-----
-----

Laboratory for Linguistics
and Computation

-----

Brandeis University logo
Brandeis University


Mildly Context Sensitive Parsing

There is increasing consensus on the necessity of formalisms more powerful than CFGs to account for certain phenomena that are characteristic of Natural Languages (NLs). Typical natural languages constructions that require context sensitive power are:

  • reduplication, leading to languages of the form { ww | w in Sigma* }
  • multiple agreements, corresponding to languages of the form { anbncn | n > 0 }, { anbncndn | n > 0 }, etc.
  • crossed agreements, as modeled by { anbmcndm | n > 0 },

Mildly context-sensitive grammars (Joshi85-tag) have been proposed as capable of modeling the above mentioned phenomena. The most restricted mildly context-sensitive grammar formalisms are LIG, HGs, TAGs, and CCGs which are weakly equivalent. They correspond to the class L2 in Weir's hierarchy, a level-2 control grammar. They were proven to have recognition algorithms with time complexity O(n6) considering the size of the grammar a constant factor ( Satta:94, Rajasekaran:96).

More expressive power can be gained in a mildly context-sensitive model increasing the progression of the same mechanism ( e.g. in a grammar model adding another level of grammar control and a machine model allowing another level of embeddedness in the stack). This increases the time (and space) complexity but it still is in P: for a level-k control grammar (or a level-k of stack embeddedness, or ordering) the recognition problem is in O(n3·2K-1) (Weir:88, Weir:92).

We investigate grammatical (a type of Indexed Grammar) and automaton models ( constrained two-stack automata) which are semi-linear and have mildly context-sensitive power. We have developed a parsing algorithm which has time complexity O(n5) for this grammatical model. Our next steps will consider the possible relationship of the class of mildly context-sensitive grammars with the formalisms that we are proposing.

Questions or comments: José Castaño


Bibliography

Culy 1985
Culy, C. (1985).
The complexity of the vocabulary of bambara.
Linguistics and Philosophy 8, 345-351.

Gazdar 1988
Gazdar, G. (1988).
Applicability of indexed grammars to natural languages.
In U. Reyle and C. R. (eds.) (Eds.), Natural Language Parsing and Linguistic Theories, pp. 69-94. D. Reidel, Dordrecht.

Joshi 1985
Joshi, A. (1985).
Tree adjoining grammars: How much context-sensitivity is required to provide reasonable structural description?
In D. Dowty, L. Karttunen, and A. Zwicky (Eds.), Natural language processing: psycholinguistic, computational and theoretical perspectives, pp. 206-250. New York: Chicago University Press.

Rajasekaran 1996
Rajasekaran, S. (1996).
Tree-adjoining language parsing in o(n6) time.
SIAM J. of Computation 25.

Satta 1994
Satta, G. (1994).
Tree-adjoining grammar parsing and boolean matrix multiplication.
Computational linguistics 20, No. 2.

Shieber 1985
Shieber, S. (1985).
Evidence against the context-freeness of natural language.
Linguistics and Philosophy 8, 333-343.

Weir 1988
Weir, D. (1988).
Characterizing mildly context-sensitive grammar formalisms.
Ph. D. thesis, University of Pennsylvania.

Weir 1992
Weir, D. J. (1992).
A geometric hierarchy beyond context-free languages.
Theoretical Computer Science 104(2), 235-261.

Back to LLC Home Page