A New Parsing Algorithm for Combinatory Categorial Grammar

We present a polynomial-time parsing algorithm for CCG, based on a new decomposition of derivations into small, shareable parts. Our algorithm has the same asymptotic complexity, O(n6), as a previous algorithm by Vijay-Shanker and Weir (1993), but is easier to understand, implement, and prove correct.


Introduction
Combinatory Categorial Grammar (CCG; Steedman and Baldridge (2011)) is a lexicalized grammar formalism that belongs to the class of so-called mildly context-sensitive formalisms, as characterized by Joshi (1985). CCG has been successfully used for a wide range of practical tasks including data-driven parsing (Clark and Curran, 2007), wide-coverage semantic construction (Bos et al., 2004;Kwiatkowski et al., 2010;Lewis and Steedman, 2013) and machine translation (Weese et al., 2012).
Several parsing algorithms for CCG have been presented in the literature. Earlier proposals show running time exponential in the length of the input string (Pareschi and Steedman, 1987;Tomita, 1988). A breakthrough came with the work of Vijay-Shanker and Weir (1990) and Vijay-Shanker and Weir (1993) who report the first polynomial-time algorithm for CCG parsing. Until this day, this algorithm, which we shall refer to as the V&W algorithm, remains the only published polynomial-time parsing algorithm for CCG. However, we are not aware of any practical parser for CCG that actually uses it. We speculate that this has two main reasons: First, some authors have argued that linguistic resources available for CCG can be covered with context-free fragments of the formalism (Fowler and Penn, 2010), for which more efficient parsing algorithms can be given. Second, the V&W algorithm is considerably more complex than parsing algorithms for equivalent mildly context-sensitive formalisms, such as Tree-Adjoining Grammar (Joshi and Schabes, 1997), and is quite hard to understand, implement, and prove correct.
The V&W algorithm is based on a special decomposition of CCG derivations into smaller parts that can then be shared among different derivations. This sharing is the key to the polynomial runtime. In this article we build on the same idea, but develop an alternative polynomial-time algorithm for CCG parsing. The new algorithm is based on a different decomposition of CCG derivations, and is arguably simpler than the V&W algorithm in at least two respects: First, the new algorithm uses only three basic steps, against the nine basic steps of the V&W parser. Second, the correctness proof of the new algorithm is simpler than the one reported by Vijay-Shanker and Weir (1993). The new algorithm runs in time O.n 6 / where n is the length of the input string, the same as the V&W parser.
We organize our presentation as follows. In Section 2 we introduce CCG and the central notion of derivation trees. In Section 3 we start with a simple but exponential-time parser for CCG, from which we derive our polynomial-time parser in Section 4. Section 5 further simplifies the algorithm and proves its correctness. We then provide a discussion of our algorithm and possible extensions in Section 6. Section 7 concludes the article.

Combinatory Categorial Grammar
We assume basic familiarity with CCG in general and the formalism of Weir and Joshi (1988) in particular. In this section we set up our terminology and notation. A CCG has two main parts: a lexicon that associates words with categories, and rules that specify how categories can be combined into other categories. Together, these components give rise to derivations such as the one shown in Figure 1.

Lexicon
The CCG lexicon is a finite set of word-category pairs w WD X. 1 Categories are built from a finite set of atomic categories and two binary operators: forward slash (=) and backward slash ( =). Atomic categories represent the syntactic types of complete constituents; they include a distinguished category S for complete sentences. A constituent with the complex category X =Y represents a function that seeks a constituent of category Y immediately to its right and returns a constituent of category X ; similarly, X =Y represents a function that seeks a Y to its left. We treat slashes as left-associative operators and omit unnecessary parentheses. By this convention, every category X can be written as where m 0, A is an atomic category called the target of X and the j i X i are slash-category pairs called the arguments of X. We view these arguments as being arranged in a stack with j 1 X 1 at the top and j m X m at the bottom. Thus another way of writing the category X above is as X D A˛, where˛is a (possibly empty) stack of m arguments. The number m is called the arity of X; we denote it by ar.X/.

Rules
The rules of CCG are directed versions of (generalized) functional composition. There are two forms, forward rules and backward rules: The formalism of Weir and Joshi (1988) also allows lexicon entries for the empty string, a feature that we ignore here. Every rule is obtained by choosing a specific degree d 0 and specific directions (forward or backward) for each of the slashes j i , while X , Y and the Y i are variables ranging over categories. Thus for every degree d 0 there are 2 d forward rules and 2 d backward rules. The rules of degree 0 are called application rules. In contexts where we refer to both application and composition, we use the latter term for "proper" composition rules of degree d > 0. Note that in most of this article we ignore additional rules required for linguistic analysis with CCG, in particular type-raising and substitution. We briefly discuss these rules in Section 6.
Every CCG grammar restricts itself to a finite set of rules, but each such rule may give rise to infinitely many rule instances. A rule is instantiated by substituting concrete categories for the variables. For example, the derivation in Figure 1 contains the following instance of forward composition (> 1 ): Note that we overload the double arrow to denote not only rules but also rule instances. Given a rule instance, the category that instantiates the pattern X=Y (forward) or X =Y (backward) is called the primary input, and the category that instantiates the pattern Y j d Y d j 1 Y 1 is called the secondary input. Adopting our stack-based view, each rule can be understood as an operation on argument stacks: pop jY off the stack of the primary input; pop the j i Y i off the stack of the secondary input and push them to the stack of the primary input (preserving their order).
The formalism of Weir and Joshi (1988) allows to restrict the valid instances of individual rules. Similar to our treatment of additional combinatory rules, in most of this article we ignore these rule restrictions; but see the discussion in Section 6.
igure 2: Recursive definition of derivation trees. Nodes labeled with primary input categories are shaded.

Derivation Trees
The set of derivation trees of a CCG can be formally defined as in Figure 2. There and in the remainder of this article we useˇand other symbols from the beginning of the Greek alphabet to denote a (possibly empty) stack of arguments. Derivation trees consist of unary branchings and binary branchings: unary branchings (drawn with dotted lines) correspond to lexicon entries; binary branchings correspond to (valid instances of) composition rules. The yield of a derivation tree is the left-to-right sequence of its leaves. The type of a derivation tree is the category at its root.

CKY-Style Parsing Algorithm
As the point of departure for our own work, we now introduce a straightforward, CKY-style parsing algorithm for CCGs. It is a simple generalization of the algorithm presented by Shieber et al. (1995), which is restricted to grammars with rules of degree 0 or 1. As in that article, we specify our algorithm in terms of a grammatical deduction system.

Deduction System
We are given a CCG and a string w D w 1 w n to be parsed, where each w i is a lexical token. As a general notation, for integers i; j with 0 Ä i Ä j Ä n we write wOEi; j to denote the substring w i C1 w j of w. As usual, we take wOEi; i to be the empty string.
Items The CKY-style algorithm uses a logic with items of the form OEX; i; j where X is a category and i; j are fencepost positions in w. The intended interpretation of such an item is to assert that we can build a derivation tree with yield wOEi; j and type X . The goal of the algorithm is the construction of the item OES; 0; n, which asserts the existence of a derivation tree for the entire input string. (Recall that S is the distinguished category for sentences.) Axioms and Inference Rules The steps of the algorithm are specified by means of inference rules over items. These rules implement the recursive definition of derivation trees given in Figure 2. The construction starts with axioms of the form OEX; i; i C 1 where w i C1 WD X is a lexicon entry; these items assert the existence of a unary-branching derivation tree of the form shown in the left of Figure 2 for each lexical token w i C1 . There is one inference rule for every forward rule (application or composition): A symmetrical rule is used for backward application and composition. However, here and in the remainder of the article we only specify the forward version of each rule and leave the backward version implicit.

Correctness and Runtime
The soundness and completeness of the CKY-style algorithm can be proved by induction on the number of inferences and the number of nodes in a derivation tree, respectively. It is not hard to see that, in the general case, the algorithm uses an amount of time and space exponential with respect to the length of the input string, n. This is because rule (1) may be used to grow the arity of primary input categories up to some linear function of n, resulting in exponentially many categories. 2 Note that this is only possible if there are rules with degree 2 or more. For grammars restricted to rules with degree 0 or 1, such as those considered by Shieber et al. (1995), the runtime of the algorithm is cubic in n. This restricted class of grammars only holds context-free generative power, while the power of general CCG is beyond that of context-free grammars (Vijay-Shanker and Weir, 1994).

A Polynomial-Time Algorithm
We now introduce our polynomial-time algorithm. This algorithm uses the same axioms and the same goal item as the CKY-style algorithm, but features new items and new inference rules.

New Items
In a first step, in order to avoid runtime exponential in n we restrict our item set to categories whose arity is bounded by some grammar constant c G : The exact choice of the constant will be discussed in Section 5. With the restricted item set, the new algorithm behaves like the old one as long as the arity of categories does not exceed c G . However, rule (1) alone is no longer complete: Derivations with categories whose arity exceeds c G cannot be simulated anymore. To remedy this deficiency, we introduce a new type of item to implement a specific decomposition of long derivations into smaller pieces. Consider a derivation t of the form shown in Figure 3(a). Note that the yield of t is wOEi; j . The derivation consists of two parts, named t 0 and c; these share a common node with a category of the form X=Y . Now assume that c has the special property that none of the combinatory rules that it uses pops the argument stack of the category X. This means that c, after popping the argument =Y , may push new arguments and pop them again, but may never "touch" X . We call a fragment with this special property a derivation context. (A formal definition will be given in Section 5.2.) The special property of c is useful because it implies that c can be carried out for any choice of X.
To be more specific, let us writeˇfor the (possibly empty) sequence of arguments that c pushes to the argument stack of X in place of =Y . We shall refer to =Y as the bridging argument and to the sequenceǎ s the excess of c. Suppose now that we replace t 0 by a derivation tree with the same yield but with a type X 0 =Y where X 0 ¤ X. Then because c does not touch X 0 we obtain another valid derivation tree with the same yield as t ; the type of this tree will be X 0ˇ.
For the combination with c, the internal structure of t 0 is of no importance; the only important information is the extent of the yield of t 0 and the identity of the bridging argument =Y . In terms of our deduction system, this can be expressed as follows: The derivation context c can be combined with any tree t 0 that is associated with an item of the form OEX=Y; i 0 ; j 0 , where X is any category. Similarly, the internal structure of c is of no importance either, as long as the argument stack of the category X remains untouched. It suffices to record the following: the extent of the yield of t , specified in terms of the positions i and j ; the extent of the yield of t 0 , specified in terms of the positions i 0 and j 0 ; the bridging argument =Y ; and the excessˇ.
We represent these pieces of information in a new type of item of the form OE=Y;ˇ; i; i 0 ; j 0 ; j . The intended interpretation of these items is to assert that, for any choice of X, if we can build a derivation tree t 0 with yield wOEi 0 ; j 0 and type X=Y , then we can also build a derivation tree t 0 with yield wOEi; j and type Xˇ. We also use items OE =Y;ˇ; i; i 0 ; j 0 ; j with a backward slash, with a similar meaning. Like items that represent derivation trees, our items for derivation contexts are arity-restricted: OEjY;ˇ; i; i 0 ; j 0 ; j where ar.Yˇ/ Ä c G As we will see in Section 5, these restricted items suffice to simulate all derivations of a CCG. Furthermore, this can be done in time polynomial in n, because our encoding allows sharing of the same items among several derivations.  Figure 4: A sample derivation of the grammatical deduction system of Section 4. Inference triggers a new context item from a tree item; inference reuses the tree item (as indicated by the arrow), recombining it with the (modified) context item.

New Inference Rules
In our parsing algorithm, context items are introduced whenever the composition of two categories whose arities are bounded by c G would result in a category whose arity exceeds this bound: The new rule has the same antecedents as rule (1), but rather than extending the derivation asserted by the first antecedent OEX =Y; i; j , which is not possible because of the arity bound, it triggers a new derivation context, asserted by the item OE=Y;ˇ; i; i; j; k. Further applications and compositions will extend the new context, and only when the excess of this context has become sufficiently small will it be recombined with the derivation that originally triggered it. This is done by the following rule: Note that this rule (like all rules in the deduction system) is only defined on valid items; in particular it only fires if the arity of the category Xˇis bounded by c G . The remaining rules of the algorithm parallel the three rules that we have introduced so far but take items that represent derivation contexts rather than derivation trees as their first antecedents. First out, rule (4) extends a derivation context in the same way as rule (1) extends a derivation tree.
OEjY;ˇ=Z; i; i 0 ; j 0 ; j OEZ ; j; k OEjY;ˇ ; i; i 0 ; j 0 ; k 8 < : Rule (5) is the obvious correspondent of rule (2): It triggers a new context when the antecedent context cannot be extended because of the arity bound.
OEjY;ˇ=Z; i; i 0 ; j 0 ; j OEZ ; j; k OE=Z; ; i; i; j; k Finally, and parallel to rule (3), we need a rule to recombine a context with the context that originally triggered it. As it will turn out, we only need this in cases where the triggered context has no excess.

Sample Derivation
We now illustrate our algorithm on a toy grammar. The grammar has the following lexicon: The start symbol is S . The grammar allows all instances of application and all instances of composition with degree bounded by 2. We let c G D 3 (as explained later in Section 5.2). A derivation of our deduction system on the input string w 1 w 8 is given in Figure 4. We start by applying rule (1) twice (once forward, once backward) to obtain the item OES =H =A=F; 2; 5. Combining this item with the axiom OEF =G =B; 5; 6 is not possible using rule (1), as this would result in a category with arity 4, exceeding the arity bound. We therefore use rule (2) to trigger the context item OE=F; =G =B; 2; 2; 5; 6 (). Successively, we use rule (4) twice to obtain the item OE=F; "; 1; 2; 5; 7. At this point we use rule (3) (withˇD ") to recombine the context item with the tree item that originally triggered it (); this yields the item OES =H =A; 1; 7. Note that the recombination effectively retrieves the portion of the stack that was below the argument =F when the context item was triggered in . Double application of rule (1) produces the goal item OES; 0; 8.

Runtime Analysis
We now turn to an analysis of the runtime complexity of our algorithm. We first consider runtime complexity with respect to the length of the input string, n. The runtime is dominated by the number of instantiations of rule (6) which involves two context items as antecedents. By inspection of this rule, we see that the number of possible instantiations is bounded by n 6 . Therefore we conclude that the algorithm runs in time O.n 6 /.
We now consider runtime complexity with respect to the size of the input grammar. Here the runtime is dominated by the number of instantiations of rules (1)-(5). For example, rule (5) combines items OEjY;ˇ=Z; i; i 0 ; j 0 ; j and OEZ ; j; k : By our restrictions on items, both the arity of Yˇ=Z and the arity of Z are upper-bounded by the constant c G . Now recall that every category X can be written as X D A˛for some atomic category A and stack of arguments˛. Let A be the set of atomic categories in the input grammar, and let L be the set of all arguments occurring in any category of the lexicon. By a result of Vijay-Shanker and Weir (1994, Lemma 3.1), every argument that may occur in a derived category occurs in L. Then the number of possible instantiations of rule (5) as well as rules (1)-(4), and hence the runtime of the algorithm, is in Note that both A and L may grow with the grammar size. As we will see in Section 5.2, the constant c G also depends on the grammar size. This means that the worst-case runtime complexity of our parser is exponential in the size of the input grammar. We will return to this point in Section 7.

Correctness
In this section we prove the correctness of our parsing algorithm. In order to simplify the proofs, we start by simplifying our algorithm, at the cost of making it less efficient: We remove the rules for extending trees and contexts, rule (1) and rule (4).
We conflate the rules for triggering contexts, rule (2) and rule (5), into the single rule OEYˇ; j; k OE=Y;ˇ; i; i; j; k X =Y Yˇ) Xˇ (7) This rule blindly guesses the extension of the triggering tree or context (specified by positions i and j ), rather than waiting for a corresponding item to be derived.
The simplified algorithm is specified in Figure 5. We now argue that this algorithm and the algorithm from Section 4 parse exactly the same derivation trees, although they use different parsing strategies. First, we observe that rule (1) in the old algorithm can be simulated by a combination of other rules in the simplified algorithm, as follows: OEX =Y; i; j OEYˇ; j; k OE=Y;ˇ; i; i; j; k OEXˇ; i; k Furthermore, the simplified algorithm does no longer need rule (4), whose role is now taken over by rule (7). To see this, recall that rule (4) extends an existing context whenever the composition of two categories results in a new category whose arity does not exceed c G . In contrast, rule (7) always triggers a new context c, even if the result of the composition of c with some existing context satisfies the above arity restriction. Despite the difference in the adopted strategy, these two rules are equivalent in terms of stack content, leading to the same derivation trees.

Definitions
We introduce some additional terminology and notation that we will use in the proofs. For a derivation tree t and a node u of t, we write tOEu to denote the category at u, and we write t j u to denote the subtree of t at u. Formally, tj u is the restriction of t to node u and all of its descendants. Each subtree of a derivation tree is another derivation tree.
Definition 1 Let t be a derivation tree with root r. Then t has signature OEX; i; j if 1. the yield of t is wOEi; j and 2. the type of t is X , that is, t OEr D X.
Note that while we use the same notation for signatures as for items, the signature of a derivation tree is a purely structural concept, whereas an item is an object in the algorithm.
A central concept in our proof is the notion of spine. Recall that a derivation tree consists of unary branchings and binary branchings. In each binary branching, we refer to the two children of the branching's root node as the primary child and the secondary child, depending on which of the two is labeled with the primary and secondary input category of the corresponding rule instance. In Figure 2, the primary children of the root node are shaded.
Definition 2 For a derivation tree t, the spine of t is the unique path that starts at the root node of t and at each node u continues to the primary child of u.
The spine of a derivation tree always ends at a node that is labeled with a category from the lexicon.

Definition 3 Let t be a derivation tree with root r.
A derivation context c is obtained by removing all proper descendants of some node f ¤ r on the spine of t, under the restriction that ar.t OEu/ > ar.tOEr/ for every node u on the spine properly between f and r.
The node f is called the foot node of c. The yield of c is the pair whose first component is the yield of t to the left of f and whose second component is the yield of t to the right of f . For a derivation context c and a node u of c, we write cOEu to denote the category at u.
Definition 3 formalizes the concept of derivation contexts that we introduced in Section 4.1. First, because f is on the spine and f ¤ r, the category cOEf takes the form X =Y . The arity restriction implies that the category of every node u on the spine properly between f and r takes the form Xˇu, jˇuj > 0, and that the category at the root takes the form Xˇ, jˇj 0. Thus the category X is never exposed in c, except perhaps at r. As we will see in Section 5.4, this property, together with a careful selection of "split nodes", will allow us to decompose derivations into smaller, shareable parts. The basic idea is the same as in the tabulation of pushdown automata (Lang, 1974;Nederhof and Satta, 2004), where the pushdown in our case is the argument stack of the primary input categories along a spine.
The concepts of signature and spine are generalized to derivation contexts as follows: Definition 4 Let c be a derivation context with root node r and foot node f . Then c has signature OEjY;ˇ; i; i 0 ; j 0 ; j if 1. the yield of c is .wOEi; i 0 ; wOEj 0 ; j /; 2. for some X , cOEf D X jY and cOEr D Xˇ.
Definition 5 For a derivation context c, the spine of c is the path from its root node to its foot node.

Grammar Constant
Before we start with the proof as such, we turn to the choice of the grammar constant c G , which was left pending in previous sections. Recall that we are using c G as a bound on the arity of X in type 1 items OEX; i; j . Since these items are produced by our axioms from the set of categories in the lexicon, c G must not be smaller than the maximum arity`of a category in this finite set.
We also use c G as a bound on the arity of the category Yˇin type 2 items OEjY;ˇ; i; i 0 ; j 0 ; j . These items are produced by inference rule (7) to simulate instances of composition of the form X =Y Yˇ) Xˇ. Here the length ofˇis bounded by the maximum degree d of a composition rule in the grammar, and ar.Y / is bounded by the maximum arity a of an argument from the (finite) set L of arguments in the lexicon (recall Section 4.4). Therefore c G cannot be smaller than a C d . Putting everything together, we obtain the condition c G maxf`; a C d g: The next lemma will be used in several places later.
Proof. Let r and f be the root and the foot node, respectively, of c. From the definition of signature, there must be some X such that cOEr D Xˇand cOEf D XjY . Now let p be the parent node of f , and assume that the rule used at p is instantiated as X=Y Yˇ0 ) Xˇ0, so that cOEp D Xˇ0. If p D r thenˇ0 Dˇ; otherwise, because of the arity restriction in the definition of derivation contexts (Definition 3) we have jˇ0j > jˇj. Then where the right inequality follows from the assumption that X=Y Yˇ0 ) Xˇ0 is a rule instance of the grammar, and from inequality (8).

Soundness
We start the correctness proof by arguing for the soundness of the deduction system in Figure 5. More specifically, we show that for every item of type 1 there exists a derivation tree with the same signature, and that for every item of type 2 there exists a derivation context with the same signature. The soundness of the axioms is obvious. Rule (7) states that, if we have built a derivation tree t with signature OEYˇ; j; k then we can build a derivation context c with signature OE=Y;ˇ; i; i; j; k. Under the condition that the grammar admits the rule instance X=Y Yˇ) Xˇ, this inference is sound; the context can be built as shown in Figure 6. Rule (3) states that, if we have built a derivation tree t 0 with signature OEX=Y; i 0 ; j 0 and a context c with signature OE=Y;ˇ; i; i 0 ; j 0 ; j , then we can build a new tree t with signature OEXˇ; i; j . We obtain t by substituting t 0 for the foot node of c (see Figure 3(a)).
Rule (6) states that, if we have built a derivation context c 1 with signature OEj 1 Y;ˇj 2 Z; i 00 ; i 0 ; j 0 ; j 00 and another context c 2 with signature OEj 2 Z; "; i; i 00 ; j 00 ; j , then we can build a derivation context c with signature OEj 1 Y;ˇ; i; i 0 ; j 0 ; j . We obtain c by substituting c 1 for the foot node of c 2 ; this is illustrated by Figure 3(b) if we assume D ".

Completeness
In the final part of our correctness proof we now prove the completeness of the deduction system in Figure 5. Specifically we show the following stronger statement: For every derivation tree and for every derivation context with a signature I satisfying the arity bounds for items of Figure 5, the deduction system infers the corresponding item I . From this statement we can immediately conclude that the system constructs the goal item whenever there exists a derivation tree whose yield is the complete input string and whose type is the distinguished category S.
Our proof is by induction on a measure that we call rank. The rank of a derivation tree or context is the number of its non-leaf nodes. Note that this definition implies that the foot node of a context is not counted against its rank. The rank of a tree or context is always at least 1, with rank 1 only realized for derivation trees consisting of a single node.

Base Case
Consider a derivation tree t with signature OEX; i; j and rank.t/ D 1. The tree t takes the form shown in the left of Figure 2, and we have j D i C 1 and w j WD X . The item OEX; i; j is then produced by one of the axioms of our deduction system.

Inductive Case
The general idea underlying the inductive case can be stated as follows. We consider a derivation tree or context ' with signature I satisfying the bounds stated in Figure 5 for items of type 1 or 2. We then identify a special node s in ''s spine, which we call the split node. We use s to "split" ' into two parts that are either derivation trees or contexts, that both satisfy the bounds for items of type 1 or 2, and that both have rank smaller than the rank of '. We then apply the induction hypothesis to obtain two items that can be combined by one of the inference rules of our algorithm, resulting in the desired item I for '. We first consider the case in which ' is a tree, and later the case in which ' is a context.

Splitting Trees
Consider a derivation tree t with signature OEX 0 ; i; j , root node r, and rank.t / > 1. Then the spine of t consists of at least 2 nodes. Now assume that ar.X 0 / Ä c G , that is, OEX 0 ; i; j is a valid item.
Choose the split node s to be the highest (closest to the root) non-root node on the spine for which ar.tOEs/ Ä c G . Node s always exists, as the arity constraint is satisfied at least for the lowest (farthest from the root) node on the spine, which is labeled with a category from the lexicon.
Consider the subtree t 0 D t j s ; thus s is the root node of t 0 . Because s is a primary node in t, the category at s has at least one argument. We deal here with the case where this category takes the form tOEs D X =Y ; the case t OEs D X =Y is symmetrical. Thus the signature of t 0 takes the form OEX=Y; i 0 ; j 0 where i 0 ; j 0 are integers with i Ä i 0 < j 0 Ä j . By our choice of s, ar.X =Y / Ä c G , and therefore OEX=Y; i 0 ; j 0 is a valid item. Furthermore, rank.tj s / < rank.t /, as r does not belong to tj s . We may then use the induction hypothesis to deduce that our algorithm constructs the item OEX =Y; i 0 ; j 0 . Now consider the context c that is obtained from t by removing all proper descendants of s; thus r is the root node of c and s is its foot node. To see that c is well-defined, note that our choice of s guarantees that ar.tOEu/ > ar.t OEr/ for every node u that is properly between s and r: If there was a node u such that ar.tOEu/ Ä ar.t OEr/ then because of t OEr D X 0 and our assumption that ar.X 0 / Ä c G we would have chosen u instead of s. Now letˇbe the excess of c; then X 0 D tOEr D tOEsˇD Xˇ. Thus the signature of c takes the form OE=Y;ˇ; i; i 0 ; j 0 ; j . Applying Lemma 1 to c, we get ar.Yˇ/ Ä c G , and therefore OE=Y;ˇ; i; i 0 ; j 0 ; j is also a valid item. Furthermore, rank.c/ < rank.t /, since node s is counted in rank.t/ but not in rank.c/. By the induction hypothesis, we conclude that our algorithm constructs the item OE=Y;˛; i; i 0 ; j 0 ; j .
Finally, we apply the inference rule (3) to the previously constructed items OEX=Y; i 0 ; j 0 and OE=Y;ˇ; i; i 0 ; j 0 ; j . This yields the item OEXˇ; i; j D OEX 0 ; i; j for t, as desired.

Splitting Contexts
Consider a derivation context c with signature OE=Y;ˇ; i; i 0 ; j 0 ; j , root r, and foot f . (The case where we have =Y instead of =Y can be covered with a symmetrical argument.) From Definition 4 we know that there is a category X such that cOEf D X=Y and cOEr D Xˇ, and from the definition of context we know that for every spinal node u that is properly between f and r it holds that ar.cOEu/ > ar.cOEr/. Now assume that ar.Yˇ/ Ä c G , that is, OE=Y;ˇ; i; i 0 ; j 0 ; j is a valid item. We distinguish two cases below.
Case 1 Suppose that the spine of c consists of exactly 2 nodes. In this case the foot f is the left child of the root r and i D i 0 . Let f 0 be the right sibling of f and consider the subtree t 0 D t j f 0 ; thus f 0 is the root node of t 0 . The signature of t 0 takes the form OEYˇ; j 0 ; j . By our assumption, ar.Yˇ/ Ä c G , and then OEYˇ; j 0 ; j is a valid item. Furthermore, rank.t 0 / < rank.c/ since the root node r is counted in rank.c/ but not in rank.t 0 /. Then, by the induction hypothesis, the item OEYˇ; j 0 ; j is constructed by our algorithm. We now apply inference rule (7) to this item; this yields the item OE=Y;ˇ; i; i 0 ; j 0 ; j for c, as required.
Case 2 Suppose that the spine of c consists of more than 2 nodes. This means that there is at least one spinal node that is properly between f and r.
Choose the split node s to be the deepest (farthest from the root) node properly between f and r for which ar.cOEs/ D ar.cOEr/ C 1. Node s always exists, as the arity constraint is satisfied at least for the primary child of r. This is because the definition of context states that ar.cOEu/ > ar.cOEr/ for every node u in the spine, and at the same time, no combinatory rule can reduce the arity of its primary input category by more than one unit.
Consider the context c 1 that is obtained by restricting c to node s and all of its descendants; thus s is the root node of c 1 and f is the foot node. To see that c 1 is well-defined, note that our choice of s guarantees that ar.cOEu/ > ar.cOEs/ for every node u that is properly between f and s. To see this, suppose that there was a node u ¤ s such that ar.cOEu/ Ä ar.cOEs/. Since ar.cOEs/ D ar.cOEr/ C 1 and ar.cOEu/ ¤ ar.cOEs/, by our definition of s, we would have ar.cOEu/ Ä ar.cOEr/, which cannot be because in c, every node u properly between f and s has arity ar.cOEu/ > ar.cOEr/.
Because f is a primary node in c, the category at f has at least one argument; call it j 1 Y . The node s is a primary node in c as well, so the excess of c 1 takes the formˇj 2 Z, where j 2 Z is the topmost argument of the category at s. Thus the signature of c 1 takes the form OEj 1 Y;ˇj 2 Z; i 00 ; i 0 ; j 0 ; j 00 where i 00 ; j 00 are integers with i Ä i 00 Ä i 0 and j 0 Ä j 00 Ä j . Applying Lemma 1 to c 1 , we get ar.Yˇj 2 Z/ Ä c G , and therefore OEj 1 Y;ˇj 2 Z; i 00 ; i 0 ; j 0 ; j 00 is a valid item. Finally, we note that rank.c 1 / < rank.c/, since the root node r is counted in c but not in c 1 . By the induction hypothesis we conclude that our algorithm constructs the item OEj 1 Y;ˇj 2 Z; i 00 ; i 0 ; j 0 ; j 00 . Now consider the context c 2 that is obtained from c by removing all proper descendants of the node s; thus r is the root node of c 2 and s is the foot node. To see that c 2 is well-defined, note that ar.cOEu/ > ar.cOEr/ for every node u that is properly between s and r simply because every such node is also properly between f and r. The excess of c 2 is the empty stack " by our choice of s. Thus the signature of c 2 is OEj 2 Z; "; i; i 00 ; j 00 ; j . We apply Lemma 1 once more, this time to c 2 , to show that ar.Z/ Ä c G , and conclude that OEj 2 Z; "; i; i 00 ; j 00 ; j is also a valid item. Finally we note that rank.c 2 / < rank.c/, as the node s is counted in c but not in c 2 . By the induction hypothesis we conclude that our algorithm constructs the item OEj 2 Z; "; i; i 00 ; j 00 ; j .

Discussion
We round off the article with a discussion of our algorithm and possible extensions.

Support for Rule Restrictions
As we mentioned in Section 2, the CCG formalism of Weir and Joshi (1988) allows a grammar to impose certain restrictions on valid rule instances. More specifically, for every rule a grammar may restrict (a) the target of the primary input category and/or (b) parts of or the entire secondary input category. 3 The algorithm in Figure 5 can be extended to support such rule restrictions. Note that already in its present form, the algorithm only allows inferences that are licensed by valid instances of a given rule. Supporting restrictions on the secondary input category (restrictions of type b) is straightforwardassuming that these restrictions can be efficiently tested. To also support restrictions on the target of the primary input category (restrictions of type a) the items can be extended by an additional component that keeps track of that target category for the corresponding derivation subtree or context. With this information, rule (7) can perform a check against the restrictions specified for the composition rule, and rules (3) and (6) merely need to test whether the target categories of their two antecedents match, and propagate the common target category to the conclusion. This is essentially the same solution as the one adopted in the V&W algorithm.

Support for Multi-Modal CCG
The modern version of CCG has abandoned rule restrictions in favor of a new, lexicalized control mechanism in the form of modalities or slash types (Steedman and Baldridge, 2011). However, as shown by Baldridge and Kruijff (2003), every multi-modal CCG can be translated into an equivalent CCG with rule restrictions. The basic idea is to specialize the target of each category and argument for a slash type, and to reformulate the multi-modal rules as rules with restrictions that reference this information. With this simulation, our parsing algorithm can also be used as a parsing algorithm for multi-modal CCG.

Comparison with the V&W Algorithm
As already mentioned in Section 1, apart from the algorithm presented in this article, the only other algorithm known to run in polynomial time in the length of the input string is the one presented by Vijay-Shanker and Weir (1993). At an abstract level, the two algorithms are based on the same basic idea of decomposing CCG derivations into pieces of two different types, one of which spans a portion of the input string that includes a gap. This idea actually underlies several parsing algorithms for equivalent mildly context-sensitive formalisms, such as Tree-Adjoining Grammar (Joshi and Schabes, 1997).
The main difference between the V&W algorithm and the one presented in this article is the use of different decompositions for CCG derivations. In our algorithm we allow the excess of a derivation context to be the empty list of arguments, something that is not possible in the V&W algorithm. There, when an application operation empties the excesš of some context, one is forced to retrieve, in the same elementary step, the portion of the stack placed right belowˇ. This requires the distinction of several possible cases, resulting in four different realizations of the application rule (Vijay-Shanker and Weir, 1993, p. 616). As a consequence, the V&W algorithm uses nine (forward) inference rules, against our algorithm in Figure 5 which uses only three. Furthermore, some of the inference rules in the V&W algorithm use three antecedent items, while our use a maximum of two. This results in a runtime complexity of O.n 7 / for the V&W algorithm, n the length of the input string; however, Vijay-Shanker and Weir (1993) show how their algorithm can be implemented in time O.n 6 / at the cost of some extra bookkeeping. In contrast, our algorithm directly runs in time O.n 6 /.
The relative proliferation of inference rules, combined with the increase in their complexity, makes, in our own opinion, the specification of the V&W parser more difficult to understand and implement, and calls for a more articulated correctness proof.

Support for Additional Types of Rules
Like the V&W algorithm, our algorithm currently only supports (generalized) composition but no other combinatory rules required for linguistic analysis, in particular type-raising and substitution.
Type-raising is a unary rule of the (forward) form X ) T =.T =X/ where T is a variable over categories. Under the standard assumption that T =X is limited to a finite set of categories (Steedman, 2000), this rule can be implemented in our algorithm by introducing a new unary inference rule and choosing the constant c G large enough to accomodate all instances of T =X.
Substitution is a binary rule of the (forward) form X =Y jZ Y jZ ) X=Z. This rule is easy to implement if both =Y and jZ are stored in the same item. Otherwise, we need to pass jZ to any item storing the =Y . This can be done by changing the second antecedent of rule (6) to allow a single argument jZ instead of the empty excess ". The price of this change is spurious ambiguity in the derivations of the grammatical deduction system.

Conclusion
Recently, there has been a surge of interest in the mathematical properties of CCG; see for instance Hockenmaier and Young (2008), Kuhlmann (2009), Fowler andPenn (2010) and Kuhlmann et al. (2010). Following this line, this article has revisited the parsing problem for CCG.
Our work, like the polynomial-time parsing algorithm previously discovered by Vijay-Shanker and Weir (1993), is based on the idea of decomposing large CCG derivations into smaller, shareable pieces. Here we have proposed a derivation decomposition different from the one adopted by Vijay-Shanker and Weir (1993). This results in an algorithm which, in our own opinion, is simpler and easier to understand.
Although we have specified only a recognition version of the algorithm, standard techniques can be applied to obtain a derivation forest from our parsing table. This consists in saving backpointers at each inference rule, linking newly inferred items to their antecedent items.
As observed in Section 4.4, the worst case runtime of our algorithm is exponential in the size of the grammar. The same holds true for the algorithm of Vijay-Shanker and Weir (1993). We are not aware of any published discussion on this issue, and we therefore leave as an open problem the question whether CCG parsing can be done in polynomial time when the grammar is considered as part of the input.