Efficient Parsing for Head-Split Dependency Trees

Head splitting techniques have been successfully exploited to improve the asymptotic runtime of parsing algorithms for projective dependency trees, under the arc-factored model. In this article we extend these techniques to a class of non-projective dependency trees, called well-nested dependency trees with block-degree at most 2, which has been previously investigated in the literature. We define a structural property that allows head splitting for these trees, and present two algorithms that improve over the runtime of existing algorithms at no significant loss in coverage.


Introduction
Much of the recent work on dependency parsing has been aimed at finding a good balance between accuracy and efficiency. For one end of the spectrum, Eisner (1997) showed that the highest-scoring projective dependency tree under an arc-factored model can be computed in time O.n 3 /, where n is the length of the input string. Later work has focused on making projective parsing viable under more expressive models (Carreras, 2007;Koo and Collins, 2010).
At the same time, it has been observed that for many standard data sets, the coverage of projective trees is far from complete (Kuhlmann and Nivre, 2006), which has led to an interest in parsing algorithms for non-projective trees. While non-projective parsing under an arc-factored model can be done in time O.n 2 / (McDonald et al., 2005), parsing with more informed models is intractable (McDonald and Satta, 2007). This has led several authors to investigate 'mildly non-projective' classes of trees, with the goal of achieving a balance between expressiveness and complexity (Kuhlmann and Nivre, 2006).
In this article we focus on a class of mildly nonprojective dependency structures called well-nested dependency trees with block-degree at most 2. This class was first introduced by Bodirsky et al. (2005), who showed that it corresponds, in a natural way, to the class of derivation trees of lexicalized tree-adjoining grammars (Joshi and Schabes, 1997). While there are linguistic arguments against the restriction to this class (Maier and Lichte, 2011;Chen-Main and Joshi, 2010), Kuhlmann and Nivre (2006) found that it has excellent coverage on standard data sets. Assuming an arc-factored model, well-nested dependency trees with block-degree Ä 2 can be parsed in time O.n 7 / using the algorithm of Gómez-Rodríguez et al. (2011). Recently, Pitler et al. (2012) have shown that if an additional restriction called 1-inherit is imposed, parsing can be done in time O.n 6 /, without any additional loss in coverage on standard data sets.
Standard context-free parsing methods, when adapted to the parsing of projective trees, provide O.n 5 / time complexity. The O.n 3 / time result reported by Eisner (1997) has been obtained by exploiting more sophisticated dynamic programming techniques that 'split' dependency trees at the position of their heads, in order to save bookkeeping. Splitting techniques have also been exploited to speed up parsing time for other lexicalized formalisms, such as bilexical context-free grammars and head automata (Eisner and Satta, 1999). However, to our knowledge no attempt has been made in the literature to extend these techniques to non-projective dependency parsing.
In this article we leverage the central idea from Eisner's algorithm and extend it to the class of wellnested dependency trees with block-degree at most 2.
We introduce a structural property, called head-split, that allows us to split these trees at the positions of their heads. The property is restrictive, meaning that it reduces the class of trees that can be generated. However, we show that the restriction to head-split trees comes at no significant loss in coverage, and it allows parsing in time O.n 6 /, an asymptotic improvement of one order of magnitude over the algorithm by Gómez-Rodríguez et al. (2011) for the unrestricted class. We also show that restricting the class of head-split trees by imposing the already mentioned 1-inherit property does not cause any additional loss in coverage, and that parsing for the combined class is possible in time O.n 5 /, one order of magnitude faster than the algorithm by Pitler et al. (2012) for the 1-inherit class without the head-split condition.
The above results have consequences also for the parsing of other related formalisms, such as the already mentioned lexicalized tree-adjoining grammars. This will be discussed in the final section.

Head Splitting
To introduce the basic idea of this article, we briefly discuss in this section two well-known algorithms for computing the set of all projective dependency trees for a given input sentence: the naïve, CKY-style algorithm, and the improved algorithm with head splitting, in the version of Eisner and Satta (1999). 1 CKY parsing The CKY-style algorithm works in a pure bottom-up way, building dependency trees by combining subtrees. Assuming an input string w D a 1 a n , n 1, each subtree t is represented by means of a finite signature OEi; j; h, called item, where i; j are the boundary positions of t 's span over w and h is the position of t's root. This is the only information we need in order to combine subtrees under the arc-factored model. Note that the number of possible signatures is O.n 3 /.
The main step of the algorithm is displayed in Figure 1(a). Here we introduce the graphical convention, used throughout this article, of representing a subtree by a shaded area, with an horizontal line indicating the spanned fragment of the input string, and of marking the position of the head by a bullet. The illustrated step attaches a tree with signature OEk; j; d 1 Eisner (1997) describes a slightly different algorithm. as a dependent of a tree with signature OEi; k; h. There can be O.n 5 / instantiations of this step, and this is also the running time of the algorithm.
Eisner's algorithm Eisner and Satta (1999) improve over the CKY algorithm by reducing the number of position records in an item. They do this by 'splitting' each tree into a left and a right fragment, so that the head is always placed at one of the two boundary positions of a fragment, as opposed to being placed at an internal position. In this way items need only two indices. Left and right fragments can be processed independently, and merged afterwards. Let us consider a right fragment t with head a h . Attachment at t of a right dependent tree with head a d is now performed in two steps. The first step attaches a left fragment with head a d , as in Figure 1(b). This results in a new type of fragment/item that has both heads a h and a d placed at its boundaries. The second step attaches a right fragment with head a d , as in Figure 1(c). The number of possible instantiations of these steps, and the asymptotic runtime of the algorithm, is O.n 3 /.
In this article we extend the splitting technique to the class of well-nested dependency trees with blockdegree at most 2. This amounts to defining a factorization for these trees into fragments, each with its own head at one of its boundary positions, along with some unfolding of the attachment operation into intermediate steps. While for projective trees head splitting can be done without any loss in coverage, for the extended class head splitting turns out to be a proper restriction. The empirical relevance of this will be discussed in 7.

Head-Split Trees
In this section we introduce the class of well-nested dependency trees with block-degree at most 2, and define the subclass of head-split dependency trees.

Preliminaries
For non-negative integers i; j we write OEi; j to denote the set fi; i C1; : : : ; j g; when i > j , OEi; j is the empty set. For a string w D a 1 a n , where n 1 and each a i is a lexical token, and for i; j 2 OE0; n with i Ä j , we write w i;j to denote the substring a i C1 a j of w; w i;i is the empty string.
A dependency tree t over w is a directed tree whose nodes are a subset of the tokens a i in w and whose arcs encode a dependency relation between two nodes. We write a i ! a j to denote the arc .a i ; a j / in t ; here, the node a i is the head, and the node a j is the dependent. If each token a i , i 2 OE1; n, is a node of t , then t is called complete. Sometimes we write t a i to emphasize that tree t is rooted in node a i . If a i is a node of t, we also write t OEa i to denote the subtree of t composed by node a i as its root and all of its descendant nodes.
The nodes of t uniquely identify a set of maximal substrings of w, that is, substrings separated by tokens not in t . The sequence of such substrings, ordered from left to right, is the yield of t , written yd.t/. Let a i be some node of t . The block-degree of a i in t, written bd.a i ; t /, is defined as the number of string components of yd.t OEa i /. The block-degree of t, written bd.t /, is the maximal block-degree of its nodes. Tree t is non-projective if bd.t/ > 1. Tree t is well-nested if, for each node a i of t and for every pair of outgoing dependencies a i ! a d 1 and a i ! a d 2 , the string components of yd.t OEa d 1 / and yd.tOEa d 2 / do not 'interleave' in w. More precisely, it is required that, if some component of yd.t OEa d i /, i 2 OE1; 2, occurs in w in between two components s 1 ; s 2 of yd.t OEa d j /, j 2 OE1; 2 and j ¤ i , then all components of yd.t OEa d i / occur in between s 1 ; s 2 .
Throughout this article, whenever we consider a dependency tree t we always implicitly assume that t is over w, that t has block-degree at most 2, and that t is well-nested. Let t a i be such a tree, with bd.a i ; t a i / D 2. We call the portion of w in between the two substrings of yd.t a i / the gap of t a i , denoted by Example 1 Figure 2 schematically depicts a wellnested tree t a h with block-degree 2; we have marked the root node a h and its dependent nodes a d i . For each node a d i , a shaded area highlights tOEa d i . We have bd.a h ; t a h / D bd.a d 1 ; t a h / D bd.a d 4 ; t a h / D 2 and bd.a d 2 ; t a h / D bd.a d 3 ; t a h / D 1.

The Head-Split Property
We say that a dependency tree t has the head-split property if it satisfies the following condition. Let a h ! a d be any dependency in t with bd.a h ; t / D bd.a d ; t/ D 2. Whenever gap.tOEa d / contains a h , it must also contain gap.tOEa h /. Intuitively, this means that if yd.tOEa d / 'crosses over' the lexical token a h in w, then yd.tOEa d / must also 'cross over' gap.tOEa h /.
Example 2 Dependency a h ! a d 1 in Figure 3 violates the head-split condition, since yd.tOEa d 1 / crosses over the lexical token a h in w, but does not cross over gap.tOEa h /. The remaining outgoing dependencies of a h trivially satisfy the head-split condition, since the child nodes have block-degree 1.
Let t a h be a dependency tree satisfying the headsplit property and with bd.a h ; t a h / D 2. We specify below a construction that 'splits' t a h with respect to the position of the head a h in yd.t a h /, resulting in two dependency trees sharing the root a h and having all of the remaining nodes forming two disjoint sets. Furthermore, the resulting trees have block-degree at most 2. Let yd.t a h / D hw i;j ; w p;q i and assume that a h is placed within w i;j . (A symmetric construction should be used in case a h is placed within w p;q .) The mirror image of a h with respect to gap.t a h /, written m.t a h /, is the largest integer in OEp; q such that there are no dependencies linking nodes in w i;h 1 and nodes in w p;m.t a h / and there are no dependencies linking nodes in w h;j and nodes in w m.t a h /;q . It is not hard to see that such an integer always exists, since t a h is well-nested.
We classify every dependent a d of a h as being an 'upper' dependent or a 'lower' dependent of a h , according to the following conditions: The upper tree of t a h is the dependency tree rooted in a h and composed of all dependencies a h ! a d in t a h with a d an upper dependent of a h , along with all subtrees t a h OEa d rooted in those dependents. Similarly, the lower tree of t a h is the dependency tree rooted in a h and composed of all dependencies a h ! a d in t a h with a d a lower dependent of a h , along with all subtrees t a h OEa d rooted in those dependents. As a general convention, in this article we write t U;a h and t L;a h to denote the upper and the lower trees of t a h , respectively. Note that, in some degenerate cases, the set of lower or upper dependents may be empty; then t U;a h or t L;a h consists of the root node a h only. The importance of the head-split property can be informally explained as follows. Let a h ! a d be a dependency in t a h . When we take apart the upper and the lower trees of t a h , the entire subtree t a h OEa d ends up in either of these two fragments. This allows us to represent upper and lower fragments for some head independently of the other, and to freely recombine them. More formally, our algorithms will make use of the following three properties, stated here without any formal proof: P1 Trees t U;a h and t L;a h are well-nested, have blockdegree Ä 2, and satisfy the head-split property. P2 Trees t U;a h and t L;a h have their head a h always placed at one of the boundaries in their yields. P3 Let t 0 U;a h and t 00 L;a h be the upper and lower trees of distinct trees t 0 a h and t 00

Parsing Items
Let w D a 1 a n , n 1, be the input string. We need to compactly represent trees that span substrings of w by recording only the information that is needed to combine these trees into larger trees during the parsing process. We do this by associating each tree with a signature, called item, which is a tuple OEi; j ; p; q; h X , where h 2 OE1; n identifies the token a h , i; j with 0 Ä i Ä j Ä n identify a substring w i;j , and p; q with j < p Ä q Ä n identify a substring w p;q . We also use the special setting p D q D .
The intended meaning is that each item repres- The two cases h < i and h > j C 1 above will be used when the root node a h of t a h has not yet collected all of its dependents. Note that h 2 fi; j C 1g is not used in the definition of item. This is meant to avoid different items representing the same dependency tree, which is undesired for the specification of our algorithm. As an example, items OEi; j ; ; ; i C 1 X and OEi C 1; j ; ; ; i C 1 X both represent a dependency tree t a iC1 with yd.t a i C1 / D hw i;j i. This and other similar cases are avoided by the ban against h 2 fi; j C 1g, which amounts to imposing some normal form for items. In our example, only item OEi; j ; ; ; i C 1 X is a valid signature.
Finally, we distinguish among several item types, indicated by the value of subscript X . These types are specific to each parsing algorithm, and will be defined in later sections.

Parsing of Head-Split Trees
We present in this section our first tabular algorithm for computing the set of all dependency trees for an input sentence w that have the head-split property, under the arc-factored model. Recall that t a i denotes a tree with root a i , and t L;a i and t U;a i are the lower and upper trees of t a i . The steps of the algorithm are specified by means of deduction rules over items, following the approach of Shieber et al. (1995).

Basic Idea
Our algorithm builds trees step by step, by attaching a tree t a h 0 as a dependent of a tree t a h and creating the new dependency a h ! a h 0 . Computationally, the worst case for this operation is when both t a h and t a h 0 have a gap; then, for each tree we need to keep a record of the four boundaries, along with the position of the head, as done by Gómez-Rodríguez et al. (2011). However, if we are interested in parsing trees that satisfy the head-split property, we can avoid representing a tree with a gap by means of a single item. We instead follow the general idea of 2 for projective parsing, and use different items for the upper and the lower trees of the source tree.
When we need to attach t a h 0 as an upper dependent of t a h , defined as in 3.2, we perform two consecutive steps. First, we attach t L;a h 0 to t U;a h , resulting in a new intermediate tree t 1 . As a second step, we attach t U;a h 0 to t 1 , resulting in a new tree t 2 which is t U;a h with t a h 0 attached as an upper dependent, as desired. Both steps are depicted in Figure 5; here we introduce the convention of indicating tree grouping through a dashed line. A symmetric procedure can be used to attach t a h 0 as a lower dependent to t L;a h . The correctness of the two step approach follows from properties P1 and P3 in 3.2. By property P2 in 3.2, in both steps above the lexical heads a h and a h 0 can be read from the boundaries of the involved trees. Then these steps can be implemented more efficiently than the naïve method of attaching t a h 0 to t a h in a single step. A more detailed computational analysis will be provided in 5.7. To simplify the presentation, we restrict the use of head splitting to trees with a gap and parse trees with no gap with the naïve method; this does not affect the computational complexity.

Item Types
We distinguish five different types of items, indicated by the subscript X 2 f0; L; U; =L; =U g, as described in what follows.
If X D 0, we have p D q D and yd.a h / is specified as in 4.
If X D L, we use the item to represent some lower tree. We have therefore p; q ¤ and h 2 fi C 1; qg.
If X D U , we use the item to represent some upper tree. We have therefore p; q ¤ and h 2 fj; p C 1g.
If X D =L or X D =U , we use the item to represent some intermediate step in the parsing process, in which only the lower or upper tree of some dependent has been collected by the head a h , and we are still missing the upper (=U ) or the lower (=L) tree.
We further specialize symbol =U by writing =U < (=U > ) to indicate that the missing upper tree should have its head to the left (right) of its gap. We also use =L < and =L > with a similar meaning.

Item Normal Form
It could happen that our algorithm produces items of type 0 that do not satisfy the normal form condition discussed in 4. To avoid this problem, we assume that every item of type 0 that is produced by the algorithm is converted into an equivalent normal form item, by means of the following rules: OEi; j ; ; ; i 0 OEi 1; j ; ; ; i 0 (1) OEi; j ; ; ; j C 1 0 OEi; j C 1; ; ; j C 1 0 (2)

Items of Type 0
We start with deduction rules that produce items of type 0. As already mentioned, we do not apply the head splitting technique in this case. The next rule creates trees with a single node, representing the head, and no dependents. The rule is actually an axiom (there is no antecedent) and the statement i 2 OE1; n is a side condition.
OEi 1; i ; ; ; i 0˚i 2 OE1; n The next rule takes a tree headed in a h 0 and makes it a dependent of a new head a h . This rule implements what has been called the 'hook trick'. The first side condition enforces that the tree headed in a h 0 has collected all of its dependents, as discussed in 4. The second side condition enforces that no cycle is created. We also write a h ! a h 0 to indicate that a new dependency is created in the parse forest.
OEi; j ; ; ; h 0 0 OEi; j ; ; ; h 0 8 < : We need the special case in (6) to deal with the concatenation of two items that share the head a h at the concatenation point. Observe the apparent mismatch in step (6) between index h in the first antecedent and index h 1 in the second antecedent. This is because in our normal form, both the first and the second antecedent have already incorporated a copy of the shared head a h . The next two rules collect a dependent of a h that wraps around the dependents that have already been collected. As already discussed, this operation is performed by two successive steps: We first collect the lower tree and then the upper tree. We present the case in which the shared head of the two trees is placed at the left of the gap. The case in which the head is placed at the right of the gap is symmetric.
Again, there is an overlap in rule (8) between the two antecedents, due to the fact that both items have already incorporated copies of the same head.

Items of Type U
We now consider the deduction rules that are needed to process upper trees. Throughout this subsection we assume that the head of the upper tree is placed at the left of the gap. The other case is symmetric. The next rule creates an upper tree with a single node, representing its head, and no dependents. We construct an item for all possible right gap boundaries j .
OEi 1; i ; j; j ; i U i 2 OE1; n j 2 OEi C 1; n The next rule adds to an upper tree a group of new dependents that do not have any gap. We present the case in which the new dependents are placed at the left of the gap of the upper tree.
OEi; i 0 ; ; ; j 0 OEi 0 ; j ; p; q; j U OEi; j ; p; q; j U The next two rules collect a new dependent that wraps around the upper tree. Again, this operation is performed by two successive steps: We first collect the lower tree, then the upper tree. We present the case in which the shared head of the two trees is placed at the left of the gap. OEi 0 ; j ; p; q 0 ; j U OEi; i 0 ; q 0 ; q; i C 1 L OEi; j ; p; q; j =U < OEi 0 ; j ; p; q 0 ; j =U < OEi; i 0 C 1; q 0 ; q; i 0 C 1 U OEi; j ; p; q; j U˚a j ! a i 0 C1 (12)

Items of Type L
So far we have always expanded items (type 0 or U ) at their external boundaries. When dealing with lower trees, we have to reverse this strategy and expand items (type L) at their internal boundaries. Apart from this difference, the deduction rules below are entirely symmetric to those in 5.5. Again, we assume that the head of the lower tree is placed at the left of the gap, the other case being symmetric. Our first rule creates a lower tree with a single node, representing its head. We blindly guess the right boundary of the gap of such a tree.
OEi 1; i; j; j ; i L i 2 OE1; n j 2 OEi C 1; n The next rule adds to a lower tree a group of new dependents that do not have any gap. We present the case in which the new dependents are placed at the left of the gap of the lower tree. OEj 0 ; j ; ; ; i C 1 0 OEi; j 0 ; p; q; i C 1 L OEi; j ; p; q; i C 1 L The next two rules collect a new dependent with a gap and embed it within the gap of our lower tree, creating a new dependency. Again, this operation is performed by two successive steps, and we present the case in which the common head of the lower and upper trees that are embedded is placed at the left of the gap, the other case being symmetric.
OEi; j 0 ; p 0 ; q; i C 1 L OEj 0 ; j ; p; p 0 ; j U OEi; j ; p; q; i C 1 =L < OEi; j 0 ; p 0 ; q; i C 1 =L < OEj 0 1; j ; p; p 0 ; j 0 L OEi; j ; p; q; i C 1 L˚a i C1 ! a j 0 (16) Figure 6: Node a h satisfies both the 1-inherit and headsplit conditions. Accordingly, tree t a h can be split into three fragments t U;a h , t LL;a h and t LR;a h .

Runtime
The algorithm runs in time O.n 6 /, where n is the length of the input sentence. The worst case is due to deduction rules that combine two items, each of which represents trees with one gap. For instance, rule (11) involves six free indices ranging over OE1; n, and thus could be instantiated O.n 6 / many times. If the head-split property does not hold, attachment of a dependent in one step results in time O.n 7 /, as seen for instance in Gómez-Rodríguez et al. (2011).

Parsing of 1-Inherit Head-Split Trees
In this section we specialize the parsing algorithm of 5 to a new, more efficient algorithm for a restricted class of trees.

1-Inherit Head-Split Trees
Pitler et al. (2012) introduce a restriction on well-nested dependency trees with block-degree at most 2. A tree t satisfies the 1-inherit property if, for every node a h in t with bd.a h ; t/ D 2, there is at most one dependency a h ! a d such that gap.tOEa d / contains gap.tOEa h /. Informally, this means that yd.tOEa d / 'crosses over' gap.tOEa h /, and we say that a d 'inherits' the gap of a h . In this section we investigate the parsing of head-split trees that also have the 1-inherit property.
Example 4 Figure 6 shows a head node a h along with dependents a d i , satisfying the head-split condition. Only t a d 1 has its yield crossing over gap.t a h /. Thus a h also satisfies the 1-inherit condition.

Basic Idea
Let t a h be some tree satisfying both the head-split property and the 1-inherit propery. Assume that the dependent node a d which inherits the gap of t a h is placed within t U;a h . This means that, for every dependency a h ! a d in t L;a h , yd.t OEa d / does not cross over gap.t L;a h /. Then we can further split t L;a h into two trees, both with root a h . We call these two trees the lower-left tree, written t LL;a h , and the lower-right tree, written t LR;a h ; see again Figure 6. The basic idea behind our algorithm is to split t a h into three dependency trees t U;a h , t LL;a h and t LR;a h , all sharing the same root a h . This means that t a h can be attached to an existing tree through three successive steps, each processing one of the three trees above. The correctness of this procedure follows from a straightforward extension of properties P1 and P3 from 3.2, stating that the tree fragments t U;a h , t LL;a h and t LR;a h can be represented and processed one independently of the others, and freely combined if certain conditions are satisfied by their yields.
In case a d is placed within t L;a h , we introduce the upper-left and the upper-right trees, written t UL;a h and t UR;a h , and apply a similar idea.

Item Types
When processing an attachment, the order in which the algorithm assembles the three tree fragments of t a h defined in 6.2 is not always the same. Such an order is chosen on the basis of where the head a h and the dependent a d inheriting the gap are placed within the involved trees. As a consequence, in our algorithm we need to represent several intermediate parsing states. Besides the item types from 5.2, we therefore need additional types. The specification of these new item types is rather technical, and is therefore delayed until we introduce the relevant deduction rules.

Items of Type 0
We start with the deduction rules for parsing of trees t LL;a h and t LR;a h ; trees t UL;a h and t UR;a h can be treated symmetrically. The yields of t LL;a h and t LR;a h have the form specified in 4 for the case p D q D . We can therefore use items of type 0 to parse these trees, adopting a strategy similar to the one in 5.4. The main difference is that, when a tree t a h 0 with a gap is attached as a dependent to the head a h , we use three consecutive steps, each processing a single fragment of t a h 0 . We assume below that t a h 0 can be split into trees t U;a h 0 , t LL;a h 0 and t LR;a h 0 , the other case can be treated in a similar way.
We use rules (3), (4) and (5) from 5.4. Since in Figure 7: Tree t U;a h is decomposed into t a d and subtrees covering substrings i , i 2 OE1; 4. Tree t a d is in turn decomposed into three fragments (trees t LL;a d , t LR;a d , and t U;a d in this example). the trees t LL;a h and t LR;a h the head is never placed in the middle of the yield, rule (6) is not needed now and it can safely be discarded. Rule (7), attaching a lower tree, needs to be replaced by two new rules, processing a lower-left and a lower-right tree. We assume here that the common head of these trees is placed at the left boundary of the lower-left tree; we leave out the symmetric case.
OEi; i 0 ; ; ; i C 1 0 OEi 0 ; j ; ; ; h 0 OEi; j ; ; ; h =LR <˚h 6 2 OEi C 1; i 0 (17) OEj 0 ; j ; ; ; i C 1 0 OEi; j 0 ; ; ; h =LR < OEi; j ; ; ; h =U <˚h 6 2 OEj 0 C 1; j The first antecedent in (17) encodes a lower-left tree with its head at the left boundary. The consequent item has then the new type =LR < , meaning that a lower-right tree is missing that must have its head at the left. The first antecedent in (18) provides the missing lower-right tree, having the same head as the already incorporated lower-left tree. After these rules are applied, rule (8) from 5.4 can be applied to the consequent item of (18). This completes the attachment of a 'wrapping' dependent of a h , with the incorporation of the missing upper tree and with the construction of the new dependency.

Items of Type U
We now assume that node a d is realized within  We start by observing that yd.t a d / splits yd.t U;a h / into at most four substrings i ; see Figure 7. 2 Because of the well-nested property, within the tree t U;a h each dependent of a h other than a d has a yield that is entirely placed within one of the i 's substrings. This means that each substring i can be parsed independently of the other substrings.
As a first step in the process of parsing t U;a h , we parse each substring i . We do this following the parsing strategy specified in 6.4. As a second step, we assume that each of the three fragments resulting from the decomposition of tree t a d has already been parsed; see again Figure 7. We then 'merge' these three fragments and the trees for segments i 's into a complete parse tree representing t U;a h . This is described in detail in what follows.
We assume that a h is placed at the left of the gap of t U;a h (the right case being symmetrical) and we distinguish four cases, depending on the two ways in which t a d can be split, and the two side positions of the head a d with respect to gap.t a d /.
Case 1 We assume that t a d can be split into trees t U;a d , t LL;a d , t LR;a d , and the head a d is placed at the left of gap.t a d /; see again Figure 7.
Rule (19) below combines t LL;a d with a parse for segment 2 , which has its head a h placed at its right boundary; see Figure 8 for a graphical representation of rule (19) Figure 9: Decomposition of t U;a h as in Figure 7, with highlighted application of rules (22) and (23). between these heads will be constructed later.
OEi; i 0 ; ; ; i C 1 0 OEi 0 ; j ; ; ; j 0 OEi; j ; ; ; j HH Rule (20) combines t U;a d with a type 0 item representing t LR;a d ; see again Figure 8. Note that this combination operation expands an upper tree at one of its internal boundaries, something that was not possible with the rules specified in 5.5.
OEi; j ; p 0 ; q; j U OEp; p 0 ; ; ; j 0 OEi; j ; p; q; j U Finally, we combine the consequents of (19) and (20), and process the dependency that was left pending in the item of type HH.
OEi; j 0 ; p; q; j 0 U OEj 0 1; j ; ; ; j HH OEi; j ; p; q; j U˚a j ! a j 0 (21) After the above steps, parsing of t U;a h can be completed by combining item OEi; j ; p; q; j U from (21) with items of type 0 representing parses for the substrings 1 , 3 and 4 .
Case 2 We assume that t a d can be split into trees t U;a d , t LL;a d , t LR;a d , and the head a d is placed at the right of gap.t a d /, as depicted in Figure 9.
Rule (22) below, graphically represented in Figure 9, combines t U;a d with a type 0 item representing t LL;a d . This can be viewed as the symmetric version of rule (20) of Case 1, expanding an upper tree at one of its internal boundaries.
OEi; j 0 ; p; q; p C 1 U OEj 0 ; j ; ; ; p C 1 0 OEi; j ; p; q; p C 1 U Next, we combine the result of (22) with a parse for substring 2 . The result is an item of the new type =LR > . This item is used to represent an intermediate tree fragment that is missing a lower-right tree with its head at the right. In this fragment, two heads are left pending, and a dependency relation will be eventually established between them.
OEi; j 0 ; p; q; p C 1 U OEj 0 ; j ; ; ; j 0 OEi; j ; p; q; j =LR > The next rule combines the consequent item of (23) with a tree t LR;a d having its head at the right boundary, and processes the dependency that was left pending in the =LR > item.
OEi; j ; p 0 ; q; j =LR > OEp; p 0 C 1; ; ; p 0 C 1 0 OEi; j ; p; q; j U˚a j ! a p 0 C1 (24) After the above rules, parsing of t U;a h continues by combining the consequent item OEi; j ; p; q; j U from rule (24) with items representing parses for the substrings 1 , 3 and 4 .
Cases 3 and 4 We informally discuss the cases in which t a d can be split into trees t L;a d , t UL;a d , t UR;a d , for both positions of the head a d with respect to gap.t a d /. In both cases we can adopt a strategy similar to the one of Case 2. We first expand t L;a d externally, at the side opposite to the head a d , with a tree fragment t UL;a d or t UR;a d , similarly to rule (22) of Case 2. This results in a new fragment t 1 . Next, we merge t 1 with a parse for 2 containing the head a h , similarly to rule (23) of Case 2. This results in a new fragment t 2 where a dependency relation involving the heads a d and a h is left pending. Finally, we merge t 2 with a missing tree t UL;a d or t UR;a d , and process the pending dependency, similarly to rule (24). One should contrast this strategy with the alternative strategy adopted in Case 1, where the fragment of t a d having block-degree 2 cannot be merged with a parse for the segment containing the head a h ( 2 in Case 1), because of an intervening fragment of t a d with block-degree 1 (t LL;a d in Case 1).
Finally, if there is no node a d in t U;a h that inherits the gap of a h , we can split t U;a h into two dependency trees, as we have done for t L;a h in 6.2, and parse the two fragments using the strategy of 6.4.

Runtime
Our algorithm runs in time O.n 5 /, where n is the length of the input sentence. The reason of the improvement with respect to the O.n 6 / result of 5 is that we no longer have deduction rules where both antecedents represent trees with a gap. In the new algorithm, the worst case is due to rules where only one antecedent has a gap. This leads to rules involving a maximum of five indices, ranging over OE1; n. These rules can be instantiated in O.n 5 / ways.

Empirical Coverage
We have seen that the restriction to head-split dependency trees enables us to parse these trees one order of magnitude faster than the class of well-nested dependency trees with block-degree at most 2.
In connection with the 1-inherit property, this even increases to two orders of magnitude. However, as already stated in 2, this improvement is paid for by a loss in coverage; for instance, trees of the form shown in Figure 3 cannot be parsed any longer.

Quantitative Evaluation
In order to assess the empirical loss in coverage that the restriction to head-split trees incurs, we evaluated the coverage of several classes of dependency trees on standard data sets. Following Pitler et al. (2012), we report in Table 1 figures for the training sets of six languages used in the CoNLL-X shared task on dependency parsing (Buchholz and Marsi, 2006). As we can see, the O.n 6 / class of head-split trees has only slightly lower coverage on this data than the baseline class of well-nested dependency trees with block-degree at most 2. The losses are up to 0.2 percentage points on five of the six languages, and 0.9 points on the Dutch data. Our even more restricted O.n 5 / class of 1-inherit head-split trees has the same coverage as our O.n 6 / class, which is expected given the results of Pitler et al. (2012): Their O.n 6 / class of 1-inherit trees has exactly the same coverage as the baseline (and thereby more coverage than our O.n 6 / class). Interestingly though, their O.n 5 / class of 'gap-minding' trees has a significantly smaller coverage than our O.n 5 / class. We conclude that our class seems to strike a good balance between expressiveness and parsing complexity.

Qualitative Evaluation
While the original motivation behind introducing the head-split property was to improve parsing complexity, it is interesting to also discuss the linguistic relevance of this property. A first inspection of the structures that violate the head-split property revealed that many such violations disappear if one ignores gaps caused by punctuation. Some decisions about what nodes should function as the heads of punctuation symbols lead to more gaps than others. In order to quantify the implications of this, we recomputed the coverage of the class of head-split trees on data sets where we first removed all punctuation. The results are given in Table 2. We restrict ourselves to the five native dependency treebanks used in the CoNLL-X shared task, ignoring treebanks that have been converted from phrase structure representations.  We see that when we remove punctuation from the sentences, the number of violations against the head-split property at most decreases. For Danish and Slovene, removing punctuation even has the effect that all well-nested dependency trees with blockdegree at most 2 become head-split. Overall, the absolute numbers of violations are extremely smallexcept for Czech, where we have 139 violations with and 46 without punctuation. A closer inspection of the Czech sentences reveals that many of these feature rather complex coordinations. Indeed, out of the 46 violations in the punctuation-free data, only 9 remain when one ignores those with coordination. For the remaining ones, we have not been able to identify any clear patterns.

Concluding Remarks
In this article we have extended head splitting techniques, originally developed for parsing of projective dependency trees, to two subclasses of well-nested dependency trees with block-degree at most 2. We have improved over the asymptotic runtime of two existing algorithms, at no significant loss in coverage. With the same goal of improving parsing efficiency for subclasses of non-projective trees, in very recent work Pitler et al. (2013) have proposed an O.n 4 / time algorithm for a subclass of non-projective trees that are not well-nested, using an approach that is orthogonal to the one we have explored here.
Other than for dependency parsing, our results have also implications for mildly context-sensitive phrase structure formalisms. In particular, the algorithm of 5 can be adapted to parse a subclass of lexicalized tree-adjoining grammars, improving the result by Eisner and Satta (2000) from O.n 7 / to O.n 6 /. Similarly, the algorithm of 6 can be adapted to parse a lexicalized version of the tree-adjoining grammars investigated by Satta and Schuler (1998), improving a naïve O.n 7 / algorithm to O.n 5 /.