Learning Strictly Local Subsequential Functions

We define two proper subclasses of subsequential functions based on the concept of Strict Locality (McNaughton and Papert, 1971; Rogers and Pullum, 2011; Rogers et al., 2013) for formal languages. They are called Input and Output Strictly Local (ISL and OSL). We provide an automata-theoretic characterization of the ISL class and theorems establishing how the classes are related to each other and to Strictly Local languages. We give evidence that local phonological and morphological processes belong to these classes. Finally we provide a learning algorithm which provably identifies the class of ISL functions in the limit from positive data in polynomial time and data. We demonstrate this learning result on appropriately synthesized artificial corpora. We leave a similar learning result for OSL functions for future work and suggest future directions for addressing non-local phonological processes.


Introduction
In this paper we define two proper subclasses of subsequential functions based on the properties of the well-studied Strictly Local formal languages (Mc-Naughton and Papert, 1971;Rogers and Pullum, 2011;Rogers et al., 2013). These are languages that can be defined with grammars of substrings of length k (called k-factors), such that a string is in the language only if its own k-factors are a subset of the grammar. These languages have also been characterized by Rogers and Pullum (2011) as those that have the property expressed in the following theorem (which can be taken as a defining property): Theorem 1 (Suffix Substitution Closure). L is Strictly Local iff for all strings u 1 , v 1 , u 2 , v 2 , there exists k ∈ N such that for any string x of length k − 1, if u 1 xv 1 , u 2 xv 2 ∈ L, then u 1 xv 2 ∈ L. These languages can model natural language phonotactic constraints which pick out contiguous substrings bounded by some length k (Heinz, 2007;Heinz, 2010). We define Input Strictly Local (ISL) and Output Strictly Local (OSL) functions which model phonological processes for which the target and triggering context are a bounded contiguous substring. Here our use of 'process' is not specific to rule-based grammatical formalisms (such as SPE (Chomsky and Halle, 1968)). ISL and OSL functions model mappings from underlying forms to surface forms, which are also the bedrock of constraintbased frameworks like Optimality Theory (Prince and Smolensky, 1993).
By showing that local phonological processes can be modeled with ISL (and OSL) functions, we provide the strongest computational characterization of the input-output mappings these processes represent. While it has been shown that phonological mappings describable with rules of the form A → B / C D (where A, B, C, and D are regular languages) are regular (Johnson, 1972;Kaplan and Kay, 1994), and even subsequential Heinz and Lai, 2013), many logically possible regular and subsequential mappings are not plausible phonological mappings. Since these implausible mappings cannot be modeled with ISL or OSL functions, we provide a more precise notion of what constitues plausible phonological mappings.
In addition, we present the Input SL Function Learning Algorithm (ISLFLA) and prove that it identifies this class in the limit (Gold, 1967) in polynomial time and data (de la Higuera, 1997). Our approach follows previous work on learning subsequential transductions, namely Oncina et al. (1993), Oncina and Varò (1996), Castellanos et al. (1998), and Gildea and Jurafsky (1996). Oncina et al. (1993) present OSTIA (Onward Subsequential Transducer Inference Algorithm), an algorithm that learns the class of subsequential functions in the limit from positive data. OSTIA is only guaranteed to identify total functions exactly, but Oncina and Varò (1996) and Castellanos et al. (1998) present the modifications OSTIA-N, OSTIA-D, and OSTIA-R, which learn partial functions using negative data, domain information, and range information, respectively.
In terms of linguistic applications, Gildea and Jurafsky (1996) show that OSTIA fails to learn the phonological mapping of English flapping when given natural language data. The authors modified OSTIA with three learning heuristics (context, community, and faithfulness) and showed that the modified learner successfully learns flapping and several other phonological rules. Context encodes the idea that phonological changes depend on the context of the segment undergoing the change. Community gives the learner the ability to deduce that segments belonging to a natural class are likely to behave similarly. Lastly, faithfulness, by which underlying segments are assumed to be realized similarly on the surface, was encoded with a forced alignment between the input-output strings in the data set. 1 We believe this alignment removes OSTIA's guarantee that all subsequential functions are learned.
Similar to the approach of Gildea and Jurafsky (1996), our learner employs a context bias because it knows its target is an ISL function and therefore the transduction only involves bounded contiguous substrings. And similar to OSTIA-D (Oncina and Varò, 1996;Castellanos et al., 1998), the ISLFLA makes use of domain information, because it makes decisions based on the input strings of the data set. It also employs a faithfulness bias in terms of the prop-erty onwardness (see §2). The ISLFLA is supported by a theoretical result like Oncina et al. (1993), but learns a more restrictive class of mappings. We believe the theoretical results for this class will lead to new algorithms which include something akin to the community bias and that will succeed on natural language data while keeping strong theoretical results.
The proposed learner also builds on earlier work by Chandlee and Koirala (2014) and Chandlee and Jardine (2014) which also used strict locality to learn phonological processes but with weaker theoretical results. The former did not precisely identify the class of functions the learner could learn, and the latter could only guarantee learnability of the ISL functions with a closed learning sample.
The paper is organized as follows. §2 presents the mathematical notations to be used. §3 defines ISL and OSL functions, provides an automata-theoretic characterization for ISL, and establishes some properties of these classes. §4 demonstrates how these functions can model local phonological processes, including substitution, insertion, and deletion. §5 presents the ISLFLA, proves that it efficiently learns the class of ISL functions, and provides demonstrations. §6 concludes.

Preliminaries
The set of all possible finite strings of symbols from a finite alphabet Σ and the set of strings of length ≤ n are Σ * and Σ ≤n , respectively. The unique empty string is represented with λ. The length of a string w is |w|, so |λ| = 0. The set of prefixes of w, Pref(w), is {p ∈ Σ * | (∃s ∈ Σ * )[w = ps]}, and the set of suffixes of w, Suff(w), is {s ∈ Σ * | (∃p ∈ Σ * )[w = ps]}. For all w ∈ Σ * and n ∈ N, Suff n (w) is the single suffix of w of length n if |w| ≥ n; otherwise Suff n (w) = w. If w = ps, then ws −1 = p and p −1 w = s. The longest common prefix of a set of strings S, lcp(S), is p ∈ ∩ w∈S Pref(w) such that ∀p ∈ ∩ w∈S Pref(w), |p | < |p|.
A function f with domain A and co-domain B is written f : A → B. We sometimes write x → f y for f (x) = y. When A and B are free monoids (like Σ * ), the input and output languages of a function f are the stringsets Following Oncina et al. (1993), a subsequen-tial finite state transducer (SFST) is a 6-tuple (Q, q 0 , Σ, Γ, δ, σ), where Q is a finite set of states, Σ and Γ are finite alphabets, q 0 ∈ Q is the initial state, δ ⊆ Q × Σ × Γ * × Q is a set of edges, and σ : Q → Γ * is the final output function that maps states to strings that are appended to the output if the input ends in that state. δ recursively defines a mapping δ * : (q, λ, λ, q) ∈ δ * ; if (q, u, v, q ) ∈ δ * and (q , σ, w, q ) ∈ δ then (q, uσ, vw, q ) ∈ δ * . SFSTs are deterministic, which means their edges have the following property: [(q, a, u, r), (q, a, v, s) ∈ δ ⇒ u = v ∧ r = s]. Hence, we also refer to δ as the transition function, and ∀(q, a, u, r) ∈ δ, we let δ 1 (q, a) = u and δ 2 (q, a) = r.
The relation that a SFST T recognizes/accepts/generates is Since SFSTs are deterministic, the relations they recognize are functions. Subsequential functions are defined as those describable with SFSTs.
For any function f : Σ * → Γ * and x ∈ Σ * , let the tails of x with respect to f be defined as If strings x 1 , x 2 ∈ Σ * have the same set of tails with respect to a function f , they are tail-equivalent with respect to f and we write x 1 ∼ f x 2 . Clearly, ∼ f is an equivalence relation which partitions Σ * .
The above theorem can be seen as the functional analogue to the MyHill-Nerode theorem for regular languages. Recall that for any stringset L, the tails of a word w w.r.t. L is defined as tails L (w) = {u | wu ∈ L}. These tails can be used to partition Σ * into a finite set of equivalence classes iff L is regular. Furthermore, these equivalence classes are the basis for constructing the (unique up to isomorphism) smallest deterministic acceptor for a regular language. Likewise, Oncina and Garcia's proof of Theorem 2 shows how to construct the (unique up to isomorphism) smallest SFST for a subsequential function f . We refer to this transducer as the canonical transducer for f and denote it with T C f . The states of T C f are in one-to-one correspondence with tails f (x) for all x ∈ Σ * (Oncina and García, 1991). To construct T C f first let, for all x ∈ Σ * and a ∈ Σ, the contribution of a w.r.t.
x be cont f (a, x) = lcp(f (xΣ * ) −1 lcp(f (xaΣ * )). Then, T C f has an important property called onwardness. Definition 1 (onwardness). For all q ∈ Q let the outputs of the edges out of q be outs Informally, this means that the writing of output is never delayed. Readers are referred to Oncina and García (1991), Oncina et al. (1993), and Mohri (1997) for more on SFSTs.

Strictly Local Functions
In this section we define Input and Output Strictly Local functions and provide properties of these classes. These definitions are analogous to the language-theoretic definition of Strictly Local languages (Theorem 1) (Rogers and Pullum, 2011;Rogers et al., 2013).
The theorem below establishes an automatatheoretic characterization of ISL functions.
Theorem 3. A function f is ISL iff there is some k such that f can be described with a SFST for which This theorem helps make clear how ISL functions are Markovian: the output for input symbol a depends on the last (k − 1) input symbols. Also, since the transducer defined in Theorem 3 is deterministic, it is unique and we refer to it as T ISL f . T ISL f may not be isomorphic to T C f . Figure 1 shows T ISL f (with k = 2) and T C f for the identity function. Before we present the proof of Theorem 3, we make the following remarks, and then prove a lemma.
Remark 1. For all n, m ∈ N with n ≤ m, and for all w ∈ Σ * , Suff n (Suff m (w)) = Suff n (w) since both w and Suff m (w) share the same n-long suffix.
. This is a direct consequence of Remark 1.
Next, we show that T computes the same function as T C f . We first establish by induction on δ * the following ( ): c . Clearly tails f (λ), λ, λ, tails f (λ) ∈ δ * c and λ, λ, λ, λ ∈ δ * by definition of the extended transition function. Thus the base case is satisfied.
Similar to Definition 2 above, the following definition of OSL functions says that if the outputs of two input strings share the same suffix of length (k − 1), then those inputs have the same tails.
The automata-theoretic characterization of this class is left as future work.
Next, some properties of ISL and OSL functions are identified. The proofs of the following theorems will refer to the example SFSTs in Figure 2.
First we establish that ISL and OSL functions are proper subclasses of the subsequential functions.  Proof. Consider the subsequential function f 1 in Figure 2 and choose any k ∈ N.
Since f 1 (bc k b) = bc k b and f 1 (ac k b) = ac k a it follows that tails f 1 (bc k ) = tails f 1 (ac k ) even though inputs bc k and ac k share a common suffix of length (k − 1). Likewise, the outputs of these inputs f 1 (bc k ) = bc k and f 1 (ac k ) = ac k share a common suffix of length (k − 1), but the tails of these inputs, as mentioned, are distinct. Hence by Definitions 2 and 3, there is no k for which f 1 is ISL or OSL.
Theorem 5. The class of ISL functions is incomparable to the class of OSL functions.
Proof. Consider f 2 in Figure 2. This function is ISL by Theorem 3. However it is not OSL. Consider any k ∈ N. Observe that f 2 (aa k ) = f 2 (ab k ) = ab k and so the outputs share the same suffix of length (k − 1). However, tails f 2 (aa k ) = tails f 2 (ab k ) since (a, b) ∈ tails f 2 (aa k ) but (a, a) ∈ tails f 2 (ab k ).
Similarly, consider f 3 in Figure 2. This function is 2-OSL because inputs mapped to outputs which end in a have the same tails (i.e., they all lead to state a), and inputs mapped to outputs ending in b have the same tails.
However, f 3 is not ISL. Consider any k ∈ N. The inputs cb k and ab k share the same suffix of length (k − 1). However, tails f 3 (cb k ) = Finally, the two classes are obviously not disjoint, since the identity function is both ISL and OSL.
Theorem 6. The output language of an ISL or OSL function is not necessarily a Strictly Local language.
Proof. Consider f 4 in Figure 2. By Theorem 3, it is ISL. It is also 2-OSL since inputs mapped to outputs which end in a have the same tails (i.e., they all lead to state a). Similarly, inputs mapped to outputs ending in b have the same tails.
Let k ∈ N. Observe that f 4 (a k ) = a 2k and further that there is no input x and k such that f 4 (x) = a 2k+1 . Since a 2k = a k−1 a k a = a k−2 a k aa, if the output language of f 4 were SL it would follow, by Theorem 1, that a k−1 a k aa = a 2k+1 also belongs to the output language. But there is no input in Σ * which f 4 maps to an odd sequence of a's.
Theorem 7. If either the output or input language of a subsequential function f is Strictly Local then it does not follow that f belongs to either ISL or OSL.
Proof. Let Σ = Γ = {a} and consider the function f 5 which, for all n ∈ N, maps a n to a n if n is even and a n to a n a if n is odd. f 5 is subsequential, as shown in Figure 2, and its domain, a * , is a Strictly Local language. However, f 5 is neither ISL nor OSL for any k since the tails of a 2k includes (λ, λ) but the tails of aa 2k includes (λ, a).
Next consider f 6 , which for all n ∈ N maps a n to a m where m = (n div 2) if n is even and m = (n div 2)+1 if n is odd. f 6 is subsequential, as shown in Figure 2, and the image(f ) = a * is Strictly Local. However, f is neither ISL nor OSL for any k since the tails of a 2k includes (a, a) but the tails of aa 2k includes (a, λ).

Relevance to Phonology
This section will briefly discuss the range of phonological processes that can be modeled with ISL and/or OSL functions by providing illustrative examples for three common, representative processes. These are shown in (1) with SPE-style rules. First, consider the process of final obstruent devoicing (1-a), attested in German and other languages. This process causes an underlying voiced obstruent in word-final position to surface as its voiceless counterpart. In (1-a), D abbreviates the set of voiced obstruents and T the voiceless obstruents. 3 The mapping from underlying form to surface form that this process describes is represented with the 2-ISL function in Figure 3 (note N represents any sonorant). In addition to substitution processes like (1-a), another common type of process is insertion of a segment, such as the @-epenthesis process attested in Dutch (Warner et al., 2001). This process inserts a schwa between a liquid and a non-coronal consonant, as specified by (1-b). Using L to represent the liquids {l, r} and K to represent any non-coronal consonant, Figure 4 presents T ISL f for this process. Following Beesley and Karttunen (2003), the symbol '?' represents any segment of the alphabet other than the ones for which transitions are defined. 4 Lastly, an example deletion process from Greek (Joseph and Philippaki-Warburton, 1987) deletes interdental fricatives before /T/ or /s/, as in (1-c).  The German, Dutch, and Greek examples demonstrate how ISL functions can be used to model the input-output mapping of a phonological rule. Beyond substitution, insertion, and deletion, it is shown in Chandlee (2014) that ISL and/or OSL functions can also model many metathesis patterns, specifically those for which there is an upper bound on the amount of material that intervenes between a segment's original and surface positions (this appears to include all synchronic patterns). In addition, morpho-phonological processes such as local partial reduplication (i.e., in which the reduplicant is affixed adjacent to the portion of the base it was copied from) and general affixation are also shown to be ISL or OSL. More generally, we currently conjecture that a SPE-style rewrite rule of the form A → B/ C D which applies simultaneously (leftto-right) describes an Input (Output) Strictly Local function iff there is a k such that for all w ∈ CAD it is the case that |w| ≤ k. We refer readers to Kaplan and Kay (1994) and Hulden (2009) for more on how SPE rules and application modes determine mappings.
The next section presents a learning algorithm for ISL functions, the ISLFLA. The development of a corresponding algorithm for OSL functions is the subject of ongoing work, but see §6 for comments.

Learning Input Strictly Local Functions
We now present a learning algorithm for the class of ISL functions that uses its defining property as an inductive principle to generalize from a finite amount of data to a possibly infinite function. This learner begins with a prefix tree representation of this data and then generalizes by merging states. 5 Its state merging criterion is based on the defining property of ISL functions: two input strings with the same suffix of length (k − 1) have the same tails. The next section explains in detail how the algorithm works.

The Algorithm: ISLFLA
Given a finite dataset D of input-output string pairs (w, w ) such that f (w) = w , where f is the target function, the learner tries to build T ISL f . The dataset is submitted to the learner in the form of a prefix tree transducer (PTT), which is defined in Definition 4.

Definition 4 (Prefix Tree Transducer). A prefix tree transducer for finite set
• σ(w) = w for all (w, w ) ∈ D As an example, the sample of data in (2) exemplifies the final devoicing rule in (1-a). Figure 6 gives the PTT for this data.  (2) Given such a PTT, the learner's first step is to make it onward. In the PTT, the output string is withheld until the end of the input (i.e., #) is reached. In the onward version (shown in Figure 7), output is advanced as close to the root as possible. This involves determining the lcp of all the output strings of all outgoing transitions of a state (starting from the leaves) and suffixing that lcp to the output of the single incoming transition of the state.  Figure 6 Once the learner has constructed this onward representation of the data, it begins to merge states, using two nested loops. The outer loop proceeds through the entire state set (ordered lexicographically) and merges each state with the state that corresponds to its (k − 1)-length suffix. For example, for final devoicing k = 2, so each state will be merged with the state that represents its final symbol. This merging may introduce non-determinism, which must be removed since by definition T ISL f is deterministic. Non-determinism is removed in the inner loop with additional state merges.
Consider the situation depicted on the lefthand side of Figure 8, which resulted from a previous merge. The non-determinism could be resolved by merging states 1 and 2, except for the fact that the output strings of the two transitions differ. Before merging 1 and 2, therefore, the learner performs an operation called pushback that retains the lcp of the two output strings and prefixes what's left to all output strings of all outgoing transitions from the respective destination states.
In the example in Figure 8, pushback is applied to the edges (0, T, DT, 1) and (0, T, DT N, 2). Only the lcp of the output strings, which is DT, is retained as the output string of both edges. The remainder (which is λ and N, respectively) is prefixed to the output string of all transitions leaving the respective destination state. The result is shown on the righthand side of Figure 8. Essentially, pushback 'undoes' onwardness when needed.
After pushback, states 1 and 2 can be merged. This removes the initial non-determinism but creates new non-determinism. The inner loop iterates until all non-determinism is resolved, after which the outer loop continues with the next state in the order. If the inner loop encounters non-determinism that cannot be removed, the ISLFLA exits with a message indicating that the data sample is insufficient.
Non-removable non-determinism occurs if and only if the situation depicted on the left in Figure  9 obtains. The normal procedure for removing non- determinism cannot be applied in this case. Assuming z = λ, all of z would have to be pushed back, but since this transition has no destination state there is nowhere for z to go. OSTIA handles this situation by rejecting the outer loop merge that led to it, restoring the FST to its state before that merge and moving on to the next possible merge. But the ISLFLA cannot reject merges. If it could, the possibility would arise that two states with the same (k − 1)-length suffix would remain distinct in the final FST the learner outputs. Such a FST would not (by definition) be ISL. Therefore, the algorithm is at an impasse: rejecting a merge can lead to a non-ISL FST, while allowing it can lead to a nonsubsequential (hence non-ISL) FST. It therefore terminates. Below is pseudo-code for the ISLFLA.

Learning Results
Here, we present a proof that the ISLFLA identifies the class of ISL functions in the limit from positive data, in the sense of Gold (1967), with polynomial bounds on time and data (de la Higuera, 1997). First, we establish the notion of characteristic sample.
Definition 6 (Characteristic sample). A sample CS is characteristic of a function f for an algorithm A if for all samples S s.t. CS ⊆ S, A returns a representation τ such that for all x ∈ dom(f ), τ (x) = f (x), and for all x / ∈ dom(f ), τ (x) is not defined.
We can now define the learning criteria.
Definition 7 (Identification in polynomial time and data (de la Higuera, 1997)). A class F of functions is identifiable in polynomial time and data using a class T of representations iff there exist an algorithm A and two polynomials p() and q() such that: 1. Given a sample S of size m for f ∈ F, A returns a hypothesis in O(p(m)) time; 2. For each representation τ of size k of a function f ∈ F, there exists a characteristic sample CS of f for A of size at most O(q(k)).
Essentially the proof for convergence follows from the fact that given a sufficient sample the algorithm merges all and only states with the same (k − 1)-length suffix.
Clearly, merges in the outer loop only involve states with the same (k − 1)-length suffix. This is also the case for inner loop merges. Consider the scenario depicted on the right in Figure 9, in which q is a state created by an outer loop merge.
After pushback, states s and t will be merged. If x = Suff k−1 (q), then both s and t must have xa as a suffix. Since |xa| = k, it follows that Suff k−1 (s) = Suff k−1 (t). It also follows that additional states merged to remove non-determinism resulting from the merge of s and t will have the same suffix of length (k − 1). To show that all states with the same (k − 1)-length suffix will be merged, it is shown that the ISLFLA will never encounter the situation in Figure 9, provided the data set includes a seed defined as follows.
Definition 8 (ISLFLA seed). Given a k-ISL function Lemma 2 (Convergence). A seed for an ISL function f is a characteristic sample for ISLFLA.
Proof. We show that for any ISL function f and a dataset D that contains a seed S the output of the ISLFLA is T ISL f . Let P T T (D) = (Q P T T , q 0 P T T , Σ, Γ, δ P T T , σ P T T ) be the input to the ISLFLA and T = (Q T , q 0 T , Σ, Γ, δ T , σ T ) be its output. First we show that Q T = Σ ≤k−1 . By Definitions 4 and 8, Σ ≤k−1 ⊆ Q P T T . Since the ISLFLA only merges states with the same (k − 1)-length suffix, Σ ≤k−1 ⊆ Q T . Since it does not exit until all states q have been merged with Suff k−1 (q), Q T = Σ ≤k−1 .
Next we show that given S, the algorithm will never need to merge two states q 1 and q 2 such that δ 1 (q 1 , #) = δ 1 (q 2 , #). Let δ 1 (q 1 , #) = z, and δ 1 (q 2 , #) = x with z = x and q 1 = Suff k−1 (q 2 ). By Definition 2, tails f (q 1 ) = tails f (q 2 ), so if z = x it must be the case that q 2 does not have transitions for all a ∈ Σ. This is because the only way for the output strings of the outgoing transitions of q 2 to differ from those of q 1 is if fewer transitions were present on q 2 when the PTT was made onward. (By definition of S we know q 1 has transitions for all a ∈ Σ.) But since tails f (q 1 ) = tails f (q 2 ), we also know that z = ux for some u ∈ Γ * .
By Definition 8, all states up to length k have transitions for all a ∈ Σ; therefore, |q 2 | ≥ k + 1. This means ∃q ∈ Σ k between q 1 and q 2 , which will be merged with some other state before q 2 will. This merge will cause non-determinism, which in turn will trigger pushback and cause u to move further down the branch toward q 2 . By extension there will be |q 2 | − k states between q 1 and q 2 , each of which will be merged, triggering pushback of u, so that by the time q 1 and q 2 are merged, δ 1 (q 2 , #) = ux = z = δ 1 (q 2 , #). Thus, all non-determinism can be removed, and so T is subsequential.
It remains to show that ∀q ∈ Q T , a ∈ Σ, δ 2 (q, a) = Suff k−1 (qa). Since state merging preserves transitions, this follows from the construction of P T T (D). By Theorem 3, T = T ISL f . Next we establish complexity bounds on the runtime of the ISLFLA and the size of the characteristic sample for ISL functions. We observe that both of these bounds improve the bounds of OSTIA. While not surprising, since ISL functions are less general than subsequential functions, the result is important since it is an example of greater a priori knowledge enabling learning with less time and data.
Let m be the length of the longest output string in the sample and let n denote the number of states of the PTT; n is at most the sum of the lengths of the input strings of the pairs in the sample.
Lemma 3 (Polynomial time). The time complexity of the ISLFLA is in O(n · m · k · |Σ|).
Proof. First, making the PTT onward can be done in O(m · n): it consists of a depth-first parsing of the PTT from its root, with a computation at each state of the lcp of the outgoing transition outputs after the recursive computation of the function (see de la Higuera (2010), Chap. 18, for details). As the computation of the lcp takes at most m steps, and as it has to be done for each state, the complexity of this step is effectively in O(m · n).
For the two loops, we need to find a bound on the number of merges that can occur. States q such that |q| < k do not yield any merges in the outer loop. All other states q are merged with Suff k−1 (q), in the outer loop or in the inner one. The number of merges is thus bounded by n. Computing the suffix of length (k − 1) of any word can be done in O(k) with a correct implementation of strings of characters. The test of the inner loop can be done in constant time and so can the merge and pushback procedures. After each merge, the test of the inner loop needs to be done at most |Σ| times. As computing the lcp has a complexity in O(m), the overall complexity of the two loops is in O(n · m · k · |Σ|).
Proof. The first item of the seed, S , covers all and only the states of the target: the left projection of these pairs is thus linear in n and every right projection is at most n · m + p. Thus the size of S is at most n · (n + n · m + p) = O(n 2 · m + n · p).
Concerning the second part, S , its cardinality is at most n · |Σ| (in the rare case where Q = Q).
Each element of the left projection of S is of length (k + 1) and each element of its right projection is at most of length (k + 1) · m + p. The size of S is thus in O(n · |Σ| · (k · m + p)).
Therefore, the size of the characteristic sample is in O(n·|Σ|·k·m+n 2 ·m+n·|Σ|·p), which is clearly polynomial in the size of the target transducer.
Theorem 8. The ISLFLA identifies the class of ISL functions in polynomial time and data.

Demonstrations
We tested the ISLFLA with the three examples in §4, as well as the English flapping process (t → R / V V). For each case, a data set was constructed according to Definition 8 using the alphabets presented in §4. The alphabet for English flapping was {V, V, t, ?}. The value of k is 2 for final devoicing, @-epenthesis, and fricative deletion and 3 for English flapping. A few additional data points of length 5 or 6 were also added to make the data set a superset of the seed. In all four cases, the learner returned the correct T ISL f . The decision to use artificial corpora in these demonstrations was motivated by the fact that the sample in Definition 8 will not be present in a natural language corpus. That sample includes all possible sequences of symbols from the alphabet of a given length, whereas a natural language corpus will reflect the language-particular restrictions against certain segment sequences (i.e., phonotactics).
As discussed in the introduction, Gildea and Jurafsky (1996) address this issue with natural language data by equipping OSTIA with a community bias, whereby segments belonging to a natural class (i.e., stops, fricatives, sonorants) are expected to behave similarly, and a faithfulness bias, whereby segments are assumed to be realized similarly on the surface. In our demonstrations we put aside the issue of the behavior of segments in a natural class by using abbreviated alphabets (e.g., T for all voiceless stops). But if in fact knowledge of natural classes precedes the learning of phonological processes, the use of such an alphabet is appropriate.
In future developments of the ISLFLA we likewise aim to accommodate natural language data, but in a way that maintains the theoretical result of identification in the limit. The restrictions on segment sequences represented in natural language data amount to 'missing' transitions in the initial prefix tree transducer that is built from that data. In other words, the transducer represents a partial, not a total function. Thus it seems the approach of Oncina and Varò (1996) and Castellanos et al. (1998) could be very instructive, as their use of domain information enabled OSTIA to learn partial functions. In our case, the fact that the domain of an ISL function is an SL language could provide a means of 'filling in' the missing transitions. The details of such an approach are, however, being left for future work.

Conclusion
This paper has defined Input and Output Strictly Local functions, which synthesize the properties of subsequential transduction and Strictly Local formal languages. It has provided language-theoretic characterizations of these functions and argued that they can model many phonological and morphological processes. Lastly, an automata-theoretic characterization of ISL functions was presented, along with a learning algorithm that efficiently learn this class in the limit from positive data.
Current work includes developing a comparable automata characterization and learning algorithm for OSL functions, as well as defining additional functional classes to model those phonological processes that cannot be modeled with ISL or OSL functions. The SL languages are just one region of a subregular hierarchy of formal languages (McNaughton and Papert, 1971;Rogers and Pullum, 2011;Rogers et al., 2013). The ISL and OSL functions defined here are the first step in developing a corresponding hierarchy of subregular functions. Of immediate interest to phonology are functional counterparts for the Tier-Based Strictly Local and Strictly Piecewise language classes, which have been shown to model long-distance phonotactics (Heinz, 2010;Heinz et al., 2011). Such functions might be useful for modeling the long-distance processes that repair violations of these phonotactic constraints.