Large-scale Semantic Parsing without Question-Answer Pairs

In this paper we introduce a novel semantic parsing approach to query Freebase in natural language without requiring manual annotations or question-answer pairs. Our key insight is to represent natural language via semantic graphs whose topology shares many commonalities with Freebase. Given this representation, we conceptualize semantic parsing as a graph matching problem. Our model converts sentences to semantic graphs using CCG and subsequently grounds them to Freebase guided by denotations as a form of weak supervision. Evaluation experiments on a subset of the Free917 and WebQuestions benchmark datasets show our semantic parser improves over the state of the art.


Introduction
Querying a database to retrieve an answer, telling a robot to perform an action, or teaching a computer to play a game are tasks requiring communication with machines in a language interpretable by them.Semantic parsing addresses the specific task of learning to map natural language (NL) to machine interpretable formal meaning representations.Traditionally, sentences are converted into logical form grounded in the symbols of some fixed ontology or relational database.
Approaches for learning semantic parsers have been for the most part supervised, using annotated training data consisting of sentences and their corresponding logical forms (Zelle and Mooney, 1996;Zettlemoyer and Collins, 2005;Wong and Mooney, 2007;Kwiatkowski et al., 2010).More recently, alternative forms of supervision have been proposed to alleviate the annotation burden, e.g., by learning from conversational logs (Artzi and Zettlemoyer, 2011), from sentences paired with system behavior (Chen and Mooney, 2011;Goldwasser and Roth, Question What is the capital of Texas?Logical Form λx.city(x) ∧ capital(x, Texas) Answer {Austin} Figure 1: An example question with annotated logical query, and its answer.
In this paper, we build a semantic parser that does not require example annotations or question-answer pairs but instead learns from a large knowledge base (KB) and web-scale corpora.Specifically, we exploit Freebase, a large community-authored knowledge base that spans many sub-domains and stores real world facts in graphical format, and parsed sentences from a large corpus.We formulate semantic parsing as a graph matching problem.We convert the output of an open-domain combinatory categorial grammar (CCG) parser (Clark and Curran, 2007) into a graphical representation and subsequently map it onto Freebase.The parser's graphs (also called ungrounded graphs) are mapped to all possible Freebase subgraphs (also called grounded graphs) by replacing edges and nodes with relations and types in Freebase.Each grounded graph corresponds to a unique grounded logical query.During learning, our semantic parser is trained to identify which KB subgraph best corresponds to the NL graph.Problem-capital(Austin) ∧ UNIQUE(Austin) ∧ capital.of.arg1(e,Austin) ∧ capital.of.arg2(e,Texas) (a) Semantic parse of the sentence Austin is the capital of Texas.atically, ungrounded graphs may give rise to many grounded graphs.Since we do not make use of manual annotations of sentences or question-answer pairs, we do not know which grounded graphs are correct.To overcome this, we rely on comparisons between denotations of natural language queries and related Freebase queries as a form of weak supervision in order to learn the mapping between NL and KB graphs.Figure 2 illustrates our approach for the sentence Austin is the capital of Texas.From the CCG syntactic derivation (which we omit here for the sake of brevity) we obtain a semantic parse (Figure 2a) and convert it to an ungrounded graph (Figure 2b).Next, we select an entity from the graph and replace it with a variable x, creating a graph corresponding to the query What is the capital of Texas?(Figure 2c).The math function UNIQUE on Austin in Figure 2b indi-cates Austin is the only value of x which can satisfy the query graph in Figure 2c.Therefore, the denotation1 of the NL query graph is {AUSTIN}.Figure 2d shows two different groundings of the query graph in the Freebase KB.We obtain these by replacing edges and nodes in the query graph with Freebase relations and types.We use the denotation of the NL query as a form of weak supervision to select the best grounded graph.Under the constraint that the denotation of a Freebase query should be the same as the denotation of the NL query, the graph on the left hand-side of Figure 2d is chosen as the correct grounding.
Experimental results on two benchmark datasets consisting of questions to Freebase -FREE917 (Cai and Yates, 2013) and WEBQUESTIONS (Berant et al., 2013) -show that our semantic parser improves over state-of-the-art approaches.Our contributions include: a novel graph-based method to convert natural language sentences to grounded semantic parses which exploits the similarities in the topology of knowledge graphs and linguistic structure, together with the ability to train using a wide range of features; a proposal to learn from a large scale web corpus, without question-answer pairs, based on denotations of queries from natural language statements as weak supervision; and the development of a scalable semantic parser which besides Freebase uses CLUEWEB09 for training, a corpus of 503.9 million webpages.Our semantic parser can be downloaded from http://sivareddy.in/ downloads.

Framework
Our goal is to build a semantic parser which maps a natural language sentence to a logical form that can be executed against Freebase.We begin with CLUEWEB09, a web-scale corpus automatically annotated with Freebase entities (Gabrilovich et al., 2013).We extract the sentences containing at least two entities linked by a relation in Freebase.We parse these sentences using a CCG syntactic parser, and build semantic parses from the syntactic output.Semantic parses are then converted to semantic graphs which are subsequently grounded to Freebase.Grounded graphs can be easily converted to a KB query deterministically.During training we learn which grounded graphs correspond best to the natural language input.In the following, we provide a brief introduction to Freebase and its graph structure.Next, we explain how we obtain semantic parses from CCG (Section 2.2), how we convert them to graphs (Section 2.3), and ground them in Freebase (Section 2.4).Section 3 presents our learning algorithm.

The Freebase Knowledge Graph
Freebase consists of 42 million entities and 2.5 billion facts.A fact is defined by a triple containing two entities and a relation between them.Entities represent real world concepts, and edges represent relations, thus forming a graph-like structure.
A Freebase subgraph is shown in Figure 3 with

Combinatory Categorial Grammar
The graph like structure of Freebase inspires us to create a graph like structure for natural language, and learn a mapping between them.To do this we take advantage of the representational power of Combinatory Categorial Grammar (Steedman, 2000).CCG is a linguistic formalism that tightly couples syntax and semantics, and can be used to model a wide range of language phenom-  (Clark et al., 2002), thus supporting wide-coverage semantic analysis.Moreover, due to the transparent interface between syntax and semantics, it is relatively straightforward to built a semantic parse for a sentence from its corresponding syntactic derivation tree (Bos et al., 2004).
In our case, the choice of syntactic parser is motivated by the scale of our problem; the parser must be broad-coverage and robust enough to handle a web-sized corpus.For these reasons, we rely on the C&C parser (Clark and Curran, 2004), a generalpurpose CCG parser, to obtain syntactic derivations.To our knowledge, we present the first attempt to use a CCG parser trained on treebanks for grounded semantic parsing.Most previous work has induced task-specific CCG grammars (Zettlemoyer andCollins, 2005, 2007;Kwiatkowski et al., 2010).An example CCG derivation is shown in Figure 4.
Semantic parses are constructed from syntactic CCG parses, with semantic composition being guided by the CCG syntactic derivation. 2 We use a neo-Davidsonian (Parsons, 1990) semantics to represent semantic parses. 3Each word has a semantic category based on its syntactic category and part of speech.For example, the syntactic category for directed is (S\NP)/NP, i.e., it 2 See Bos et al. (2004) for a detailed introduction to semantic representation using CCG. 3 Neo-Davidsonian semantics is a form of first-order logic that uses event identifiers (e) to connect verb predicates and their subcategorized arguments through conjunctions.takes two argument NPs and becomes S. To represent its semantic category, we use a lambda term λyλx.directed.arg1(e,x) ∧ directed.arg2(e,y), where e identifies the event of directed, and x and y are arguments corresponding to the NPs in the syntactic category.
We obtain semantic categories automatically using the indexed syntactic categories provided by the C&C parser.
The latter reveal the bindings of basic constituent categories in more complex categories.For example, in order to convert ((S\NP)\(S\NP))/NP to its semantic category, we must know whether all NPs have the same referent and thus use the same variable name.The indexed category ((S e \NP x )\(S e \NP x ))/NP y reveals that there are only two different NPs, x and y, and that one of them (i.e., x) is shared across two subcategories.We discuss the details of semantic category construction in the Appendix.
Apart from n-ary predicates representing events (mostly verbs), we also use unary predicates representing types in language (mostly common nouns and noun modifiers).For example, capital(Austin) indicates Austin is of type capital.Prepositions, adjectives and adverbs are represented by predicates lexicalized with their head words to provide more information (see capital.of.arg1 instead of of.arg1 in Figure 2a).

Ungrounded Semantic Graphs
We will now illustrate how we create ungrounded semantic graphs from CCG-derived semantic parses.Figure 5a displays the ungrounded graph for the sen-  tence Cameron directed Titanic in 1997.In order to construct ungrounded graphs topologically similar to Freebase, we define five types of nodes: Word Nodes (Ovals) Word nodes are denoted by ovals.They represent natural language words (e.g., directed in Figure 5a, capital and state in Figure 6b).Word nodes are connected to other word nodes via syntactic dependencies.For readability, we do not show inter-word dependencies.
Entity Nodes (Rectangles) Entity nodes are denoted by rectangles and represent entities e.g., Cameron in Figure 5a.In cases where an entity is not known, we use variables e.g., x in Figure 6a.Entity variables are connected to their corresponding word nodes from which they originate by dotted links e.g., x in Figure 6a is connected to the word node who.
Mediator Nodes (Circles) Mediator nodes are denoted by circles and represent events in language.They connect pairs of entities which participate in an event forming a clique (see the entities Cameron, Titanic and 1997 in Figure 5a).We define an edge as a link that connects any two entities via a mediator.The subedge of an edge i.e., the link between a mediator and an entity, corresponds to the predi-cate denoting the event and taking the entity as its argument (e.g.directed.arg1links e and Cameron in Figure 5a).Mediator nodes are connected to their corresponding word nodes from which they originate by dotted links e.g.mediators in Figure 5a are connected to word node directed.For example, the graph in Figure 6a represents the question Who directed The Nutty Professor?.Here, TARGET attaches to x representing the word who.UNIQUE attaches to the entity variable modified by the definite article the.In Figure 6b, UNIQUE attaches to Austin implying that only Austin satisfies the graph.Finally, COUNT attaches to entity nodes which have to be counted.For the sentence Julie Andrews has appeared in 40 movies in Figure 7, the KB could either link Julie Andrews and 40, with type node movies matching the grounded type integer, or it could link Julie Andrews to each movie she acted in and the count of these different movies add to 40.
In anticipation of this ambiguity, we generate two semantic parses resulting in two ungrounded graphs (see Figures 7a and 7b).We generate all possible grounded graphs corresponding to each ungrounded graph, and leave it up to the learning to decide which ones the KB prefers.

Grounded Semantic Graphs
We  we use an automatically constructed lexicon which maps ungrounded types to grounded ones (see Section 4.2 for details).
Edges An edge between two entities is grounded using all edges linking the two entities in the knowledge graph.For example, to ground the edge between Titanic and Cameron in Figure 5, we use the following edges linking TITANIC and CAMERON in Freebase: (film.directedby.arg1, film.directedby.arg2), (film.producedby.arg1, film.producedby.arg2).If only one entity is grounded, we use all possible edges from this grounded entity.If no entity is grounded, we use a mapping lexicon which is automatically created as described in Section 4.2.Given an ungrounded graph with n edges, there are O((k + 1) n ) possible grounded graphs, with k being the grounded edges in the knowledge graph for each ungrounded edge together with an additional empty (no) edge.
Mediator nodes In an ungrounded graph, mediator nodes represent semantic event identifiers.In the grounded graph, they represent Freebase fact identifiers.Fact identifiers help distinguish if neighboring edges belong to a single complex fact, which may or may not be coextensive with an ungrounded event.
In Figure 8a, the edges corresponding to the event identifier e are grounded to a single complex fact in Figure 8b, with the fact identifier m.However, in Figure 5a, the edges of the ungrounded event e are grounded to different Freebase facts, distinguished in Figure 5b by the identifiers m and n.Furthermore, the edge in 5a between CAMERON and 1997 is not grounded in 5b, since no Freebase edge exists between the two entities.We convert grounded graphs to SPARQL queries, but for readability we only show logical expressions.The conversion is deterministic and is exactly the inverse of the semantic parse to graph conversion (Section 2.3).Wherever a node/edge is instantiated with a grounded entity/type/relation in Freebase, we use them in the grounded parse (e.g., type node capital.state in Figure 6b becomes location.capitalcity).Math function TARGET is useful in retrieving instantiations of entity variables of interest (see Figure 6a).

Learning
A natural language sentence may give rise to several grounded graphs.But only one (or a few) of them will be a faithful representation of the sentence in Freebase.We next describe our algorithm for finding the best Freebase graph for a given sentence, our learning model, and the features it uses.

Algorithm
Freebase has a large number of relations and entities, and as a result there are many possible grounded graphs g for each ungrounded graph u.We construct and score graphs incrementally, traversing each node in the ungrounded graph and matching its edges and types in Freebase.Given a NL sentence s, we construct from its CCG syntactic derivation all corresponding ungrounded graphs u.Using a beam search procedure (described in Section 4.2), we find the best scoring graphs ( ĝ, û), maximizing over different graph configurations (g, u) of s: We define the score of ( ĝ, û) as the dot product between a high dimensional feature representation Φ = (Φ 1 , . . .Φ m ) and a weight vector θ (see Section 3.3 for details on the features we employ).
We estimate the weights θ using the averaged structured perceptron algorithm (Collins, 2002).As shown in Algorithm 1, the perceptron makes several passes over sentences, and in each iteration it computes the best scoring ( ĝ, û) among the candidate graphs for a given sentence.In line 6, the algorithm updates θ with the difference (if any) be- tween the feature representations of the best scoring graph ( ĝ, û) and the gold standard graph (g + , u + ).
The goal of the algorithm is to rank gold standard graphs higher than the any other graphs.The final weight vector θ is the average of weight vectors over T iterations and N sentences.This averaging procedure avoids overfitting and produces more stable results (Collins, 2002).As we do not make use of question-answer pairs or manual annotations of sentences, gold standard graphs (g + , u + ) are not available.In the following, we explain how we approximate them by relying on graph denotations as a form of weak supervision.

Selecting Surrogate Gold Graphs
Let u be an ungrounded semantic graph of s.We select an entity E in u, replace it with a variable x, and make it a target node.Let u + represent the resulting ungrounded graph.Next, we obtain all grounded graphs g + which correspond to u + such that the denotations [[u + ]] K B = [[g + ]] N L .We use these surrogate graphs g + as gold standard, and the pairs (u + , g + ) for model training.There is considerable latitude in choosing which entity E to replace.This can be done randomly, according to entity frequency, or some other criterion.We found that substituting the entity with the most connections to other entities in the sentence works well in practice.All the entities that can replace x in u + to constitute a valid fact in Freebase will be the denotation of ] N L because of the mismatch between our natural language semantic language and the Freebase query language.To ensure that graphs u + and g + have the same denotations, we impose the following constraints: Constraint 1 If the math function UNIQUE is attached to the entity being replaced in the ungrounded graph, we assume the denotation of u + contains only that entity.For example, in Figure 2b, we replace Austin by x, and thus assume [[u + ]] N L = {AUSTIN}. 4Any grounded graph which results in [[g + ]] K B = {AUSTIN} will be considered a surrogate gold graph.This allows us to learn entailment relations, e.g., capital.ofshould be grounded to location.capital(left hand-side graph in Figure 2d) and not to location.containedbywhich results in all locations in Texas (right hand-side graph in Figure 2d).
Constraint 2 If the target entity node is a number, we select the Freebase graphs with denotation close to this number.For example, in Figure 8a if 120, 000 is replaced by x, and we assume [[u + ]] N L = {120,000}.However, the grounded graph 8b retrieves Integers can either occur directly in relation with an entity as in Figure 8b, or must be enumerated as in Figure 7c.
Constraint 3 If the target entity node is a date, we select the grounded graph which results in the smallest set containing the date based on the intuition that most sentences in the data describe specific rather than general events.
Constraint 4 If none of the above constraints apply to the target entity E, we know E ∈ [[u + ]] N L , and hence we select the grounded graphs which satisfy E ∈ [[g + ]] K B as surrogate gold graphs.

Features
Our feature vector Φ(g, u, s, K B) denotes the features extracted from a sentence s and its corresponding graphs u and g with respect to a knowledge base K B. The elements of the vector (φ 1 , φ 2 , . . . ) take integer values denoting the number of times a feature appeared.We devised the following broad feature classes: Lexical alignments Since ungrounded graphs are similar in topology to grounded graphs, we extract ungrounded and grounded edge and type alignments.
In a similar fashion we extract type alignments (e.g., φ type (capital,location.city)).

Contextual features
In addition to lexical alignments, we also use contextual features which essentially record words or word combinations surrounding grounded edge labels.Feature φ event records an event word and its grounded predicates (e.g., in Figure 7c we extract features φ event (appear, performance.film)and φ event (appear, performance.actor).
Feature φ arg records a predicate and its argument words (e.g., φ arg (performance.film,movie) in Figure 7c).Word combination features are extracted from the parser's dependency output.The feature φ dep records a predicate and the dependencies of its event word (e.g., from the grounded version of Figure 6b

Lexical similarity
We count the number of word stems5 shared by grounded and ungrounded edge labels e.g., in Figure 5 directed.arg1and film.directedby.arg2 have one stem overlap (ignoring the argument labels arg1 and arg2).For a grounded graph, we compute φ stem , the aggregate stem overlap count over all its grounded and ungrounded edge labels.We did not incorporate WordNet/Wiktionarybased lexical similarity features but these were found fruitful in Kwiatkowski et al. (2013).We also have a feature for stem overlap count between the grounded edge labels and the context words.
Graph connectivity features These features penalize graphs with non-standard topologies.For example, we do not want a final graph with no edges.The feature value φ hasEdge is one if there exists at least one edge in the graph.We also have a feature φ nodeCount for counting the number of connected nodes in the graph.Finally, feature φ colloc captures the collocation of grounded edges (e.g., edges belonging to a single complex fact are likely to cooccur; see Figure 8b).

Experimental Setup
In this section we present our experimental set-up for assessing the performance of the semantic parser described above.We present the datasets on which our model was trained and tested, discuss implementation details, and briefly introduce the models used for comparison with our approach.

Data
We evaluated our approach on the FREE917 (Cai and Yates, 2013) and WEBQUESTIONS (Berant et al., 2013) datasets.FREE917 consists of 917 questions and their meaning representations (written in a variant of lambda calculus) which we, however, do not use.The dataset represents 81 domains covering 635 Freebase relations, with most domains containing fewer than 10 questions.We report results on three domains, namely film, business, and people as these are relatively large in both FREE917 and Freebase.WEBQUESTIONS consists of 5,810 question-answer pairs, 2,780 of which are reserved for testing.Our experiments used a subset of WEBQUESTIONS representing the three target domains.We extracted domain-specific queries semi-automatically by identifying questionanswer pairs with entities in target domain relations.In both datasets, named entities were disambiguated to Freebase entities with a named entity lexicon. 6able 1 presents descriptive statistics for each domain.Evaluating on all domains in Freebase would generate a very large number of queries for which denotations would have to be computed (the number of queries is linear in the number of domains and the size of training data).Our system loads Freebase using Virtuoso7 and queries it with SPARQL.Virtuoso is slow in dealing with millions of queries indexed on the entire Freebase, and is the only reason we did not work with the complete Freebase.

Implementation
To train our model, we extracted sentences from CLUEWEB09 which contain at least two entities associated with a relation in Freebase, and have an edge between them in the ungrounded graph.These were further filtered so as to remove sentences which do not yield at least one semantic parse without an uninstantiated entity variable.For example, the sentence Avatar is directed by Cameron would be used for training, whereas Avatar directed by Cameron received a critical review wouldn't.In the latter case, any semantic parse will have an uninstantiated entity variable for review.Table 1 ( Train) shows the number of sentences we obtained.
In order to train our semantic parser, we initialized the alignment and type features (φ edge and φ type , respectively) with the alignment lexicon weights.These weights are computed as follows.Let count(r , r) denote the number of pairs of entities which are linked with edge r in Freebase and edge r in CLUEWEB09 sentences.We then estimate the probability distribution P(r /r) = count(r ,r) ∑ i count(r i ,r) .Analogously, we created a type alignment lexicon.The counts were collected from CLUEWEB09 sentences containing pairs of entities linked with an edge in Freebase (business 390k, film 130k, and people 490k).Contextual features were initialized to −1 since most word contexts and grounded predicates/types do not appear together.All other features were set to 0.
We used a beam-search algorithm to convert ungrounded graphs to grounded ones.The edges and types of each ungrounded graph are placed in a priority queue.Priority is based on edge/type tf-idf scores collected over CLUEWEB09.At each step, we pop an element from the queue and ground it in Freebase.We rank the resulting grounded graphs us-ing the perceptron model, and pick the n-best ones, where n is the beam size.We continue until the queue is empty.In our experiments we used a beam size of 100.We trained a single model for all the domains combined together.We ran the perceptron for 20 iterations (around 5-10 million queries).At each training iteration we used 6,000 randomly selected sentences from the training corpus.

Comparison Systems
We compared our graph-based semantic parser (henceforth GRAPHPARSER) against two state-ofthe-art systems both of which are open-domain and work with Freebase.The semantic parser developed by Kwiatkowski et al. (2013) (henceforth KCAZ13) is learned from question-answer pairs and follows a two-stage procedure: first, a natural language sentence is converted to a domain-independent semantic parse and then grounded onto Freebase using a set of logical-type equivalent operators.The operators explore possible ways sentential meaning could be expressed in Freebase and essentially transform logical form to match the target ontology.Our approach also has two steps (i.e., we first generate multiple ungrounded graphs and then ground them to different Freebase graphs).We do not use operators to perform structure matching, rather we create multiple graphs and leave it up to the learner to find an appropriate grounding using a rich feature space.To give a specific example, their operator literal to constant is equivalent to having named entities for larger text chunks in our case.Their operator split literal explores different edge possibilities in an event whereas we start with a clique and remove unwanted edges.Our approach has (almost) similar expressive power but is conceptually simpler.
Our second comparison system was the semantic parser of Berant and Liang (2014) (henceforth PARASEMPRE) which also uses QA pairs for training and makes use of paraphrasing.Given an input NL sentence, they first construct a set of logical forms based on hand-coded rules, and then generate sentences from each logical form (using generation templates and a lexicon).Pairs of logical forms and natural language are finally scored using a paraphrase model consisting of two components.An association model determines whether they contain phrase pairs likely to be paraphrases and a vector space model assigns a vector representation for each sentence, and learns a scoring function that ranks paraphrase candidates.Our semantic parser employs a graph-based representation as a means of handling the mismatch between natural language, whereas PARASEMPRE opts for a textbased one through paraphrasing.
Finally, we compared our semantic parser against a baseline which is based on graphs but employs no learning.The baseline converts an ungrounded graph to a grounded one by replacing each ungrounded edge/type with the highest weighted grounded label creating a maximum weighted graph, henceforth MWG.Both GRAPHPARSER and the baseline use the same alignment lexicon (a weighted mapping from ungrounded to grounded labels).

Results
Table 2 summarizes our results on FREE917.As described earlier, we evaluated GRAPHPARSER on a subset of the dataset representing three domains (business, film, and people).Since this subset contains a relatively small number of instances (124 in total), we performed 10-fold cross validation with 9 folds as development data8 , and one fold as test data.We report results averaged over all test folds.With respect to KCAZ13, we present results with their cross-domain trained models, where training data from multiple domains is used to test foreign domains.9KCAZ13 used generic features like string similarity and knowledge base features which apply across domains and do not require indomain training data.We do not report results with PARASEMPRE as the small number of training instances would put their method at a disadvantage.We treat a predicted query as correct if its denota- tion is exactly equal to the denotation of the manually annotated gold query.
As can be seen, GRAPHPARSER outperforms KCAZ13 and the MWG baseline by a wide margin.This is an encouraging result bearing in mind that our model does not use question-answer pairs.We should also point out that our domain relation set is larger compared to KCAZ13.We do not prune any of the relations in Freebase, whereas KCAZ13 use only 112 relations and 83 types from our three domains (see Table 1).We further performed a feature ablation study to examine the contribution of different feature classes.As shown in Table 3, the most important features are those based on lexical similarity, as also observed in KCAZ13.Graph connectivity and lexical alignments are equally important (these features are absent from KCAZ13).Contextual features are not very helpful over and above alignment features which also encode contextual information.Overall, generic features like lexical similarity are helpful only to a certain extent; the performance of GRAPHPARSER improves considerably when additional graph-related features are taken into account.
We also analyzed the errors GRAPHPARSER makes.25% of these are caused by the C&C parser and are cases where it either returns no syntactic analysis or a wrong one.19% of the errors are due to Freebase inconsistencies.For example, our system answered the question How many stores are in Nittany mall? with 65 using the relation shopping center.number of stores whereas the gold standard provides the answer 25 counting all stores using the relation shopping center.store.Around 15% of errors include structural mismatches between natural language and Freebase; for the question Who is the president of Gap Inc?, our method grounds president to a grounded type whereas in Freebase it is represented as a relation ing errors are miscellaneous.For example, the question What are some films on Antarctica? receives two interpretations, i.e., movies filmed in Antarctica or movies with Antarctica as their subject.
We next discuss our results on WEBQUESTIONS.PARASEMPRE was trained with 1,115 QA pairs (corresponding to our target domains) together with question paraphrases obtained from the PARALEX corpus (Fader et al., 2013). 10While training PARASEMPRE, out-of-domain Freebase relations and types were removed.Both GRAPHPARSER and PARASEMPRE were tested on the same set of 570 in-domain QA pairs with exact answer match as the evaluation criterion.For development purposes, GRAPHPARSER uses 200 QA pairs.Table 4 displays our results.We observe that GRAPHPARSER obtains a higher F1 against MWG and PARASEMPRE.Differences in performance among these systems are less pronounced compared to FREE917.This is for a good reason.WEBQUESTIONS is a challenging dataset, created by non-experts.The questions are not tailored to Freebase in any way, they are more varied and display a wider vocabulary.As a result the mismatch between natural language and Freebase is greater and the semantic parsing task harder.
Error analysis further revealed that parsing errors are responsible for 13% of the questions GRAPH-PARSER fails to answer.Another cause of errors is mismatches between natural language and Freebase.Around 7% of the questions are of the type Where did X come from?, and our model answers with the individual's nationality, whereas annotators provide the birthplace (city/town/village) as the right answer.Moreover, 8% of the questions are of the type What does X do?, which the annotators answer with the individual's profession.In natural language, we rarely attest constructions 10 We used the SEMPRE package (http://www-nlp.stanford.edu/software/sempre/)which does not use any hand-coded entity disambiguation lexicon.like X does dentist/researcher/actor.The proposed framework assumes that Freebase and natural language are somewhat isomorphic, which is not always true.An obvious future direction would be to paraphrase the questions so as to increase the number of grounded and ungrounded graphs.As an illustration, we rewrote questions like Where did X come from to What is X's birth place, and What did X do to What is X's profession and evaluated our model GRAPHPARSER + PARA.As shown in Table 4, even simple paraphrasing can boost performance.
Finally, Table 3 (third column) examines the contribution of different features on the WEBQUES-TIONS development dataset.Interestingly, we observe that contextual features are not useful and in fact slightly harm performance.We hypothesize that this is due to the higher degree of mismatch between natural language and Freebase in this dataset.Features based on similarity, graph connectivity, and lexical alignments are more robust and generally useful across datasets.

Discussion
In this paper, we introduce a new semantic parsing approach for Freebase.A key idea in our work is to exploit the structural and conceptual similarities between natural language and Freebase through a common graph-based representation.We formalize semantic parsing as a graph matching problem and learn a semantic parser without using annotated question-answer pairs.We have shown how to obtain graph representations from the output of a CCG parser and subsequently learn their correspondence to Freebase using a rich feature set and their denotations as a form of weak supervision.Our parser yields state-of-the art performance on three large Freebase domains and is not limited to question answering.We can create semantic parses for any type of NL sentences.
Our work brings together several strands of research.Graph-based representations of sentential meaning have recently gained some attention in the literature (Banarescu et al., 2013), and attempts to map sentences to semantic graphs have met with good inter-annotator agreement.Our work is also closely related to Kwiatkowski et al. (2013) and Berant and Liang (2014) who present open-domain se-mantic parsers based on Freebase and trained on QA pairs.Despite differences in formulation and model structure, both approaches have explicit mechanisms for handling the mismatch between natural language and the KB (e.g., using logical-type equivalent operators or paraphrases).The mismatch is handled implicitly in our case via our graphical representation which allows for the incorporation of all manner of powerful features.More generally, our method is based on the assumption that linguistic structure has a correspondence to Freebase structure which does not always hold (e.g., in Who is the grandmother of Prince William?, grandmother is not directly expressed as a relation in Freebase).Additionally, our model fails when questions are too short without any lexical clues (e.g., What did Charles Darwin do? ).Supervision from annotated data or paraphrasing could improve performance in such cases.In the future, we plan to explore cluster-based semantics (Lewis and Steedman, 2013) to increase the robustness on unseen NL predicates.
Our work joins others in exploiting the connections between natural language and open-domain knowledge bases.Recent approaches in relation extraction use distant supervision from a knowledge base to predict grounded relations between two target entities (Mintz et al., 2009;Hoffmann et al., 2011;Riedel et al., 2013).During learning, they aggregate sentences containing the target entities, ignoring richer contextual information.In contrast, we learn from each individual sentence taking into account all entities present, their relations, and how they interact.Krishnamurthy and Mitchell (2012) formalize semantic parsing as a distantly supervised relation extraction problem combined with a manually specified grammar to guide semantic parse composition.

Figure 2 :
Figure 2: Steps involved in converting a natural language sentence to a Freebase grounded graph.

Figure 5 :
Figure 5: Graph representations for the sentence Cameron directed Titanic in 1997.

Figure 6 :
Figure 6: Ungrounded graphs with math functions TARGET and UNIQUE.
Type nodes (Rounded rectangles) Type nodes are denoted by rounded rectangles.They represent unary predicates in natural language.In Figure6btype nodes capital and capital.stateareattached to Austin denoting Austin is of type capital and capital.state.Type nodes are also connected to their corresponding word nodes from which they originate by dotted links e.g.type node capital.stateandwordnode state in Figure6b.
Math nodes (Diamonds) Math nodes are denoted by diamonds.They describe functions to be applied on the nodes/subgraphs they attach to.The function TARGET attaches to the entity variable of interest.

Table 4 :
employment.job.title.The remain-Experimental results on WEBQUESTIONS.

Table 5 :
Rules used to classify words into semantic classes.* represents a wild card expression which matches anything.lex x denotes the lexicalised form of x e.g., when state : NP x /NP x : λPλx.lexx .state(x)∧ P(x) is applied to capital : NP : λy.capital(y), the lexicalised form of x becomes capital, and therefore the predicate lex x .statebecomes capital.state.The resulting semantic parse after application is λx.capital.state(x)∧ capital(x).