Large-Scale Information Extraction from Textual Definitions through Deep Syntactic and Semantic Analysis

We present DefIE, an approach to large-scale Information Extraction (IE) based on a syntactic-semantic analysis of textual definitions. Given a large corpus of definitions we leverage syntactic dependencies to reduce data sparsity, then disambiguate the arguments and content words of the relation strings, and finally exploit the resulting information to organize the acquired relations hierarchically. The output of DefIE is a high-quality knowledge base consisting of several million automatically acquired semantic relations.


Introduction
The problem of knowledge acquisition lies at the core of Natural Language Processing. Recent years have witnessed the massive exploitation of collaborative, semi-structured information as the ideal middle ground between high-quality, fully-structured resources and the larger amount of cheaper (but noisy) unstructured text (Hovy et al., 2013). Collaborative projects, like Freebase (Bollacker et al., 2008) and Wikidata (Vrandečić, 2012), have been being developed for many years and are continuously being improved. A great deal of research also focuses on enriching available semi-structured resources, most notably Wikipedia, thereby creating taxonomies (Ponzetto and Strube, 2011;Flati et al., 2014), ontologies (Mahdisoltani et al., 2015) and semantic networks (Navigli and Ponzetto, 2012;Nastase and Strube, 2013). These solutions, however, are inherently constrained to small and often prespecified sets of relations. A more radical approach is adopted in systems like TEXTRUNNER (Etzioni et al., 2008) and REVERB (Fader et al., 2011), which developed from the Open Information Extraction (OIE) paradigm (Etzioni et al., 2008) and focused on the unconstrained extraction of a large number of relations from massive unstructured corpora. Ultimately, all these endeavors were geared towards addressing the knowledge acquisition problem and tackling long-standing challenges in the field, such as Machine Reading (Mitchell, 2005).
While earlier OIE approaches relied mostly on dependencies at the level of surface text (Etzioni et al., 2008;Fader et al., 2011), more recent work has focused on deeper language understanding at the level of both syntax and semantics (Nakashole et al., 2012; and tackled challenging linguistic phenomena like synonymy and polysemy. However, these issues have not yet been addressed in their entirety. Relation strings are still bound to surface text, lacking actual semantic content. Furthermore, most OIE systems do not have a clear and unified ontological structure and require additional processing steps, such as statistical inference mappings (Dutta et al., 2014), graphbased alignments of relational phrases (Grycner and Weikum, 2014), or knowledge base unification procedures (Delli Bovi et al., 2015), in order for their potential to be exploitable in real applications.
In DEFIE the key idea is to leverage the linguistic analysis of recent semantically-enhanced OIE techniques while moving from open text to smaller corpora of dense prescriptive knowledge. The aim is then to extract as much information as possible by unifying syntactic analysis and state-of-the-art disambiguation and entity linking. Using this strategy, from an input corpus of textual definitions (short and concise descriptions of a given concept or entity) we are able to harvest fully disambiguated relation instances on a large scale, and integrate them automatically into a high-quality taxonomy of semantic relations. As a result a large knowledge base is produced that shows competitive accuracy and coverage against state-of-the-art OIE systems based on much larger corpora. Our contributions can be summarized as follows: • We propose an approach to IE that ties together syntactic dependencies and unified entity linking/word sense disambiguation, designed to discover semantic relations from a relatively small corpus of textual definitions; • We create a large knowledge base of fully disambiguated relation instances, ranging over named entities and concepts from available resources like WordNet and Wikipedia; • We exploit our semantified relation patterns to automatically build a rich, high-quality relation taxonomy, showing competitive results against state-of-the-art approaches.
Our approach comprises three stages. First, we extract from our input corpus an initial set of semantic relations (Section 2); each relation is then scored and augmented with semantic type signatures (Section 3); finally, the augmented relations are used to build a relation taxonomy (Section 4).

Relation Extraction
Here we describe the first stage of our approach, where a set of semantic relations is extracted from the input corpus. In the following, we refer to a relation instance as a triple t = a i , r, a j with a i and a j being the arguments and r the relation pattern. From each relation pattern r k the associated relation R k is identified by the set of all relation instances where r = r k . In order to extract a large set of fully disambiguated relation instances we bring together syntactic and semantic analysis on a corpus of plain textual definitions. Each definition is first parsed and disambiguated (Figure 1a-b, Section 2.1); syntactic and semantic information is combined into a structured graph representation (Figure 1c, Section 2.2) and relation patterns are then extracted as shortest paths between concept pairs (Section 2.3).
The semantics of our relations draws on BabelNet (Navigli and Ponzetto, 2012), a wide-coverage multilingual semantic network obtained from the automatic integration of WordNet, Wikipedia and other resources. This choice is not mandatory; however, inasmuch as it is a superset of these resources, Ba-belNet brings together lexicographic and encyclopedic knowledge, enabling us to reach higher coverage while still being able to accommodate different disambiguation strategies. For each relation instance t extracted, both a i , a j and the content words appearing in r are linked to the BabelNet inventory. In the remainder of the paper we identify BabelNet concepts or entities using a subscript-superscript notation where, for instance, band i bn refers to the i-th BabelNet sense for the English word band.

Textual Definition Processing
The first step of the process is the automatic extraction of syntactic information (typed dependencies) and semantic information (word senses and named entity mentions) from each textual definition. Each definition undergoes the following steps: Syntactic Analysis. Each textual definition d is parsed to obtain a dependency graph G d (Figure 1a). Parsing is carried out using C&C (Clark and Curran, 2007), a log-linear parser based on Combinatory Categorial Grammar (CCG). Although our algorithm seamlessly works with any syntactic formalism, CCG rules are especially suited to longer definitions and linguistic phenomena like coordinating conjunctions (Steedman, 2000).
Semantic Analysis. Semantic analysis is based on Babelfy (Moro et al., 2014), a joint, stateof-the-art approach to entity linking and word sense disambiguation. Given a lexicalized semantic network as underlying structure, Babelfy uses a dense subgraph algorithm to identify high-coherence semantic interpretations of words and multi-word expressions across an input text. We apply Babelfy to each definition d, obtaining a sense mapping S d from surface text (words and entity mentions) to word senses and named entities ( Figure 1b).
As a matter of fact, any disambiguation or entity linking strategy can be used at this stage. However, a knowledge-based unified approach like Babelfy is best suited to our setting, where context is limited and exploiting definitional knowledge as much as possible is key to attaining high-coverage results (as we show in Section 6.4).

Syntactic-Semantic Graph Construction
The information extracted by parsing and disambiguating a given definition d is unified into a syntactic-semantic graph G sem d where concepts and entities identified in d are arranged in a graph structure encoding their syntactic dependencies ( Figure  1c). We start from the dependency graph G d , as provided by the syntactic analysis of d in Section 2.1. Semantic information from the sense mappings S d can be incorporated directly in the vertices of G d by attaching available matches between words and senses to the corresponding vertices. Dependency graphs, however, encode dependencies solely on a word basis, while our sense mappings may include multi-word expressions (e.g. Pink Floyd 1 bn ). In order to extract consistent information, subsets of vertices referring to the same concept or entity are merged to a single semantic node, which replaces the subgraph covered in the original dependency structure. Consider the example in Figure 1: an entity like Pink Floyd 1 bn covers two distinct and connected vertices in the dependency graph G d , one for the noun Floyd and one for its modifier Pink. In the actual semantics of the sentence, as encoded in G sem d (Figure 1c), these two vertices are merged to a single node referring to the entity Pink Floyd 1 bn (the English rock band), instead of being assigned individual word interpretations.
Our procedure for building G sem d takes as input a typed dependency graph G d and a sense mapping S d , both extracted from a given definition d. G sem d is first populated with the vertices of G d referring to disambiguated content words, merging those vertices covered by the same sense s ∈ S d into a single node (like Pink Floyd 1 bn and Atom Heart Mother 1 bn in Figure 1c). Then, the remaining vertices and edges are added as in G d , discarding nondisambiguated adjuncts and modifiers (like the and fifth in Figure 1).

Relation Pattern Identification
At this stage, all the information in a given definition d has been extracted and encoded in the corresponding graph G sem d (Section 2.2). We now consider those paths connecting entity pairs across the graph and extract the relation pattern r between two entities and/or concepts as the shortest path between the two corresponding vertices in G sem d . This enables us to exclude less relevant information (typically carried by adjuncts or modifiers) and reduce data sparsity in the overall extraction process.
Our algorithm works as follows: given a textual definition d, we consider every pair of identified concepts or entities and compute the corresponding shortest path in G sem d using the Floyd-Warshall algorithm (Floyd, 1962). The only constraint we enforce is that resulting paths must include at least one verb node. This condition filters out meaningless single-node patterns (e.g. two concepts connected for each s i , s j in S d do 7: s i , r ij , s j := shortestP ath(s i , s j ) 8: with a preposition) and, given the prescriptive nature of d, is unlikely to discard semantically relevant attributes compacted in noun phrases. As an example, consider the two sentences "Mutter is the third album by German band Rammstein" and "Atom Heart Mother is the fifth album by English band Pink Floyd". In both cases, two valid shortest-path patterns are extracted. The first extracted shortest-path pattern is: bn , a j =Rammstein 1 bn for the first sentence and a i =Atom Heart Mother 1 bn , a j =Pink Floyd 1 bn for the second one. The second extracted shortest-path pattern is: bn , a j =album 1 bn for the first sentence and a i =Atom Heart Mother 1 bn , a j =album 1 bn for the second one. In fact, our extraction process seamlessly discovers general knowledge (e.g. that Mutter 3 bn and Atom Heart Mother 1 bn are instances of the concept album 1 bn ) and facts (e.g. that the entities Rammstein 1 bn and Pink Floyd 1 bn have an isAlbumBy relation with the two recordings).
A pseudo-code for the entire extraction algorithm is shown in Algorithm 1: given a set of textual definitions D, a set of relations is generated over extractions R, with each relation R ⊂ R comprising relation instances extracted from D. Each d ∈ D is first parsed and disambiguated to produce a syntactic-semantic graph G sem d (Sections 2.1-2.2); then all the concept pairs s i , s j are examined to detect relation instances as shortest paths. Finally, we filter out from the resulting set all relations for which the number of extracted instances is below a fixed threshold ρ. 2 The overall algorithm extracts over 20 million relation instances in our experimental setup (Section 5) with almost 256,000 distinct relations.

Relation Type Signatures and Scoring
We further characterize the semantics of our relations by computing semantic type signatures for each R ⊂ R, i.e. by attaching a proper semantic class to both its domain and range (the sets of arguments occurring on the left and right of the pattern). As every element in the domain and range of R is disambiguated, we retrieve the corresponding senses and collect their direct hypernyms. Then we select the hypernym covering the largest subset of arguments as the representative semantic class for the domain (or range) of R. We extract hypernyms using BabelNet, where taxonomic information covers both general concepts (from the WordNet taxonomy (Fellbaum, 1998)) and named entities (from the Wikipedia Bitaxonomy (Flati et al., 2014)).
From the distribution of direct hypernyms over domain and range arguments of R we estimate the quality of R and associate a confidence value with its relation pattern r. Intuitively we want to assign higher confidence to relations where the corresponding distributions have low entropy. For instance, if both sets have a single hypernym covering all arguments, then R arguably captures a well-defined semantic relation and should be assigned high confidence. For each relation R, we compute: where h i (i = 1, ..., n) are all the distinct argument hypernyms over the domain and range of R and probabilities p(h i ) are estimated from the proportion of arguments covered in such sets. The lower H R , the better semantic types of R are defined. As a matter of fact, however, some valid but over-general relations (e.g. X is a Y, X is used for Y) have inherently high values of H R . To obtain a balanced score,  we therefore consider two additional factors, i.e. the number of extracted instances for R and the length of the associated pattern r, obtaining the following empirical measure: with S R being the set of extracted relation instances for R. The +1 term accounts for cases where H R = 0. As shown in the examples of Table 1, relations with rather general patterns (such as X known for Y) achieve higher scores compared to very specific ones (like X is village 2 bn founded in 1912 in Y) despite higher entropy values. We validated our measure on the samples of Section 6.1, computing the overall precision for different score thresholds. The monotonic decrease of sample precision in Figure  2a shows that our measure captures the quality of extracted patterns better than H R (Figure 2b).

Relation Taxonomization
In the last stage of our approach our set of extracted relations is arranged automatically in a relation taxonomy. The process is carried out by comparing relations pairwise, looking for hypernymyhyponymy relationships between the corresponding relation patterns; we then build our taxonomy by connecting with an edge those relation pairs for which such a relationship is found. Both the relation taxonomization procedures described here examine noun nodes across each relation pattern r, and consider for taxonomization only those relations whose patterns are identical except for a single noun node. 3

Hypernym Generalization
A direct way of identifying hypernym/hyponym noun nodes across relation patterns is to analyze the semantic information attached to them. Given two relation patterns r i and r j , differing only in respect of the noun nodes n i and n j , we first look at the associated concepts or entities, c i and c j , and retrieve the corresponding hypernym sets, H(c i ) and H(c j ). Hypernym sets are obtained by iteratively collecting the superclasses of c i and c j from the semantic network of BabelNet, up to a fixed height. For instance, given c i = album 1 bn , H(c i ) = {work of art 1 bn , creation 2 bn , artifact 1 bn }, and given Figure 3a). According to which is the case, we conclude that r j is a generalization of r i , or that r i is a generalization of r j .

Substring Generalization
The second procedure focuses on the noun (or compound) represented by the node. Given two relation patterns, r i and r j , we apply the following heuristic: from one of the two nouns, be it n i , any adjunct or modifier is removed, retaining the sole head wordn i . Then,n i is compared with n j and, ifn i = n j , we assume that the relation r j is a generalization of r i (Figure 3b).  Input. The input corpus used for the relation extraction procedure is the full set of English textual definitions in BabelNet 2.5 (Navigli and Ponzetto, 2012). 4 In fact, any set of textual definitions can be provided as input to DEFIE, ranging from existing dictionaries (like WordNet or Wiktionary) to the set of first sentences of Wikipedia articles. 5 As it is a merger for various different resources of this kind, BabelNet provides a large heterogeneous set comprising definitions from WordNet, Wikipedia, Wiktionary, Wikidata and OmegaWiki. To the best of our knowledge, this set constitutes the largest available corpus of definitional knowledge. We therefore worked on a total of 4,357,327 textual definitions from the English synsets of BabelNet's knowledge base. We then used the same version of BabelNet as the underlying semantic network structure for disambiguating with Babelfy. 6 Statistics. Comparative statistics are shown in Table 2. DEFIE extracts 20,352,903 relation instances, out of which 13,753,133 feature a fully disambiguated pattern, yielding an average of 3.15 disambiguated relation instances extracted from each definition. After the extraction process, our knowledge base comprises 255,881 distinct semantic relations, 94% of which also have disambiguated content words in their patterns. DEFIE extracts a considerably larger amount of relation instances compared to similar approaches, despite the much smaller amount of text used. For example, we managed to harvest over 5 million relation instances more than PATTY, using a much smaller corpus (sin-gle sentences as opposed to full Wikipedia articles) and generating a number of distinct relations that was six times less than PATTY's. As a result, we obtained an average number of extractions that was substantially higher than those of our OIE competitors. This suggests that DEFIE is able to exploit the nature of textual definitions effectively and generalize over relation patterns. Furthermore, our semantic analysis captured 2,398,982 distinct arguments (either concept or named entities), outperforming almost all open-text systems examined.
Evaluation. All the evaluations carried out in Section 6 were based on manual assessment by two human judges, with an inter-annotator agreement, as measured by Cohen's kappa coefficient, above 70% in all cases. In these evaluations we compared DE-FIE with the following OIE approaches: • NELL (Carlson et al., 2010) with knowledge base beliefs updated to November 2014; • PATTY (Nakashole et al., 2012) with Freebase types and pattern synsets from the English Wikipedia dump of June 2011; • REVERB (Fader et al., 2011), using the set of normalized relation instances from the ClueWeb09 dataset; • WISENET (Moro and Navigli, 2012; with relational phrases from the English Wikipedia dump of December 2012. In addition, we also compared our knowledge base with up-to-date human-contributed resources, namely Freebase (Bollacker et al., 2008) and DBpedia (Lehmann et al., 2014), both from the dumps of April/May 2014.   We first assessed the quality and the semantic consistency of our relations using manual evaluation. We ranked our relations according to their score (Section 3) and then created two samples (of size 100 and 250 respectively) of the top scoring relations. In order to evaluate the long tail of less confident relations, we created another two samples of the same size with randomly extracted relations. We presented these samples to our human judges, accompanying each relation with a set of 50 argument pairs and the corresponding textual definitions from BabelNet. For each item in the sample we asked whether it represented a meaningful relation and whether the extracted argument pairs were consistent with this relation and the corresponding definitions. If the answer was positive, the relation was considered as correct. Finally we estimated the overall precision of the sample as the proportion of correct items. Results are reported in Table 3 and compared to those obtained by our closest competitor, PATTY, in the setting of Section 5. In PATTY the confidence of a given pattern was estimated from its statistical strength (Nakashole et al., 2012). As shown in Table 3, DEFIE achieved a comparable level of accuracy in every sample. An error analysis identified most errors as related to the vagueness of some short and general patterns, e.g. X take Y, X make Y. Others were related to parsing (e.g. in labeling the head word of complex noun phrases) or disambiguation. In addition, we used the same samples to estimate the novelty of the extracted information in comparison to currently available resources. We examined each correct relation pattern and looked manually for an equivalent relation in the knowledge bases   Table 4 for both the top 100 sample and the random sample. The high proportion of relations not appearing in existing resources (especially across the random samples) suggests that DEFIE is capable of discovering information not obtainable from available knowledge bases, including very specific relations (X is blizzard in Y, X is Mayan language spoken by Y, X is governmentowned corporation in Y), as well as general but unusual ones (X used by writer of Y).

Coverage of Relations
To assess the coverage of DEFIE we first tested our extracted relations on a public dataset described in (Nakashole et al., 2012) and consisting of 163 semantic relations manually annotated from five Wikipedia pages about musicians. Following the line of previous works (Nakashole et al., 2012;, for each annotation we sought a relation in our knowledge base carrying the same semantics. Results are reported in Table 5. Consistently with the results in Table 4, the proportion of novel information places DEFIE in line with its closest competitors, achieving a coverage of 80.3% with respect to the gold standard. Examples of relations not covered by our competitors are hasFatherInLaw and hasDaughterInLaw. Furthermore, relations holding between entities and general concepts (e.g. critizedFor, praisedFor, sentencedTo), are captured only by DEFIE and REVERB (which, however, lacks any argument semantics).
We also assessed the coverage of resources based     Table 6, DEFIE reports a coverage between 81% and 89% depending on the resource, failing to cover mostly relations that refer to numerical properties (e.g. numberOfMembers). Finally, we tested the coverage of DEFIE over individual relation instances. We selected a random sample of 100 triples from the two closest competitors exploiting textual corpora, i.e. PATTY and WISENET. For each selected triple a i , r, a j , we sought an equivalent relation instance in our knowledge base, i.e. one comprising a i and a j and a relation pattern expressing the same semantic relation of r. Results in Table 7 show a coverage greater than 65% over both samples. Given the dramatic reduction of corpus size and the high precision of the items extracted, these figures demonstrate that definitional knowledge is extremely valuable for relation extraction approaches. This might suggest that, even in large-scale OIE-based resources, a substantial amount of knowledge is likely to come from a rather smaller subset of definitional sentences within the source corpus.

Quality of Relation Taxonomization
We evaluated our relation taxonomy by manually assessing the accuracy of our taxonomization heuristics. Then we compared our results against PATTY, the only system among our closest competitors that generates a taxonomy of relations. The setting for this evaluation was the same of that of Section 6.1. However, as we lacked a confidence measure in this case, we just extracted a random sample of 200 hypernym edges for each generalization procedure. We presented these samples to our human judges and, for each hypernym edge, we asked whether the corresponding pair of relations represented a correct generalization. We then estimated the overall precision as the proportion of edges regarded as correct.
Results are reported in Table 8, along with PATTY's results in the setting of Section 5; as PATTY's edges are ranked by confidence, we considered both its top confident 100 subsumptions and a random sample of the same size. As shown in Table  8, DEFIE outperforms PATTY in terms of precision, and generates more than twice the number of edges overall. HARPY (Grycner and Weikum, 2014) enriches PATTY's taxonomy with 616,792 hypernym edges, but its alignment algorithm, in the setting of Section 5, also includes transitive edges and still yields a sparser taxonomy compared to ours, with a graph density of 2.32 × 10 −7 . Generalization errors in our taxonomy are mostly related to disambiguation errors or flaws in the Wikipedia Bitaxonomy (e.g. the concept Titular Church 1 bn marked as hyponym of Cardinal 1 bn ).

Quality of Entity Linking and Disambiguation
We evaluated the disambiguation stage of DEFIE (Section 2.1) by comparing Babelfy against other state-of-the-art entity linking systems. In order to compare different disambiguation outputs we selected a random sample of 60,000 glosses from the input corpus of textual definitions (Section 5) and ran the relation extraction algorithm (Sections 2.1-2.3) using a different competitor in the disambiguation step each time. We eventually used the mappings in BabelNet to express each output using a common dictionary and sense inventory.
The coverage obtained by each competitor was assessed by looking at the number of distinct relations extracted in the process, the total number of relation instances extracted, the number of distinct concepts or entities involved, and the average number of semantic nodes within the relation patterns. For each competitor, we also assessed the precision obtained by evaluating the quality and semantic consistency of the relation patterns, in the same manner as in    Tables 9 and 10 for Babelfy and the following systems: • TagME 2.0 7 (Ferragina and Scaiella, 2012), which links text fragments to Wikipedia based on measures like sense commonness and keyphraseness (Mihalcea and Csomai, 2007); • WAT (Piccinno and Ferragina, 2014), an entity annotator that improves over TagME and features a re-designed spotting, disambiguation and pruning pipeline; • DBpedia Spotlight 8 (Mendes et al., 2011), which annotates text documents with DBpedia URIs using scores such as prominence, topical relevance and contextual ambiguity; • Wikipedia Miner 9 (Milne and Witten, 2013), which combines parallelized processing of Wikipedia dumps, relatedness measures and annotation features.
As shown in Table 9, Babelfy outperforms all its competitors in terms of coverage and, due to its unified word sense disambiguation and entity linking approach, extracts semantically richer patterns 7 tagme.di.unipi.it 8 spotlight.dbpedia.org 9 wikipediadataminer.cms.waikato.ac.nz   Table 12: Impact of each source on the extraction step with 2.37 semantic nodes on the average per sentence. This reflects on the quality of semantic relations, reported in Table 10, with an overall increase of precision both in terms of relations and in terms of individual instances; even though WAT shows slightly higher precision over relations, its considerably lower coverage yields semantically poor patterns (0.39 semantic nodes on the average) and impacts on the overall quality of relations, where some ambiguity is necessarily retained. As an example, the pattern X is station in Y, extracted from WAT's disambiguation output, covers both railway stations and radio broadcasts. Babelfy produces, instead, two distinct relation patterns for each sense, tagging station as railway station 1 bn for the former and station 5 bn for the latter.

Impact of Definition Sources
We carried out an empirical analysis over the input corpus in our experimental setup, studying the impact of each source of textual definitions in isolation. In fact, as explained in Section 5, BabelNet's textual definitions come from various resources: WordNet, Wikipedia, Wikidata, Wiktionary and OmegaWiki. Table 11 shows the composition of the input corpus with respect to each of these definition sources. The distribution is rather skewed, with the vast majority of definitions coming from Wikipedia (almost 90% of the input corpus).
We ran the relation extraction algorithm (Sections 2.1-2.3) on each subset of the input corpus. As in previous experiments, we report the number of relation instances extracted, the number of distinct re-   lations, and the average number of extractions for each relation. Results, as shown in Table 12, are consistent with the composition of the input corpus in Table 11: by relying solely on Wikipedia's first sentences, the extraction algorithm discovered 98% of all the distinct relations identified across the whole input corpus, and 93% of the total number of extracted instances. Wikidata provides more than 1 million extractions (5% of the total) but definitions are rather short and most of them (44.2%) generate only is-a relation instances. The remaining sources (WordNet, Wiktionary, OmegaWiki) account for less than 2% of the extractions.
6.6 Impact of the Approach vs. Impact of the Data DEFIE's relation extraction algorithm is explicitly designed to target textual definitions. Hence, the result it achieves is due to the mutual contribution of two key features: an OIE approach and the use of definitional data. In order to decouple these two factors and study their respective impacts, we carried out two experiments: first we applied DEFIE to a sample of non-definitional text; then we applied our closest competitor, PATTY, on the same definition corpus described in Section 5.
Extraction from non-definitional text. We selected a random sample of Wikipedia pages from the English Wikipedia dump of October 2012. We processed each sentence as in Sections 2.1-2.2 and extracted instances of those relations produced by DEFIE in the original definitional setting (Section 5); we then automatically filtered out those instances where the arguments' hypernyms did not agree with the semantic types of the relation. We evaluated manually the quality of extractions on a sample of  100 items (as in Section 6.1) for both the full set of extracted instances and for the subset of extractions from the top 100 scoring relations. Results are reported in Table 13: in both cases, precision figures show that extraction quality drops consistently in comparison to Section 6.1, suggesting that our extraction approach by itself is less accurate when moving to more complex sentences (with, e.g., subordinate clauses or coreferences).

PATTY on textual definitions.
Since no opensource implementation of PATTY is available, we implemented a version of the algorithm which uses BABELFY for named entity disambiguation. We then ran it on our corpus of BabelNet definitions and compared the results against those originally obtained by PATTY (on the entire Wikipedia corpus) and those obtained by DEFIE. Figures are reported in Table 14 in terms of number of extracted relation instances, distinct relations and hypernym edges in the relation taxonomy. Results show that the dramatic reduction of corpus size affects the support sets of PATTY's relations, worsening both coverage and generalization capability.

Preliminary Study: Resource Enrichment
To further investigate the potential of our approach, we explored the application of DEFIE to the enrichment of existing resources. We focused on BabelNet as a case study. In BabelNet's semantic network, nodes representing concepts and entities are only connected via lexicograhic relationships from WordNet (hypernymy, meronymy, etc.) or unlabeled edges derived from Wikipedia hyperlinks. Our extraction algorithm has the potential to provide useful information to both augment unlabeled edges with labels and explicit semantic content, and create additional connections based on semantic relations. Examples are shown in Table 15.  We carried out a preliminary analysis over all disambiguated relations with at least 10 extracted instances. For each relation pattern r, we first examined the concept pairs associated with its type signatures and looked in BabelNet for an unlabeled edge connecting the pair. Then we examined the whole set of extracted relation instances in R and looked in BabelNet for an unlabeled edge connecting the arguments a i and a j . Results in Table 16 show that only 27.7% of the concept pairs representing relation type signatures are connected in BabelNet, and most of these connections are unlabeled. By the same token, more than 4 million distinct argument pairs (53.5%) do not share any edge in the semantic network and, among those that do, less than 14% have a labeled relationship. These proportions suggest that our relations provide a potential enrichment of the underlying knowledge base in terms of both connectivity and labeling of existing edges. In BabelNet, our case study, cross-resource mappings might also propagate this information across other knowledge bases and rephrase semantic relations in terms of, e.g., automatically generated Wikipedia hyperlinks.

Related Work
From the earliest days, OIE systems had to cope with the dimension and heterogeneity of huge unstructured sources of text. The first systems employed statistical techniques and relied heavily on information redundancy. Then, as soon as semistructured resources came into play (Hovy et al., 2013), researchers started developing learning systems based on self-supervision (Wu and Weld, 2007) and distant supervision (Mintz et al., 2009;Krause et al., 2012). Crucial issues in distant supervision, like noisy training data, have been addressed in various ways: probabilistic graphical models (Riedel et al., 2010;Hoffmann et al., 2011), sophisticated multi-instance learning algorithms (Surdeanu et al., 2012), matrix factorization techniques (Riedel et al., 2013), labeled data infusion (Pershina et al., 2014) or crowd-based human computing (Kondreddi et al., 2014). A different strategy consists of moving from open text extraction to more constrained settings. For instance, the KNOWLEDGE VAULT (Dong et al., 2014) combines Web-scale extraction with prior knowledge from existing knowledge bases; BIPER-PEDIA  relies on schema-level attributes from the query stream in order to create an ontology of class-attribute pairs; RENOUN (Yahya et al., 2014) in turn exploits BIPERPEDIA to extract facts expressed as noun phrases. DEFIE focuses, instead, on smaller and denser corpora of prescriptive knowledge. Although early works, such as MindNet (Richardson et al., 1998), had already highlighted the potential of textual definitions for extracting reliable semantic information, no OIE approach to the best of our knowledge has exploited definitional data to extract and disambiguate a large knowledge base of semantic relations. The direction of most papers (especially in the recent OIE literature) seems rather the opposite, namely, to target Web-scale corpora. In contrast, we manage to extract a large amount of high-quality information by combining an OIE unsupervised approach with definitional data.
A deeper linguistic analysis constitutes the focus of many OIE approaches. Syntactic dependencies are used to construct general relation patterns (Nakashole et al., 2012), or to improve the quality of surface pattern realizations . Phenomena like synonymy and polysemy have been addressed with kernel-based similarity measures and soft clustering techniques (Min et al., 2012;, or exploiting the semantic types of relation arguments (Nakashole et al., 2012;Moro and Navigli, 2012). An appropriate modeling of semantic types (e.g. selectional preferences) constitutes a line of research by itself, rooted in earlier works like (Resnik, 1996) and focused on either class-based (Clark and Weir, 2002), or similarity-based (Erk, 2007), approaches. However, these methods are used to model the semantics of verbs rather than arbitrary patterns. More recently some strategies based on topic modeling have been proposed, either to infer latent relation semantic types from OIE relations (Ritter et al., 2010), or to directly learn an ontological structure from a starting set of relation instances (Movshovitz-Attias and Cohen, 2015). However, the knowledge generated is often hard to interpret and integrate with existing 539 knowledge bases without human intervention (Ritter et al., 2010). In this respect, the semantic predicates proposed by Flati and Navigli (2013) seem to be more promising.
A novelty in our approach is that issues like polysemy and synonymy are explicitly addressed with a unified entity linking and disambiguation algorithm. By incorporating explicit semantic content in our relation patterns, not only do we make relations less ambiguous, but we also abstract away from specific lexicalizations of the content words and merge together many patterns conveying the same semantics. Rather than using plain dependencies we also inject explicit semantic content into the dependency graph to generate a unified syntactic-semantic representation. Previous works  used similar semantic graph representations to produce filtering rules for relation extraction, but they required a starting set of relation patterns and did not exploit syntactic information. A joint approach of syntacticsemantic analysis of text was used in works such as (Lao et al., 2012), but they addressed a substantially different task (inference for knowledge base completion) and assumed a radically different setting, with a predefined starting set of semantic relations from a given knowledge base. As we enforce an OIE approach, we do not have such requirements and directly process the input text via parsing and disambiguation. This enables DEFIE to generate relations already integrated with resources like WordNet and Wikipedia, without additional alignment steps (Grycner and Weikum, 2014), or semantic type propagations . As shown in Section 6.3, explicit semantic content within relation patterns underpins a rich and high-quality relation taxonomy, whereas generalization in (Nakashole et al., 2012) is limited to support set inclusion and leads to sparser and less accurate results.

Conclusion and Future Work
We presented DEFIE, an approach to OIE that, thanks to a novel unified syntactic-semantic analysis of text, harvests instances of semantic relations from a corpus of textual definitions. DEFIE extracts knowledge on a large scale, reducing data sparsity and disambiguating both arguments and relation patterns at the same time. Unlike previous semantically-enhanced approaches, mostly relying on the semantics of argument types, DEFIE is able to semantify relation phrases as well, by providing explicit links to the underlying knowledge base. We leveraged an input corpus of 4.3 million definitions and extracted over 20 million relation instances, with more than 250,000 distinct relations and almost 2.4 million concepts and entities involved. From these relations we automatically constructed a highquality relation taxonomy by exploiting the explicit semantic content of the relation patterns. In the resulting knowledge base concepts and entities are linked to existing resources, such as WordNet and Wikipedia, via the BabelNet semantic network. We evaluated DEFIE in terms of precision, coverage, novelty of information in comparison to existing resources and quality of disambiguation, and we compared our relation taxonomy against state-of-the-art systems obtaining highly competitive results.
A key feature of our approach is its deep syntactic-semantic analysis targeted to textual definitions. In contrast to our competitors, where syntactic constraints are necessary in order to keep precision high when dealing with noisy data, DEFIE shows comparable (or greater) performances by exploiting a dense, noise-free definitional setting. DE-FIE generates a large knowledge base, in line with collaboratively-built resources and state-of-the-art OIE systems, but uses a much smaller amount of input data: our corpus of definitions comprises less than 83 million tokens overall, while other OIE systems exploit massive corpora like Wikipedia (typically more than 1.5 billion tokens), ClueWeb (more than 33 billion tokens), or the Web itself. Furthermore, our semantic analysis based on Babelfy enables the discovery of semantic connections between both general concepts and named entities, with the potential to enrich existing structured and semi-structured resources, as we showed in a preliminary study on BabelNet (cf. Section 6.7).
As the next step, we plan to apply DEFIE to open text and integrate it with definition extraction and automatic gloss finding algorithms (Navigli and Velardi, 2010;Dalvi et al., 2015). Also, by further exploiting the underlying knowledge base, inference and learning techniques (Lao et al., 2012;Wang et al., 2015) can be applied to complement our model, generating new triples or correcting wrong ones. Fi-540 nally, another future perspective is to leverage the increasingly large variety of multilingual resources, like BabelNet, and move towards the modeling of language-independent relations.