Knowledge Completion for Generics using Guided Tensor Factorization

Given a knowledge base or KB containing (noisy) facts about common nouns or generics, such as “all trees produce oxygen” or “some animals live in forests”, we consider the problem of inferring additional such facts at a precision similar to that of the starting KB. Such KBs capture general knowledge about the world, and are crucial for various applications such as question answering. Different from commonly studied named entity KBs such as Freebase, generics KBs involve quantification, have more complex underlying regularities, tend to be more incomplete, and violate the commonly used locally closed world assumption (LCWA). We show that existing KB completion methods struggle with this new task, and present the first approach that is successful. Our results demonstrate that external information, such as relation schemas and entity taxonomies, if used appropriately, can be a surprisingly powerful tool in this setting. First, our simple yet effective knowledge guided tensor factorization approach achieves state-of-the-art results on two generics KBs (80% precise) for science, doubling their size at 74%–86% precision. Second, our novel taxonomy guided, submodular, active learning method for collecting annotations about rare entities (e.g., oriole, a bird) is 6x more effective at inferring further new facts about them than multiple active learning baselines.


Introduction
We consider the problem of completing a partial knowledge base (KB) containing facts about generics or common nouns, represented as a third-order tensor of (source, relation, target) triples, such as (butterfly, pollinate, flower).Since generics represent classes of similar individuals, the truth value y i of a generics triple x i = (s, r, t) depends on the quantification semantics one associates with s and t.A natural setting, especially relevant for noisy facts derived automatically from text, is to associate s with a categorical quantification from {all, some, none} and associate t (implicitly) with some.For instance, "all butterflies pollinate (some) flower" and "some animals live in (some) forest".We treat the quantification of s as the categorical label y i for the triple x i .Given a KB of such triples, the task is to infer as many more triples as possible at a high (e.g., 80%) precision.
Tensor factorization and graph based methods have both been found to be very effective for expanding knowledge bases, but have focused on named entity KBs such as Freebase [2] involving relations with clear semantics such as bornIn and isACityIn and disambiguated entities such as Barack Obama or Hawaii.It has been observed that there are many reliable Horn clauses that connect predicates in this setting.For instance, for any person x, city y, and country z, (x, bornIn, y) & (y, isACityIn, z) ⇒ (x, nationalityIs, z).With generics, however, clear patterns or reliable first-order logic rules are rare, in part due to each generic representing a collection of entities that often have similarities and differences w.r.t.various relations.For example, (x, liveIn, mountain) is true for many cats and caribou, but there is little tangible similarity between the two animals.
On the other hand, generics often come with additional rich background knowledge complementing the information present in the KB itself, such as a taxonomic hierarchy of entities (available from sources such as WordNet [15]) and the corresponding entity types and relation schema.
Our key insight is that, if used appropriately, taxonomic and schema information can be surprisingly effective in making tensor factorization methods vastly more effective for generics for deriving high precision facts.Intuitively, many properties of generics tend to be shared by their siblings in the taxonomy (e.g., finch, oriole, and hummingbird) where as siblings of named entities (e.g., various people) differ widely (e.g., who they are married to, where they live, etc.).We propose three ways of using this information and empirically demonstrate the effectiveness of each for generics.
First, we observe that simply imposing schema consistency (Section 3.1) on derived facts can boost the performance of state-of-the-art methods such as Holographic Embeddings (HolE) [19] from nearly no new facts at 80% precision to over 10,000 new facts, starting with a generics KB of about the same size.Other low-dimensional embedding methods, such as TransE [3], RESCAL [17], and CNTF [20] (which uses schema information as well), also obtained no new facts at 80% precision.Graph-based completion methods did not scale to our densely connected tensors. 1econd, one can further boost performance by transferring knowledge up and down the taxonomic hierarchy, using the quantification semantics of generics (Section 3.2).We show that simply expanding the starting tensor this way before applying tensor factorization results in a statistically significant higher precision over new facts at the same yield.
Finally, we propose a novel limited-budget taxonomy guided active learning method to address the challenge of significant incompleteness in generics KBs, by quantifying uncertainty via siblings (Section 4).Reliable facts about generics are much harder to derive using state-of-the-art information extraction methods than facts about named entities.The large number of many-to-many relations makes it challenging to cover most of the true facts by only using textual information from books, the Web, or other corpora.This, in turn, makes generics KBs vastly incomplete, with no or very little information about certain entities such as caribou or oriole.
To predict facts about a previously unseen or rarely seen entity ẽ, we first identify and annotate a small number of active queries about ẽ and then feed this to the factorization framework to infer more facts about ẽ.Specifically, we define a correlation based measure of the uncertainty of each triple involving ẽ, based on how frequently is the corresponding triple true for ẽ's siblings in the taxonomic hierarchy (Section 4.1).We then propose a submodular objective function, and a corresponding greedy (1 − 1/e)-approximation, to search for a small subset of triples to annotate that optimally balances diversity with coverage (Section 4.2).We demonstrate that annotating this balanced subset makes tensor factorization derive substantially more new and interesting facts compared to several active learning baselines.

Related Work
KB completion approaches fall into two main classes: graph-based methods and those employing low-dimensional embeddings via matrix or tensor factorization [8,18,21].The former use graph traversal techniques to complete the KB [9,13].While highly effective on named entity KBs, this class of solutions doesn't scale well to our setting (cf.Footnote 1).This may be due to different connectivity characteristics of generics tensors compared to named entity ones such as FB15k [3].
Several embedding-based methods have been highly successful at KB completion.We compare against many of these, including variants of HolE, TransE, and RESCAL.
Recent work on incorporating entity type and relation schema in tensor factorization [11,12,25] has focused on factual databases, with very different characteristics than generics tensors.Nimishakavi et al. [20] use entity type information as a matrix in the context of non-negative RESCAL for schema induction on medical research documents.As a byproduct, they complete missing entries in the tensor in a schema-compatible manner.We show that our proposal performs better on generics tensors than their method, CNTF.Chang et al. [4]'s TRESCAL system also incorporates types in RESCAL in a manner very similar to CNTF.
For schema-aware discriminative training of embeddings, Xie et al. [25] use a flexible ratio of negative samples from both schema consistent and schema inconsistent triples.Their combined ideas, however, do not improve upon vanilla HolE (one of our baselines) on the standard FB15k [3] dataset.They also consider imposing hierarchical types for Freebase, as entities may have different meanings when they have different types -an issue that typically does not apply to generics KBs.Incorporating given first order logic rules (which are unavailable for generics) has been explored for the simpler case of matrix factorization [7,22].
Xie et al. [24] consider inferring facts about a new entity ẽ given a 'description' of that entity.They use Convolutional Neural Networks (CNNs) to encode the description, deriving an embedding for ẽ.Such a description in our context would correspond to knowing some factual triples about ẽ, which is a restricted version of our active learning setting.
Krishnamurthy and Singh [10] consider active learning for CP decomposition of tensors.They start with an empty tensor and look for the most informative slices and columns to fill completely to achieve optimal sample complexity.The incoherence assumption on the column space that their framework builds upon, however, does not apply to generics KB.

Tensors of Generics
We consider knowledge expressed in terms of (source, relation, target) triples, abbreviated as (s, r, t).Such a triple may refer to (subject, predicate, object) style facts commonly used in information extraction.Each source and target is an entity that is a generic noun, e.g., animals, habitats, or food items.Examples of relations include foundIn, eat, etc.With each generics triple (s, r, t), we associate a categorical truth value q ∈ {all, some, none}, defining the quantification semantics "q s r (some) t".For instance, "some animals live in (some) forest" and "all dogs eat (some) bone".Given a set K of such triples with annotated truth values, the task is to predict additional triples K that are also likely to be true.
In addition to a list of triples, we assume access to background information in the form of entity types and the corresponding relation schema, as well as a taxonomic hierarchy. 2Let E T denote the set of possible entity types.For each relation r, the relation schema imposes a type constraint on the entities that may appear as its source or target.Specifically, the schema for r is a collection r .For example, the relation foundIn may be associated with the schema S foundIn = {(animal, location), (insect, animal), (plant, habitat), . . .}. Similarly, the taxonomic hierarchy defines a partial order H over all entities that captures the "isa" relation, with direct links such as isa(dog, mammal) or isa(gerbil, rodent).We use this information to extract "siblings" of a given entity, i.e., entities that share a common parent (this may be easily generalized to any common ancestor).

Guided Knowledge Completion
We begin with an overview of tensor factorization for KB completion for generics.Let (s, r, t) be a generics triple associated with a categorical quantification label q ∈ {all, some, none}.For example, ((cat, havePart, whiskers), all), ((cat, liveIn, homes), some), and ((cat, eat, bear), none).Predicting such labels is thus a multi-class classification problem.Given a set K of labeled triples, the goal of tensor factorization is to learn a low-dimensional embedding h for each entity and relation such that some function f of h best captures the given labels.Given a new triple, we can then use f and the learned h to predict the probability of each label for it.K often contains only "positive" triples, i.e., those with label all or some.A common step in discriminative training for h is thus negative sampling, i.e., generating additional triples that (are expected to) have label none.
} be a set of triples x i = (s i , r i , t i ) and corresponding labels y i ∈ {1, 2, 3} equivalent to categorical quantification label q i ∈ {all, some, none}.We learn entity and relation embeddings Θ that minimizes the multinomial logistic loss as: where h r , h s , h t ∈ R d denote the learned embeddings (latent vectors) for s, r, t, respectively, and σ(•) is the sigmoid function defined as σ(z) = 1 1+exp(−z) .If the all categorical label for generics is unavailable, we can simplify the label space to {some, none}, modeled as y i ∈ {±1}, and reduce the model to binary classification: While all our proposed schemes are embedding oblivious, for concreteness, we describe and evaluate them for the recent Holographic Embeddings (HolE) model [19] which models the label probability as: where As can be deduced from eqns.( 3)-( 4), this model can capture asymmetry of relations.In addition, circular correlation can be performed using the fast Fourier transform (FFT), making HolE highly efficient.

Incorporating Types, Relation Schema (ITRS)
As described earlier, relation schema S r imposes a restriction on sources and targets that may occur with a relation r.We can incorporate this knowledge both at training and at test times.Doing this at test time simply translates to relabeling schema inconsistent predicted triples as none.Incorporating this knowledge at training time can be done as a constraint on the random negative samples that the method generates to complement the given, typically positive, triples for training.In general, the ratio of random negative samples from the entire tensor T and random negative samples from the schema consistent portion T of T is a parameter that should be tuned such that the resulting negative samples mimic the true distribution of labels.It is worth noting that the locally closed world assumption (LCWA) plays an important role in determining this ratio.However, this idea has been used in the literature without considering the nature of the dataset, resulting in some seemingly contradicting empirical results on the optimal ratio [14,25].
As discussed later, we found sampling from T worked best on our datasets.

Incorporating Entity Taxonomy (IET)
It is challenging to come up with complex Horn or first order logic rules for generics, as each entity represents a class of individuals that may not all behave identically.However, we can derive simple yet highly effective rules based on categorical quantification labels, leveraging the fact that entities come from different levels in a taxonomy hierarchy.Let p be the parent entity for entity set {c i }.
Then we have: ((p, r j , t j ), all ) ⇒ ∀i ((c i , r j , t j ), all ) ∀i ((c i , r j , t j ), all ) ⇒ ((p, e j , t j ), all ) ∃i ((c i , r j , t j ), all ) ⇒ ((p, e j , t j ), some) ∃i ((c i , r j , t j ), some) ⇒ ((p, e j , t j ), some) We apply these rules to address sparsity of generics tensors, making tensor factorization more robust.Specifically, given initial triples K, we use applicable rules to derive additional triples K , perform tensor factorization on K ∪ K , and then revisit the triples in K using their predicted label probabilities.Note that this approach allows us to be robust to taxonomic errors: instead of assuming each triple in K is true, we use this only as a prior and let tensor factorization determine the final prediction based on global patterns it finds.

Active Learning for New or Rare Entities
To address the incomplete nature of generics KBs, we consider rare entities for which we have very few facts, or new entities which are present in the taxonomy but for which we have no facts in the KB.The goal is to use tensor factorization to generate high quality facts about such entities.For instance, consider the task of inferring facts about oriole, where all we know is that it is a bird.
We assume a restricted budget on number of the facts we can query (for human annotation) about oriole, using which we would like to predict many more high-quality facts about it.Given a fixed query budget B, what is the optimal set queries we should generate for human annotation about a new or rare entity ẽ for this task?We view this as an active learning problem and propose a two-step algorithm.First, we use taxonomy guided uncertainty sampling to propose a list L to potentially query.Next, we describe a submodular objective function and a corresponding linear time algorithm to choose an optimal subset L ⊆ L satisfying | L| = B. We then use L for human annotation, append the result to the original KB, and perform tensor factorization to predict additional new facts about ẽ.For notational simplicity and without loss of generality, throughout this section, we consider the case where ẽ appears as the source entity in the triple; the ideas apply equally when ẽ appears as the target entity in the triple.

Knowledge Guided Uncertainty Quantification
We now discuss the active learning and specifically uncertainty sampling method we use to propose a list of triples to query.Uncertainty sampling considers the uncertainty for each possible triple (ẽ, r i , e i ), defined as one minus the conditional probability of this fact given the facts we already know (i.e., KB) [23].The question is how to model this conditional probability.A simple baseline is to consider Random queries, i.e., r, e are selected randomly from the list of relations and entities in the tensor, respectively.
To infer information about ẽ, we propose the following approximation for the conditional probability of a new fact about ẽ given the KB.Let Ẽẽ = {e | corr (ẽ, e) > 0} be set of entities that are correlated with ẽ and Ω = {((e i , r i , e i ), y i )|e i ∈ Ẽẽ } be set of known facts about such entities and y i be the label for triple (e i , r i , e i ).We have: 1 However, in practice, we cannot measure corr (ẽ, e i ) for every entry in the KB as we do not have complete information about ẽ.One simple idea is to consider that every entity is correlated with ẽ: corr (ẽ, e i ) = 1 ∀e i ∈ E. We will refer to this as Schema Consistent query proposal as this relates to summing over all possible (hence schema consistent) facts.Since we have access to taxonomy information, we can do a more precise, Sibling Guided, approximation.We propose the following approximation for corr (ẽ, e i ), e i ∈ E, corr (ẽ, e i ) = 1 e i ∈ sibling(ẽ) 0 o.w.
Eqns. ( 5) and ( 6) can be used to infer uncertain triples: if every sibling of ẽ has relationship r with an entity e , we can infer this is the case for ẽ as well.On the other hand, when siblings disagree in this respect, there is more uncertainty about (ẽ, r, e ) (according to ( 5) and ( 6)), making this triple a good candidate to query.Setting some upper and lower bound on the uncertainty (conditional probability as in ( 5)), we reach a set L = {(ẽ, r i , e i ), i ∈ I} of triples to query.We also infer the set M = {(ẽ, r j , e j ), j ∈ J} of triples that a large majority of siblings agree upon, and hence ẽ is expected to agree as well.See Algorithm 1.In our example of oriole, the siblings are the birds that exist in the tensor, e.g., hummingbird, finch, hummingbird, woodpecker, etc.All of them (eat, insect) and hence we infer this for oriole.But there is no agreement on (appearIn, farm) and hence this is added to the query list.

Efficient Subset Selection
Given the list L as above (Algorithm 1), which we can write in short as L = {(r i , e i ), i ∈ I}, the problem is to find the "best" subset L. A baseline for such a selection is to choose the top k queries.
We will refer to this as TK subset selection.
Viewing subset selection as a combinatorial problem, we devise an objective F that models several natural properties of this subset.We then prove that F is submodular, implying that a simple Algorithm 1: Active Learning for Query Proposal input new entity ẽ, KB, taxonomy, lower bound κ M on agreement, lower bound τ L on uncertainty, upper bound τ U on uncertainty 1: Extract list S ẽ of sibling(ẽ) using taxonomy 2: For each e i ∈ S ẽ, add all facts about e i to Ω 3: for (ẽ, r i , e i ) ∈ Ω do 4: Use ( 5) and ( 6) to estimate p = Pr(φ r i (ẽ, e i ) = 1) greedy algorithm can efficiently compute a (1 − 1/e)-approximation of the global optimum [16].We refer to this as SM subset selection method.
Since queried samples will eventually be fed into tensor factorization, we would like L to cover entities (for the other argument of the triple) and relations as much as possible.In addition, we would like L to be diverse, i.e., prioritize relations and entities that are more varied. 3At the same time, we would also want to minimize redundancy, i.e., avoid choosing relations (entities) that are too similar.Let F( L, R L , E L ) denote our objective, where R L , E L is the set of relations and entities in L, respectively.We decompose it as: where the terms in RHS correspond to coverage, diversity and redundancy, resp., and w C , w D , w R are the corresponding non-negative weights.Next, we propose functional forms for these terms.Note that any function that captures the described properties can be used instead, as long as the objective remains submodular.Let R and E denote the set of relations and entities in the KB, resp.The coverage simply captures the fraction of entity and relations that we have included in L: The diversity for L is the sum of the diversity measure of the entities and relations included in the set: We define the diversity measure for each relation r as the ratio of the number of entities that appear in the KB as its source or target over the number all entities.For an entity, we define the notion of diversity in the same manner as shown below.Note that diversity measure is an intrinsic characteristic of each entity and relationship dictated by KB and is independent of the set L. Let V r and V e represent the diversity measure of relation r and entity e, respectively.We use E Sr , E Tr to denote the set of sources and targets that appear for relation r in the KB and R e as the set of Select l * = arg max L\ L G(l) 6: Add l * to L output L relations in the KB that have e as their target and E Se as the set of entities that appear as the first entity when e is the second entity of the triple in the KB, As described above, redundancy is a measure of similarity between relations(entities) in L. Tensor factorization yields an embedding for each relation(entity) given the facts they participated in.Therefore, the learned embeddings are one of the best options for capturing similarities.Let h e (h r ) denote the learned embedding for entity e(relation r).We define Theorem 1. Algorithm 2 gives a (1 − 1/e)-approximation of the global optimum of F.
Theorem 1 follows by proving that the objective F in eqn.( 7) is submodular.Since addition preserves submodularity and the weights w C , w D , w R are non-negative, it suffices to show that each term in F is submodular.For proof, see Appendix A. We compare our query proposal and subset selection methods with the corresponding baselines in Table 2.

Experiments
Dataset and Setup: To assess the quality of our guided KB completion method, we consider two knowledge bases about generics: (1) a Science tensor containing facts about animals, occupations, activities, locations, etc. [6]; and (2) its Animals sub-tensor that focuses on facts about animals.This data does not include ((s, r, t), all) style triples, so we use the objective function in Eqn.(2) rather than the multi-class one in Eqn.(1).Dalvi et al. use a pipeline consisting of Open IE [1] extractions, aggregation, and clean up via crowd-sourcing to generate domain-targeted facts about elementary level science.These facts come with a relevant WordNet [15] based taxonomy, entity types, and relation schema.Table 1 summarizes statistics of the datasets we use for experimentation, 4 where we ensure every entity is mentioned at least 20 times in the corresponding tensor. 5In order to limit error propagation, we use a two-level abstraction of the taxonomy.Guided KB Completion: We first compare our method (Section 3) with existing KB completion techniques on the Animals tensor, and then demonstrate that its effectiveness carries over scalably to the larger Science tensor as well.
We examine two alternatives for generating negative samples: given a triple (s, r, t) ∈ T , replace s with (1) any entity s or (2) an entity s of the same type as s.The resulting perturbed triple (s , r, t) is then treated as a negative sample if it is not present in T .We also considered a weighted combination of ( 1) and ( 2), and found random sampling to be the most reliable on our datasets.This complies with lack of LCWA.
We consider extensions of three state-of-the-art embedding-based KB completion methods: HolE, TransE, and RESCAL.As mentioned earlier, two leading graph-based methods, SFE and PRA, did not scale well.Both vanilla TransE and RESCAL resulted in poor performance; we report numbers only for their extensions.Specifically, we consider 3 baselines: (1) HolE, (2) TransE+Schema, and (3) CNTF which extends RESCAL and incorporates schema.Figure 1 shows the resulting precision-yield curves for the predictions made by each method on the Animals dataset containing 10.6K facts.TransE+ITRS gave a precision of only around 10% and is omitted from the plot.We make two observations.First, deriving new facts for these generics tensors at a high precision is challenging!Specifically, none of the baseline methods achieve a yield of more than 10% of T with precision at least 60%.Second, external information, if used appropriately, can be surprisingly powerful in this setting.Specifically, simply incorporating relation schema (ITRS) allows HolE-based completion to double the size of the starting tensor T by producing over 10K new triples at a precision of 82%.Further, incorporating entity taxonomy (IET) to address tensor sparsity results in the same yield at a statistically significantly higher precision of 86.4%.
Figure 2 lists the top 20 predictions by various approaches, illustrating that our proposed method infers many interesting and useful generic facts about the world.Next, we evaluate our proposal on the entire Science dataset with 66.6K facts.Since graphbased methods did not scale well to the much smaller Animals dataset and other methods performed substantially worse there, we focus here on the scalability and prediction quality of our method.We found that HolE+ITRS+IET scales well to this high dimension, doubling the number of facts by adding 66K new facts at 74% precision.Although the Science tensor is 1,000 times larger than the Animals tensor, the method took only 10x longer to run (3 minutes on Animals tensor vs. 56 minutes on Science tensor, using a 2.8GHz, 16GB Macbook Pro).
Active Learning for New Entities: To assess the quality of our active learning mechanism (Section 4), we consider predicting facts about a new entity ẽ that is not in the Animals tensor.For illustration, we choose ẽ from the Science vocabulary while ensuring that it is present in the WordNet taxonomy.The setup is as follows.We first use a query generation mechanism (Random, Schema Consistent, or Sibling Guided; cf.Section 4.1) to propose an ordered list L of facts about ẽ to annotate.Next, we perform subset selection (Top K or TK, Submodular or SM; cf.Section 4.2) on L to identify a subset L of up to 100 most promising queries.These are then annotated and the true ones fed into tensor factorization as additional input to infer further new facts about ẽ.
In Table 2, we assess the quality of L in two ways, when | L| = 100: how many true facts does L have and how many overall new facts does this annotation produce about ẽ. Figure 3 provides a complementary view, focusing on the overall number of new facts inferred as | L| increases.While these illustrative numbers are for a representative new entity, reindeer, the overall trend and order of numbers remain the same.
We highlight a few observations.First, not surprisingly, randomly choosing triples about ẽ to annotate is ineffective.Second, choosing schema consistent triples results in 73 true triples (out of 100) but these facts help tensor factorization very little, resulting in only 10 additional new triples about ẽ.Our proposed sibling guided querying mechanism results not only in nearly all 100 facts being true along with 17 true facts inferred from sibling agreement (set M in Algorithm 1), but also,

Conclusion
This work explores KB completion for a new class of problems, namely completing generics KBs, which is an essential step for including general world knowledge in intelligent machines.The differences between generics and much studied named entity KBs make existing techniques either not scale well or produce facts at an undesirably low precision out of the box.We demonstrate that incorporating entity taxonomy and relation schema appropriately can be highly effective for generics KBs.Further, to address scarcity of facts about certain entities in such KBs, we devise a novel active learning approach using sibling guided uncertainty estimation along with submodular subset selection.The proposed techniques substantially outperform various baselines, setting a new state of the art for this challenging class of completion problems.

A Proofs
Proof of Theorem 1.In order to prove the result, in suffices to show that F( L, R L , E L ) in Equation ( 7) is submodular.Then Algorithm 2 can find a (1−1/e)-approximation of the global optimum of F.
To demonstrate that the objective F is submodular, we need to show that for L ⊆ L ⊆ L and for l = (r l , e l ) ∈ L \ L , F(L ∪ l) − F(L ) ≥ F(L ∪ l) − F(L ) Since addition preserves submodularity and the weights w C , w D , w R are non-negative, it suffices to show that each term in F is submodular.l = (r l , e l ) ∈ L \ L .Therefore, at least one of r l or e l is not included in L and hence it is not included in L .Note that for each of the above lines the difference can be either +1 or 0 and since L ⊂ L the LHS is always greater than or equal to the RHS and the inequality holds.
2. Next we consider the diversity term.The above argument directly holds and hence it is submodular.
3. In order to show that −R( L, R L , E L ) is submodular, note that when taking the difference between R(L ∪ l) and R(L ) the terms that correspond to both entities(relations) being in L cancel out.The same holds for R(L ∪ l) − R(L ).Therefore, we have that This concludes the proof.

Algorithm 2 :
Subset Selection for Query Selection input KB, B as budget size, L from Algorithm 1.1: ∀(r, e) ∈ L, compute the diversity measure V r , V e 2: L ← ∅ 3: for j = 1 to B do 4:

Figure 1 :
Figure 1: Precision-yield curves for various embedding-based methods on the Animals tensor.

Figure 2 :
Figure 2: Top 20 predictions by various methods.True triples are shown in black and false triples in red.

Figure 3 :
Figure 3: Active Learning for new entities: Total number of new inferred facts for various human annotation query sizes.

Table 2 :
Active Learning for new entities: Number of new facts inferred (from annotation, sibling agreement, tensor factorization, and in total) for a representative new entity ẽ, when querying 100 facts about ẽ for human annotation.