Utilizing Temporal Information for Taxonomy Construction

Taxonomies play an important role in many applications by organizing domain knowledge into a hierarchy of ‘is-a’ relations between terms. Previous work on automatic construction of taxonomies from text documents either ignored temporal information or used fixed time periods to discretize the time series of documents. In this paper, we propose a time-aware method to automatically construct and effectively maintain a taxonomy from a given series of documents preclustered for a domain of interest. The method extracts temporal information from the documents and uses a timestamp contribution function to score the temporal relevance of the evidence from source texts when identifying the taxonomic relations for constructing the taxonomy. Experimental results show that our proposed method outperforms the state-of-the-art methods by increasing F-measure up to 7%–20%. Furthermore, the proposed method can incrementally update the taxonomy by adding fresh relations from new data and removing outdated relations using an information decay function. It thus avoids rebuilding the whole taxonomy from scratch for every update and keeps the taxonomy effectively up-to-date in order to track the latest information trends in the rapidly evolving domain.


Introduction
The explosion in the amount of unstructured text data gives us the opportunity to explore knowledge in depth, but there are also challenges to recognize useful information for our interests. To provide access to information effectively, it is important to organize unstructured data in a structured and meaningful manner. Taxonomies, which serve as backbones for structured knowledge, are use-ful for many NLP applications such as question answering (Harabagiu et al., 2003) and document clustering (Fodeh et al., 2011). However, handcrafted, well-structured taxonomies such as Word-Net (Miller, 1995), OpenCyc (Matuszek et al., 2006) and Freebase (Bollacker et al., 2008), which are publicly available, can be incomplete for new or specialized domains. As it is time-consuming and cumbersome to create a new one manually, methods for automatic domain-specific taxonomy construction from text corpora are highly desirable.
Previous work on automatic construction of domain-specific taxonomies from text documents assumed that the data sets (that is, the document sets) and the underlying taxonomic relations are static. However, the data sets for certain domains may evolve over time, as new documents are added while older documents are deleted or modified. As such, the taxonomic relations for these potentially fast-changing domains may not remain static but become dynamic over time as new domain terms emerge while some older ones disappear. For example, in World Health Organization reports about disease outbreak, the term 'smallpox' used to be a hyponym of 'dangerous disease', but it has fallen off since 1980. On the other hand, since 2014 the term 'Ebola' has become an emerging hyponym of 'dangerous disease'. As another example, up until 1992, in a collection of US yearly reports of terrorism, the term 'Palestine Liberation Organization' used to be a hyponym of 'terrorist group', but it is no longer true nowadays. 'Palestine Liberation Organization' should now be classified as a 'national organization' of Palestine.
When temporal information in data sets is not captured, the resultant taxonomy may be incomplete or outdated and misleading. This could be caused by the overwhelming evidence of older patterns/contexts compared to emerging, but relatively small amount of, evidence of newer relations. For example, in the taxonomy of US yearly reports on terrorism, many previous methods could fail to recognize the taxonomic relation between the two terms 'ISIS' and 'terrorist group' due to relatively infrequent mentions of 'ISIS' (which only appears in reports from 2014). Meanwhile, 'Palestine Liberation Organization' could still be classified as a hyponym of 'terrorist group' because of the relatively larger number of mentions in the documents from the earlier years.
In this paper, we propose a time-aware method for domain-specific taxonomy construction from a time series of text documents for a particular domain. We incorporate temporal information into the process of identifying taxonomic relations by computing evidence scores for the data sources weighted by a timestamp contribution function (Efron and Golovchinsky, 2011;Li and Croft, 2003) to capture the temporally-varying contributions of evidence from various documents at a particular point in time. We assume that newer evidence is more important than older evidence. For example, the evidence that 'Palestine Liberation Organization' was a hyponym of 'terrorist group' in 1990 is less important now than the evidence that 'ISIS' is a hyponym of 'terrorist group' in 2014. In the proposed method, we incorporate the timestamp contribution function into the method of Tuan et al. (2014) to measure the weights of the evidence for both statistical and linguistic methods. With such built-in time-awareness for taxonomy construction, we ensure that the constructed taxonomies are up-to-date for the fast-changing domains found in newswire and social media, where users constantly search for updated relations and track information trends.
Most previous work requires re-running the taxonomy construction process whenever there are new incoming data. Our proposed method enables incremental update of the constructed taxonomies to avoid costly reconstructions. We incorporate an information decay function (Smucker and Clarke, 2012) to manage outdated relations in the constructed taxonomy. The decay function measures the extent that the relation is out-of-date over time, and we incorporate it into a time-aware graph-based algorithm for taxonomy update.
The contributions of our research are summarized as follows: • We propose a time-aware method for taxonomy construction that extracts and utilizes temporal information to measure evidence weights of taxonomic relationships. Our method constructs an up-to-date taxonomy by adding new emerging relations and discarding obsolete and incorrect ones.
• We propose an incremental time-aware graphbased algorithm to update an existing taxonomy, instead of rebuilding a new taxonomy from scratch.

Related Work
Previous work on taxonomy construction can be roughly divided into two main approaches: statistical learning and linguistic pattern matching. Statistical learning methods for taxonomy construction include co-occurrence analysis (Lawrie and Croft, 2003), hierarchical Latent Dirichlet Allocation (LDA) (Blei et al., 2004), clustering (Li et al., 2013), term embedding (Tuan et al., 2016), linguistic feature-based semantic distance learning (Yu et al., 2011), and co-occurrence subnetwork mining (Wang et al., 2013). Supervised statistical methods (Petinot et al., 2011) rely on hierarchical labels to learn the corresponding terms for each label. The labeled training data, however, are costly to obtain and may not always be available in practice. Unsupervised statistical methods (Pons-Porrata et al., 2007;Li et al., 2013;Wang et al., 2013) are based on the idea that terms that frequently co-occur may have taxonomic relationships. However, these methods generally achieve low accuracy. Linguistic approaches for taxonomy construction rely on lexical-syntactic patterns (Hearst, 1992) (e.g., 'A such as B') to capture textual expressions of taxonomic relations, matching them with given documents or Web information to identify the relations between a term and its hypernyms (Kozareva and Hovy, 2010;Navigli et al., 2011;Wu et al., 2012). These patterns can be manually created (Wu et al., 2012) or automatically identified (Navigli et al., 2011). Such linguistic pattern matching methods can generally achieve higher precision than the statistical methods, but they suffer from lower coverage. To balance precision and recall, Zhu et al. (2013) and Tuan et al. (2014)   find taxonomic relations.
The approach that is closest to our work is the one proposed by Zhu et al. (2013), which performs dynamic taxonomy update. To keep up with ever changing social media data, new terms are mined from incoming data and added to the existing taxonomy. The data are divided into separate clusters using pre-defined time periods based on their document timestamps. The newly found taxonomic relations in each period are then added to the existing taxonomy. The use of a pre-defined time period to discretize the time series of the documents for taxonomy update can be problematic. If the chosen time period is too long, rapid changes of domain terms and their taxonomic relations that have occurred within the time period may not be reported. If the time period is too short, it may fail to identify valid taxonomic relations that needed a longer time period to establish. The method also does not remove those relations in the existing taxonomy that may have become obsolete over time.

Problem Specification
We define the root term of a domain-specific taxonomy as a word or phrase that represents the domain of interest. It can be any informative concept such as an entity ('animal') or event ('Ebola outbreak'). Given a root term R, we define a corpus C as a preclustered set of a time series of text documents according to R.
Given two terms t 1 and t 2 , we denote t 1 → t 2 as a taxonomic relation where t 1 is a hypernym of t 2 . In this work, we define a taxonomy as a triple H = (V, E, s), where: • V is the set of the taxonomy's vertices, i.e., the set of terms, including the root term.
• E is the set of the taxonomy's edges, i.e., the set of taxonomic relations.
• s is the creation time of the taxonomy. It can be the current date or any specified time.
Our task is formally defined as follows: Given a root term R, a corpus C and an optional existing taxonomy H 1 = (V 1 , E 1 , s 1 ) constructed at time s 1 , we aim to build a new taxonomy H 2 = (V 2 , E 2 , s 2 ) at time s 2 , where s 2 > s 1 , so that we can process the document set in C up to time s 2 into the relevant terms in the taxonomy.
If H 1 does not exist, the problem becomes creating a taxonomy H 2 for corpus C. Otherwise, the problem is to update the existing taxonomy with the newly obtained data for corpus C. Note that while the taxonomy construction method is not a totally unsupervised method as it does require as input a corpus (C) pre-clustered by a domain of interest (i.e., R), the subsequent steps for constructing the taxonomy given the text corpus are unsupervised. Figure 1 shows the workflow of the proposed timeaware taxonomy construction method. There are two key processes in the proposed method: temporal information processing and taxonomy construction.

Temporal information processing
The aim of the temporal information processing process is to generate temporal information (or timestamp for short) for each sentence in the input document or Web data.
Previous taxonomy construction methods (Zhu et al., 2013) only extract temporal information at the document level, i.e., all information in the document has the same timestamp as the document creation date. This assumption, however, is not always correct. Figure 2 shows a sample report about the flight MH370 created on 30 July 2015. In this report, the timestamp of each sentence is very different from the document creation date. If we were to simply use the temporal information at the document level, the timestamps for the search areas of MH370 at different periods will be incorrect. Thus, we propose a 553 Figure 2: A sample report about the flight MH370 created on 30 July 2015. method to extract timestamps (i.e., temporal expressions) at the sentence level. The method comprises the following three steps: Document creation date extraction: First, we extract the timestamp at the document level. The text corpus that we are using for this study consists of a collection of reports, scientific publications and Web search results. For the first two types of documents, the timestamp is the document creation date that can be extracted directly from the data source, i.e., the date of the report, or the date of the publication. For Web search results, we use Google advanced search with customized time range which returns the search results together with their creation dates at the beginning of search snippets.
Temporal expression extraction: Next, the second step proceeds to extract all temporal expressions (e.g. "2015 December 31") in the document. Here, we use SUTime (Chang and Manning, 2012), a library for recognizing and normalizing time expression using a deterministic rule-based method. The output of this step is a list of time expressions, together with their positions in the document.
Sentence timestamp extraction and normalization: Finally, in the third step, we assign each sentence in the document a time expression as follows: • First, we assign a temporal value τ as the document creation date.
• For each sentence s in the document: -If s contains a temporal expression τ 1 , assign τ 1 as the timestamp of s and update τ = τ 1 . -Otherwise, assign τ as the timestamp of s.
Note that we use the format 'YYYY-MM-DD' for the temporal expression. If the information of DD or MM is missing, it is replaced with the first day or first month respectively. For example, 'December 2015' will be normalized as '2015-12-01'. Using the proposed method, sentence (1) and sentence (2) in the example of Figure 2 will have the same timestamp of '2014-03-08', while sentence (3) will have the timestamp of '2014-03-17'.
In Section 5.2, we will show that the extraction of timestamps at the sentence level will improve the performance of the proposed taxonomy construction method as compared to the extraction of timestamps at the document level.

Taxonomy construction
There are three general steps to constructing a taxonomy: domain term extraction, taxonomic relation identification, and taxonomy induction. We make use of the taxonomy construction method of Tuan et al. (2014) for the first step, incorporate timestamps into the second step of identifying taxonomic relations (Section 4.2.2), and propose an incremental taxonomy induction algorithm for the third step (Section 4.2.4). As extraction of domain term extraction does not affect the temporal aspects of taxonomy construction, the first step of domain term extraction is not within the scope of this study. The reader can refer to Tuan et al. (2014) or Zhu et al. (2013) in which linguistic approaches to extract domain terms are discussed. In this paper, we assume that the list of domain terms is available and we will focus only on discussing the second and third steps for taxonomy construction.

Taxonomic relation identification
In this section, we give an overview of the method to identify taxonomic relations proposed in Tuan et al. (2014). Given an ordered pair of two terms t 1 and t 2 , Tuan et al. (2014) calculates the evidence score that t 1 → t 2 based on the following three methods: Syntactic contextual subsumption (SCS): This method derives evidence for t 1 → t 2 from their syntactic contexts, particularly from triples of the form (subject, verb, object). It is observed that if the context set of t 1 mostly contains that of t 2 but not vice versa, then t 1 is likely to be a hypernym of t 2 . To implement this idea, the method finds the most common relation (or verb) r of t 1 and t 2 , submits the queries "t 1 r" and "t 2 r" to Web search engine and collects all search results to construct two corpora 554 Corpus Γ t 1 and Corpus Γ t 2 for t 1 and t 2 . The syntactic context sets are then created from these contextual corpora using a non-taxonomic relation identification method. The details of Score SCS (t 1 , t 2 ) can be found in Tuan et al. (2014).

Lexical-syntactic pattern (LSP):
This method is to find how much more evidence for t 1 → t 2 is found on the Web than for t 2 → t 1 . Specifically, a list of manually constructed taxonomic patterns (e.g., "t 2 is a t 1 ") is queried with a Web search engine to estimate the amount of evidence for t 1 → t 2 from the Web. The LSP measure is calculated as follows: where C W eb (t 1 , t 2 ) denotes the set of search results.

String inclusion with WordNet (SIWN):
This method is to check the evidence for t 1 → t 2 by using the combination of string inclusion and references in WordNet synsets. Score SIW N (t 1 , t 2 ) is set to 1 if there is such evidence; otherwise, it is set to 0.
Combined evidence: The three scores are then combined linearly as follows: If Score(t 1 , t 2 ) is greater than a threshold value, then t 1 is regarded as a hypernym of t 2 .

Incorporating temporal information into taxonomic relation identification
Previous studies of taxonomic relation identification treated all evidence equally, i.e., evidence from 1950 is treated equally with evidence from 2014. This assumption is not always appropriate, as discussed in Section 1. We propose a time-aware method to identify taxonomic relations by incorporating timestamps into the process of finding evidence, using the following timestamp contribution function: Definition 1 (Timestamp contribution function). Given a text sentence d with timestamp s d , the timestamp contribution of d at time s 0 is defined as: where ξ is a control rate, s 0 > s d and (s 0 − s d ) is the time lapse between s d and s 0 .
Equation (1) describes the timestamp contribution of a sentence at a specific time by using an exponential distribution function T d . The intuition behind this function is that the evidence of taxonomic relations found in more recent sentences will be of higher relevance than that found in older sentences. This function is inspired by the work of Efron and Golovchinsky (2011), and Li and Croft (2003), in which it was used to effectively rank documents over time intervals.
Using the timestamp contribution function, we incorporate temporal information into the three taxonomic relation identification methods described in Section 4.2.1, as follows: LSP method: For each search result snippet d in C W eb (t 1 , t 2 ) collected from the Web search engine, we calculate the timestamp contribution score of d by using T d : where s 0 is a chosen specific time (i.e., the time of taxonomy construction) and s d is the timestamp of d. Note that s d has to be earlier than s 0 . The unit of time lapse (s 0 −s d ) depends on the nature of corpus and can be, for instance, a day, a month or even a year. For example, if the corpus is from a fast-changing source such as social media, we can set the unit as day to keep up with the change of data on a daily basis. In contrast, for a corpus from slower changing domains such as scientific disciplines, the unit can be a year. The time-aware score for the LSP method is calculated as follows: In Equation (2), the original LSP evidence score is multiplied by the average timestamp contribution score of all evidence sentences for the taxonomic relation from the Web. If the number of the returned search results is too large, we will use only the first 1,000 results to estimate the average timestamp contribution of evidence.
Note that the total timestamp contribution score of all evidence sentences d∈C W eb (t 1 ,t 2 ) T d (s 0 ) can be considered as the "weighted size" of C W eb (t 1 , t 2 ), i.e., we weigh each evidence sentence using Equation (1) and sum all these weights. However, if we use only the "weighted size" of C W eb (t 1 , t 2 ) for the time-aware score Score T ime LSP (t 1 , t 2 ), there will be some issues. Firstly, the score Score T ime LSP will not 555 be normalized with respect to the number of evidence sentences. This may lead to potential bias due to large amounts of past evidence-if there were an obsolete or incorrect taxonomic relation with many evidence sentences in the past, it may overwhelm the new taxonomic relations which may only have a small number of recent evidence sentences. Secondly, if we normalize the score, the information on the number of evidence sentences, which is important for the LSP method to recognize true taxonomic relationships, will be lost. Therefore, we propose to use Equation (2), which combines both information on the number of evidence sentences (embedded inside the original Score LSP score) and the normalized "weighted size" of C W eb (t 1 , t 2 ).

SCS method:
Similarly, for each search result snippet d in Corpus Γ t 1 and Corpus Γ t 2 , we calculate the timestamp contribution score of d using the function where s 0 is a specific time and s d is the timestamp of d. The time-aware score for SCS method is calculated as follows: In Equation (3), the original evidence score of t 1 → t 2 is multiplied by the average timestamp contribution scores of the returned search snippets. Similar to Equation (2), Equation (3) combines both information on the number of evidence sentences (embedded inside the original score Score SCS ) and the normalized "weighted size" of them.
SIWN method: Because WordNet does not contain information about timestamps, we set: Combined evidence: The final combined evidence score for the time-aware method is calculated as: If the value Score T ime (t 1 , t 2 ) is greater than a threshold value, we extract the relation t 1 → t 2 .

Parameter learning
We need to estimate the optimal values for the parameters α, β and γ which are used in Equation (5). For this purpose, we apply ridge regression (Hastie et al., 2009). First, we use the time-aware method to create taxonomies for the 'Animal', 'Plant' and 'Vehicle' domains using corpora constructed by a bootstrapping method (Kozareva et al., 2008). Then, we ask two annotators to construct gold standard taxonomies of the three domains (see Section 5.2 for more details) and use them to build the training sets. For each pair of terms (t 1 , t 2 ) found in the gold standard taxonomies, its evidence score is estimated as (τ +1), where τ is the threshold value for Score T ime . Finally, we use Equation (5) to learn the best combination of α, β and γ using the ridge regression algorithm. Note that we learn the parameters only once and use them subsequently for the other domains.

Incremental taxonomy induction
To avoid reconstructing a taxonomy whenever there is new incoming data, we propose a novel incremental graph-based algorithm to update an existing taxonomy with a given set of taxonomic relations. The proposed algorithm updates a taxonomy automatically over time based on the information decay function defined below.
Definition 2 (Information decay function). Given a taxonomic relation r, the information decay of r over the period from time s 1 to time s 2 is computed by the information decay function: where λ is a decay rate and s 2 > s 1 .
The intuition behind the information decay function is that the evidential value of a relation will decrease over time at an exponential rate.
Given a root node R, a set of taxonomic relations T and, optionally, an existing taxonomy H 1 = (V 1 , E 1 , s 1 ) created at time s 1 with vertex set V 1 and edge set E 1 , the proposed graph-based algorithm constructs a new taxonomy H 2 = (V 2 , E 2 , s 2 ) created at time s 2 with vertex set V 2 and edge set E 2 . t 1 → t 2 denotes the edge from t 1 to t 2 in a taxonomy, and w(t 1 → t 2 ) as the weight of this edge (i.e., evidence score). Algorithm 1 consists of four steps: Step 1: Update existing taxonomy (lines 2 -4) This step aims to update the existing taxonomy from Algorithm 1 Taxonomy induction algorithm Input: R: root node of taxonomy; T : new taxonomic relation set; H 1 = (V 1 , E 1 , s 1 ): existing taxonomy created at time s 1 with vertex set V 1 and edge set E 1 ; Output: H 2 = (V 2 , E 2 , s 2 ): new taxonomy created at time s 2 with vertex set V 2 and edge set E 2 ; 1: Set V 2 = V 1 and E 2 = E 1 2: for each edge (t 1 → t 2 ) ∈ E 2 , t 1 = R and t 2 = R do 3: 10:

17:
if (t 3 → t 1 ) ∈ E 2 and t 3 = R then 18: 23: time s 1 to s 2 . In this step, the weight of each edge (t 1 → t 2 ) in E 1 (except the edges connected to root R) is reduced using the information decay function: Step 2: Add new relations to existing taxonomy (lines 5 -25) This step adds new taxonomic relations to the existing taxonomy and updates their weights. It adds each relation t 1 → t 2 as a directed edge from the parent node t 1 to child node t 2 if this edge does not exist in the existing taxonomy. Otherwise, we update the weight of this edge with a new evidence score. If t 1 does not have any parent node, t 1 will become a child node of root R. The edge's weight is updated as follows: The result of this step is a weighted connected graph containing all taxonomic relations with root R.
Step 3: Edge filtering (line 26) The graph generated in Step 2 contains some edges with low evidence scores. The reason is that some relations in the existing taxonomy can become outdated during the period from s 1 to s 2 (according to the information decay function), and they do not exist in the new relation set. In this step, each edge t 1 → t 2 in the graph is revisited, and if its weight is lower than the threshold value of Score T ime , it will be removed from the graph. In the case that t 2 does not have any other parent node except t 1 , t 2 will be deleted from the vertex set, and edges from t 1 to t 2 's children will be added to the edge set with weights that are equal to the weights of the edges from t 2 to t 2 's children. Then, all edges from t 2 to t 2 's children will be removed from the edge set.
Step 4: Graph pruning (line 27) The graph generated in Step 3 is not an optimal tree as it may contain redundant edges or incorrect edges-for example, those edges that form a loop in the graph. This step aims to produce an optimal tree of the taxonomy from the weighted graph in Step 3. For this purpose, we apply Edmonds' algorithm (Edmonds, 1967) for finding the optimal spanning arborescence for a weighted directed graph. Using this algorithm, we can find a subset of the current edge set that forms a taxonomy where every non-root node has in-degree 1 and the sum of the edge weights is maximized.

Performance Evaluation
We have conducted two experiments for performance evaluation. The first experiment evaluates the performance of our proposed time-aware method on constructing a taxonomy from a given list of terms without any prior knowledge (i.e., without any existing taxonomies). The second experiment evaluates the performance of our proposed method on taxonomy update.

Datasets
We evaluate our method for taxonomy construction based on the following four datasets of document collections obtained from different domains: • Artificial Intelligence (AI) domain (Navigli et al., 2011) (1) and decay rate λ in Equation (6) as 0.15. The setting of these parameters will be discussed in Section 5.4.

Experiment
In this experiment, we compare our time-aware taxonomy construction method with other state-ofthe-art methods in the task of constructing a new taxonomy from a given list of terms without any prior knowledge (i.e., without any existing taxonomy). Three state-of-the-art methods in the literature are selected for comparison: • Zhu's method (Zhu et al., 2013): It constructs the taxonomy using evidence from multiple sources such as WordNet, Wikipedia and Web search engines. In their method, both statistical and linguistic approaches are used to infer taxonomic relations.
• Kozareva's method (Kozareva and Hovy, 2010): It constructs the taxonomy using evidence from a Web search engine by matching the search results with a predefined set of syntactic patterns.
• Tuan's method (Tuan et al., 2014): It is the non time-aware method described in Section 4.2.1. This method ignores temporal information during taxonomy construction.
To evaluate the effectiveness of extracting timestamps at the sentence level (as described in Section 4.1), we also conduct an experiment on a setting that uses the timestamps out the document level (i.e., all evidence in the document will have the same timestamp information as the document creation date). We use the subscript docstamp to denote this setting.

Evaluation metric
In this experiment, we evaluate the constructed taxonomies against the manually created gold standard taxonomies. The gold standard taxonomies are created as follows. For each domain, two annotators are employed at the same time to create taxonomies independently using the list of terms obtained from the domain term extraction module, according to the following rules: • Rule 1 (Relevancy): Every term in the taxonomy should be related to the root term.
• Rule 2 (Appropriateness): Each edge between two terms should be established at the time the taxonomy is created, if their relation is correct and not obsolete. A relation is obsolete if it is invalid at the time of consideration.
• Rule 3 (Hierarchical structure): The gold standard taxonomy of each domain should form a tree, without redundant paths or cycles.
The annotators then compare their constructed taxonomies. A taxonomic relation t 1 → t 2 is counted as an agreement if and only if both annotators have t 1 and t 2 in their taxonomies, and there is a directed path from t 1 to t 2 . If an annotator has a taxonomic relation with one vertex not in the other annotator's taxonomy, it will be considered as a disagreement. After evaluation, the average inter-annotator agreement on edges of the constructed taxonomies between the two annotators is 87% using Cohen's kappa coefficient measurement. Finally, the two annotators discuss to come up with the gold standard taxonomies. As a result, the number of nodes and average depth of the taxonomies are summarized in Table 1. We use precision, recall and F-measure to measure the performance of taxonomy construction. Let R and R gold be the set of taxonomic relations of our constructed taxonomy and the gold standard taxonomy respectively; then the metrics are given as follows:

Experimental results
The experimental results are given in Table 2 which shows that our time-aware method achieves significantly better performance than Kozareva's method and Zhu's method in terms of F-measure (t-test, p-value<0.05). Our method shows slightly lower precision than that of Kozareva's method due to the SCS method, but much higher recall and Fmeasure than Kozareva's method. In contrast, our method shows slightly lower recall but much higher precision and F-measure than Zhu's method, which is based on statistical methods such as pointwise mutual information and cosine similarity. On average, our time-aware method improves the F-measure by 20% compared to Kozareva's method, and by 10% compared to Zhu's method.
Moreover, the incorporation of timestamps into the time-aware method also contributes to better performance as it helps identify new taxonomic relations effectively, while getting rid of obsolete and incorrect relations. As shown from the experimental results, the time-aware method shows significantly better performance than the non time-aware method (i.e. Tuan's method) in all four domains in terms of  F-measure (t-test, p-value<0.05). On average, our time-aware method improves the F-measure by 7% compared to Tuan's method. We further examine the taxonomic relations identified by the time-aware method but not by the nontime-aware method, and vice versa. We observed that around 91% of relations found by the timeaware method but not by the non-time-aware method are recent relations (i.e., relations found in recent documents), while around 86% of relations found by the non-time-aware method but not by the timeaware method are obsolete relations. The percentage of taxonomic relations that become obsolete in each of the datasets are summarized in Table 3.  For example, in the Terrorism domain, our method recognizes 'ISIS' as a hyponym of 'terrorist group', while the three state-of-the-art methods cannot recognize this. In addition, while the other three methods have extracted the outdated taxonomic relation between 'Palestine Liberation Organization' 559 and 'terrorist group', our method was able to ignore it. The reason is that the three state-of-theart methods inferred taxonomic relations using cooccurrence frequency, but 'ISIS' has only appeared in reports since 2014. The occurrence frequency of 'ISIS' was very low compared to 'Palestine Liberation Organization' which was mentioned over the past many years. In contrast, by using the timestamp contribution function to better profile the relevance of evidence over time, our method can recognize the recent relationship of 'terrorist group' with 'ISIS' while getting rid of the obsolete and incorrect relation with 'Palestine Liberation Organization'.
From the experimental results of the time-aware and time-aware docstamp methods, we also observe that the use of timestamps extracted at the sentence level is more effective than the use of timestamps at the document level. The timestamps extracted at the sentence level can capture more precisely the temporal information of the facts in fast-changing domains than those at the document level. The results showed that the use of sentence-level timestamps can improve the precision and recall of our taxonomy construction method, improving the F-measure by 4% on average, as compared to the use of timestamps at the document level.

Experiment
For fast-changing domains, taxonomies should be frequently and quickly updated. In this experiment, we examine how the proposed time-aware method can effectively update the constructed taxonomies over time to keep up with the latest information trends.
We use the case study of the 'MH370' domain for this experiment. During the search operation for the missing flight MH370, there were several turning points which can be captured by the following phases (according to well-known news agencies such as CNN, BBC and the New York Times): • Phase 1 (from March 08, 2014): The flight lost contact with the airport. The search started from the South China Sea and Gulf of Thailand, and was extended to the Strait of Malacca.
• Phase 2 (from March 13, 2014): Images from satellites indicated the plane might have fallen into the Indian Ocean. The search focus was moved from the South of Sumatra to the South-West of Perth in the Southern Indian Ocean.
• Phase 3 (from March 28, 2014): Estimation of the aircraft's remaining fuel and the radar track led the search to shift to a new area, the North-West of Perth in the Southern Indian Ocean.
We apply the proposed time-aware method to construct and update the taxonomy for 'MH370' incrementally every two days. We compare our timeaware method with the following three methods: • Zhu's method (Zhu et al., 2013): It applies a graph-based algorithm to update taxonomies incrementally with timestamp information.
• Baseline 1: The taxonomy is updated with the newly obtained data every two days, but does not use any temporal information in either taxonomic relation identification (Section 4.2.2) or taxonomy induction (Section 4.2.4). Specifically, Step 1 (update existing taxonomy) in Section 4.2.4 is excluded since we are not using any temporal information, so there is no updating of the weights of the existing taxonomy using the decay function.
• Baseline 2: We construct the taxonomies using temporal information every two days, but only with the new documents from these two days. This allows us to evaluate the effect of retiring all the taxonomic relations built from the previous documents instead of the gradual decay approach in our proposed method.
Here we have chosen the time period of two days because the 'MH370' domain was a truly fastchanging domain. As we shall see shortly, even using only the new documents within 2 days to build the taxonomy in our baseline method 2, there were new taxonomic relations updated from the latest information (as shown in the example in Figure 4).

Evaluation metric
When constructing the gold standard taxonomies using the same rules described in Section 5.2, we asked the annotators to select for each parent term at most three sub-terms that are most related to it at the time of taxonomy construction. We denote the set of gold-standard taxonomic relations as S gold . In the same way, when applying the methods of taxonomy construction, we select for each parent term 560 at most three sub-terms with the highest evidence scores. We denote the set of those automatically extracted taxonomic relations as S. We use the following metrics to evaluate the update of taxonomy: The intuition for limiting the sub-term number to three for the evaluation is that if a taxonomy can keep up with the newly updated data, it should be able to detect the emerging terms and relations and add them to the taxonomy with high evidence scores so that the user can easily observe an emerging trend of information in the domain as it occurs. In addition, the method should also have the capability to remove any obsolete relations in the taxonomy when they are no longer valid.

Experimental results
From the results shown in Figure 3, we can see that our time-aware method achieves the best performance and significantly outperforms the two baseline methods and Zhu's method in terms of F-measure (t-test, p-value<0.05). One interesting point to observe is that there are two periods when the time-aware method shows much higher F-measure than the baseline methods and Zhu's method: from March 12 to March 14, and from March 28 to March 30. During these periods, the performance of baseline method 1 (which does not use any timestamp information) and Zhu's method drops significantly, while our time-aware method's performance increases slightly.
One plausible explanation is that there are some turning points on March 13 and March 28, which fall within these periods as described above. During these periods, many new terms/relations such as search area, search focus and search device are added to the corpus. Our time-aware method was able to assign higher weights to the new taxonomic relations than the older relations due to their recent timestamps, even though the frequencies of these new relations are fewer than that of the older relations. In contrast, Zhu's method and baseline method 1 were unable to recognize these new relations due to their relatively low frequencies in the corpus. In addition, incorrect relations in the existing taxonomy were also removed from the new tax-  onomy using the information decay function by our time-aware method, whereas the other two methods still kept them in the taxonomy. In short, our timeaware method can update the taxonomy faster with the latest information trends, as well as remove incorrect relations effectively, as compared to the other methods. Also, from the experimental results of our timeaware method and the baseline method 2, we can observe that updating the existing taxonomy with new taxonomic relations is more effective than rebuilding a new taxonomy using only the new data. The reason is that although some older taxonomic relations are mentioned occasionally in the new data, they are still valid. Therefore, if we ignore the older data, their taxonomic relations will be lost in the new taxonomy when it is constructed with only the new data. In addition, there are also many taxonomic relations that needed a longer time period to become established. Figure 4 shows an example of the changes of the hyponym list for the term 'search area' over time using different methods. We observe that both the time-aware method and baseline method 2, which utilized the temporal information, can quickly update the relations with the latest information as compared to Zhu's method and baseline method 1, which ignore temporal information for taxonomy construction. For example, in the taxonomy constructed on March 14, the time-aware method and baseline method can quickly recognize the change of the search area to 'Southern Indian Ocean' and 'Sumatra', thereby ranking them at the top of the hyponym list of 'search area', whereas Zhu's method and baseline method 1 both missed this update until March 26. Another interesting point is that due to the lack of temporal information, both Zhu's method and baseline method 1 still ranked 'South China Sea' at the top of the taxonomies constructed on April 30, while this term was removed earlier from the hyponym list of 'search area' by our time-aware method using temporal information.

Parameter tuning
In our method for taxonomy construction, some parameters are tuned to optimize performance.
The threshold value for Score T ime in Equation (5) controls the number of extracted taxonomic relations. In general, the larger this threshold value is, the higher number of true taxonomic relations we can get. However, a higher number of incorrect relations may also occur. From our experiments, we found that the threshold value for Score T ime can be set between 2.1 to 2.3 for the time-aware method to achieve the best performance.
The control rate ξ in Equation (1) and decay rate λ in Equation (6) affect the contribution of old and new data. Specifically, smaller values for the control rate and decay rate will allow newer data to contribute more evidence of taxonomic relations than older data, whereas larger values will cause the old and new data to have similar evidence contribu-tions. According to our experiments, the time-aware method shows the best performance when the values of these rates are set between 0.15 to 0.20.

Conclusion
In this paper, we have proposed a novel time-aware method for taxonomy construction given a time series of text documents from a domain that could be fast-changing with emerging concepts or events. By using timestamp contribution and information decay functions, our method can effectively utilize temporal information for both taxonomic relation identification and taxonomy update. The experimental results show that our method achieves better performance than the state-of-the-art methods. In addition, the proposed method can be used to update the taxonomy incrementally over time and keep the taxonomy up-to-date with the latest information trends for the domain. All the datasets, including the gold standards of the four domains and the outputs of our method, are publicly available at https://sites. google.com/site/tuanluu219/research/tacl1.