Joint Modeling of Topics, Citations, and Topical Authority in Academic Corpora

Much of scientific progress stems from previously published findings, but searching through the vast sea of scientific publications is difficult. We often rely on metrics of scholarly authority to find the prominent authors but these authority indices do not differentiate authority based on research topics. We present Latent Topical-Authority Indexing (LTAI) for jointly modeling the topics, citations, and topical authority in a corpus of academic papers. Compared to previous models, LTAI differs in two main aspects. First, it explicitly models the generative process of the citations, rather than treating the citations as given. Second, it models each author’s influence on citations of a paper based on the topics of the cited papers, as well as the citing papers. We fit LTAI into four academic corpora: CORA, Arxiv Physics, PNAS, and Citeseer. We compare the performance of LTAI against various baselines, starting with the latent Dirichlet allocation, to the more advanced models including author-link topic model and dynamic author citation topic model. The results show that LTAI achieves improved accuracy over other similar models when predicting words, citations and authors of publications.


Introduction
With a corpus of scientific literature, we can observe the complex and intricate process of scientific progress. We can learn the major topics in journal articles and conference proceedings, follow authors who are prolific and influential, and find papers that are highly cited. The huge number of publications  and authors, however, makes it practically impossible to attain any deep or detailed understanding beyond the very broad trends. For example, if we want to identify authors who are particularly influential in a specific research field, it is difficult to do so without the aid of automatic analysis.
Online publication archives, such as Google Scholar, provide near real-time metrics of schol-arly impact, such as the h-index (Hirsch, 2005), the journal impact factor (Garfield, 2006), and citation count. Those indices, however, are still at a coarse level of granularity. For example, both Michael Jordan and Richard Sutton are researchers with very high citation count and h-index, but they are authoritative in different topics, Jordan in the more general machine learning topic of statistical learning, and Sutton in the topic of reinforcement learning. It would be much more helpful to know that via topical authority scores, as shown in Figure 1.
Fortunately, various academic publication archives contain the full contents, references, and meta-data including titles, venues, and authors. With such data, we can build and fit a model to partition researchers' scholarly domain into topics at a much finer-grain and discover their academic authority within each topic. To do that, we propose a model named Latent Topical-Authority Indexing (LTAI), based on the latent Dirichlet allocation, to jointly model the topics, authors' topical authority, and citations among the publications.
We illustrate the modeling power of the LTAI with four corpora encompassing a diverse set of academic fields: CORA, Arxiv Physics, PNAS, and Citeseer. To show the improvements over other related models, we carry out prediction tasks on word, citation and authorship using the LTAI and compare the results with those of latent Dirichlet allocation (Blei et al., 2003), relational topic model (Chang and Blei, 2010), author-link topic model, and dynamic authorcite topic model (Kataria et al., 2011), as well as simple baselines of topical h-index. The results show that the LTAI outperforms these other models for all prediction tasks.
The rest of this paper is organized as follows. In section 2, we describe related work, including models that are most similar to the LTAI, and describe how the LTAI fits in and contributes to the field. In section 3, we describe the LTAI model in detail and present the generative process. In section 4, we explain the algorithm for approximate inference, and in section 5, we present a faster algorithm for scalability. In section 6, we describe the experimental setup and in section 7, we present the results to show that the LTAI performs better than other related models for word, citation and authorship prediction.

Related Work
In this section, we review related papers, first in the field of NLP and ML-based analysis of scientific corpora, then the approaches based on the Bayesian topic models for academic corpora, and lastly joint models of topics, authors, and citations. In analyzing scientific corpora, previous research presents classifying scientific publications (Caragea et al., 2015), recommending yet unlinked citations (Huang et al., 2015;Neiswanger et al., 2014;Jiang, 2015), summarizing and extracting key phrases (Cohan and Goharian, 2015;Caragea et al., 2014), triggering better model fit (He et al., 2015), incorporating authorship information to increase the content and link predictability (Sim et al., 2015), estimating a paper's potential influence on academic community (Dong et al., 2015), and finding and classifying different functionalities of citation practices (Moravcsik and Murugesan, 1975;Teufel et al., 2006;Valenzuela et al., 2015).
Several variants of topic modeling consider the relationship between topics and citations in academic corpora. Topic models that use text and citation network are divided into two types: (a) models that generate text given citation network (Dietz et al., 2007;Foulds and Smyth, 2013) and (b) models that generate citation network given text (Nallapati et al., 2008;Liu et al., 2009;Chang and Blei, 2010). While our model falls into the latter category, we also take into account the influence of the authors on the citation structure.
Most closely related to the LTAI are the citation author topic model (Tu et al., 2010), the authorlink topic model, and the dynamic author-cite topic model (Kataria et al., 2011). Similar to the LTAI, they are designed to capture the influence of the authors. However, these models infer authority by referencing only the citing papers' text, while our authority is based on the predictive modeling of comparing both the citing and the cited papers. Furthermore, the LTAI defines a generative model of citations and publications by introducing a latent authority index, whereas the previous models assume the citation structure is given. the LTAI thus explicitly gives a topical authority index, which directly answers the question of which author increases the probability of a paper being cited.

Latent Topical-Authority Indexing
The LTAI models the complex relationship among the topics of publications, the topical authority of the authors, and the citations among these publications. The generative process of the LTAI can be divided into two parts: content generation and citation network generation. We make several assumptions in the LTAI to model citation structure of academic corpora. First, we assume a citation is more likely to occur between two papers that are similar in their topic proportions. Second, we assume that an author differs in their authority (i.e., potential to induce citation) for each topic, and an author's topical authority positively correlates with the probability of citation among publications. Also, in the LTAI, when there are multiple authors in a single cited publication, their contribution of forming citations with respect to different citing papers varies according to their topical authority. Lastly, we assign different concentration parameters for a pair of papers with and without citation. In this paper, we use terms positive and negative links to denote pairs of papers with and without citations respectively. Figure 2 illustrates the graphical model of the LTAI, and we summarize the generative process of the LTAI, where the variables of the model are explained in the remainder of this section, as follows: 1. For each topic k, draw topic β k ∼ Dir(α β ).
4. For each document pair from i to j: (a) Draw influence proportion parameter π i←j ∼ Dir(π i ). The LTAI jointly models content-related variables θ, z, w, β, and author and citation related variables η and π.

Content Generation
To model the content of publications, we follow a standard document generative process of latent Dirichlet allocation (LDA) (Blei et al., 2003). Also, we inherit notations for variables from LDA; θ is the per-document topic distribution, β is the per-topic word distribution, z is the topic for each word in a document where w is the corresponding word, and α θ , α β are the Dirichlet parameters of θ and β respectively.

Citation Generation
Let x i←j be a binary valued variable which indicates that publication j cites publication i. We formulate a continuous variable r i←j which is a linear combination of the authority variable and the topic proportion variable to approximate x i←j by minimizing the sum of squared errors between the two variables. There is a body of research on using continuous user and item-related variables to approximate binary variables in the field of recommender systems (Rennie and Srebro, 2005;Koren et al., 2009). Approximating binary variables using linear combination of continuous variables can be probabilistically generalized (Salakhutdinov and Mnih, 2007). Using probabilistic matrix factorization, we approximate probability mass function p(x i←j ) using probability density function N (x i←j |r i←j , c −1 i←j ), where the precision parameter c i←j can be set differently for each pair of papers as will be discussed below.

Content Similarity Between Publications:
In the LTAI, we model relationship between a random pair of documents i and j. The probability of publication j citing publication i is proportional to the similarity of topic proportions of two publications, i.e., r i←j positively correlates to k θ ik θ jk . Following relational topic model's approach (Chang and Blei, 2010), we usez i = 1 N i n z i,n ≈ θ i instead of topic proportion parameter θ i .
Topical Authority of Cited Paper: We introduce a K-dimensional vector η a for representing the topical authority index of author a. η ak is a real number drawn from the zero-mean normal distribution with variance α −1 η . Given the authority indices η a i for author a of cited publication i, the probability of citation is further modeled as r i←j = k η a i kzikzjk , where the authority indices can promote or demote the probability of citation.
Different Degree of Contribution among Multiple Authors: Academic publications are often written by more than one author. Thus, we need to distinguish the influence of each author on a citation between two publications. Let A i be a set of authors of publication i. To measure the influence proportion of author a ∈ A i on the citation from i to j, we introduce additional parameter π ij which is a one-hot vector drawn from a Dirichlet distribution with |A i |-dimensional parameter π i . π i←ja ∈ {0, 1} is an element of π i←j which measures the influence of author a on the citation from j to i and sums up to one ( a∈A i π i←ja = 1) over all authors of publication i. We approximate the probability of citation x i←j from publication j to publication i by p(x i←j |z, π ij , a i←j , η a ) ≈ which is a mixture of normal distributions with precision parameter c i←j . Therefore, if topic distributions of paper i and j are similar and if η values of the cited paper's authors are high, the citation formation probability increases; on the other hand, dissimilar or topically irrelevant pair of papers with less authoritative authors on the cited paper will be assigned with low probability of citation formation.
Different Treatment between Positive and Negative links: Citation is a binary problem where x i←j is either one or zero. When x i←j is zero, this can be interpreted in two ways: 1) the authors of citing publication j are unaware of the publication i, or 2) the publication j is not relevant to publication i. Identifying which case is true is impossible unless we are the authors of the publication. Therefore the model embraces this uncertainty in the absence of a link between publications. We control the ambiguity by the Gaussian distribution with precision parameter c ij as follows: where c + > c − to ensure that we have more confidence on the observed citations. This is an implicit feedback approach that permits using negative examples (x i←j = 0) of sparse observations by mitigating their importance (Hu et al., 2008;Wang and Blei, 2011;Purushotham et al., 2012). Setting different values to the precision parameter c i←j according to x i←j induces cyclic dependencies between the two variables, and due to this cycle, the model no longer becomes a Bayesian network, or a directed acyclic graph. However, we note that this setting does lead to better experimental results, and we show the pragmatic benefit of the setting in the Evaluation section.

Joint Modeling of the LTAI
In the LTAI, the topics and the link structures are simultaneously learned, and thus the content-related variables and the citation-related variables mutually reshape one another during the posterior inference.
On the other hand, if content and citation data are modeled separately, the topics would not reflect any information about the document citation structure. Thus, in the LTAI, documents with shared links are more likely to have similar topic distributions which leads to better model fit. We develop and explain this joint inference in section 4. In section 7, we illustrate the differences in word-level predictive powers of the LTAI and LDA.

Posterior Inference
We develop a hybrid inference algorithm in which the posterior of content-related parameters θ, z, and β are approximated by variational inference, and author-related parameters π and η are approximated by EM. In algorithm 1, we summarize the full inference procedure of the LTAI.

Content Parameters: Variational Update
Since computing the posterior distribution of the LTAI is intractable, we use variational inference to optimize variational parameters each of which correspond to original content-related variables. Following the standard mean-field variational approach, we define fully factorized variational distributions over the topic-related latent variables q(θ, β, z) = where for each factorized variational distribution, we place the same family of distributions as the original distribution. Using the variational distributions, we bound the log-likelihood of the model as follows: where H[q] is the negative entropy of q.
Taking the derivatives of this lower bound with respect to each variational parameter, we can obtain the coordinate ascent updates. The update for the variational Dirichlet parameters γ i and the λ k is the same as the standard variational update for LDA (Blei et al., 2003). The update for the variational multinomial φ in is: where the gradient of expected log probabilities of both incoming link x i←j and outgoing link x j←i contribute to the variational parameter. The first expectation can be rewritten as Algorithm 1 Posterior inference algorithm for the LTAI Initialize γ, λ, π, and η randomly Set learning-rate parameter ρ t that satisfies Robbins-Monro condition Set subsample sizes S V , S E , S S and S A repeat where A i is the set of authors of i. We take the lower bound of the expectation using Jensen's inequality. The last term is approximated by the first order where diag is a diagonalization operator andφ i is We can compute the gradient with respect to the outgoing directions in the same way.

Author Parameters: EM Step
We use the EM algorithm to update author-related parameters π, and η based on the lower bound computed by variational inference. In the E step, we compute the probability of author contribution to the link between document i and j.
In the M step, we optimize the authority parameter η for each author. Given the other estimated parameters, taking the gradient of L with respect to η a and setting it to zero leads to the following update equation: Let D a be the set of documents written by author a and D a (i) be the ith document written by a. Then Ψ a is a vertical stack of |D a | matrices Ψ Da(i) , whose jth row isφ Da(i) •φ j , the Hadamard product betweenφ Da(i) andφ j . Similarly, C a is a vertical stack of |D a | matrices C Da(i) whose j th diagonal element is c Da(i)←j , and X a is a vertical stack of |D a | vectors X Da(i) whose j th element is π Da(i)←ja × x Da(i)←j . Finally, we update π Da(i)a = j π Da(i)←ja /D.

Faster Inference Using Stochastic Optimization
To model topical authority, the LTAI considers the linkage information. If two papers are linked by citation, the topical authority of the cited paper's authors will increase while the negative link buffers the potential noise of irrelevant topics. This algorithmic design of the LTAI results in high model complexity.
To remedy this issue, we adopt the noisy gradient method from the stochastic approximation algorithm (Robbins and Monro, 1951) to subsample negative links for updating per-document topic variational parameter φ and authority parameter η. The prior work of using subsampled negative links to reduce computational complexity is introduced in (Raftery et al., 2012). Also, we elucidate how stochastic variational inference (Hoffman et al., 2013) is applied in our model to update global per-topic-word variational parameter λ.

Updating φ and η
Updatingφ i for document i in variational update requires iterating over every other document and computing the gradient of link probability. This leads to the time complexity O(DK) for everyφ i . To apply the noisy gradient method, we divide the gradient of the expected log probability of link into two parts: where the first and the second term of RHS is the gradient sum of positive links (x ij = 1) and negative links (x ij = 0), respectively. Compared to positive links, the order of negative links is close to the total number of documents, and thus computing the second term results in computational inefficiency. However, in our model, we reduced the importance of the negative links by assigning a larger variance c −1 ij compared to the positive links, and the empirical mean ofφ j for negative links follows the Dirichlet expectation due to the large number of negative links. Therefore, we approximate the expectation of the gradient for the negative links using the noisy gradient as follows: where D − i is the number of negative links (i.e. x i←j = 0) of document i, and S V is the size of subsamples S V for the variational update. We randomly sample S V documents, compute gradients on the sampled documents, and then scale the average gradient to the size of the negative link D − i . This noisy gradient method reduces the updating time complexity from O(DK) to O(S V K). Now, we discuss how to approximate author's topical authority based on Equation 7. When K D × D a , the computational bottleneck is Ψ a C a Ψ a which has time complexity O(DD a K 2 ). To alleviate this complexity, we once again approximate the large number of negative links using smaller number of subsamples. Specifically, while keeping the positive link rows Ψ a + intact, we approximate negative link rows in Ψ a using smaller matrix Ψ a − that  has S E rows, or the size of subsamples for the EM step. Using this approximation, we can represent Ψ a C a Ψ a as with the time complexity of O(S E K 2 ), where D − a is the number of rows with negative links in Ψ a . Also, although we do not incorporate rigorous analysis on the performance of our model given the size of the subsamples, we confirm that the negative link size greater than 100 does not degrade the model performance in any of our experiment.

Updating λ
In traditional coordinate ascent based variational inference, the global variational parameter λ is updated infrequently because all the other local parameters φ need to be updated beforehand. This problem is more noticeable in the LTAI since updating φ using equation 3 is slower than updating φ in vanilla LDA; moreover, per-author topical authority variable η is another local variable that algorithm needs to update a priori. However, using the stochastic variational inference, the global parameters are updated after a small portion of local parameters are updated (Hoffman et al., 2013). Applying stochastic variational inference for the LTAI is straightforward after we calculate the intermediate topic-word variational parameterλ by α β + D S S S S d=1 N d n=1 φ k dn w dn from the noisy estimate of the natural gradient with respect to subsampled local parameters where N d is the number of words for document d, and S S is the subsample size for the minibatch stochastic variational inference. The final global parameter for the t th iteration λ (t) is updated by (1 − ρ t )λ (t−1) + ρ tλ where ρ t is the learning-rate. Posterior inference is guaranteed to converge at local optimum when the learning rate satisfies the condition ∞ t=1 ρ t = ∞, ∞ t=1 ρ 2 t < ∞ (Robbins and Monro, 1951). In Figure 3, we confirm that stochastic variational inference is applicable for the LTAI and reduces the training time compared to using the batch counterpart, while maintaining similar performance.

Experimental Settings
In this section, we introduce the four academic corpora used to fit the LTAI, describe comparison models, and provide information about the evaluation metric and parameter settings for the LTAI 1 .

Datasets
We experiment with four academic corpora: CORA (McCallum et al., 2000), Arxiv-Physics (Gehrke et al., 2003), the Proceedings of the National Academy of Sciences (PNAS), and Citeseer (Lu and Getoor, 2003). CORA, Arxiv-Physics, and PNAS datasets contain abstracts only, and the locations of the citations within each paper are not preserved, whereas the Citeseer dataset contains the citation locations. For CORA, Arxiv-Physics, and PNAS, we lemmatize words, remove stop words, and discard words that occur fewer than four times in the corpus. Table 1 describes the datasets in detail. Note that we obtain citation data from the entire document, not only from the abstract. Also, we consider withincorpus citation only, which leads to less than 13 average citation counts per document for all corpora.  Figure 4: Word-level prediction result. We measured per-word log predictive probability on four datasets. As shown in graphs, our model performs better than LDA.

Comparison Models
We compare predictive performance of the LTAI with five other models. Different comparison models have different degrees of expressive powers; each model conducts a certain type of prediction task; while RTM, ALTM, and DACTM predicts citation structures, the topical h-index predicts authorship information. Also, the baseline topic models are implemented based on the inference methods suggested in the corresponding papers; LDA, RTM and the LTAI variants use variational inference, while ALTM and DACTM use collapsed Gibbs sampling. Finally, all the conditions for implementation such as the choice of programming language and modules, except for parts that convey each model's unique assumption, are identically set; thus, the performance differences between models are due to their model assumption and different degrees of data usage, rather than the implementation technicalities.
Latent Dirichlet Allocation: LDA (Blei et al., 2003) discovers topics and represents each publication by mixture of the topics. Compared to other models, LDA only uses the content information.
LTAI-n%: In LTAI-n%, we remove n% of actual citations and displace them with arbitrarily selected false connections. Note that the link structures are displaced rather than removed; if the citation links are just removed, the LTAI and LTAI-n% cannot be fairly compared as the density of the citation structures will be affected and each model needs different concentration values. Performance difference between the LTAI and this indicates that under identical conditions, using the correct linkage information is indeed beneficial for prediction.
LTAI-C: In LTAI-C the precision parameter c ij has constant value, rather than assigning different values according to x ij as discussed in section 3.
LTAI-SEP: LTAI-SEP has an identical structure as the LTAI, but the topic and the authority variables are separately learned. Once the topic variables are learned using the vanilla LDA, authority and citation variables are then inferred consecutively. Thus, the performance edge of the LTAI over LTAI-SEP highlights the necessity of the LTAI's joint modeling in which both topic and authority related variables reshape one another in an iterative fashion.
Relational Topic Model: RTM (Chang and Blei, 2010) jointly models content and citation, and thus, topic proportions of a pair of publications become similar if the pair is connected by citations. Compared to the LTAI, the author information is not considered, the link structure does not have directionality and the model does not consider negative links.
Author-Link Topic Model: ALTM (Kataria et al., 2011) is a variation of author topic model (ATM) (Rosen-Zvi et al., 2004) that models both topical interests and influence of authors in scientific corpora. The model uses content information of citing papers and names of the cited authors as word tokens. ALTM outputs per-topic author distribution that functions as author influence indices.
Dynamic Author-Citation Topic Model: DACTM (Kataria et al., 2011) is an extension of ALTM that requires publication corpora which preserve sentence structures. To model author influence, DACTM selectively uses words that are close to the point where the citation is presented. : Citation prediction results. The task is to find out which paper is originally linked to a cited paper. We measure mean reciprocal rank (MRR) to evaluate model performance. For all cases, the LTAI performs better than the other methods.
In our corpora, only Citeseer dataset preserves the sentence structure. Topical h-index: To compute topical h-index, we separate the papers into several clusters using LDA and calculate the h-index within each cluster. Topical h-index is used for author prediction in the same manner as we did for our model, except the topic proportions are replaced to the LDA's result and η is replaced to the topical h-index values.

Evaluation Metric and Parameter Settings
We use mean reciprocal rank (MRR) (Voorhees, 1999) to measure the predictive performance of the LTAI and the comparison models. MRR is a widely used metric for evaluating link prediction tasks (Balog and De Rijke, 2007;Diehl et al., 2007;Radlinski et al., 2008;Huang et al., 2015). When the models outputs the correct answers as ranks, MRR is the inverse of the harmonic mean of such ranks.
We report the parameter values used for evaluations. For all datasets, we set c − to 1. To predict citation, we set c + to 10,000, 100, 1,000, 10, and to predict authorship, we set c + to 1,000, 1,000, 10,000, 1,000 for CORA, Arxiv-Physics, PNAS, and Citeseer datasets. These values are obtained through exhaustive parameter analysis. We set α θ to 1, and α β to 0.1. We fix the subsample sizes to 500 2 . For fair comparison, all the parameters that the LTAI and the baseline models share are set to have the same values, and for other parameters that uniquely belong to the baseline models, the values are exhaustively tuned as done in the LTAI. Finally, we note that all parameters are tuned using the training set, and test dataset is used only for the testing purpose.

Evaluation
We conduct the evaluation of the LTAI with three different quantitative tasks, along with one qualitative analysis. In the first task, we check whether using citation and authorship information in the LTAI helps increase the word-level predictive performance. In the second and third tasks, we measure the predictability of the LTAI regarding missing publication-publication linkage and authorpublication linkage; with these two tasks, we compare the predictive power of the LTAI with other comparison models and use MRR as evaluation metric. Finally, we observe famous researchers' topical authority scores generated by the LTAI and investigate how these scores capture notable academic characteristics of the researchers.

Word-level Prediction
In the LTAI, citation and authorship information affect per-document topic proportions, as can be confirmed in equation 3. This joint modeling of content and linkage structure, compared to vanilla LDA that uses content data only, yields better performance in terms of predicting missing words in documents. In this task, we use log-predictive probability, a metric that is widely used in other researches for measuring model fitness (Teh et al., 2006;Asuncion et  : Author prediction results. The task is to find out who the author of a cited paper is, given all the citing papers. For all cases, the LTAI performs better than the other methods. Hoffman et al., 2013). For each corpus, we separate one third of documents as test set, and for all documents in each test set, we use half of the words for training per-document topic proportion θ and predict the probability of word occurrence regarding the remaining half. Specifically, the predictive probability for a word in a test set w new with respect to the given words w obs and the training document D train is computed using equation Figure 4 illustrates the per-word log-predictive probability in each corpus. We confirm that when using the LTAI, the log predictive probability converges at higher value compared to the result using LDA. Also, when we corrupt the link structure from 10% to 30% the predictive performances of the LTAI gradually decrease. Thus, the LTAI's superior predictive performance is attributed to its usage of correct citations rather than the algorithmic bias.

Citation Prediction
We evaluate model predictability regarding which publication is originally citing a certain publication. Specifically, we randomly remove one citation from each of the documents in the test set. To predict the citation link between publications, we first compute the probability that publication j cites i from p(x i←j |z, A i , π i ) ∝ a∈A i π i←ja N (x i←j |z i diag(η a )z j , c −1 + ). Given the topic proportion of the cited publication θ i and the topical authorities of the authors η a , we compute which publication is more likely to cite the publication. Based on our model assumption in subsection 3.2, using topical authority increases the performance of predicting linkage structure.
In Figure 5, the LTAI yields better citation prediction performance than other models for all datasets and with most number of topics. Since the LTAI incorporates topical authority for predicting citations, it performs better than RTM, which does not discover topical authority. We can attribute the better performance of the LTAI compared to ALTM and DACTM to the LTAI's multiple model assumptions explained in section 3. We note that DACTM requires additional information such as citation location and sentence structure, and thus, is only applicable for limited kinds of datasets.

Author Prediction
For author prediction, we randomly remove one of the authors from documents in the test set while preserving citation structures. Similar to citation prediction, we predict which author is more likely to write the cited publication based on the topic proportions of cited publication i and a set of citing publications J . We approximate the probability of researcher a being an author of publication i from p(a|z, η a , x i←j ) ∝ j∈J N (x i←j |z i diag(η a )z j , c −1 + ). Because the mixture proportion of an unknown author π i←ja cannot be obtained during posterior inference, we assume the cited publication is written by a single author to approximate the probability. For author prediction, we choose the author that maximizes the above probability. In Figure 6, the LTAI outperforms the comparison models in most of the settings.

Qualitative Analysis
To stress our model's additional characteristics that are not observed in the quantitative analysis, we look at the assigned topical authority indices as well as other statistics of some researchers in the dataset. In the analyses, we set the number of topics to 100, and use CORA dataset for demonstration. We first demonstrate famous authors' authoritative topics that can be unveiled using our model. In Table 2, we list top 10 authors with highest h-indices along with their number of citations, number of papers, and their representative topics. Authors' representative topics are the topics with highest authority scores. In the table, we observe that all authors with top h-indices have wrote at least 18 papers and earned at least 207 citations, which are the top 0.8% and 0.2% values respectively. However, their authoritative topics retrieved by the LTAI do not overlap for any of the authors. This table illustrates that each of the top authors in the table exerts authority on different academic topics that can be captured by the LTAI, while the authors commonly have highest h-index scores as well as other statistics.
We now stress attributes of topical authority index that are different from other topic irrelevant statistics. From Tables 3 to 5, we show four example topics extracted by our model and list notable authors within each topic with their topical authority indices, h-indices, number of citations, and number of papers. In the tables, we first find that all four authors with highest topical authority values, Monica Lam, Alex Pentland, Michael Jordan, and Mihir Bellare are also listed in the topic-irrelevant authority rankings in Table 2. From this, we confirm that authority score of the LTAI has a certain degree of correlation to other statistics, while it splits the authors by their authoritative topics.
At the same time, the topical authority score correlates less with topic-irrelevant statistics than those statistics correlate with themselves; in Table 5, Oded Goldreich has lower topical authority score for the computer security topic while having higher topic irrelevant scores than the above four researchers, because his main research filed is in the theory of computation and randomness. Also, we can spot authors who exert high authority on multiple academic fields, such as Tomaso Poggio in Table 3 and  in Table 4. Similarity, when comparing Federico Girosi and Tomaso Poggio in Table 4, the two researchers have similar authority indices for this topic while Tomaso Poggio has higher values for the other three topic-irrelevant indices. This is a reasonable outcome when we investigate the two researchers' publication history. Federico Girosi has relatively focused academic interest, with his publication history being skewed towards machine-learning-related subjects, while Tomaso Poggio has broader topical interests that include computer vision and statistical learning, while also co-authoring most of the papers that Federico Girosi wrote. Thus, Federico Girosi   has similar authority index for this topic but has lower authority indices for other topics than Tomaso Poggio. Also, our model is able to capture topic-specific authoritative researchers that have relatively low topic-irrelevant scores. For example, researchers such as Stan Sclaroff and Kentaro Toyama are the top 5 authoritative researchers in computer vision topic according to the LTAI, but it is difficult to detect these researchers out of many other authoritative authors using the topic-irrelevant scores.
Finally, the LTAI detect researchers' topical authority that is peripheral but not negligible. Mark Jones in Table 4, who has high h-index, number of citations, and wrote many papers, is a researcher whose academic interest lies in programming language design and application. However, while most of his papers' main topics are about programming language, he often uses inference techniques and algorithms in machine learning in his papers. Our model captures that tendency and assigns some authority score for machine learning to him.

Conclusion and Discussion
We proposed Latent Topical Authority Indexing (LTAI) to model the topical-authority of academic researchers. Based on the hypothesis that authors play an important role in citation, we specifically focus on their authority and develop a Bayesian model to capture the authority. With model assumptions that are necessary for extracting convincing and interpretable topical authority values for authors, we have proposed speed-up methods that are based on stochastic optimization. While there is prior research in topic modeling that provides topic-specific indices when modeling the link structure, these do not extend to individual indices, and most previous citation-based indices are defined for each individual but without considering topics. On the other hand, our model combines the merits of both topic-specific and individual-specific indices to provide topical authority information for academic researchers.
With four academic datasets, we demonstrated that the joint modeling of publication and author related variables improve topic quality, when compared to vanilla LDA. Also, we quantitatively manifested that including authority variables increases the predictive performance in terms of citation and author predictions. Finally, we qualitatively demonstrated the interpretability by topical-authority outcomes of the LTAI from the CORA corpus.
Finally, there are issues that can be dealt in future work. In our model, we do not consider time information in terms of when papers are published and when pairs of papers are linked; we can use datasets that incorporate timestamps to enhance the model capability to predict future citations and authorships.