Comparing institutional-level bibliometric research performance indicator values based on different affiliation disambiguation systems

The present study is an evaluation of three frequently used institution name disambiguation systems. The Web of Science normalized institution names and Organization Enhanced system and the Scopus Affiliation ID system are tested against a complete, independent institution disambiguation system for a sample of German public sector research organizations. The independent system is used as the gold standard in the evaluations that we perform. We study the coverage of the disambiguation systems and, in particular, the differences in a number of commonly used bibliometric indicators. The key finding is that for the sample institutions, the studied systems provide bibliometric indicator values that have only a limited accuracy. Our conclusion is that for any use with policy implications, additional data cleaning for disambiguating affiliation data is recommended.


INTRODUCTION
Scientometric studies at the level of research institutions face the challenge of the correct attribution of publications to institutions.This task, here referred to as institution name disambiguation, comprises systematically standardizing the heterogeneous address data of the author-provided affiliation information present in publications and recorded in bibliographic databases.At present, institutional affiliation information in academic publications is not standardized and unique identifiers for research institutions have not yet been adopted.Therefore, in order to generate valid primary data on publications for studies at the meso level, the assignment of address strings to known real institutional entities is crucial.Institution name disambiguation belongs to a class of problems known as named entity normalization, in which variant forms have to be matched to the correct preferred form.Another prominent member of this class is author name disambiguation.Disambiguated affiliation information can contribute to the performance of author name disambiguation systems that employ affiliations as background information. 1  In the recent past, a nearly complete institutional disambiguation for German research institutions was developed and implemented at the Institute for Interdisciplinary Studies of 1 Likewise, disambiguated author information could potentially be used as additional input information for institutional disambiguation.However, we are not aware of any literature on this approach.Science at Bielefeld University, as a major component of a national bibliometric data infrastructure for research and monitoring (Rimmert, Schwechheimer, & Winterhager, 2017).
The system has been tested and improved over a number of years and is now in production use.We are therefore in a position to study the degree to which the use of a sophisticated disambiguation system with near-complete national-scale coverage leads to different bibliometric indicator values compared to a situation in which no such system is available and simpler alternatives to the attribution problem have to be used.We consider here (a) the case where a simple unification strategy using ad hoc lexical searches in the address data fields of a bibliographic database is conducted in order to collect publications of the target institutions (based on vendor preprocessed affiliation data in Web of Science [WoS]); and (b) the use of bibliographic database vendors' own institution disambiguation systems (in both WoS and Scopus).We believe that these two situations are common in practice outside of specialized research or evaluation units with access to the raw data of bibliographic databases.The performance and implications of these approaches are therefore relevant and of wide interest to the bibliometrics and research evaluation communities.Prominent examples with profound science-political consequences of the use of bibliometric data of institutions derived from WoS or Scopus are the international university rankings, which generate much public attention and elicit considerable debate.
The remainder of the article is structured as follows.We begin by providing an overview of the prior work on institutional disambiguation.Next, we briefly outline the institution name disambiguation systems that we study and describe the publication data and institution samples that we use, as well as the comparison scenarios, the bibliometric indicators that are calculated for the institutions, and our approach of assessing the differences in indicator values.In the next section we present the results of our comparisons.In particular, we assess the distributions of errors in indicator values over institutions arising when applying alternative disambiguation systems in contrast to the reference values obtained from the presented disambiguation system for German research institutions, which can be assumed to be complete and nearly error free for the data.The results and their implications are summarized in the Discussion section.

RELATED WORK
Unification of author affiliation information and the allocation to clearly identified research institutions has been recognized as a challenging task in the bibliometric research community and beyond.Accurate disambiguation of heterogeneous affiliation data is crucial for institutionlevel scientometric research and bibliometric evaluation.Disambiguation systems connecting heterogeneous author affiliations to known research institutions have been constructed in several projects-usually for project-specific purposes and not to be made available publicly.They may be roughly divided into rule-based and machine learning approaches.However, this division is not a strict one, as approaches often use combinations of methods (e.g., rules and some manual work are used in addition to a machine learning approach to improve precision, especially for problematic cases).

Rule-based Approaches
A substantial amount of work on this topic has been done at the Centre for Science and Technology Studies (CWTS) at Leiden University.For the case of universities, this began with De Bruin and Moed (1990).They performed a unification of about 85,000 affiliations from 75 journals (data from SCISEARCH) using the first part of the addresses.Using structural information and reference books, they assigned units on lower hierarchical levels (e.g., departments), appearing in the first part of addresses, to the corresponding main organization.They found that many problems remained, and to solve these they used external information from encyclopedias, university handbooks, specialists, and staff lists of universities.This is a timeconsuming method, and they only did this for selected countries (in particular the Netherlands).In a follow-up study, Moed, De Bruin, and Van Leeuwen (1995) reported on a bibliometric database constructed from all articles published by authors from the Netherlands using data from the Science Citation Index.To store unified affiliations, they improved their earlier procedure for Dutch addresses by, among other things, adding a classification of institutions to research sectors: that is, types of organizations such as universities, hospitals, and firms.They noted problematic affiliations that could not be handled correctly by their procedure.CWTS continued to maintain and improve its disambiguation system, in particular for its university ranking, for which all name variants that occur at least five times in the WoS are cleaned (Waltman et al., 2012).This system pays special attention to the way publications by academic hospitals are assigned to universities (Reyes-Elizondo, Calero-Medina, Visser, & Waltman, 2016).
The Swedish Research Council performed affiliation disambiguation for its bibliometric database, which was also constructed based on WoS data (Kronman, Gunnarsson, & Karlsson, 2010;Swedish Research Council, 2017).They used a deterministic approach based on a catalog of string rules, mapping address strings to 600 known Swedish research organizations.Organizations were also classified by the research sector.Their procedure was able to assign over 99% of Swedish address strings.A single address may be matched to more than one organization in the case of affiliations containing information on more than one organization, usually indicating collaborations.

Machine Learning Approaches
French, Powell, and Schulman (2000) described a number of institutional disambiguation experiments with different address string distance metrics and a one-pass heuristic clustering procedure.The clearly stated goal was not a complete automatic disambiguation, but rather the reduction of manual reviewing of the most difficult cases.Among other things, they introduced a new, domain-specific affiliation comparison function, based on normalized and sorted words, minimizing edit distances between aligned words across possible permutations.Jonnalagadda and Topham (2010) reported on their disambiguation of institution names extracted from PubMed data.The presented approach utilized agglomerative clustering, for which the entity similarity is computed with an edit distance, building on the work of French et al. (2000).In particular, their approach was a hybrid of a sequence alignment measure over word sequences (Smith-Waterman algorithm) and the Levenshtein distance between single words.Furthermore, similar clusters were merged.They reported sample precision values of 99.5% (4,135 affiliation strings related to "Antiangiogenesis," only US organizations) and 97.9% (1,000 affiliation strings related to "Diabetes," organizations from any country) for organization normalization.Although these values are high, it is not possible to extrapolate them to less restricted data sets.Galvez andMoya-Anegón (2006, 2007) reported on a new approach using finite-state graphs, developed with WoS data and also tested on data from Inspec, Medline, and CAB Abstracts.Although this is a promising approach, the authors outlined the limits of automatic classifications for problematic affiliations, which requires expert knowledge to classify.Jiang, Zheng, Wang, Lu, and Wu (2011) discussed an experimental approach of agglomerative clustering of affiliations using string compression distance.Their evaluation of the method is questionable, as they use the publication affiliations of mostly students and staff from a single university.Their affiliation string pool is, therefore, dominated by name variants of that university, although the remainder are affiliations of coauthors.They extracted a reference corpus of 217 "affiliations" (variants) of 105 "categories" (true organizations).In any case, their clustering quality results are not encouraging.This also holds true for the application of supervised and semisupervised machine learning methods, tested by Cuxac, Lamirel, and Bonvallot (2013) on French CNRS addresses. Huang, Yang, Yan, and Rousseau (2014) proposed an algorithm using author information to classify affiliations that received high precision values but low recall.
We can conclude that the problem of institution name disambiguation is far from being solved.For the objective of achieving highly accurate disambiguation, it seems that simple methods have not yet been replaced by fully automatic methods, despite the experimental application of several sophisticated approaches with partly promising results on small scales.However, significant progress has been made on affiliation string similarity calculation methods.Both rule-based and machine learning methods can be used to minimize the necessary amount of manual human decisions.Nevertheless, the necessarily higher amount of labor required by rule-based methods means that they have only been applied to parts of all author affiliations, typically to those from one country or discipline.No standard evaluation data set is available for this task.Furthermore, none of the studies have investigated the effects of institutional disambiguation on the quality of bibliometric indicator scores.

Institution Disambiguation Systems
In this section we summarize the disambiguation system that was developed for German institutions.For a full description of the system we refer readers to Rimmert et al. (2017).The system, which we call the KB system,2 is comprised of (a) a set of known and uniquely identified German research institutions, (b) a mapping of institutions to affiliation records identified as belonging to each institution from the two data sources WoS and Scopus, (c) a hierarchical classification of the institutions into sectors, and (d) a change history of the institutions which record the splitting and merging and incorporation of institutions and sector changes.The KB system is thus built on the affiliation data provided in WoS and Scopus, respectively, and belongs to the category of rule-based systems.The tracking of structural changes affords the necessary flexibility in handling such changes required by different project contexts.In the KB system, two different analytical views are implemented (item 4 above).With Mode S (for synchronic allocation), we can perform analyses that take into account the institutional structures as they were at the time of publication for each paper.Institutions that have later come to be related to another institution through structural changes, such as through mergers or splits, are treated as different entities.On the other hand, with Mode A (asynchronic, current perspective), we can analyze the publication records of institutions as they are constituted at present; that is, including publications of predecessor units.The mapping of institutions to affiliation records (item 2 above) is a deterministic, rule-based classification.The core of the institutional coding procedure is a mapping of author addresses to the corresponding uniquely identified research institutions and their subdivisions, using a large library of regular expressions.This library currently contains some 45,000 expressions and is continuously being expanded and improved.
The sector classification (item c above) contains the classes of higher education sectors (universities and universities of applied sciences), four major nonuniversity research organizations (Fraunhofer-Gesellschaft [FHG], Helmholtz Association [HGF], Leibniz Association [WGL], and the Max Planck Society [MPG]), private companies, registered associations, government laboratories, and academies of science.For the sector information, structural changes over time and multiple assignments of research institutions to these sectors are also available.
The version of the KB system used for this study contained 2,097 institutions, which also included placeholder records for unidentified institutions for which only the sector could be determined.An evaluation of the KB disambiguation system was conducted prior to the main study.We provide a detailed overview of the system evaluation in Appendix A for German research institutions.We conclude that, based on the good results of this evaluation, the KB system is a valid, gold standard benchmark for German institutional affiliation disambiguation data.This is not to say, however, that the KB system or its rule-based approach are superior in general.In fact, its scope is limited to a single country and it would be difficult to extend the method to global scope because of the large effort and unreasonable expense required.
We deliberately do not attempt to describe the workings of the proprietary institution disambiguation systems of WoS and Scopus and regard them as black boxes, of which we only analyze the results.The reason for this is that both systems are not documented in any detail by the providers.What we can gather from the information of the platforms is that WoS Organizations Enhanced (OE) is based on lists of variant names mapped to preferred names. 3oS OE can therefore be seen as a rule-based system.Regarding the Scopus Affiliation Identifiers (AFIDs), the documentation merely informs us that "the Affiliation Identifier distinguishes between affiliations by assigning each affiliation in Scopus a unique number, and grouping together all of the documents affiliated with an organization."4No information is given about how the system works.

Data
The data used in the analyses are derived from the licensed source of WoS5 and Scopus, obtained in spring 2017.The data were loaded into in-house relational databases, cleaned, and enhanced at the Competence Centre for Bibliometrics for Germany.The most important enhancement is the disambiguation of German author addresses to known German research institutions.This process is conducted separately for each data source using the KB disambiguation system described in the previous subsection.The units of the analysis for this study are German academic institutions, in particular universities, universities of applied sciences, and nonuniversity research institutes.Publications are restricted to articles and reviews published between 1995 and 2017.To be included, an institution needed to have at least 50 such publications associated with it according to the KB disambiguation of the WoS data.These restrictions resulted in a study sample of 445 institutions.The same institutions are used to investigate both WoS and Scopus.

Scopus AFID
For the Scopus data, we compare the KB system-derived reference data to sets of publications that have one or more common assigned AFIDs (affiliation identifiers), as provided by Elsevier.Some preprocessing steps to align the Scopus and KB disambiguation systems were performed in order to make them comparable, as they are conceptually and structurally somewhat different.To match AFIDs to the KB system IDs, the AFID for each institution in our sample was obtained by searching Scopus's online platform.It is not clear whether and how exactly the definition of an institution in Scopus differs from the one the KB disambiguation is based on.One difference that we have noticed is that the AFID system typically has separate IDs for university hospitals and the universities they belong to, which is not the case in the KB system.We have therefore merged those AFIDs to create more comparable and consistent publication record sets.Furthermore, in some cases more than one AFID for the same institution exists in Scopus, for instance, for multiple branch locations.If these are logically linked in the hierarchical relations in the Scopus system, we also merged these linked AFIDs.If not, we took only the most commonly used AFID per institution.
We found that in the AFID system, publications with affiliations referring to predecessor units are grouped with their current unit.Based on this observation, we compare the AFID results with those from the KB system's Mode A.

Web of Science (WoS) organization enhanced
The WoS OE system does not have unit identifiers but Preferred Names, which are additionally assigned as institution names to affiliations considered belonging to one real institution.In order to identify the WoS Preferred Name for the institutions in our set, we started by identifying all the Preferred Names of records with German addresses that occur more than 20 times.From this list, we chose the Preferred Name matching the target institution and otherwise excluded the institution from this part of the study.In fact, for our sample set, it was not possible to retrieve the corresponding publications on the main institutional level in a majority of cases.Although many universities are recorded in OE, the institutions of FHG, HGF, WGL, and MPG are almost all grouped such that only all publications of each of the respective head organizations can be found, but rarely those of their member institutes. 6imilar to AFID, also in the WoS OE system, predecessor units are grouped under the Preferred Name of the current institution.In consequence, we also compare the WoS OE system with Mode A of the KB system.

WoS institution name search
As well as the comparison of WoS OE data with the KB disambiguation, we also investigated the performance of a lexical search using the institution name in the WoS affiliation data.As pointed out above, the coverage of institutions in the WoS OE system is far from complete (since only head organizations are covered, not their member institutes), which supports the notion that such an alternative approach might often be required in practice.The institution name search method makes use of WoS disambiguation efforts, because institution names extracted from affiliation information in papers are not indexed identically to how they are given in the original publication but are normalized.Because the affiliations in Scopus are not transformed or normalized, we do not apply a similar search strategy to Scopus data.In fact, it is not possible to conduct comparable searches across these two databases because WoS only contains normalized address strings, while Scopus only contains the original address strings.
In this scenario, we model a hypothetical user who has a list of the names of the German research institutions available, which is used as a basis for generating search terms.We also assume that the user is familiar with searching in WoS data to a sufficient degree.This scenario further requires a definition of the name list, the search terms, and the search parameters.
In order to generate a plausible name list, we begin by using the KB institutional disambiguation results to find the most common normalized name in the WoS data for each real institution in our initial set, because in principle there should be only one normalized name for The number of an institution's publications that, compared with other publications in the same field and the same year, belong to the top 10% most frequently cited PP(top 10%): Share of highly cited publications The proportion of an institution's publications that, compared with other publications in the same field and the same year, belong to the top 10% most frequently cited each institution.We manually assess the lists side by side with the real names and discard any WoS name that cannot be deduced from the name list, using instead the next most common name variant iteratively until all WoS normalized names are mapped to KB system IDs based on the names in the two systems.This relates to our decision of going beyond a completely naïve and automatic procedure and including a realistic degree of user common sense and domain familiarity.We use the search term list thus obtained as retrieval inputs, while also ignoring capitalization and allowing truncation at the end of the term, and searched the full address information field.This came reasonably close to an informed, but nonspecialized, search for an institution on the online platform of WoS.It is general in the sense that all institutions are treated in the same way and no special knowledge of affiliation idiosyncrasies is included.It is limited in the sense that we only consider one name variant per institution.
Because we directly use the normalized affiliation data as it is indexed in WoS, it is clear that we use the normalized versions of the institution names at the time of publication.Thus, we use Mode S of the KB system for comparison.

Methods
To assess the performance of the studied systems in terms of being able to identify the correct publications of the research institutions we use the information retrieval measures of precision and recall.For this task, precision is calculated as the share of correctly retrieved publications among the total number of retrieved publications.Recall is the share of correctly retrieved publications among all relevant publications.The correct publications of an institution are those identified by the KB system.
In order to quantify the effect of the application of a specific institutional disambiguation on scores of bibliometric indicators, we calculated the indicator values based on the publications of each institution as retrieved by the KB system-considered a validated gold standard for the Note.The figure in the row "Total" may differ from the sum of the above cells because some institutions are assigned to more than one sector.A number of commonly utilized bibliometric indicators are included in this study.We consider the three domains of publication output, collaboration, and citation impact.For the latter two domains we have selected indicators that are size dependent (absolute numbers) as well as size-independent indicators (ratios or averages).The citation indicators are all calculated for 5-year citation windows which include the year of publication.The indicators are summarized in Table 1.It is clear that the size-dependent indicator values are directly related to the number of correctly identified publications.However, it might be hypothesized that the values of sizeindependent indicators are less affected when only a part of the correct publication set is used as their input, because errors may cancel each other out.
We compare two vendor-provided disambiguation system results and one search-based result with the KB system's results, which we take as the correct result providing reference values.We divide the system evaluation into two parts.First, for each institution in the evaluation set, we would like to find all its publications, without retrieving any publications it was not involved in.This is a basic information retrieval task, which can be measured with precision and recall.We also use retrieval performance, including the absolute number of retrieved institutions in the evaluation set, to analyze the coverage of the systems with respect to our sample of 445 institutions.The second component of the evaluation concerns the bibliometric  indicator scores calculated from the retrieved institution publication sets.In general, the numerical discrepancy between the indicator values, using the KB disambiguation (reference values) and the other methods, will be expressed as relative deviation in percent, calculated as deviation ¼ observed system score− KB system reference score ð Þ =KB system reference score Â 100 The deviation has a lower bound at −100% and is unbounded in the positive direction.For example, let the reference MCS of a unit be 5.5 (calculated based on the KB disambiguated data), and the focal value obtained from a simple institution search in WoS be 4.2.Then the deviation as defined above is (4.2 − 5.5)/5.5 × 100 = −23.6%.In this case, the correct result would be underestimated by 23.6%.
For each indicator, the computed deviations for each institution are collected.Our main measure of accuracy is the percentage of values within a range of ±5% of the reference score.

RESULTS
An overview of the coverage of German institutions in the WoS and Scopus institution disambiguation systems and the lexical search method in WoS is provided in Table 2.We are able to find only 91 of our 445 (20%) evaluation sample institutions in the OE system.The coverage of OE is the lowest among the systems considered.To a significant extent, this is a consequence of the choice not to include the member institutes of nonuniversity research organizations in WoS OE.The set of covered institutions in WoS OE is comprised mostly of universities.However, also for the universities, in particular for the universities of applied sciences, a significant number of institutions are not covered in WoS OE.Using the search strategy, we can find one normalized form for each institution, achieving complete coverage of the institutions.The Scopus AFID system covers 376 (85%) of the institutions with no conspicuous differences between sectors.

WoS Organizations Enhanced
We present the institution-level figures for precision and recall for WoS OE in Table 3 and Figure 1.All results should be interpreted with due caution because of the OE system's limited coverage of the selected institutions.The precision of WoS OE for these institutional publication sets is 0.95, on average, across institutions, weighted by publication numbers.Hence, typically about 5% of the returned publications in a result set of a specific preferred name will be false positives.The weighted mean of recall across institutions is 0.93, meaning that the result sets do not include about 7% of relevant publications, on average.The contrast between unweighted (0.87) and weighted mean for the recall shows that the results for larger institutions (in terms of number of publication) are better than for smaller institutions.We found poor results for recall for the four institutions presented in Table 4.
We now turn to the results of the comparison of the scores of bibliometric indicators between the WoS OE and the KB system.The results are presented in Table 5, in the form of summaries of the deviation score distributions, visualized in Figure 2. It can be seen that absolute indicator scores (number of publications, collaborative publications, and citations) are less often within the range of nearly correct values (±5%) than relative indicator scores.

WoS Institution Name Search
In this section, we compare the results of the WoS institution name search with those of the KB system.Note that the search makes use of the institution name normalization of WoS, and we  have deliberately searched for the single most common WoS normalized name per institution, as mentioned above.Using this search method, we obtain vastly more institution publication sets than using WoS OE; in fact, full coverage of all sample institutions is achieved (see Table 2).The summary of the distributions of precision and recall is given in Table 6 and the values are displayed in Figure 3.We obtain rather poor results for the average precision of 0.61 when weighting institutions by the number of publications, and 0.67 as the arithmetic mean.Publication sets for this method will often contain many publications incorrectly assigned to the institutions in question.Recall is at 0.74 weighted mean and 0.55 arithmetic mean, which means that the publication lists returned by these queries will commonly be incomplete, but less so for the larger institutions.Tables 7 and 8 provide the five institutions with the lowest recall and precision scores.
These results for recall suggest that the normalization procedure of WoS is often unable to group most of the relevant institution name variants under one normalized form.
The results of the comparison of the bibliometric indicator scores between the WoS institution name search approach and the KB system for Mode S are provided in Table 9 and the deviation distributions are displayed in Figure 4.The shares of institutions for which the scores obtained with the institution name search approach are within ±5% of the reference score are low, especially for the absolute indicators.Dispersion of the deviations is high.Moreover, the ratio-and mean-based citation scores are comparatively less inaccurate.Evidently, the incomplete publication list result sets of this method lead to substantially inaccurate scores for all indicators.The results for precision and recall of the Scopus AFID system, under the Mode A condition, are summarized in Table 10 and displayed in Figure 5. Precision is quite high, but, in contrast, recall is more moderate.Again, we find that the weighted mean precision and recall are slightly greater than the unweighted ones, suggesting that disambiguation quality is typically a little better for larger institutions.We also note that the coverage of our selected benchmarking institutions for the AFID system is 376 out of 445 (i.e., 85%) and therefore far from complete.Unlike the WoS OE system, the Scopus AFID system is not largely concentrated on universities (Table 2).Table 11 provides the five institutions with the lowest recall scores for the Scopus AFID system.
The direct comparison of the results for the indicator scores, calculated with the Scopus platform disambiguation system-AFID-on the one hand, and those calculated with the KB system on the other, in terms of distributions of percent deviation, are given in Table 12 and the deviation distributions are displayed in Figure 6.We find on average, for the absolute indicators, considerable shares of scores that are outside the range of accepted values.Relative indicators scores are less severely affected, but not within the accepted range often enough to be considered reliable.It is worth pointing out that in particular the total number of citations (TCS) is rarely within the allowed range, which, however, did not seem to overly affect the other citation indicators.

DISCUSSION
We have investigated the accuracy of bibliometric indicator values for German publicly funded research organizations that can be obtained through a search strategy on vendornormalized data (for WoS) and through the use of the database vendors' proprietary institution disambiguation systems (for both WoS and Scopus).These indicator values were compared with results from a nearly complete and independent institutional disambiguation for which detailed performance characteristics were provided.During our study, we found that conceptual differences between the three institution disambiguation systems and a lack of documentation of both the WoS OE system and the Scopus AFID system were obstacles to making straightforward comparisons.In particular, the definition of the basic institutional entity-which is a crucial point for comparing disambiguation systems-varied among the systems.For example, in Scopus, university hospitals were kept separate from university entities.They had different AFIDs, which were not connected in any way.This inhibits evaluations for universities including their academic hospitals or medical faculties.For a comparison with the KB system, these entities, academic hospitals and the universities to which they belong, had to be aggregated manually.A further issue was faced regarding the handling of predecessor institutions.In order to obtain valid results, we evaluated the systems on their own terms, adjusting the KB system as necessary, to include predecessor institutions.In WoS OE, the level at which institutional entities are defined (e.g., MPG as one single institutional entity), largely rules out a comparison on the institutional level, as defined in the KB system, for some KB sectors.Furthermore, there is no clear documentation on the handling of structural changes over time, such as splits or mergers.For analyses at the institutional level, this is a major limitation.
We find that WoS OE has the smallest coverage of our institution sample, at 20%, and is mainly restricted to universities.This reflects the choice made in WoS OE not to include the member institutes of nonuniversity research organizations.The coverage of Scopus AFID, on the other hand, is not largely limited to one institution type, but with 85%, it is also far from complete.These results show that the utility of the WoS and Scopus institution disambiguation systems for bibliometric analysis is limited, as they do not currently provide full coverage of disambiguated research organizations.In the WoS OE and Scopus AFID systems, precision of the obtainable publication sets was close to adequate levels at 0.95 and 0.96, respectively.However, neither system provided high recall rates (WoS: 0.93; Scopus: 0.86), which led to inaccurate indicator scores.Furthermore, we find substantial variation in precision and recall across institutions, indicating that within one system, these values are not systematically similar across the covered institutions but differ on a case-by-case basis.As for the tested name search method on normalized WoS data, precision and recall scores are poor, so this approach does not constitute a viable alternative.
Our results show that indicator values will typically not be within tolerable error margins at the organizational level, which we have set at ±5% of the reference value.This holds both for size-dependent and size-independent indicators.Hence, bibliometric indicator values at the institutional level have only limited accuracy.
Relying on vendor disambiguation systems may incur serious inaccuracies in indicator values at the institutional level.Therefore we conclude that for any use with policy implications,  additional data cleaning for disambiguating affiliation data is recommended.We stress that any study such as the one presented in this paper shows only the current situation and that disambiguation systems may improve over time.The lack of adequate documentation of vendor institution disambiguation systems, including performance figures, is, however, another barrier impeding the adoption of these institution disambiguation systems in bibliometric studies.

Figure 2 .
Figure 2. Distributions of indicator score deviations of WoS OE from KB system in Mode A. Diagonal lines indicate the ±5% error margin for indicator values.

Figure 4 .
Figure 4. Distributions of indicator score deviations of WoS institution name search from KB system in Mode S. Diagonal lines indicate the ±5% error margin for indicator values.

Figure 6 .
Figure 6.Distributions of indicator score deviations of Scopus AFID from KB system in Mode A. Diagonal lines indicate the ±5% error margin for indicator values.

Table 1 .
Overview of selected bibliometric indicators

Table 2 .
Coverage of sample institutions by the studied disambiguation systems Sector Number of institutions Covered in WoS OE Covered in WoS search Covered in Scopus AFID

Table 3 .
Summary statistics of the distributions of precision and recall of retrieved publications per institution for WoS OE (n = 91) selected institutions in this study-and for each of the three alternative systems.The differences of indicator values are calculated, and the arising error distributions are displayed.

Table 5 .
Deviation of indicator scores of WoS OE from KB system (n = 91)

Table 6 .
Summary statistics of distributions of precision and recall of retrieved publications per institution for WoS institution name search(n = 445)

Table 7 .
Institutions with low recall for WoS institution name search

Table 8 .
Institutions with low precision for WoS institution name search

Table 9 .
Deviation of indicator scores of WoS institution name search from KB system (n = 445)

Table 10 .
Summary statistics of the distributions of precision and recall of retrieved publications per institution for ScopusAFID (n = 376)

Table 11 .
Institutions with low recall for Scopus AFID

Table 12 .
Deviation of indicator scores of Scopus AFID from KB system in Mode A (n = 376)