Impact Factor volatility to a single paper: A comprehensive analysis of 11639 journals

We study how a single paper affects the Impact Factor (IF) by analyzing data from 3,088,511 papers published in 11639 journals in the 2017 Journal Citation Reports of Clarivate Analytics. We find that IFs are highly volatile. For example, the top-cited paper of 381 journals caused their IF to increase by more than 0.5 points, while for 818 journals the relative increase exceeded 25%. And one in 10 journals had their IF boosted by more than 50% by their top three cited papers. Because the single-paper effect on the IF is inversely proportional to journal size, small journals are rewarded much more strongly than large journals for a highly-cited paper, while they are penalized more for a low-cited paper, especially if their IF is high. This skewed reward mechanism incentivizes high-IF journals to stay small, to remain competitive in rankings. We discuss the implications for breakthrough papers appearing in prestigious journals. We question the reliability of IF rankings given the high IF sensitivity to a few papers for thousands of journals.


Introduction & Motivation
The effect of a journal's scale (i.e., size) on its citation average cannot be overstated. Recently, we showed (Antonoyiannakis, 2018) that citation averages, such as IFs, are scale-dependent in a way that drastically affects their rankings, and which can be understood and quantified via the Central Limit Theorem: For a randomly formed journal of scale n, the range of its IF values (measured from the global citation average) scales as 1/ √ n. While actual journals are not completely random, the Central Limit Theorem explains to a large extent their IF-scale behavior, and allows us to understand how the balance in IF rankings is tipped in two important ways: (a) Only small journals can score a high IF; and (b) large journals have IFs that asymptotically approach the global citation average as their size increases, via regression to the mean.
At a less quantitative level, the scale-dependence of citation averages has been noted earlier, by Amin & Mabe (2004), Campbell (2008), and Antonoyiannakis & Mitra, (2009). Yet it is almost always neglected in practice. Journal size is thus never accounted or controlled for in Impact Factor rankings, whether by standardization of citation averages (a rigorous approach; 2. How a single paper affects the IF: An example from four journals We are now ready to analyze the effect of a single paper on a journal's IF. Initially, let the journal have an IF equal to f 1 , which is the ratio of C 1 citations to the biennial publication count N 1 . The additional paper causes the IF denominator to increase by 1, and the nominator by c.
Before we study the general case, let us first consider one example, to get an idea of what is going on. In Table 1 we list four journals whose sizes range from 50 to 50,000, but their IFs are the same. The numbers are fictitious but realistic: As one can confirm from the JCR, there are journals with size and IFs sufficiently close to the values in the table.

Journal Size
Citations Initial IF New paper  Table 1: A hypothetical but realistic scenario. Four journals, A, B, C, and D, have the same IF but different sizes, when they each publish a paper that brings them c = 100 citations. The IF gain spans 4 orders of magnitude-both in absolute, ∆ f (c), and relative, ∆ f r (c), terms-since it depends not only on the additional paper, but also on the size of each journal.
Journal B is 10 times larger than A. When a highly-cited paper (c = 100) is published by A, the IF changes by ∆ f (100) = 1.902. When the same paper is published by B, the change is ten times smaller, i.e., ∆ f (100) = 0.194. Therefore, to compete with journal A-to obtain the same IF increase ∆ f (c)-journal B needs to publish 10 equally highly cited papers. Likewise, for every paper of c = 100 that A publishes, C needs to publish 100 equally cited papers to obtain the same IF increase. And for every paper of c = 100 that journal A publishes, journal D needs to publish 1000 equally cited papers to compete.
To sum up, the IF increase is inversely proportional to journal size. Publication of the same highly cited paper in any of the journals A, B, C, or D, produces widely disparate outcomes, as the corresponding IF increase spans four orders of magnitude, from 0.0019 to 1.902. With such a high sensitivity to scale, the comparison of IFs of these four journals is no level playing field: Small journals disproportionately benefit from highly cited papers.
The above example considers a highly cited paper. As we will shortly see, there is a sufficient number of highly cited papers to cause hundreds of journals every year to jump up considerably in IF rankings due to one paper. And even further: There are many journals of sufficiently small size and small IF that even a low-or moderately-cited paper can produce a big increase in their IF. Therefore, IF volatility due to a single paper (or a handful of papers, in the more general case) is a much more common pattern than is widely recognized. Which is why this behavior of IFs goes beyond academic interest. To understand this fully, let us now consider the general case.
3. How a single paper affects the IF: The general case. Introducing the IF volatility index.
The initial IF is 3 so that when the new paper is published by the journal, the new IF becomes The change (volatility) in the IF caused by this one paper is then so that where the approximation is justified for N 1 1, which applies for all but a few journals that publish only a few items per year. So, the IF volatility, ∆ f (c), depends both on the new paper (i.e., on c) and on the journal (size N 1 , and citation average f 1 ) where it is published.
We can also consider the relative change in the citation average caused by a single paper, which is probably a more pertinent measure of volatility. For example, if a journal's IF jumps from 1 to 2, then this is bigger news than if it jumped from 20 to 21. The relative volatility is where, again, the approximation is justified when N 1 1. The above equation can be further simplified for highly cited papers (c f 1 ) as Let us now return to ∆ f (c) and make a few remarks.
(a) For c > f 1 , the additional paper is above-average with respect to the journal, and there is a benefit to publication: ∆ f (c) > 0 and the IF increases, i.e., f 2 > f 1 .
(b) For c < f 1 , the new paper is below-average with respect to the journal, and publishing it invokes a penalty: ∆ f (c) < 0 as the IF drops, i.e., f 2 < f 1 .
(c) For c = f 1 , the new paper is average, and publishing it makes no difference in the IF. (d) The presence of N 1 in the denominator means that the benefit or penalty of publishing an additional paper decays rapidly with journal size. This has dramatic consequences.
Let us now consider two special cases of interest: CASE 1. The new paper is well above average relative to the journal, i.e., c f 1 . Here, where the last step is justified since in realistic cases we have N 1 1. The volatility ∆ f (c) depends on the paper itself and on the journal size. The presence of N 1 in the denominator means that publishing an above-average paper is far more beneficial to small journals than to large journals. For example, a journal A that is ten times smaller than a journal B will have a ten times higher benefit upon publishing the same highly cited paper, even if both journals had the same IF to begin with! The editorial implication here is that it pays for editors of small journals to be particularly watchful for high-performing papers. From the perspective of competing in 4 IF rankings, small journals have two conflicting incentives: Be open to publishing risky and potentially breakthrough papers on the one hand, but not publish too many papers lest they lose their competitive advantage due to their small size. For c N 1 , we get ∆ f (c) ≈ 0 even for large c. So, when large journals publish highly cited papers, they have a tiny benefit in their IF. For example, when a journal with N 1 = 2000 publishes a paper of c = 100, the benefit is a mere ∆ f (100) = 0.05. For a very large journal of N 1 = 20000, even an extremely highly cited paper of c = 1000 produces a small gain ∆ f (1000) = 0.05.
But for small and intermediate values of N 1 , the value of ∆ f (c) can increase appreciably. This is the most interesting regime for journals, which tend to be rather small: Recall that "90% of all journals publish 250 or fewer citable items annually" (Antonoyiannakis, 2018).
CASE 2. The new paper is well below average relative to the journal, i.e., c f 1 . (For journals of, say, f 1 ≤ 2, the condition c f 1 implies c = 0.) Here, since in realistic cases we have N 1 1. The penalty ∆ f (c) depends now only on the journal parameters (N 1 , f 1 ), and it is greater for small-sized, high-IF journals. The editorial implication is that editors of small and high-IF journals need to be more vigilant in pruning low-performing papers than editors of large journals. Two kinds of papers are low-cited, at least in the IF citation window: (a) archival, incremental papers, and (b) some truly ground-breaking papers that may appear too speculative at the time and take more than a couple years to be recognized.
For f 1 N 1 , we get ∆ f (c) ≈ 0. Very large journals lose little by publishing low-cited papers. The take-home message from the above analysis is two-fold. First, with respect to increasing their IF, it pays for all journals take risks. Because the maximum penalty for publishing belowaverage papers (≈ f 1 /N 1 ) is smaller than the maximum benefit for publishing above-average papers (≈ c/N 1 ), it is better for a journal's IF that its editors publish a paper they are on the fence about, if what is at stake is the possibility of a highly influential paper. Some of these papers may reap high citations to be worth the risk: recall that c can lie in the hundreds or even thousands.
However, the reward for publishing breakthrough papers is much higher for small journals. For a journal's IF to seriously benefit from ground-breaking papers, the journal must above all remain small, otherwise the benefit is much reduced due to its inverse dependence with size. To the extent that editors of elite journals are influenced by IF considerations, they have an incentive to keep a tight lid on their acceptance decisions and reject many good papers, and even some potentially breakthrough papers they might otherwise have published. We wonder whether the abundance of prestigious high-IF journals with small biennial sizes, N 2Y < 400, and especially their size stability over time, bears any connection to this realization. In other words, Is "size consciousness" one reason why high-IF journals stay small? We claim yes. As Philip Campbell, former Editor-in-Chief of Nature put it, "The larger the number of papers, the lower the impact factor. In other words, worrying about maximizing the impact factor turns what many might consider a benefit-i.e. more good papers to read-into a burden." (Campbell, 2008).
On a related note, Wang, Veugelers and Stephan (2017) reported on the increased difficulty of transformative papers to appear in prestigious journals. They found that "novel papers are less likely to be top cited when using short time-windows," and "are published in journals with Impact Factors lower than their non-novel counterparts, ceteris paribus." They argue that the increased pressure on journals to boost their IF "suggests that journals may strategically choose to not publish novel papers which are less likely to be highly cited in the short run." Our analysis adds size consciousness as another obstacle for a novel paper to be accepted in a prestigious journal: 5 Not only the novel paper itself may be less likely to be cited in the short run, but publication of a lesser-cited paper decreases the IF gain from all prospective papers and thus pressures the journal to be even more conservative as it grows in size. Why worry about intellectually risky papers? Because they are more likely to lead to major breakthroughs (Fortunato et. al., 2018.) It was in this spirit that the Physical Review Letters Evaluation Committee recommended back in 2004 that steps be taken to "educate referees to identify cutting edge papers worth publishing even if their correctness cannot be definitively established," and that "[r]eferee training should emphasize that a stronger attempt be made to accept more of the speculative exciting papers that really move science forward." (Cornell et al., 2004.) Granting agencies have reached a similar understanding. For example, an effort to encourage risk in research is the NIH Common Fund Program, established in 2004 and supporting "compelling, high-risk research proposals that may struggle in the traditional peer review process despite their transformative potential" (NIH News Release, 2018). These awards "recognize and reward investigators who have demonstrated innovation in prior work and provide a mechanism for them to go in entirely new, high-impact research directions." (Collins et al., 2014). Europe's flagship program for funding high-risk research, the European Research Council, was established in 2007 and "target[s] frontier research by encouraging high-risk, high-reward proposals that may revolutionize science and potentially lead to innovation if successful." (Antonoyiannakis, Hemmelskamp and Kafatos, 2009).

How the IF volatility index
We now analyze graphically how ∆ f (c) depends on its parameters, namely, the IF of the journal f 1 , the biennial publication count N 1 , and the annual citation count c of a single paper.
First, let us briefly comment on the dependence of ∆ f (c) on f 1 . Impact Factors f 1 range typically from 0.001-200, but are heavily concentrated in low-to-moderate values (Antonoyiannakis, 2018): The most commonly occurring value (the mode) is 0.5, while 75% of all journals in the 2017 JCR have IF < 2.5. Since our chief aim here is to study the effects of a single paper on citation averages, we are mostly interested in high c values (c > 100, say), in which case the effect of f 1 on ∆ f (c) or on ∆ f r (c) can be usually ignored, as can be seen from Eq. (7) and Eq. (6) respectively. For smaller c values relative to f 1 , the effect of f 1 is to simply reduce the size of ∆ f (c) by some amount, but is otherwise of no particular interest.
Let us now look at the dependence of ∆ f (c) on N 1 and c. The journal biennial size N 1 ranges from 20-60,000 and is heavily centered at small sizes (Antonoyiannakis, 2018), which has important implications, as we shall see. As for c, it ranges from 0-5000 in any JCR year, and its distribution follows a power law characteristic of the Pareto distribution for c ≥ 10 (Table 2). Figure 1 is a 3D surface plot of ∆ f (c) vs. N 1 and c, for a fixed f 1 = 10. Figure 2 is projection of Fig. 1 in 2D, i.e., a contour plot, for more visual clarity. The main features of the plots are: 1. For a given c value, ∆ f (c) decreases rapidly with N 1 , as expected from Eq. (4), since the two quantities are essentially inversely proportional for c f 1 . 2. For realistic values of N 1 , c, the volatility ∆ f (c) can take high values. For example, for 20 ≤ N 1 ≤ 100 and 20 ≤ c ≤ 500 we have 0.5 < ∆ f (c) < 25. Think about it: A single paper can raise the IF of these journals by several points! This is impressive.
Why are these parameter values realistic? Because small journals abound, while there are thousands of sufficiently cited papers that can cause an IF spike. Indeed, 25% of the 11639 journals in our data set publish fewer than 68 items biennially (N 1 < 68), while 50% of journals (biennial) journal size N 1 and citation count c of the new paper, for a journal whose IF was f 1 = 10 before publishing the paper. The range of N 1 values plotted here covers 90% of all journals, while 50% of all journals publish ∼ 130 or fewer citable items biennially (Antonoyiannakis, 2018). So, for thousands of journals a paper cited c 100 can cause ∆ f (c) > 1. The IF of the journal has little effect on ∆ f (c) as long as c f 1 . See Eq. (4).
Figure 2: Same data as in Fig. 1   publish fewer than 130 items, and 75% of journals publish fewer than 270 items. The range of N 1 values plotted here (10-500) spans 90% of all journals (Antonoyiannakis, 2018). At the same time, 6222 papers in our data set were cited at least 50 times, 1383 papers were cited at least 100 times, 302 papers were cited at least 200 times, etc. (Table 2). As these plots demonstrate, small journals (N 1 ≤ 500) enjoy a disproportionate benefit upon publishing a highly-cited paper, compared to larger journals. Small journals are abundant. Highly cited papers are relatively scarce, but nevertheless exist in sufficient numbers to cause abrupt IF spikes for hundreds of small journals.
But an additional effect is also at work here, and it can cause IF spikes for thousands of journals: a medium-cited paper published in a small and otherwise little-cited journal. Given the high abundance of medium-cited papers (e.g., more than 176,000 papers in our data set are cited at least 10 times) and low-IF journals (e.g., 4046 journals have f 1 ≤ 1), journals that would otherwise have had a negligible IF can end up with small or moderate IF. This is a much more commonly occurring effect than has been realized to date.
5. Systematic study of the volatility index, using data from 11,639 journals. Now that we understand in theory the IF volatility, let us look at some real journal data. We have analyzed all journals listed in the 2017 JCR of Clarivate Analytics.
At this point, we could continue to study the effect of a hypothetical paper on the IFs of actual journals, using JCR data for IFs and journal sizes. For example, we could ask the question, "How does the IF of each journal change by incorporation of a paper cited c = 100 times?" and calculate the corresponding volatility ∆ f (100). While such a calculation would be of value, we adopt a different approach, in order to stay firmly anchored on actual data from both journals and papers, and avoid hypotheticals. We ask the question "How did the IF (citation average) of each journal change by incorporation of its most cited paper, which was cited c * times in the IF 2-year time-window?" We thus calculate the quantity ∆ f (c * ), where c * is no longer constant and set equal to some hypothetical value, but varies across journals.
First, a slight change in terminology to avoid confusion. We wish to study the effect of a journal's top-cited paper on its citation average f when its biennial publication count is N 2Y . So, our journal's initial state has size N 1 = N 2Y − 1 and citation average f 1 , which we will denote as 8 Figure  f * . Our journal's final state has N 2 = N 2Y and f 2 = f , upon publication of the top-cited paper that was cited c * times. We study how ∆ f (c * ) and ∆ f r (c * ) behave using JCR data. Now, some technical details. The analysis was carried out in the second half of 2018. Among the 12,266 journals initially listed in the 2017 JCR, we removed the several hundred duplicate entries, as well as the few journals whose IF was listed as zero or not available. We thus ended up with a master list of 11639 unique journal titles that received a 2017 IF as of December 2018. For each journal in the master list we obtained its individual Journal Citation Report, which contained the 2017 citations to each of its citable papers (i.e., articles and reviews) published in 2015-2016. We were thus able to calculate the citation average f for each journal, which approximates the IF and becomes identical to it when there are no "free" or "stray" citations in the numerator-that is, citations to front-matter items such as editorials, letters to the editor, commentaries, etc., or citations to the journal without specific reference of volume and page or article number. We will thus use the terms "IF" and "citation average" interchangeably, for simplicity. Collectively, the 11639 journals in our master list published 3,088,511 papers in 2015-2016, which received 9,031,575 citations in 2017 according to the JCR. This is our data set.
For the record, for 26 journals the top cited paper was the only cited paper, in which case f * = 0. Also, for 11 journals none of their papers received any citations, in which case f = f * = 0! (These journals were however allocated an IF, so they did receive citations to the journal and year, or to their front matter.) None of these 37 journals is depicted in our log-log plots.
In Figs. 3 and 4 we plot the volatility ∆ f (c * ) and relative volatility ∆ f r (c * ), respectively, vs. journal size N 2Y . In Fig. 5 we plot the volatility ∆ f (c * ) vs. the journal citation average, f , in a bubble plot where bubble size is proportional to journal size. In Fig. 6 we plot the citation count of the top-cited paper, c * , vs. journal citation average, f . In Tables 3 & 4 and 7 & 8 we identify the top 100 journals in decreasing volatility ∆ f (c * ) and relative volatility ∆ f r (c * ), respectively. In Tables 5 and 6 we show the frequency distribution of ∆ f (c * ) and ∆ f r (c * ), respectively.
Our key findings are as follows.
1. High volatilities are observed for hundreds of journals. For example: (a) ∆ f (c * ) > 0.5 for 381 journals, (b) ∆ f (c * ) > 0.25 for 1061 journals, etc. Relative volatilities are also high for hundreds of journals. For example: (c) ∆ f r (c * ) > 50% for 231 journals, (d) ∆ f r (c * ) > 25% for 818 journals, etc. 2. If we look at the top few cited papers per journal-as opposed to the single top cited paper-then the IF sensitivity to a handful of papers becomes even more dramatic. For instance, the IF was boosted by more than 50% by: (a) the top two cited papers for 710 journals, (b) the top three cited papers for 1292 journals, (c) the top four cited papers for 1901 journals, etc. So, 10% of journals had their IF boosted by more than 50% by their top three cited papers! 3. Highest volatility values occur for small journals. This agrees with our earlier finding that smaller journals benefit the most from a highly cited paper. By "small journals" we mean N 2Y ≤ 500. For example, 97 of the top 100 journals ranked by volatility (Tables 3 and  4), and all the top 100 journals ranked by relative volatility (Tables 7 and 8) publish fewer than 500 papers biennially (N 2Y ≤ 500). The three parallel lines labeled "100%", "50%", and "25%" denote relative volatility values ∆ f r (c * )-i.e., relative IF boost-caused by the top-cited paper. Thus, data points above the 25% line describe the 818 journals whose top-cited paper boosted their IF by more than 25%. As expected from the Central Limit Theorem, increasing journal size causes the volatility to drop (larger bubbles "fall" to the bottom) and the IF to approach the global citation average µ = 2.9.
4. Above the limit of N 2Y ≈ 500, journal size starts to become prohibitively large for a journal's IF to profit from highly cited papers. Notice how the maximum values of ∆ f (c * ) and ∆ f r (c * ) follow a downward trend with journal size above this limit. 5. For some journals, an extremely highly cited paper causes a large volatility ∆ f (c * ). Consider the top 2 journals in Table 3. The journal CA-A Cancer Journal for Clinicians published in 2016 a paper that was cited 3790 times in 2017, accounting for almost 30% of its IF citations that year, with a corresponding ∆ f (c * ) = 68.3. Without this paper, the journal's citation average would have dropped from f = 240.1 to f * = 171.8. Similarly, the Journal of Statistical Software published in 2015 a paper cited 2708 times in 2017, capturing 73% of the journal's citations that year. Without this paper, the journal's citation average would have dropped from f = 21.6 to f * = 5.8. Although such extreme volatility values are rare, they occur every year. 6. A paper need not be exceptionally cited to produce a large IF boost, provided the journal is sufficiently small. Consider the journals ranked #3 and #4 in Table 3, namely, Living Reviews in Relativity and Psychological Inquiry. These journals' IFs were strongly boosted by their top-cited paper, even though the latter was much less cited (c * = 87 and 97, respectively) than for the top 2 journals. This is because journal sizes were smaller also (N 2Y = 6 and 11). Such occurrences are common, because papers cited dozens of times are much more abundant than papers cited thousands of times, while there are also plenty of very small journals. Indeed, within the top 100 journals by volatility (Tables 3 and 4) there are 19 journals whose top-cited paper received fewer than 50 citations and yet caused 11 a significant volatility ∆ f (c * ) that ranged from 1.6 to 4.8. High values of relative volatility ∆ f r (c * ) due to low-cited or moderately-cited papers are even more common. For 75 out of the 100 journals in Tables 7 and 8, the top-cited paper received fewer than 10 citations and yet caused ∆ f r (c * ) to range from 90% to 395%. 7. High volatilities are observed across the IF range. See Fig. 5. For example, ∆ f (c * ) > 0.5 for f ∼ 1 − 40. High relative volatilities (∆ f r (c * ) > 25%) are also observed across the IF spectrum. However, as expected from the Central Limit Theorem, with increasing journal size the IF approaches the global citation average µ = 2.9, is less sensitive to outliers and volatility drops: large bubbles "fall" to the bottom. 8. The top-cited paper captures a sizable fraction of the journal's citations for journals across the IF range. See Fig. 5. The dashed line with unity slope corresponds to the situation when the top-cited paper has all the journal's citations (so that f * = 0 and ∆ f (c * ) = f ). This line can never be reached in a log-log plot of data, although there are 26 journals with f * = 0 and another 11 journals with f = 0, as we mentioned earlier. But note how many journals are close to that line and how they extend across the IF range. For example, 818 journals have ∆ f r (c * ) > 25% (data points above the yellow line). Another example: Among the 142 journals whose top-cited paper captures more than 50% of the journal's citations, their IF ranges from f = 0.1 − 21.6 while their size ranges from N 2Y = 31 − 477. 9. The citation count of the top-cited paper correlates with the IF. See Fig. 6. But note how widely spread the highly cited papers are across journals. For example, papers with c * ≥ 50 appear in many journals of small-to-moderate IF, 0.5 < f < 2.5. 10. Note the parallel lines of negative slope at the bottom left corner of Fig. 3. All these 12 lines have slope equal to −1 in a log-log plot of ∆ f (c * ) vs. N 2Y , a feature that is readily explained from Eq. (4), whence ∆ f (c * ) ∼ (c * − f * )/N 2Y (since N 2Y 1 usually). The offset of the parallel lines is equal to log(c * − f * ), which for c * f * is roughly equal to log(c * ). Therefore, the ∆ f (c * ) data points for all journals whose highest cited paper was cited c * times must fall on the same line, irrespective of their IF, as long as c * f * . The parallel lines are therefore simply lines of increasing c * value, starting from c * = 1, 2, 3, etc., as we move from the bottom left to the top right of the figure. When the inequality c * f * no longer holds, a broadening of the parallel lines occurs and they overlap, exactly as we see in Fig. 3. Because of the highly skewed citation distribution of papers, the parallel lines become less populated as c * increases, i.e., for higher values of ∆ f (c * ).

Conclusions
The above findings corroborate our earlier conclusion (Antonoyiannakis, 2018) that IFs are scale dependent in that they are particularly volatile for small journal sizes, as explained by the Central Limit Theorem. This point is pertinent for real journals because 90% of all journals are small, publishing no more than 250 citable items annually.
Compared to large journals, small journals have (a) much more to gain by publishing a highlycited paper, and (b) more to lose by publishing a little-cited paper. The penalty for a zerocited paper can be easily exceeded by the reward of a highly-cited paper. So, in terms of IF, it pays for a small journal to "fine-tune" its risk level: publish a few potentially groundbreaking papers, but not too many. This upper limit to how many risky papers an elite, high-IF journal can publish before it begins to compromise its IF imparts a conservative mindset to the editor: Reject most but a few of the intellectually risky and innovative submissions, and the journal's IF can still benefit massively if some of them pay off. Such an ulterior motive-where the editor is conscious of the journal's size while assessing an individual paper at hand-makes it even harder for transformative papers to appear in elite journals.
The reliability of IF rankings (and citation averages in general) is compromised by the high IF volatility to a handful of papers, observed for hundreds (if not thousands) of journals each year. Three examples: (a) In 2017, the top cited paper of 381 journals raised their IF by 0.5; (b) 818 journals had their IF boosted by more than 25% by their top cited paper; and (c) one in ten journals (1292 journals) had their IF boosted by more than 50% by their top three cited papers.
So, the volatility of IFs is not of academic but of practical interest. It is not an exclusive feature of a few journals or a statistical anomaly that we can casually brush off, but an everyday feature inherent in citation averages, affecting thousands of journals each year. It casts serious doubt on the suitability of the IF as a journal defining quantity, and on the merits of ranking journals by IF. And it is a direct consequence of the Central Limit Theorem.
It is therefore prudent to consider novel ways of comparing journals based on more solid statistical grounds. The implications may reach much further than producing ranked lists aimed at librarians-which was the initial objective of Eugene Garfield when he proposed the IF-and affect research assessment and the careers or scientists.

What to do?
Many alternatives to the IF have been proposed to date. Here we share our own recommendations for how to remedy the problem, along three lines of thought.
A median shows the mid-point or "center" of the distribution. When statisticians wish to describe the typical value of a skewed distribution, they normally report the median (De Veaux, 2014), together with the interquartile distance (the distance between the 1st and 3rd quartile) as a measure of the spread. Note that citation distributions are typically highly skewed, which makes the use of medians more suitable for their description. Citation medians are far less sensitive to outlier papers and much less susceptible to gaming than citation averages (IFs). On a practical note, as of 2017, the JCR of Clarivate Analytics list the citation median per article type (research article and review article) for each journal, facilitating the wide dissemination of medians. (Cautionary note: On more occasions than we would have liked, the JCR citation medians contained errors in article type that needed correction before use.) 2. Use standardized averages to remove the scale dependence from 'bare' citation averages.
A 'bare' average is prone to fluctuations from outliers, but the Central Limit Theorem allows us to standardize it and remove the scale dependence. So, instead of the 'bare' citation average (or IF), f , we have proposed (Antonoyiannakis, 2018) the standardized average, or Φ index: where µ s , σ s are the global average and standard deviation of the citation distribution of all papers in the subject of the journal in question. The quantities µ s and σ s need to be found for each research subject before a journal's Φ index can be calculated. For example, if we were to treat, for simplicity, all 3,088,511 papers published in all journals in 2015-2016 as belonging to a single subject, then µ = 2.92, σ = 8.12, and we can use Eq. (9) to standardize the citation average of any journal. Here, f and N 2Y are the journal's citation average (IF) and biennial size, as usual. The Φ index is readily applicable to all citation averages, for instance in university rankings. More details will be provided in a forthcoming publication.
3. Resist the one-size-fits-all mindset, i.e., the limitations of a single metric. Think of scholarly journals as distributions of widely varying papers, and describe them as such. In line with this thinking, Larivière et al. (2016) suggested that journals display their full citation distribution, a recommendation adopted by several publishers so far. In a welcome development, the Clarivate Analytics JCR now display citation distributions for all journals that receive an Impact Factor. However, plots of citation distributions can be overwhelming in practice (too much information) and do not allow easy comparison across journals. So, again we turn to statistical practice and ask how statisticians describe distributions. Typically, they use a 5-number summary of various percentiles, which is graphically displayed as a box plot and includes outlier information (De Veaux, 2014;Krzywinski & Altman, 2014;Spitzer et al., 2014). We believe that use of box plots and, more generally, percentiles , leads to responsible, informative, and practical comparisons of citation impact across journals and other collections of papers.

Funding
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Disclaimer
The author is an Associate Editor & Bibliostatistics Analyst at the American Physical Society. These opinions are his own.

Note
This is an extended version of an article (Antonoyiannakis, 2019) presented at the 17th International Conference of the International Society for Scientometrics and Informetrics, Rome, Italy.