In my previous post, I demonstrated how the citation distribution for a given journal could be fit to a function defined as the difference of two exponentials. This function is characterized by two parameters, k1 and k2. I showed how these two parameters could be used to derive the journal impact factor (JIF). However, for a variety of purposes, it will be useful to generate an approximate citation distribution given the JIF value, that is, to solve the inverse problem. This should be possible if we can discern relationships between the JIF value and the parameters k1 and k2.
The relationships between k1 and k2 and JIF
First, let us plot k1 values versus the JIF values calculated from the fits to the observed distributions. I chose to use the calculated JIF values rather than the actual JIF values because the latter are distorted by the same number of papers with more that 100 citations and these lie off the distribution.
As might be anticipated, the relationship between k1 and the JIF value is approximately exponential. This can be confirmed by plotting k1 versus the logarithm of the JIF value. The results can be fit to a line.
Similarly, k2 is also approximately related to the logarithm of the JIF value, although there is somewhat more scatter.
Using these two linear fits, we can estimate the values of k1 and k2 given a value for the JIF.
To ensure that the deduced values of k1 and k2 will generate the same value of the JIF using the formula from the previous post, we can adjust the values of k1 and k2 slightly by a constant value. Thus, we need to find delta(k) such that k1’ = k1 + delta(k) and k2’ = k2 + delta(k) such that JIF = (k1’ + k2’)/k1’k2’. The derivation for the optimal value of delta(k) is shown in the mathematical appendix (see note below). For JIF values from 2 to 10, this correction averages 24% for k1 and 4% for k2.
Estimating a citation distribution from a JIF value
Consider the case of EMBO Journal, which has a reported JIF of 9.6 and a JIF calculated from the fit values of k1 and k2 of 9.0. The observed distribution is compared with the distribution predicted from the calculated JIF value below:
The agreement appears reasonable. Similar results are observed for other journals (not shown). Thus, we have developed a tool with which can estimate the distribution of citations given only a JIF.
In my next post, I will use this tool to address a key question:
If you select one paper randomly from a distribution associated with a journal impact factor JIF_1 and another paper randomly from a distribution associated with a journal impact factor JIF_2, what is the probability that the first paper has more citations than the second paper?
Available code and documents
The R Markdown file that generates this post including the R code is available. The data from Larivière et al. is provided as a .csv file. A mathematical appendix showing the derivation of a key formula is also available.