In my previous post, I demonstrated how the citation distribution for a given journal could be fit to a function defined as the difference of two exponentials. This function is characterized by two parameters, k_{1} and k_{2}. I showed how these two parameters could be used to derive the journal impact factor (JIF). However, for a variety of purposes, it will be useful to generate an approximate citation distribution given the JIF value, that is, to solve the inverse problem. This should be possible if we can discern relationships between the JIF value and the parameters k_{1} and k_{2}.

### The relationships between k_{1} and k_{2} and JIF

First, let us plot k_{1} values versus the JIF values calculated from the fits to the observed distributions. I chose to use the calculated JIF values rather than the actual JIF values because the latter are distorted by the same number of papers with more that 100 citations and these lie off the distribution.

As might be anticipated, the relationship between k_{1} and the JIF value is approximately exponential. This can be confirmed by plotting k_{1} versus the logarithm of the JIF value. The results can be fit to a line.

Similarly, k_{2} is also approximately related to the logarithm of the JIF value, although there is somewhat more scatter.

Using these two linear fits, we can estimate the values of k_{1} and k_{2} given a value for the JIF.

To ensure that the deduced values of k_{1} and k_{2} will generate the same value of the JIF using the formula from the previous post, we can adjust the values of k_{1} and k_{2} slightly by a constant value. Thus, we need to find delta(k) such that k_{1’} = k_{1} + delta(k) and k_{2’} = k_{2} + delta(k) such that JIF = (k_{1’} + k_{2’})/k_{1’}k_{2’}. The derivation for the optimal value of delta(k) is shown in the mathematical appendix (see note below). For JIF values from 2 to 10, this correction averages 24% for k_{1} and 4% for k_{2}.

### Estimating a citation distribution from a JIF value

Consider the case of *EMBO Journal*, which has a reported JIF of 9.6 and a JIF calculated from the fit values of k_{1} and k_{2} of 9.0. The observed distribution is compared with the distribution predicted from the calculated JIF value below:

The agreement appears reasonable. Similar results are observed for other journals (not shown). Thus, we have developed a tool with which can estimate the distribution of citations given only a JIF.

### Next post

In my next post, I will use this tool to address a key question:

#### If you select one paper randomly from a distribution associated with a journal impact factor JIF_1 and another paper randomly from a distribution associated with a journal impact factor JIF_2, what is the probability that the first paper has more citations than the second paper?

### Available code and documents

The R Markdown file that generates this post including the R code is available. The data from Larivière *et al.* is provided as a .csv file. A mathematical appendix showing the derivation of a key formula is also available.

Pingback: Comparing individual papers from journals with different journal impact factors | Sciencehound()