Although our analysis focused on PIs from one institute funded in a single year, NIH Deputy Director for Extramural Research Mike Lauer and his colleagues have extended the analysis across NIH over a much longer period of time. This group recently posted their analysis, including all of the underlying data, on bioRxiv. Publicly sharing this data set is a very good practice that allows others to examine the results more thoroughly and to extend the analysis.

A key graph from this analysis, and one that has attracted much attention, is shown below:

The graph shows a curve fit to data for research productivity versus grant support using recently developed measures for these parameters. Annual grant support is measured by the Grant Support Index (GSI). This measure was developed as an alternative to funding level in dollars in an attempt to show that some types of research are more expensive than others. The GSI assigns point values to each grant type, with 7 points for an R01 grant with a single PI, 5 points for a more limited R21 grant with a single PI, and so on. Research productivity is measured with the Relative Citation Ratio (RCR), a metric based on citations, developed to correct for differences in citation behavior between fields. Both metrics are presented on logarithmic scales in this graph.

The most noteworthy aspect of this curve is that it rises with a steeper slope at lower values of GSI than it does at higher levels. This suggests that, on average, the increase in productivity associated with funding an additional grant to an already well-funded investigator would be less than that for providing a grant to an investigator with no funding or providing a second grant to an investigator with only a modest amount of funding. The separation between this observed curve and a hypothetical straight line (with productivity strictly proportional to research support) has been referred to as “unrealized productivity.”

Before delving further into this point, let us take advantage of the data that were made available to plot the relationship, with two changes. First, rather than just plotting the curve fit to the data, we show the data points themselves (for all 71,936 investigators used in the analysis). Second, we plot the data with linear rather than logarithmic scales to avoid any distortion associated with this transformation. The results are shown below, with the top graph showing all of the data points and the bottom graph enlarging the region that includes almost all investigators and showing a “spline” curve fit to these data along with a linear fit for comparison.

These plots reveal that the underlying data show a large amount of scatter, consistent with my earlier observations with the NIGMS-only data set as well as with the intuitive sense that laboratories with similar amounts of funding can vary substantially in their output. The curve fit to these data again reveals that the slope of the productivity versus grant support relationship decreases somewhat at higher levels of grant support.

With these observations in hand, we can now examine some expected results of proposed NIH policies. Suppose an investigator with an annual GSI of 28 (corresponding to four R01 grants) is reduced to an annual GSI of 21 (corresponding to three R01 grants) and that these resources are used to fund a previously unfunded investigator (to move to GSI = 7). According to the fit curve, the expected annual weighted RCR values are 9.0 for GSI = 28, 7.1 for GSI = 21, and 2.6 for GSI = 7. The anticipated change in annual weighted RCR is (–9.0 + 7.1 – 0 + 2.6) = 0.7. Thus, the transfer of funding is predicted to increase productivity (measured by weighted RCR). This appears to be one of the primary foundations for the proposed NIH policy.

This approach depends on the accuracy of the fitted curve in representing the behavior of the population which, as noted, shows considerable scatter. An alternative method involves directly simulating the effects of the proposed policy on the population. For example, one can take the 968 investigators with annual GSI values over 21 and reduce them to annual GSI values of 21, scaling each investigator’s weighted RCR output by the reduction in annual GSI. The total number of annual GSI points over the threshold of 21 for these investigators is 4709. This corresponds to the ability to fund an additional 672 R01 grants. If these grants are distributed to previously unfunded investigators, the anticipated weighted RCR output can be estimated by choosing a random set of 672 investigators with annual GSI values near 7 (say 6 to 8). Because of this random element, this simulation can be repeated many times to generate a population of anticipated outcomes. This results in the distribution shown below, with an average increase in weighted RCR of 0.3.

For most simulations, there is an increase in average weighted RCR, with the average being somewhat less than that anticipated from the analysis based on the fit curve alone (0.3 versus 0.7). There are several possible explanations for this difference, including limitations in the fit curve to capture the features of the highly scattered distribution and the approach to modeling the reduction in the anticipated output from the well-funded investigators.

The same simulation method can be applied to funding an additional 672 PIs with one R01 so that they each have two R01s by selecting 672 random PIs with an annual GSI of ~7 (6 to 8), removing them from the population, and adding 672 chosen from the population with an annual GSI of ~14 (13 to 15). The results are shown below, with an average increase in weighted RCR of 0.4.

These simulations appear to confirm that, on average, transferring funding from very well-funded PIs to less well-funded PIs may result in a small increase in weighted RCR output.

**Conclusions**

I strongly favor examination of such appropriate data to guide policy development. Understanding the relationships between grant support and research output is, of course, one of the most fundamental questions for any funding agency. The attempts by NIH to tackle this issue are laudable. However, as I discussed above, the presentation of simple curves fit to the data masks the considerable variation in output for PIs at all levels of funding. The development of policies based on a hard cap at a particular level of GSI seems to me to be problematic. Well-funded investigators always have substantial histories of research accomplishments. NIH program officers and advisory councils should have access to data about previous research accomplishments and productivity when making recommendations about potentially funding additional grants and should be encouraged to examine such data critically, even when the application under consideration has an outstanding peer review score. The opportunity costs for providing additional funding to an already well-funded PI at the expense of an early or mid-career PI with less funding are considerable. In addition, the use of a hard cap amplifies the importance of the details of how the GSI is calculated, with the selection of particular parameters potentially discouraging collaboration, training, and other desirable outputs, as has been the topic of ongoing discussions. It seems unwise to convert the highly nuanced information contained in lists of grant support and publications and other outputs into points on a graph rather than empowering the trained scientists who serve as program officials and advisory council members to use their judgment to help fulfill the mission of the NIH.

]]>Note that this is a preprint that has not been peer reviewed. Nonetheless, the result is straightforward, and the remarkably good fit of this potentially complicated data set to a simple function has several important implications. First, the fit enables a simple forecasting approach. In this case, the forecast is that there will be over 300,000 opioid overdose deaths in the 5-year period from 2016 to 2020 across the United States. Such a high number highlights the importance of developing effective strategies for addressing the epidemic and slowing its exponential growth. Second, the observation of deviations from exponential behavior such as the acceleration from 2002 to 2006 provides clues about the changes that might be driving the growth of the epidemic.

This new result once again illustrates the importance of data in public health, as discussed in my recent editorial.

]]>

The project focused on Research Reports, Research Articles, and Reviews published in 2015. The primary goal was to compare the percentages of women among individuals whose papers were published compared with those from papers that were submitted during the same period but were not selected for publication. A major challenge in performing these analyses is determining the gender of the authors in question because such data are not collected as part of the submission process. The genders of authors were assigned through individual Internet searches.

The project focused on the authors in the first position and the last position in each author list. The individuals who were listed first and last in the author list were classified (again, based on Internet searches) as to whether they appeared to be in established positions (faculty or similar positions), hereafter referred to as “senior authors,” or if they were graduate students, postdoctoral fellows, or in similar temporary roles, hereafter referred to as “junior authors.” For the purposes of this analysis, senior authors in the first or last position were included, as were junior authors in the first position. Using this approach, 862 senior authors and 471 junior authors were identified and used in the subsequent analysis.

For comparison, a group of manuscripts from 2015 were randomly chosen from those that had not been selected for publication, to match the balance of Research Reports, Research Articles, and Reviews in the published set. The genders and levels of seniority were determined through Internet searches as described, resulting in 883 senior authors and 434 junior authors for use in subsequent analysis.

Among the published papers, 24.8% (117 out of 471) of the junior authors were women. In the comparison group of manuscripts that were not selected for publication, 30.0% (130 out of 434) of the junior authors were women. Although this suggests a trend disfavoring women authors, the difference has a p value of 0.086, larger than p = 0.05.

For the published papers, 16.8% (145 out of 862) of the senior authors were women, while in the comparison group, the proportion was 14.7% (130 out of 883). Any trend favors women authors, although the difference is quite modest, with a p value of 0.237.

These data can be divided into three components corresponding to Research Reports (a total of 987 authors in the published paper group and 963 authors in the comparison group), Research Articles (228 authors in the published paper group and 206 authors in the comparison group), and Reviews (118 authors in the published paper group and 148 authors in the control group). In each of these sets, the same trends are observed as for the authors overall. The results are summarized below:

This preliminary study reveals two major sets of findings. First, as discussed above, the data do not reveal that the review and editorial processes at *Science* introduce substantial gender disparities. Second, the percentages of women among authors submitting to and published in *Science* are relatively low, ~27% for junior authors and 16% for senior authors. To put these values in context, I examined data from the United States National Science Foundation regarding the percentages of women in faculty positions and enrolled in graduate school. In 2010, the percentages of women in all faculty positions were 21% in the physical sciences, 42% in the life sciences, and 39% in the social sciences, whereas the percentages of women in senior faculty positions (associate professor and above) were 16% in the physical sciences, 34% in the life sciences, and 33% in the social sciences. In 2011, the percentages of women enrolled in graduate school were 33% in the physical sciences, 57% in the biological sciences, and 60% in the social sciences.

For papers submitted to *Science*, we estimate that ~40% are in the physical sciences, 55% are in the biological sciences, and 5% are in the social sciences. Using these as weighting factors, the anticipated percentage of women in all faculty positions submitting to *Science* would be (0.40)(21%) + (0.55)(42%) + 0.05(39%) = 33%. Similarly, for women in senior faculty positions, the anticipated percentage of women is 27% and the anticipated percentage of women among graduate students is 47%. In all cases, the percentages of women who submitted to *Science* are lower than these estimates. The estimates certainly could be inaccurate given that they are based on many assumptions that could influence the results, including assumptions about career stage, the use of data from the United States only despite the international authorship of *Science*, differences in the institutions in the general and authorship pools, and so on. Refinement of these estimates may reveal the sources of some aspects of the gender disparity, which could help guide additional analyses and, eventually, policy suggestions.

**Contracting the initial corpus of abstracts from Science**

To build the corpus, abstracts for more than 2200 research papers from *Science* from 2013 through 2015 were assembled. The first task is to read in the data set and convert the data into a corpus for analysis.

**Calculating similarities between all pairs of abstracts**

The next step is to calculate the similarity matrix. This can be done using TF-IDF (term frequency–inverse document frequency) weighting. This weights terms that occurred rarely in the corpus more highly than common terms. The similarity index is the so-called “cosine similarity.” It will be necessary to convert this to a distance metric subsequently.

With this similarity matrix in hand, we calculate distances using the formula distance = 2*arccos(similarity)/pi.

**Representing the relative distances in two dimensions**

Finally, these distances are projected to two dimensions using this method based on multidimensional scaling, also known as principal coordinate analysis.

The results can then be plotted.

This plot reveals an interesting three-pointed structure. Note that only the shape of this figure is meaningful; the orientation is arbitrary. Examination of the abstracts that correspond to the three points reveals that these correspond to biomedical sciences, physical sciences, and Earth sciences.

**Extending the analysis to the other Science family journals**

With this framework in place, we can now expand the corpus to include papers published in the other *Science* family journals. For this purpose, we will use most of the papers published in *Science Advances*, *Science Signaling*, *Science Translational Medicine*, *Science Immunology*, and *Science Robotics* in 2016.

We now plot each journal separately.

The orientations of these figures are slightly different from that produced by the initial *Science*-only corpus but, as noted above, this orientation is arbitrary. Several points emerge from examining these plots. First, the breadth of disciplines covered by *Science Advances* is essentially the same as that covered by *Science*. Comparison with more papers from *Science Advances* may reveal differences in emphasis between these two broad journals. Second, the papers from *Science Signaling*, *Science Translational Medicine*, and *Science Immunology* lie in the same general region in the biomedical arm of the plot. More detailed analysis should reveal more nuanced differences between the content of these journals.

This analysis represents a first step toward using these tools for unbiased analysis of the contents of the *Science* family of journals. More refined analysis is in progress.

**Additional documents and code**

The abstracts used in this analysis are available in six .csv files. The R Markdown file that generates this post including the analysis is also available.

]]>We now turn to the second component, a model for the number of grant applications submitted and reviewed. The number of NIH research project grants reviewed each year from 1990 to 2015 is plotted below:

This curve is somewhat reminiscent of the curve for the NIH appropriation as a function of time shown in an earlier post. The drop in the number of applications that occurs in 2008–2009 is an artifact due to the effects of the American Recovery and Reinvestment Act (ARRA). The funding associated with the ARRA was not included in the appropriations data, and applications that were considered for ARRA funding were also removed.

The grant application number and appropriation curves are compared directly below, plotted as fractional changes since 1990 to facilitate comparison.

The curves are similar in shape, although the increase in the NIH appropriation curve is larger by approximately a factor of 2 than is the grant application number curve. The curves, normalized so that they have the same overall height, are compared below.

Examination of the curves reveals that the grant application number curve is shifted to later years by ~2 years compared with the NIH appropriation curve. This makes mechanistic sense in that a relatively large increase in the NIH appropriation might cause institutions to hire more faculty who then apply for grants and might cause individual investigators to submit more applications. However, these responses do not take place instantaneously but require a year or more for the applications to be written and submitted.

A linear model can now be fit to predict the grant application number curve as a linear combination of the appropriation curves shifted by 1 and 2 years, including a constant term.

The number of grant applications can be calculated from the appropriation curves by m_{1}(appropriation-1 year offset)* + *m_{2}(appropriation-2 year offset) + b, where m_{1} = –0.18, m_{2} = 0.61, and b = 0.57.

The agreement is reasonable. The major differences occur in years 2008–2009 due to the impact of ARRA noted above. The overall Pearson correlation coefficient is 0.983.

A model has been developed that allows the prediction of the number of NIH grant applications from the appropriations history. This model can be used in conjuction with the previously described model for the number of grants awarded to predict grant success rates, for actual appropriation histories or for hypothetical ones.

The grant application number model was developed empirically, based on observed similarities between the grant application number curve and the appropriation curve. Although it is not truly mechanism-based, the model is consistent with a simple mechanistic interpretation as noted. It is interesting that grant application numbers increased more or less monotonically. Thus, it would have been difficult to develop a model from the inflation-corrected appropriation curve because this peaked in 2003 and has been falling almost every year since. This raises an interesting point. Application numbers have gone up when the appropriation increases by more than inflation or when the appropriation increase is less than inflation. This could be interpreted in terms of two dynamic drivers. When the appropriation increases by more than inflation, institutions and investigators sense opportunity and submit more applications; when the appropriation increases by less than inflation, institutions and investigators sense tough times with lower success rates and submit more applications to increase their chances of competing successfully for funding.

It will be interesting to compare how well this empirical model does in predicting grant application numbers in future years.

An R Markdown file that generates this post, including the R code, is available.

To begin, I examine the appropriations history for NIH from 1990 to the present. NIH appropriations from 1990 to 2015 are shown below:

For comparison, success rates for grants (RPGs) are shown below:

These two parameters are negatively correlated with a correlation coefficient of -0.66. In other words, as the size of the appropriation increased, the success rate tended to decrease.

One possible adjustment that might improve the correlation involves correcting the appropriation data for inflation. Inflation is best measured in terms of the Biomedical Research and Development Price Index (BRDPI), a parameter calculated annually by the Department of Commerce on behalf of the NIH.

The NIH appropriation curves in nominal terms and in constant 1990 dollars are plotted below:

The constant dollar appropriations and success rate data are still negatively correlated with a correlation coefficient of -0.381.

Thus, the simple notion that the success rate should increase with increases in the NIH appropriation is empirically false over time.

There are two reasons why this is true. The first involves the manner in which NIH grants are funded. Grants average 4 years in duration which are almost always paid out in 4 consecutive fiscal years. Thus, if a 4-year grant is funded in a given fiscal year, the NIH is committed to paying the “out-years” for this grant over the next 3 fiscal years. Because of this, ~75% (actually more than 80% due to other commitments) of the NIH appropriation for a given year is already committed to ongoing projects, and only less than 20% of the appropriation is available for new and competing projects. This makes the size of the pool for new and competing projects very sensitive to the year-to-year change in the appropriation level.

The observed numbers of new and competing grants are plotted below:

To put these effects in quantitative terms, a model has been developed to estimate the number of grants funded each year, given NIH appropriation and BRDPI data over time.

The assumptions used in building the model are:

- NIH funds grants with an average length of 4.0 years.

For the purposes of this model, we will assume 1/4 of the grants have a duration of 3 years, 1/2 of the grants have a duration of 4 years, and 1/4 of the grants have a duration of 5 years. Using a single pool of grants, all of which have 4-year durations, is both contrary to fact and would likely lead to artifacts. It is unlikely that the model will depend significantly on the details of the distribution. When a grant completes its last year, the funds are freed up to fund new and competing grants in the next year.

- The average grant size increases according to the BRDPI on a year-to-year basis.

This assumption has been less true in recent years owing to highly constrained NIH budgets, but this it a reasonable approximation (and still represents good practice).

- Fifty percent of the overall NIH appropriation each year is invested in RPGs. This is consistent with the average percentage of RPG investments over time.
- The system begins with an equal distribution of grants at each stage (first, second, … year of a multiyear grant) ~10 years before the portion used for analysis.

We will start in 1990. The comparison between the actual numbers of grants funding and those predicted by the model are shown below:

The agreement between the observed and predicted curves is remarkable. The correlation coefficient is 0.894.

The largest difference between the curves occurs at the beginning of the doubling period (1998-2003) where the model predicts a large increase in the number of grants that was not observed. This is due to the fact that NIH initiated a number of larger non–RPG-based programs when substantial new funding was available rather than simply funding more RPGs (although they did this to some extent). For example, in 1998, NIH invested $17 million through the Specialized Center–Cooperative Agreements (U54) mechanism. This grew to $146 million in 1999, $188 million in 2000, $298 million in 2001, $336 million in 2002, and $396 million in 2003. Note that the change each year matters for the number of new and competing grants that can be made because, for a given year, it does not matter whether funds have been previously committed to RPGs or to other mechanisms.

The second substantial difference occurs in 2013 when the budget sequestration led to a substantial drop in the NIH appropriation. To avoid having the number of RPGs that could be funded drop too precipitously, NIH cut noncompeting grants substantially. Noncompeting grants are grants for which commitments have been made and the awarding of a grant depends only on the submission of an acceptable progress report. The average size [in terms of total costs, that is, direct costs as well as indirect (facilities and administration) costs] of a noncompeting R01 grant was $393,000 in 2011, grew to $405,000 in 2012, a 2.9% increase, and then dropped to $392,000 in 2013, a 3.3% drop. Given that there are approximately three times as many noncompeting grants as there are new and competing grants, this change from a 2.9% increase to a 3.3% decrease for noncompeting grants increased the pool of funds for new and competing grants by ~3(2.9 + 3.3) = 18.6%. However, cutting noncompeting grants means that existing programs with research underway and staff in place had to find ways for dealing with unexpected budget cuts.

At this point, I have developed a reasonable model for estimating the number of new and competing awards that can be made given annual appropriation and BRDPI data. In the next post, I will examine a model for the number of grant applications submitted each year.

The R Markdown file that generates this post, including the R code, is available.

A substantial portion of fundamental scientific research is government-supported (see my editorial in *Science*). An important factor that affects the experiences of individual investigators and the efficiency of government funding programs is the likelihood that a given scientific application receives funding. Although this can seem to be a simple parameter, it depends on the systems properties of the scientific enterprise such as the fraction of appropriated funds available for new grants and the number of investigators competing for funding, both of which evolve over time.

Here I introduce a modeling approach that links historical data about funds appropriated to agencies for grants to the probabilities that grant applications are funded in a given year. I will initially focus on the US National Institutes of Health (NIH) because the relevant data are readily available and changes in the funds appropriated over time have varied substantially, introducing some important behavior that can, at first, seem hard to understand.

The levels of funds appropriated for NIH from 1990 to 2015 are shown below:

These data are in nominal dollars—that is, they are not corrected for the effects of inflation. The NIH budget was doubled from 1998 to 2003 in a coordinated effort by Congress, and these points are shown in red. Note that these data and the subsequent analysis exclude funds and applications associated with the American Recovery and Reinvestment Act, which affected budgets in 2009–2010.

One of the factors that has a great influence on the research and science policy communities, both directly and culturally, is the likelihood that a given grant proposal will be funded, usually measured in terms of the “success rate” or “funding rate.” The success rate is the number of new and competing grant applications that are awarded in one fiscal year divided by the total number of grant applications that were reviewed in that year.

Here the term “new and competing” grants refers to grants that have not been funded previously (new grants) and grants that have been funded for one or more multiyear terms but now are competing through full peer review prior to receiving additional funding (competing renewal grants).

Two major factors determine the success rate. The first is the amount of funding available for new and competing grants, as opposed to the overall annual appropriation. This, combined with the average grant size, determines the number of new and competing grants that can be awarded—that is, the numerator in the success rate calculation. The second is the number of grant applications that are submitted and reviewed in a given year. This is determined by the number of investigators that are submitting grant applications and the average number of applications submitted in a given year per investigator. This is the denominator of the success rate calculation.

Success rate data for Research Project Grants (RPGs) for NIH for 1990 to 2015 are shown below:

Note that the success rate fell dramatically immediately after the doubling and continued to fall for several additional years. This led to outcries from the research community and consternation from Congress because they had made funding biomedical research a high priority for a number of years.

Why did this dramatic drop in success rate occur? A major factor involves the manner in which NIH research project grants are funded. NIH grants average 4 years in duration, which are almost always paid out in four consecutive fiscal years. Thus, if a 4-year grant is funded in a given fiscal year, the NIH is committed to paying the out-years for this grant over the next three fiscal years. Because of this, ~75% (actually closer to 80% or more because of other commitments) of the NIH appropriation for a given year is already committed to ongoing projects and only ~20% of the appropriation is available for new and competing projects. This makes the size of the pool for new and competing projects very sensitive to the year-to-year change in the appropriation level.

Funds from grants that have ended are recycled to fund new and competing grants. This recycling is shown schematically below:

To put these effects in quantitative terms, I developed a model for the number of new and competing grants. This model will be described in detail in a subsequent post. Briefly, the model is based on the assumptions that NIH funds grants with an average length of 4.0 years with 1/4 of the grants with a duration of 3 years, 1/2 of the grants with a duration of 4 years, and 1/4 of the grants with a duration of 5 years and that the average grant size increases annually according to the rate of biomedical research price inflation.

This model is combined with a model for the number of research project grant applications that are reviewed annually. The basis for this latter model is that the number of applications submitted rises in response to increases in the NIH appropriation with a lag of about 2 years. This model will also be described in a subsequent post.

The success rates predicted from the model are compared with the observed success rates below:

The agreement is reasonable, although certainly not perfect. The overall Pearson correlation coefficient is 0.866. However, the model does accurately predict the sharp drop in the success rate immediately following in the doubling period. In addition, because the model assumes constant policies at NIH, the areas where the model results do not agree as well with the observed values suggest time periods where NIH did change policies in response to ongoing events. This will be explored in a subsequent blog post.

Several parameters can also be examined to characterize this and other funding scenarios. The first is the total amount of funds invested, both in nominal and constant dollars.

The investment in nominal dollars is 557 billion.

The investment in constant 1990 dollars was 334 billion.

The observed mean success rate was 0.248.

The mean success rate predicted from the model was 0.251.

The observed standard deviation in the success rate was 0.052.

The standard deviation in the success rate predicted from the model was 0.063.

Suppose that, instead of the doubling, Congress had committed to steady increases in the NIH appropriation beginning in 1998. To match the investment in constant 1990 dollars from the doubling and postdoubling era, this corresponds to annual increases of 7.55%.

We can now use the modeling tools that we have developed to estimate the consequences of such an appropriations strategy in terms of success rates and other parameters.

As might be anticipated, under the new scenario, the success rates vary much less dramatically. The standard deviation in the success rate predicted from the model for the new scenario was 0.022. This is smaller than the observed standard deviation by a factor of 2.4. Thus, the scenario with steady appropriation increases would decrease the variability in and, hence, the apparent capriciousness of, success rates substantially.

The mean success rate predicted from the appropriation scenario with steady increases from 1998 on was 0.257. This is higher that the mean success rate based on the actual appropriations data by 2.6%.

Although this is a relatively modest change in mean success rate, it corresponds to a decrease in the number of unsuccessful applications from 702,000 under the actual scenario to 667,000 under the new scenario. Thus, the steady approach to funding would have reduced the number of unsuccessful applications by 35,000. With the conservative assumption that preparation of a grant application requires 1 month of work, this difference corresponds to the efforts of 111 investigators working full-time over the entire 26-year period.

A modeling approach has been developed that allows estimation of NIH grant success rates given the history of appropriations. The model is used to demonstrate that alternatives to the “boom” in appropriations corresponding to the NIH budget doubling followed by the “bust” of more than a decade of flat (or falling when the effects of inflation are included) would have resulted in 2.6% more efficient distribution of funds (measured by the number of applications that would be needed to distribute the same amount funds in constant dollars) and less variable success rates by a factor of 2.4. The model can be applied to other potential past or future appropriation scenarios, and the modeling approach can be applied to other agencies.

The next post will explore the component of the model focusing on the number of new and competing grants in more detail.

An R Markdown file that generates this post, including the code for the model, is available.

In the first post in this series, I demonstrated that normalized plots of the number of publications versus the number of citations could be fit to functions of the form:

where c is the number of citations, P(c) is the population of papers with c citations, and N is a normalization factor set so that the integrated total population is 1.

Using the tool developed in the previous post, values for k_{1} and k_{2} can be estimated given only a value for the JIF.

We now turn to the question at hand. Suppose that a paper randomly selected from Journal_1 has x citations. The probability that this paper has more citations than a paper from Journal_2 is shown graphically below:

The blue shaded region represents the fraction of papers in Journal_2 that have x or fewer citations. Note that the citation distribution curves are presented as continuous functions for clarity (although in actuality, the numbers of citations must be integers). The use of continuous functions will also substantially simplify subsequent analysis and is unlikely to change any conclusions significantly.

In mathematical terms, the area of the blue shaded region is given by F(x) = ∫P_{2}(c)dc from 0 to x, where P_{2}(c) is the normalized citation curve for Journal_2, that is, P_{2}(c) = N_{2}(exp(-k_{12}c) + exp(-k_{22}c)) with k_{12} and k_{22} determined from JIF_2 as described in the previous post.

The integral can be readily solved analytically as shown in the mathematical appendix. In this way, it can be shown that the fraction of papers from Journal_2 with x or fewer citations is given by

F(x) = N_{2}((1/k_{12})(1 – exp(-k_{12}x)) – (1/k_{22})(1 – exp(-k_{22}x))).

This function is shown graphically below:

This fraction curve has the expected shape. For small values of x, only a small fraction of papers in the other journal have x or fewer citations. For larger values of x, this fraction increases. Finally, for the largest values of x, the fraction approaches 1.00, that is, almost all papers in Journal_2 have x or fewer citations.

The probability that a paper with x citations is randomly chosen from Journal_1 is given by P_{1}(x) = N_{1}(exp(-k_{11}x) + exp(-k_{21}x)). Thus, to answer our question, we need only calculate the average of the fraction curve, F(x), weighted by the probability of different values of x. This is given by the integral Probability(JIF_1, JIF_2) = ∫P1(x)F(x) dx from 0 to infinity.

The functions to be integrated, P_{1}(x)F(x), are plotted below for JIF_2 = 3 and JIF_1 = 1, 5, 10, and 20.

The areas under these curves can be estimated. From the graph, the anticipated results are relatively clear. For JIF_1 = 1, the area is relatively small so that the probability is relatively low. For JIF_1 = 3, the area increases. Because JIF_1 = JIF_2 at this point, the area should be 0.50, that is, the probability that the number of citations from a paper from one journal is less than that for the other should be 50%. The areas continue to increase for JIF_1 = 5 through JIF_1 = 20, approaching 1.00.

The expression can be integrated analytically in a straightforward way (as shown in the mathematical appendix), although the algebra is a bit involved. Thus, the desired probability (the integrated value) is given by

Probability(JIF_1, JIF_2) = N_{1}N_{2}((1/k_{12})((1/k_{11}) – (1/k_{21}) – (1/(k_{11} + k_{12})) + (1/(k_{21} + k_{12}))) – (1/k_{22})((1/k_{11}) – (1/k_{21}) – (1/(k_{11} + k_{22})) + (1/(k_{21} + k_{22})))).

This function is plotted below for JIF_2 = 3 with JIF_1 values ranging from 1 to 30, with the values corresponding to the plot above indicated.

The curve represents that answer to our question for a journal with a JIF of 3. For example, a paper selected randomly from a journal with a JIF of 5 would be expected to have more citations than a randomly selected paper from a journal with a JIF of 3 only 65% of the time. This is only twofold different from that expected if the difference was completely random, illustrating the lack of justification for interpreting small (or even fairly large) differences in JIFs when judging individual papers.

To amplify this further, the analogous plot for JIF_2 = 10 is shown below:

This plot demonstrates that a paper randomly selected from a journal with JIF = 10 will have more citations than a randomly selected paper from a journal with JIF = 5 approximately 30% of the time. Similarly, a paper randomly selected from a journal with JIF = 10 will have fewer citations than a randomly paper from a journal with JIF = 20 approximately 75% of the time. These modest differences in probability for doubling JIFs highlight the folly of interpreting small differences in JIFs that are sometimes reported with three decimal places. From a scientific perspective, such false precision is utterly inappropriate. I hope that this analysis, along with the many other extant criticisms of the use and abuse of JIFs, will encourage scientists and administrators to use JIFs only in contexts for which they are appropriate.

The R Markdown file that generates this post including the R code is available. The parameters from the linear fits from the previous post are available as a .csv file. A mathematical appendix showing the derivation of key formulae is also available.

First, let us plot k_{1} values versus the JIF values calculated from the fits to the observed distributions. I chose to use the calculated JIF values rather than the actual JIF values because the latter are distorted by the same number of papers with more that 100 citations and these lie off the distribution.

As might be anticipated, the relationship between k_{1} and the JIF value is approximately exponential. This can be confirmed by plotting k_{1} versus the logarithm of the JIF value. The results can be fit to a line.

Similarly, k_{2} is also approximately related to the logarithm of the JIF value, although there is somewhat more scatter.

Using these two linear fits, we can estimate the values of k_{1} and k_{2} given a value for the JIF.

To ensure that the deduced values of k_{1} and k_{2} will generate the same value of the JIF using the formula from the previous post, we can adjust the values of k_{1} and k_{2} slightly by a constant value. Thus, we need to find delta(k) such that k_{1’} = k_{1} + delta(k) and k_{2’} = k_{2} + delta(k) such that JIF = (k_{1’} + k_{2’})/k_{1’}k_{2’}. The derivation for the optimal value of delta(k) is shown in the mathematical appendix (see note below). For JIF values from 2 to 10, this correction averages 24% for k_{1} and 4% for k_{2}.

Consider the case of *EMBO Journal*, which has a reported JIF of 9.6 and a JIF calculated from the fit values of k_{1} and k_{2} of 9.0. The observed distribution is compared with the distribution predicted from the calculated JIF value below:

The agreement appears reasonable. Similar results are observed for other journals (not shown). Thus, we have developed a tool with which can estimate the distribution of citations given only a JIF.

In my next post, I will use this tool to address a key question:

The R Markdown file that generates this post including the R code is available. The data from Larivière *et al.* is provided as a .csv file. A mathematical appendix showing the derivation of a key formula is also available.

Welcome to my new blog at *Science*. I began blogging when I was Director of the National Institute of General Medical Sciences (NIGMS) at the US National Institutes of Health (NIH). Our blog was called the *NIGMS Feedback Loop*. I found this to be a very effective way of sharing information and data with NIGMS stakeholders. A couple of years after leaving NIGMS, I started a new blog called *Datahound*. There, I have continued sharing data and analyses about programs of interest to the scientific community. I greatly appreciated those who took the time to comment, providing feedback and sometimes raising important questions. I am starting *Sciencehound* with the same intent, providing data and analyses and, importantly, initiating discussions with the readers of *Science* and the *Science* family of journals. Enjoy and join in!

Journal impact factors are used as metrics for the quality of academic journals. In addition, they are (ab)used as metrics for individual publications or individual scientists (see my editorial in *Science*). The journal impact factor is defined as the average number of times articles published in a given journal over the past 2 years are cited in a given year. This average is derived from a relatively broad distribution of publications with different numbers of citations. Recently, Larivière *et al.* posted on BioRxiv a proposal recommending sharing these full distributions . This manuscript includes 2015 distributions for 11 journals (in a readily downloadable format). The distribution for *Science* magazine is shown below:

Note that the point at 100 represents the sum of the numbers of all papers that received 100 or more citations.

This curve rises quickly and then falls more slowly. As a chemist, this reminded me of the curves representing the concentration of an intermediate B in a reaction of the form

The concentration of B rises when A is converted to B and then falls when B is transformed into C.

Solving equations for the kinetics of this scheme results in a function that is the difference between two exponential functions with negative exponents, that is,

Here, c is the number of citations, P(c) is the population of papers with c citations, k_{1 }and k_{2} are adjustable constants, and N is a scale factor. The curve rises with an initial slope proportional to (1/k_{2} – 1/k_{1}) and falls expontially approximately as exp(-k_{1}c).

Before fitting the citation curve to this function, we first normalize the curve so that the area under the curve is 1.0 and the *y-axis* is the fraction of the number of total papers.

This normalized curve can now be fit to the difference of exponential functions. It is easy to show that the normalization constant for the difference of exponential functions is N = k_{1}k_{2}/(k_{2} – k_{1}) (see mathematical appendix).

The best fit occurs with k_{1} = 0.05 and k_{2} = 0.19.

The apparent journal impact factor can be calculated from these parameters (See mathematical appendix). It can be shown that the journal impact factor (JIF) is:

The calculated JIF = 25.3.

Note that this value is smaller that the journal impact factor that is reported (34.7). This is because highly cited papers (with more than 100 citations) have a substantial effect on the journal impact factor but are not well fit by the difference of exponential functions.

With this fitting protocol in place we can now fit the distributions for the other 10 journals.

The best fit occurs with k_{1} = 0.07 and k_{2} = 0.08.

The calculated JIF = 26.8.

The best fit occurs with k_{1} = 0.16 and k_{2} = 0.65.

The calculated JIF = 7.8.

The best fit occurs with k_{1} = 0.31 and k_{2} = 2.

The calculated JIF = 3.7.

The best fit occurs with k_{1} = 0.16 and k_{2} = 0.57.

The calculated JIF = 8.0.

The best fit occurs with k_{1} = 0.18 and k_{2} = 0.92.

The calculated JIF = 6.6.

The best fit occurs with k_{1} = 0.13 and k_{2} = 0.66.

The calculated JIF = 9.2.

The best fit occurs with k_{1} = 0.16 and k_{2} = 0.37.

The calculated JIF = 9.0.

The best fit occurs with k_{1} = 0.24 and k_{2} = 1.42.

The calculated JIF = 4.9.

The best fit occurs with k_{1} = 0.32 and k_{2} = 2.

The calculated JIF = 3.6.

The best fit occurs with k_{1} = 0.22 and k_{2} = 2.

The calculated JIF = 5.0.

The calculated journal impact factors are well correlated with the observed values as shown below:

A line with slope 1 is shown for comparison. The overall Pearson correlation coefficient is 0.999. Fitting all 11 data points to a line through the origin yields a slope of 0.746. The fact that this slope is substantially less than 1 is largely driven by the values for *Science* and *Nature* which, as noted above, are lower than the reported values owing to the elimination of the effect of papers with more than 100 citations. If these two points are eliminated, the slope of a fitted line increases to 0.924.

We have demonstrated that a function formed as the difference of two exponential functions can be used to fit observed distributions of the numbers of papers with different number of citations. Fitting this functional form to data from 11 journals reproduces the curves well and generates journal impact factors that agree well with published values. The largest differences are in journals such as *Science* and *Nature* that have substantial numbers of papers with more than 100 citations over the 2-year period. This emphasizes again how these outlier papers can affect journal impact factor values.

In my next post, I will demonstrate how this model can be refined to produce an algorithm that will generate a unique citation distribution given a journal impact factor. This sets the stage for more interesting analyses.

The R Markdown file that generates this post including the R code for fitting the citation distributions is available. The data from Larivière *et al.* is provided as a .csv file. A mathematical appendix showing the derivation of some key formulae is also available.