Skip to Content

Research output as a function of grant support: The scatter matters

The announcement that the U.S. National Institutes of Health (NIH) is considering using the level of extant research funding, measured by the so-called Grant Support Index (previously called the Research Commitment Index), to cap grant support to individual principal investigators (PIs) has triggered much discussion (see, for example, Lauer, Kaiser, Lowe, Crotty, and Future of Research). This proposal is based on extensions of observations that we first made as director of the National Institute of General Medical Sciences (NIGMS) at NIH—namely, that graphs of average research productivity (measured by a variety of metrics) as a function of the amount of research grant support going to individual PIs are not linear but tend to decrease in slope at higher levels of support.

Although our analysis focused on PIs from one institute funded in a single year, NIH Deputy Director for Extramural Research Mike Lauer and his colleagues have extended the analysis across NIH over a much longer period of time. This group recently posted their analysis, including all of the underlying data, on bioRxiv. Publicly sharing this data set is a very good practice that allows others to examine the results more thoroughly and to extend the analysis.

A key graph from this analysis, and one that has attracted much attention, is shown below:

The graph shows a curve fit to data for research productivity versus grant support using recently developed measures for these parameters. Annual grant support is measured by the Grant Support Index (GSI). This measure was developed as an alternative to funding level in dollars in an attempt to show that some types of research are more expensive than others. The GSI assigns point values to each grant type, with 7 points for an R01 grant with a single PI, 5 points for a more limited R21 grant with a single PI, and so on. Research productivity is measured with the Relative Citation Ratio (RCR), a metric based on citations, developed to correct for differences in citation behavior between fields. Both metrics are presented on logarithmic scales in this graph.

The most noteworthy aspect of this curve is that it rises with a steeper slope at lower values of GSI than it does at higher levels. This suggests that, on average, the increase in productivity associated with funding an additional grant to an already well-funded investigator would be less than that for providing a grant to an investigator with no funding or providing a second grant to an investigator with only a modest amount of funding. The separation between this observed curve and a hypothetical straight line (with productivity strictly proportional to research support) has been referred to as “unrealized productivity.”

Before delving further into this point, let us take advantage of the data that were made available to plot the relationship, with two changes. First, rather than just plotting the curve fit to the data, we show the data points themselves (for all 71,936 investigators used in the analysis). Second, we plot the data with linear rather than logarithmic scales to avoid any distortion associated with this transformation. The results are shown below, with the top graph showing all of the data points and the bottom graph enlarging the region that includes almost all investigators and showing a “spline” curve fit to these data along with a linear fit for comparison.

These plots reveal that the underlying data show a large amount of scatter, consistent with my earlier observations with the NIGMS-only data set as well as with the intuitive sense that laboratories with similar amounts of funding can vary substantially in their output. The curve fit to these data again reveals that the slope of the productivity versus grant support relationship decreases somewhat at higher levels of grant support.

With these observations in hand, we can now examine some expected results of proposed NIH policies. Suppose an investigator with an annual GSI of 28 (corresponding to four R01 grants) is reduced to an annual GSI of 21 (corresponding to three R01 grants) and that these resources are used to fund a previously unfunded investigator (to move to GSI = 7). According to the fit curve, the expected annual weighted RCR values are 9.0 for GSI = 28, 7.1 for GSI = 21, and 2.6 for GSI = 7. The anticipated change in annual weighted RCR is (–9.0 + 7.1 – 0 + 2.6) = 0.7. Thus, the transfer of funding is predicted to increase productivity (measured by weighted RCR). This appears to be one of the primary foundations for the proposed NIH policy.

This approach depends on the accuracy of the fitted curve in representing the behavior of the population which, as noted, shows considerable scatter. An alternative method involves directly simulating the effects of the proposed policy on the population. For example, one can take the 968 investigators with annual GSI values over 21 and reduce them to annual GSI values of 21, scaling each investigator’s weighted RCR output by the reduction in annual GSI. The total number of annual GSI points over the threshold of 21 for these investigators is 4709. This corresponds to the ability to fund an additional 672 R01 grants. If these grants are distributed to previously unfunded investigators, the anticipated weighted RCR output can be estimated by choosing a random set of 672 investigators with annual GSI values near 7 (say 6 to 8). Because of this random element, this simulation can be repeated many times to generate a population of anticipated outcomes. This results in the distribution shown below, with an average increase in weighted RCR of 0.3.

For most simulations, there is an increase in average weighted RCR, with the average being somewhat less than that anticipated from the analysis based on the fit curve alone (0.3 versus 0.7). There are several possible explanations for this difference, including limitations in the fit curve to capture the features of the highly scattered distribution and the approach to modeling the reduction in the anticipated output from the well-funded investigators.

The same simulation method can be applied to funding an additional 672 PIs with one R01 so that they each have two R01s by selecting 672 random PIs with an annual GSI of ~7 (6 to 8), removing them from the population, and adding 672 chosen from the population with an annual GSI of ~14 (13 to 15). The results are shown below, with an average increase in weighted RCR of 0.4.

These simulations appear to confirm that, on average, transferring funding from very well-funded PIs to less well-funded PIs may result in a small increase in weighted RCR output.

Conclusions

I strongly favor examination of such appropriate data to guide policy development. Understanding the relationships between grant support and research output is, of course, one of the most fundamental questions for any funding agency. The attempts by NIH to tackle this issue are laudable. However, as I discussed above, the presentation of simple curves fit to the data masks the considerable variation in output for PIs at all levels of funding. The development of policies based on a hard cap at a particular level of GSI seems to me to be problematic. Well-funded investigators always have substantial histories of research accomplishments. NIH program officers and advisory councils should have access to data about previous research accomplishments and productivity when making recommendations about potentially funding additional grants and should be encouraged to examine such data critically, even when the application under consideration has an outstanding peer review score. The opportunity costs for providing additional funding to an already well-funded PI at the expense of an early or mid-career PI with less funding are considerable. In addition, the use of a hard cap amplifies the importance of the details of how the GSI is calculated, with the selection of particular parameters potentially discouraging collaboration, training, and other desirable outputs, as has been the topic of ongoing discussions. It seems unwise to convert the highly nuanced information contained in lists of grant support and publications and other outputs into points on a graph rather than empowering the trained scientists who serve as program officials and advisory council members to use their judgment to help fulfill the mission of the NIH.

Modeling the growth of opioid overdose deaths

In his recent editorial Forecasting the opioid epidemic, Don Burke discussed the rise in opioid addiction in the United States and pointed to the need for data openness and analysis in developing strategies and policies to help mitigate this epidemic. Burke and his co-workers have now posted a preprint on bioRxiv, Exponential Growth of the USA Overdose Epidemic, that reveals that the number of deaths reported as accidental poisonings that can reasonably be associated with opioid overdoses in the United States grew from 2475 cases in 1979 to over 44,000 in 2015. The growth curve over this 37-year period is very close to exponential, with a fit to an exponential curve showing a correlation coefficient of R2 = 0.99. The exponential fit reveals a doubling time in the number of deaths of approximately 8 years.

Note that this is a preprint that has not been peer reviewed. Nonetheless, the result is straightforward, and the remarkably good fit of this potentially complicated data set to a simple function has several important implications. First, the fit enables a simple forecasting approach. In this case, the forecast is that there will be over 300,000 opioid overdose deaths in the 5-year period from 2016 to 2020 across the United States. Such a high number highlights the importance of developing effective strategies for addressing the epidemic and slowing its exponential growth. Second, the observation of deviations from exponential behavior such as the acceleration from 2002 to 2006 provides clues about the changes that might be driving the growth of the epidemic.

This new result once again illustrates the importance of data in public health, as discussed in my recent editorial.

 

Gender analysis of Science authors

When I started as editor-in-chief in July 2016, I was pleased to learn that the editorial team were working on a project to examine the gender distribution among individuals who published and submitted papers to Science. This project was initiated by our executive editor and our team of deputy editors with the help of two interns, Georgina Carter and Erica Vinson. I am grateful for their initiative and their efforts on this project.

The project focused on Research Reports, Research Articles, and Reviews published in 2015. The primary goal was to compare the percentages of women among individuals whose papers were published compared with those from papers that were submitted during the same period but were not selected for publication. A major challenge in performing these analyses is determining the gender of the authors in question because such data are not collected as part of the submission process. The genders of authors were assigned through individual Internet searches.

The project focused on the authors in the first position and the last position in each author list. The individuals who were listed first and last in the author list were classified (again, based on Internet searches) as to whether they appeared to be in established positions (faculty or similar positions), hereafter referred to as “senior authors,” or if they were graduate students, postdoctoral fellows, or in similar temporary roles, hereafter referred to as “junior authors.” For the purposes of this analysis, senior authors in the first or last position were included, as were junior authors in the first position. Using this approach, 862 senior authors and 471 junior authors were identified and used in the subsequent analysis.

For comparison, a group of manuscripts from 2015 were randomly chosen from those that had not been selected for publication, to match the balance of Research Reports, Research Articles, and Reviews in the published set. The genders and levels of seniority were determined through Internet searches as described, resulting in 883 senior authors and 434 junior authors for use in subsequent analysis.

Among the published papers, 24.8% (117 out of 471) of the junior authors were women. In the comparison group of manuscripts that were not selected for publication, 30.0% (130 out of 434) of the junior authors were women. Although this suggests a trend disfavoring women authors, the difference has a p value of 0.086, larger than p = 0.05.

For the published papers, 16.8% (145 out of 862) of the senior authors were women, while in the comparison group, the proportion was 14.7% (130 out of 883). Any trend favors women authors, although the difference is quite modest, with a p value of 0.237.

These data can be divided into three components corresponding to Research Reports (a total of 987 authors in the published paper group and 963 authors in the comparison group), Research Articles (228 authors in the published paper group and 206 authors in the comparison group), and Reviews (118 authors in the published paper group and 148 authors in the control group). In each of these sets, the same trends are observed as for the authors overall. The results are summarized below:

This preliminary study reveals two major sets of findings. First, as discussed above, the data do not reveal that the review and editorial processes at Science introduce substantial gender disparities. Second, the percentages of women among authors submitting to and published in Science are relatively low, ~27% for junior authors and 16% for senior authors. To put these values in context, I examined data from the United States National Science Foundation regarding the percentages of women in faculty positions and enrolled in graduate school. In 2010, the percentages of women in all faculty positions were 21% in the physical sciences, 42% in the life sciences, and 39% in the social sciences, whereas the percentages of women in senior faculty positions (associate professor and above) were 16% in the physical sciences, 34% in the life sciences, and 33% in the social sciences. In 2011, the percentages of women enrolled in graduate school were 33% in the physical sciences, 57% in the biological sciences, and 60% in the social sciences.

For papers submitted to Science, we estimate that ~40% are in the physical sciences, 55% are in the biological sciences, and 5% are in the social sciences. Using these as weighting factors, the anticipated percentage of women in all faculty positions submitting to Science would be (0.40)(21%) + (0.55)(42%) + 0.05(39%) = 33%. Similarly, for women in senior faculty positions, the anticipated percentage of women is 27% and the anticipated percentage of women among graduate students is 47%. In all cases, the percentages of women who submitted to Science are lower than these estimates. The estimates certainly could be inaccurate given that they are based on many assumptions that could influence the results, including assumptions about career stage, the use of data from the United States only despite the international authorship of Science, differences in the institutions in the general and authorship pools, and so on. Refinement of these estimates may reveal the sources of some aspects of the gender disparity, which could help guide additional analyses and, eventually, policy suggestions.

Science family journal content analysis

Computational tools for extracting relationships from text (often referred to as “natural language processing” tools) are increasingly powerful. Here, I analyze the content of a series of abstract from members of the Science family of journals using a natural language package in R called quenteda. The approach begins with a relatively large body (often referred to as a corpus) of abstracts. The frequency of different words within this corpus is analyzed. The similarity between every pair of abstracts is then calculated based on the fraction of words that the pair have in common, with words that are rare in the corpus being weighted more heavily than common words. These similarity scores are then converted into “distances,” ranging from 0 (very similar) to 1 (no similarity). To visualize and analyze these distances, a two-dimensional space is constructed that maintains the relative distances between pairs of points as faithfully as possible. Points in this two-dimensional space can then be plotted and examined.

Contracting the initial corpus of abstracts from Science

To build the corpus, abstracts for more than 2200 research papers from Science from 2013 through 2015 were assembled. The first task is to read in the data set and convert the data into a corpus for analysis.

Calculating similarities between all pairs of abstracts

The next step is to calculate the similarity matrix. This can be done using TF-IDF (term frequency–inverse document frequency) weighting. This weights terms that occurred rarely in the corpus more highly than common terms. The similarity index is the so-called “cosine similarity.” It will be necessary to convert this to a distance metric subsequently.

With this similarity matrix in hand, we calculate distances using the formula distance = 2*arccos(similarity)/pi.

Representing the relative distances in two dimensions

Finally, these distances are projected to two dimensions using this method based on multidimensional scaling, also known as principal coordinate analysis.

The results can then be plotted.

science_2013_2015This plot reveals an interesting three-pointed structure. Note that only the shape of this figure is meaningful; the orientation is arbitrary. Examination of the abstracts that correspond to the three points reveals that these correspond to biomedical sciences, physical sciences, and Earth sciences.

science_2013_2015_labelsExtending the analysis to the other Science family journals

With this framework in place, we can now expand the corpus to include papers published in the other Science family journals. For this purpose, we will use most of the papers published in Science Advances, Science Signaling, Science Translational Medicine, Science Immunology, and Science Robotics in 2016.

We now plot each journal separately.

all_science_sa

all_science_ss

all_science_stm

all_science_si

all_science_sr

The orientations of these figures are slightly different from that produced by the initial Science-only corpus but, as noted above, this orientation is arbitrary. Several points emerge from examining these plots. First, the breadth of disciplines covered by Science Advances is essentially the same as that covered by Science. Comparison with more papers from Science Advances may reveal differences in emphasis between these two broad journals. Second, the papers from Science Signaling, Science Translational Medicine, and Science Immunology lie in the same general region in the biomedical arm of the plot. More detailed analysis should reveal more nuanced differences between the content of these journals.

This analysis represents a first step toward using these tools for unbiased analysis of the contents of the Science family of journals. More refined analysis is in progress.

Additional documents and code

The abstracts used in this analysis are available in six .csv files. The R Markdown file that generates this post including the analysis is also available.

Modeling the annual number of NIH research grant applications

In an earlier post, I outlined a model for the success rates of NIH grant applications based on the history of NIH appropriations. Because the success rate is defined as the ratio of grants awarded to the number of grant applications reviewed, this model consists of two components. The first component is a model for the number of new and competing grants awarded that was developed in my previous post.

Grant application number data

We now turn to the second component, a model for the number of grant applications submitted and reviewed. The number of NIH research project grants reviewed each year from 1990 to 2015 is plotted below:

application_number_plot

This curve is somewhat reminiscent of the curve for the NIH appropriation as a function of time shown in an earlier post. The drop in the number of applications that occurs in 2008–2009 is an artifact due to the effects of the American Recovery and Reinvestment Act (ARRA). The funding associated with the ARRA was not included in the appropriations data, and applications that were considered for ARRA funding were also removed.

The grant application number and appropriation curves are compared directly below, plotted as fractional changes since 1990 to facilitate comparison.

fractional_change_plot

The curves are similar in shape, although the increase in the NIH appropriation curve is larger by approximately a factor of 2 than is the grant application number curve. The curves, normalized so that they have the same overall height, are compared below.

scaled_fractional_change_plot

Examination of the curves reveals that the grant application number curve is shifted to later years by ~2 years compared with the NIH appropriation curve. This makes mechanistic sense in that a relatively large increase in the NIH appropriation might cause institutions to hire more faculty who then apply for grants and might cause individual investigators to submit more applications. However, these responses do not take place instantaneously but require a year or more for the applications to be written and submitted.

A model for grant application numbers based on appropriation history

A linear model can now be fit to predict the grant application number curve as a linear combination of the appropriation curves shifted by 1 and 2 years, including a constant term.

app_number_fit_plot

The number of grant applications can be calculated from the appropriation curves by m1(appropriation-1 year offset) + m2(appropriation-2 year offset) + b, where m1 = –0.18, m2 = 0.61, and b = 0.57.

The agreement is reasonable. The major differences occur in years 2008–2009 due to the impact of ARRA noted above. The overall Pearson correlation coefficient is 0.983.

Conclusions

A model has been developed that allows the prediction of the number of NIH grant applications from the appropriations history. This model can be used in conjuction with the previously described model for the number of grants awarded to predict grant success rates, for actual appropriation histories or for hypothetical ones.

The grant application number model was developed empirically, based on observed similarities between the grant application number curve and the appropriation curve. Although it is not truly mechanism-based, the model is consistent with a simple mechanistic interpretation as noted. It is interesting that grant application numbers increased more or less monotonically. Thus, it would have been difficult to develop a model from the inflation-corrected appropriation curve because this peaked in 2003 and has been falling almost every year since. This raises an interesting point. Application numbers have gone up when the appropriation increases by more than inflation or when the appropriation increase is less than inflation. This could be interpreted in terms of two dynamic drivers. When the appropriation increases by more than inflation, institutions and investigators sense opportunity and submit more applications; when the appropriation increases by less than inflation, institutions and investigators sense tough times with lower success rates and submit more applications to increase their chances of competing successfully for funding.

It will be interesting to compare how well this empirical model does in predicting grant application numbers in future years.

Additional files

An R Markdown file that generates this post, including the R code, is available.

Modeling the annual number of new and competing NIH research project grants

In my most recent post, I outlined a model that allows estimation of National Institutes of Health (NIH) grant success rates based on the NIH appropriation history. This model has two components: a model that estimates the number of new and competing grants and a model that estimates the number of grant applications submitted to compete for funding. The term “grant”” in this case refers to NIH Research Project Grants (RPGs). This refers to grant mechanisms such as R01 and R21 grants that are the most common grants for individual or small groups of investigators. It excludes mechanisms for research centers and others that support larger groups of investigators. In this post, I present more details about the first component.

Historical appropriation and inflation data

To begin, I examine the appropriations history for NIH from 1990 to the present. NIH appropriations from 1990 to 2015 are shown below:

NIH appropriations history

For comparison, success rates for grants (RPGs) are shown below:

Success rate plot

These two parameters are negatively correlated with a correlation coefficient of -0.66. In other words, as the size of the appropriation increased, the success rate tended to decrease.

One possible adjustment that might improve the correlation involves correcting the appropriation data for inflation. Inflation is best measured in terms of the Biomedical Research and Development Price Index (BRDPI), a parameter calculated annually by the Department of Commerce on behalf of the NIH.

The NIH appropriation curves in nominal terms and in constant 1990 dollars are plotted below:

Appropriation nominal and constant

The constant dollar appropriations and success rate data are still negatively correlated with a correlation coefficient of -0.381.

Thus, the simple notion that the success rate should increase with increases in the NIH appropriation is empirically false over time.

There are two reasons why this is true. The first involves the manner in which NIH grants are funded. Grants average 4 years in duration which are almost always paid out in 4 consecutive fiscal years. Thus, if a 4-year grant is funded in a given fiscal year, the NIH is committed to paying the “out-years” for this grant over the next 3 fiscal years. Because of this, ~75% (actually more than 80% due to other commitments) of the NIH appropriation for a given year is already committed to ongoing projects, and only less than 20% of the appropriation is available for new and competing projects. This makes the size of the pool for new and competing projects very sensitive to the year-to-year change in the appropriation level.

The observed numbers of new and competing grants are plotted below:

Number of competing awards

A model for the annual number of new and competing grants

To put these effects in quantitative terms, a model has been developed to estimate the number of grants funded each year, given NIH appropriation and BRDPI data over time.

The assumptions used in building the model are:

  1. NIH funds grants with an average length of 4.0 years.

For the purposes of this model, we will assume 1/4 of the grants have a duration of 3 years, 1/2 of the grants have a duration of 4 years, and 1/4 of the grants have a duration of 5 years. Using a single pool of grants, all of which have 4-year durations, is both contrary to fact and would likely lead to artifacts. It is unlikely that the model will depend significantly on the details of the distribution. When a grant completes its last year, the funds are freed up to fund new and competing grants in the next year.

  1. The average grant size increases according to the BRDPI on a year-to-year basis.

This assumption has been less true in recent years owing to highly constrained NIH budgets, but this it a reasonable approximation (and still represents good practice).

  1. Fifty percent of the overall NIH appropriation each year is invested in RPGs. This is consistent with the average percentage of RPG investments over time.
  2. The system begins with an equal distribution of grants at each stage (first, second, … year of a multiyear grant) ~10 years before the portion used for analysis.

We will start in 1990. The comparison between the actual numbers of grants funding and those predicted by the model are shown below:

Grant number comparison

The agreement between the observed and predicted curves is remarkable. The correlation coefficient is 0.894.

Differences between the actual numbers of grants and those predicted by the model

The largest difference between the curves occurs at the beginning of the doubling period (1998-2003) where the model predicts a large increase in the number of grants that was not observed. This is due to the fact that NIH initiated a number of larger non–RPG-based programs when substantial new funding was available rather than simply funding more RPGs (although they did this to some extent). For example, in 1998, NIH invested $17 million through the Specialized Center–Cooperative Agreements (U54) mechanism. This grew to $146 million in 1999, $188 million in 2000, $298 million in 2001, $336 million in 2002, and $396 million in 2003. Note that the change each year matters for the number of new and competing grants that can be made because, for a given year, it does not matter whether funds have been previously committed to RPGs or to other mechanisms.

The second substantial difference occurs in 2013 when the budget sequestration led to a substantial drop in the NIH appropriation. To avoid having the number of RPGs that could be funded drop too precipitously, NIH cut noncompeting grants substantially. Noncompeting grants are grants for which commitments have been made and the awarding of a grant depends only on the submission of an acceptable progress report. The average size [in terms of total costs, that is, direct costs as well as indirect (facilities and administration) costs] of a noncompeting R01 grant was $393,000 in 2011, grew to $405,000 in 2012, a 2.9% increase, and then dropped to $392,000 in 2013, a 3.3% drop. Given that there are approximately three times as many noncompeting grants as there are new and competing grants, this change from a 2.9% increase to a 3.3% decrease for noncompeting grants increased the pool of funds for new and competing grants by ~3(2.9 + 3.3) = 18.6%. However, cutting noncompeting grants means that existing programs with research underway and staff in place had to find ways for dealing with unexpected budget cuts.

Available documents and code

The R Markdown file that generates this post, including the R code, is available.

Modeling success rates from appropriations histories

Introduction to modeling success rates from appropriations data

A substantial portion of fundamental scientific research is government-supported (see my editorial in Science). An important factor that affects the experiences of individual investigators and the efficiency of government funding programs is the likelihood that a given scientific application receives funding. Although this can seem to be a simple parameter, it depends on the systems properties of the scientific enterprise such as the fraction of appropriated funds available for new grants and the number of investigators competing for funding, both of which evolve over time.

Here I introduce a modeling approach that links historical data about funds appropriated to agencies for grants to the probabilities that grant applications are funded in a given year. I will initially focus on the US National Institutes of Health (NIH) because the relevant data are readily available and changes in the funds appropriated over time have varied substantially, introducing some important behavior that can, at first, seem hard to understand.

NIH appropriations history

The levels of funds appropriated for NIH from 1990 to 2015 are shown below:

Appropriations_plot

These data are in nominal dollars—that is, they are not corrected for the effects of inflation. The NIH budget was doubled from 1998 to 2003 in a coordinated effort by Congress, and these points are shown in red. Note that these data and the subsequent analysis exclude funds and applications associated with the American Recovery and Reinvestment Act, which affected budgets in 2009–2010.

NIH success rates

One of the factors that has a great influence on the research and science policy communities, both directly and culturally, is the likelihood that a given grant proposal will be funded, usually measured in terms of the “success rate” or “funding rate.” The success rate is the number of new and competing grant applications that are awarded in one fiscal year divided by the total number of grant applications that were reviewed in that year.

Success rate = number of grants awarded/number of grant applications reviewed

Here the term “new and competing” grants refers to grants that have not been funded previously (new grants) and grants that have been funded for one or more multiyear terms but now are competing through full peer review prior to receiving additional funding (competing renewal grants).

Two major factors determine the success rate. The first is the amount of funding available for new and competing grants, as opposed to the overall annual appropriation. This, combined with the average grant size, determines the number of new and competing grants that can be awarded—that is, the numerator in the success rate calculation. The second is the number of grant applications that are submitted and reviewed in a given year. This is determined by the number of investigators that are submitting grant applications and the average number of applications submitted in a given year per investigator. This is the denominator of the success rate calculation.

Success rate data for Research Project Grants (RPGs) for NIH for 1990 to 2015 are shown below:

Success_rate_plot

Note that the success rate fell dramatically immediately after the doubling and continued to fall for several additional years. This led to outcries from the research community and consternation from Congress because they had made funding biomedical research a high priority for a number of years.

The effects of multiyear funding

Why did this dramatic drop in success rate occur? A major factor involves the manner in which NIH research project grants are funded. NIH grants average 4 years in duration, which are almost always paid out in four consecutive fiscal years. Thus, if a 4-year grant is funded in a given fiscal year, the NIH is committed to paying the out-years for this grant over the next three fiscal years. Because of this, ~75% (actually closer to 80% or more because of other commitments) of the NIH appropriation for a given year is already committed to ongoing projects and only ~20% of the appropriation is available for new and competing projects. This makes the size of the pool for new and competing projects very sensitive to the year-to-year change in the appropriation level.

Funds from grants that have ended are recycled to fund new and competing grants. This recycling is shown schematically below:

Grant recycling_small

The recycling of funds from year to year with funds for grants that end moving into the pool for new and competing grants.

A model for estimating success rates based on appropriations history

To put these effects in quantitative terms, I developed a model for the number of new and competing grants. This model will be described in detail in a subsequent post. Briefly, the model is based on the assumptions that NIH funds grants with an average length of 4.0 years with 1/4 of the grants with a duration of 3 years, 1/2 of the grants with a duration of 4 years, and 1/4 of the grants with a duration of 5 years and that the average grant size increases annually according to the rate of biomedical research price inflation.

This model is combined with a model for the number of research project grant applications that are reviewed annually. The basis for this latter model is that the number of applications submitted rises in response to increases in the NIH appropriation with a lag of about 2 years. This model will also be described in a subsequent post.

The success rates predicted from the model are compared with the observed success rates below:

Model_plot

The agreement is reasonable, although certainly not perfect. The overall Pearson correlation coefficient is 0.866. However, the model does accurately predict the sharp drop in the success rate immediately following in the doubling period. In addition, because the model assumes constant policies at NIH, the areas where the model results do not agree as well with the observed values suggest time periods where NIH did change policies in response to ongoing events. This will be explored in a subsequent blog post.

Several parameters can also be examined to characterize this and other funding scenarios. The first is the total amount of funds invested, both in nominal and constant dollars.

The investment in nominal dollars is 557 billion.

The investment in constant 1990 dollars was 334 billion.

The observed mean success rate was 0.248.

The mean success rate predicted from the model was 0.251.

The observed standard deviation in the success rate was 0.052.

The standard deviation in the success rate predicted from the model was 0.063.

Modeling an alternate scenario with more consistent funding increases

Suppose that, instead of the doubling, Congress had committed to steady increases in the NIH appropriation beginning in 1998. To match the investment in constant 1990 dollars from the doubling and postdoubling era, this corresponds to annual increases of 7.55%.

We can now use the modeling tools that we have developed to estimate the consequences of such an appropriations strategy in terms of success rates and other parameters.

Success_rate_comparison

As might be anticipated, under the new scenario, the success rates vary much less dramatically. The standard deviation in the success rate predicted from the model for the new scenario was 0.022. This is smaller than the observed standard deviation by a factor of 2.4. Thus, the scenario with steady appropriation increases would decrease the variability in and, hence, the apparent capriciousness of, success rates substantially.

The mean success rate predicted from the appropriation scenario with steady increases from 1998 on was 0.257. This is higher that the mean success rate based on the actual appropriations data by 2.6%.

Although this is a relatively modest change in mean success rate, it corresponds to a decrease in the number of unsuccessful applications from 702,000 under the actual scenario to 667,000 under the new scenario. Thus, the steady approach to funding would have reduced the number of unsuccessful applications by 35,000. With the conservative assumption that preparation of a grant application requires 1 month of work, this difference corresponds to the efforts of 111 investigators working full-time over the entire 26-year period.

Conclusions

A modeling approach has been developed that allows estimation of NIH grant success rates given the history of appropriations. The model is used to demonstrate that alternatives to the “boom” in appropriations corresponding to the NIH budget doubling followed by the “bust” of more than a decade of flat (or falling when the effects of inflation are included) would have resulted in 2.6% more efficient distribution of funds (measured by the number of applications that would be needed to distribute the same amount funds in constant dollars) and less variable success rates by a factor of 2.4. The model can be applied to other potential past or future appropriation scenarios, and the modeling approach can be applied to other agencies.

Available code and documents

An R Markdown file that generates this post, including the code for the model, is available.

Comparing individual papers from journals with different journal impact factors

In the previous post, I developed a tool that could generate an approximate citation distribution for a journal given its journal impact factor (JIF). We can now use this tool to address a key question:

If you select one paper randomly from a distribution associated with a journal impact factor JIF_1 (Journal_1) and another paper randomly from a distribution associated with a journal impact factor JIF_2 (Journal_2), what is the probability that the first paper has more citations than the second paper?

In the first post in this series, I demonstrated that normalized plots of the number of publications versus the number of citations could be fit to functions of the form:

P(c) = N(exp(-k1c) – exp(-k2c)) with k1 < k2

where c is the number of citations, P(c) is the population of papers with c citations, and N is a normalization factor set so that the integrated total population is 1.

Using the tool developed in the previous post, values for k1 and k2 can be estimated given only a value for the JIF.

We now turn to the question at hand. Suppose that a paper randomly selected from Journal_1 has x citations. The probability that this paper has more citations than a paper from Journal_2 is shown graphically below:

Shaded comparison plot

The blue shaded region represents the fraction of papers in Journal_2 that have x or fewer citations. Note that the citation distribution curves are presented as continuous functions for clarity (although in actuality, the numbers of citations must be integers). The use of continuous functions will also substantially simplify subsequent analysis and is unlikely to change any conclusions significantly.

In mathematical terms, the area of the blue shaded region is given by F(x) = P2(c)dc  from 0 to x, where P2(c) is the normalized citation curve for Journal_2, that is, P2(c) = N2(exp(-k12c) + exp(-k22c)) with k12 and k22 determined from JIF_2 as described in the previous post.

The integral can be readily solved analytically as shown in the mathematical appendix. In this way, it can be shown that the fraction of papers from Journal_2 with x or fewer citations is given by

F(x) = N2((1/k12)(1 – exp(-k12x)) – (1/k22)(1 – exp(-k22x))).

This function is shown graphically below:

Fraction plot

This fraction curve has the expected shape. For small values of x, only a small fraction of papers in the other journal have x or fewer citations. For larger values of x, this fraction increases. Finally, for the largest values of x, the fraction approaches 1.00, that is, almost all papers in Journal_2 have x or fewer citations.

The probability that a paper with x citations is randomly chosen from Journal_1 is given by P1(x) = N1(exp(-k11x) + exp(-k21x)). Thus, to answer our question, we need only calculate the average of the fraction curve, F(x), weighted by the probability of different values of x. This is given by the integral Probability(JIF_1, JIF_2) = P1(x)F(x) dx from 0 to infinity.

The functions to be integrated, P1(x)F(x), are plotted below for JIF_2 = 3 and JIF_1 = 1, 5, 10, and 20.

Integrand plot

The areas under these curves can be estimated. From the graph, the anticipated results are relatively clear. For JIF_1 = 1, the area is relatively small so that the probability is relatively low. For JIF_1 = 3, the area increases. Because JIF_1 = JIF_2 at this point, the area should be 0.50, that is, the probability that the number of citations from a paper from one journal is less than that for the other should be 50%. The areas continue to increase for JIF_1 = 5 through JIF_1 = 20, approaching 1.00.

The expression can be integrated analytically in a straightforward way (as shown in the mathematical appendix), although the algebra is a bit involved. Thus, the desired probability (the integrated value) is given by

Probability(JIF_1, JIF_2) = N1N2((1/k12)((1/k11) – (1/k21) – (1/(k11 + k12)) + (1/(k21 + k12))) – (1/k22)((1/k11) – (1/k21) – (1/(k11 + k22)) + (1/(k21 + k22)))).

This function is plotted below for JIF_2 = 3 with JIF_1 values ranging from 1 to 30, with the values corresponding to the plot above indicated.

JIF_2-3 plot

The curve represents that answer to our question for a journal with a JIF of 3. For example, a paper selected randomly from a journal with a JIF of 5 would be expected to have more citations than a randomly selected paper from a journal with a JIF of 3 only 65% of the time. This is only twofold different from that expected if the difference was completely random, illustrating the lack of justification for interpreting small (or even fairly large) differences in JIFs when judging individual papers.

To amplify this further, the analogous plot for JIF_2 = 10 is shown below:

JIF_2-10 plot

This plot demonstrates that a paper randomly selected from a journal with JIF = 10 will have more citations than a randomly selected paper from a journal with JIF = 5 approximately 30% of the time. Similarly, a paper randomly selected from a journal with JIF = 10 will have fewer citations than a randomly paper from a journal with JIF = 20 approximately 75% of the time. These modest differences in probability for doubling JIFs highlight the folly of interpreting small differences in JIFs that are sometimes reported with three decimal places. From a scientific perspective, such false precision is utterly inappropriate. I hope that this analysis, along with the many other extant criticisms of the use and abuse of JIFs, will encourage scientists and administrators to use JIFs only in contexts for which they are appropriate.

Available code and documents

The R Markdown file that generates this post including the R code is available. The parameters from the linear fits from the previous post are available as a .csv file. A mathematical appendix showing the derivation of key formulae is also available.

Generation of predicted citation distributions from journal impact factor values

In my previous post, I demonstrated how the citation distribution for a given journal could be fit to a function defined as the difference of two exponentials. This function is characterized by two parameters, k1 and k2. I showed how these two parameters could be used to derive the journal impact factor (JIF). However, for a variety of purposes, it will be useful to generate an approximate citation distribution given the JIF value, that is, to solve the inverse problem. This should be possible if we can discern relationships between the JIF value and the parameters k1 and k2.

The relationships between k1 and k2 and JIF

First, let us plot k1 values versus the JIF values calculated from the fits to the observed distributions. I chose to use the calculated JIF values rather than the actual JIF values because the latter are distorted by the same number of papers with more that 100 citations and these lie off the distribution.

k1_vs_JIF_plot

As might be anticipated, the relationship between k1 and the JIF value is approximately exponential. This can be confirmed by plotting k1 versus the logarithm of the JIF value. The results can be fit to a line.

k1_vs_log(JIF)_plot

Similarly, k2 is also approximately related to the logarithm of the JIF value, although there is somewhat more scatter.

k2_vs_log(JIF)_plot

Using these two linear fits, we can estimate the values of k1 and k2 given a value for the JIF.

To ensure that the deduced values of k1 and k2 will generate the same value of the JIF using the formula from the previous post, we can adjust the values of k1 and k2 slightly by a constant value. Thus, we need to find delta(k) such that k1’ = k1 + delta(k) and k2’ = k2 + delta(k) such that JIF = (k1’ + k2’)/k1’k2’. The derivation for the optimal value of delta(k) is shown in the mathematical appendix (see note below). For JIF values from 2 to 10, this correction averages 24% for k1 and 4% for k2.

Estimating a citation distribution from a JIF value

Consider the case of EMBO Journal, which has a reported JIF of 9.6 and a JIF calculated from the fit values of k1 and k2 of 9.0. The observed distribution is compared with the distribution predicted from the calculated JIF value below:

EMBO_J_JIF_plot

 

The agreement appears reasonable. Similar results are observed for other journals (not shown). Thus, we have developed a tool with which can estimate the distribution of citations given only a JIF.

Journal impact factors –Fitting citation distribution curves

Introduction to Sciencehound

Welcome to my new blog at Science. I began blogging when I was Director of the National Institute of General Medical Sciences (NIGMS) at the US National Institutes of Health (NIH). Our blog was called the NIGMS Feedback Loop. I found this to be a very effective way of sharing information and data with NIGMS stakeholders. A couple of years after leaving NIGMS, I started a new blog called Datahound. There, I have continued sharing data and analyses about programs of interest to the scientific community. I greatly appreciated those who took the time to comment, providing feedback and sometimes raising important questions. I am starting Sciencehound with the same intent, providing data and analyses and, importantly, initiating discussions with the readers of Science and the Science family of journals. Enjoy and join in!

Journal impact factors

Journal impact factors are used as metrics for the quality of academic journals. In addition, they are (ab)used as metrics for individual publications or individual scientists (see my editorial in Science). The journal impact factor is defined as the average number of times articles published in a given journal over the past 2 years are cited in a given year. This average is derived from a relatively broad distribution of publications with different numbers of citations. Recently, Larivière et al. posted on BioRxiv a proposal recommending sharing these full distributions . This manuscript includes 2015 distributions for 11 journals (in a readily downloadable format). The distribution for Science magazine is shown below:

Science_Citation_Plot

Note that the point at 100 represents the sum of the numbers of all papers that received 100 or more citations.

Fitting citation curves as the difference of exponential functions

This curve rises quickly and then falls more slowly. As a chemist, this reminded me of the curves representing the concentration of an intermediate B in a reaction of the form

A -> B -> C.

The concentration of B rises when A is converted to B and then falls when B is transformed into C.

Solving equations for the kinetics of this scheme results in a function that is the difference between two exponential functions with negative exponents, that is,

P(c) = N(exp(-k1c) – exp(-k2c)) with k1 < k2.

Here, c is the number of citations, P(c) is the population of papers with c citations, kand k2 are adjustable constants, and N is a scale factor. The curve rises with an initial slope proportional to (1/k2 – 1/k1) and falls expontially approximately as exp(-k1c).

Before fitting the citation curve to this function, we first normalize the curve so that the area under the curve is 1.0 and the y-axis is the fraction of the number of total papers.

Science_Citations_Norm_Plot

This normalized curve can now be fit to the difference of exponential functions. It is easy to show that the normalization constant for the difference of exponential functions is N = k1k2/(k2 – k1) (see mathematical appendix).

Science_Citation_Fit_Plot_rev

The best fit occurs with k1 = 0.05 and k2 = 0.19.

The apparent journal impact factor can be calculated from these parameters (See mathematical appendix). It can be shown that the journal impact factor (JIF) is:

JIF = (k1 + k2) / k1k2.

The calculated JIF = 25.3.

Note that this value is smaller that the journal impact factor that is reported (34.7). This is because highly cited papers (with more than 100 citations) have a substantial effect on the journal impact factor but are not well fit by the difference of exponential functions.

Results for a collection of journals

With this fitting protocol in place we can now fit the distributions for the other 10 journals.

Nature

Nature_Plot_rev

The best fit occurs with k1 = 0.07 and k2 = 0.08.

The calculated JIF = 26.8.

eLife

eLife_Plot_rev

The best fit occurs with k1 = 0.16 and k2 = 0.65.

The calculated JIF = 7.8.

PLOS ONE

PLOS_ONE_Plot_rev

The best fit occurs with k1 = 0.31 and k2 = 2.

The calculated JIF = 3.7.

PLOS Biology

PLOS_Biol_Plot_rev

The best fit occurs with k1 = 0.16 and k2 = 0.57.

The calculated JIF = 8.0.

PLOS Genetics

PLOS_Genet_Plot_rev

The best fit occurs with k1 = 0.18 and k2 = 0.92.

The calculated JIF = 6.6.

Nature Communications

Nature_Comm_Plot_rev

The best fit occurs with k1 = 0.13 and k2 = 0.66.

The calculated JIF = 9.2.

EMBO Journal

EMBO_J_Plot_rev

The best fit occurs with k1 = 0.16 and k2 = 0.37.

The calculated JIF = 9.0.

Proceedings of the Royal Society of London B

Proc_R_Soc_B_Plot_rev

The best fit occurs with k1 = 0.24 and k2 = 1.42.

The calculated JIF = 4.9.

Journal of Informetrics

J_Informatics_Plot_rev

The best fit occurs with k1 = 0.32 and k2 = 2.

The calculated JIF = 3.6.

Scientific Reports

Sci_Rep_Plot_rev

The best fit occurs with k1 = 0.22 and k2 = 2.

The calculated JIF = 5.0.

Analysis of calculated and observed journal impact factors

The calculated journal impact factors are well correlated with the observed values as shown below:

JIF_Comparison_Plot

A line with slope 1 is shown for comparison. The overall Pearson correlation coefficient is 0.999. Fitting all 11 data points to a line through the origin yields a slope of 0.746. The fact that this slope is substantially less than 1 is largely driven by the values for Science and Nature which, as noted above, are lower than the reported values owing to the elimination of the effect of papers with more than 100 citations. If these two points are eliminated, the slope of a fitted line increases to 0.924.

Conclusions

We have demonstrated that a function formed as the difference of two exponential functions can be used to fit observed distributions of the numbers of papers with different number of citations. Fitting this functional form to data from 11 journals reproduces the curves well and generates journal impact factors that agree well with published values. The largest differences are in journals such as Science and Nature that have substantial numbers of papers with more than 100 citations over the 2-year period. This emphasizes again how these outlier papers can affect journal impact factor values.

Available code and documents

The R Markdown file that generates this post including the R code for fitting the citation distributions is available. The data from Larivière et al. is provided as a .csv file. A mathematical appendix showing the derivation of some key formulae is also available.