---
title: "Modeling the annual number of NIH research grant applications"
output: html_document
---
In an [earlier post](http://blogs.sciencemag.org/sciencehound/2016/08/25/modeling-success-rates/), I outlined a model for the success rates of NIH grant applications based on the history of NIH appropriations. Because the success rate is defined as the ratio of grants awarded to the number of grant applications reviewed, this model consists of two components. The first component is a model for the number of new and competing grants awarded that was developed in my [previous post](http://blogs.sciencemag.org/sciencehound/2016/09/01/modeling-the-annual-number-of-new-and-competing-nih-research-project-grants/).
###Grant application number data
We now turn to the second component, a model for the number of grant applications submitted and reviewed. The number of NIH research project grants reviewed each year from 1990 to 2015 is plotted below:
```{r Build dataframe with budget data and award model from previous post, include = FALSE, echo = FALSE}
library(ggplot2)
library(dplyr)
library(reshape2)
Year <- c(1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015)
NIH_appropriation <- c(7576352, 8276739, 8921687, 10335996, 10955773, 11299522, 11927562, 12740843, 13674843, 15629156, 17840587, 20458556, 23321382, 27166715, 28036627, 28594357, 28560417, 29178504, 29607070, 30545098, 31238000, 30916345, 30860913, 29315822, 30142653, 30203000)
Model_df <- data.frame(Year, NIH_appropriation)
# Add BRDPI data
BRDPI <- c(0.054, 0.048, 0.044, 0.034, 0.039, 0.035, 0.026, 0.028, 0.034, 0.032, 0.037, 0.033, 0.033, 0.035, 0.037, 0.039, 0.046, 0.038, 0.047, 0.029, 0.03, 0.029, 0.013, 0.019, 0.022, 0.022)
Model_df <- cbind(Model_df, BRDPI)
# Add success rate data
Success_rates_observed <- c(.245, .286, .294, .235, .254, 0.268, 0.279, 0.305, 0.311, 0.324,
0.315, 0.321, 0.312, 0.299, 0.246, 0.223, 0.200, 0.213, 0.218, .206, .206, .177, .176, .168, .181, .183)
Model_df <- cbind(Model_df, Success_rates_observed)
Appropriation_1990_dollars <- numeric(26)
for (i in 1:26) {
Appropriation_1990_dollars[i] <- Model_df$NIH_appropriation[i]
}
for (i in 2:26) {
for (j in 2:i) {
Appropriation_1990_dollars[i] <- Appropriation_1990_dollars[i] / (1 + Model_df$BRDPI[j])
}
}
Model_df <- cbind(Model_df, Appropriation_1990_dollars)
Competing_award_number_observed <- c(5267, 6127, 6380, 5546, 6474, 6758, 6653, 7388, 7518, 8556, 8765, 9098, 9396, 10393, 10052, 9599, 9128, 10100, 9460, 8881, 9455, 8765, 9032, 8310, 9241, 9540)
Model_df <- cbind(Model_df, Competing_award_number_observed)
# A function to initialize grant data
Initialize_grant_df <- function(Appropriation, Grant_no_total, Year) {
no_grants_3 <- (0.25 * Grant_no_total) / 4.0
no_grants_4 <- (0.5 * Grant_no_total) / 4.0
no_grants_5 <- (0.25 * Grant_no_total) / 4.0
ave_grant_size <- Appropriation / Grant_no_total
num_competing_grants <- no_grants_3 + no_grants_4 + no_grants_5
first_year <- c(Year, no_grants_3, no_grants_3, no_grants_3,
no_grants_4, no_grants_4, no_grants_4, no_grants_4,
no_grants_5, no_grants_5, no_grants_5, no_grants_5, no_grants_5,
ave_grant_size, num_competing_grants)
grant_df <- data.frame(matrix(first_year, nrow = 1, ncol = 15))
colnames(grant_df) <- c("Year", "Grants_3_1", "Grants_3_2", "Grants_3_3",
"Grants_4_1", "Grants_4_2", "Grants_4_3", "Grants_4_4",
"Grants_5_1", "Grants_5_2", "Grants_5_3", "Grants_5_4", "Grants_5_5",
"Average_grant_size", "Competing_award_number_predicted")
grant_df
}
# Approximately 50% of the NIH budget is devoted to funding RPGs
Approp_1990 <- 0.5 * Model_df$NIH_appropriation[1]
Grant_df <- Initialize_grant_df(Approp_1990, 20090, 1990)
# A function to update Grant_df with subsequent years
Increment_year <- function(grant_df, appropriation, inflation_factor) {
num_years <- nrow(grant_df)
Ave_grant_size_new <- grant_df$Average_grant_size[num_years] * (1 + inflation_factor)
Year_new <- grant_df$Year[num_years] + 1
Grants_3_2_new <- grant_df$Grants_3_1[num_years]
Grants_3_3_new <- grant_df$Grants_3_2[num_years]
Grants_4_2_new <- grant_df$Grants_4_1[num_years]
Grants_4_3_new <- grant_df$Grants_4_2[num_years]
Grants_4_4_new <- grant_df$Grants_4_3[num_years]
Grants_5_2_new <- grant_df$Grants_5_1[num_years]
Grants_5_3_new <- grant_df$Grants_5_2[num_years]
Grants_5_4_new <- grant_df$Grants_5_3[num_years]
Grants_5_5_new <- grant_df$Grants_5_4[num_years]
Total_commitment <- (Grants_3_2_new + Grants_3_3_new +
Grants_4_2_new + Grants_4_3_new + Grants_4_4_new +
Grants_5_2_new + Grants_5_3_new + Grants_5_4_new +
Grants_5_5_new)
Total_commitment <- Total_commitment * Ave_grant_size_new
Available <- appropriation - Total_commitment
Num_new_grants <- Available / Ave_grant_size_new
Grants_3_1_new <- 0.25 * Num_new_grants
Grants_4_1_new <- 0.5 * Num_new_grants
Grants_5_1_new <- 0.25 * Num_new_grants
Num_competing_grants <- Grants_3_1_new + Grants_4_1_new + Grants_5_1_new
new_row <- c(Year_new, Grants_3_1_new, Grants_3_2_new, Grants_3_3_new,
Grants_4_1_new, Grants_4_2_new, Grants_4_3_new, Grants_4_4_new,
Grants_5_1_new, Grants_5_2_new, Grants_5_3_new, Grants_5_4_new,
Grants_5_5_new, Ave_grant_size_new, Num_competing_grants)
grant_df <- rbind(grant_df, new_row)
grant_df
}
# Initialize by running model from 1980 to 1990
Year_init <- c(1980, 1981, 1982, 1983, 1984, 1985, 1986, 1987, 1988, 1989, 1990)
NIH_appropriation_init <- c(3428935, 3569406, 3641875, 4023969, 4493588, 5149459, 5262211, 6182910, 6666693, 7144764, 7576352)
Model_df_init <- data.frame(Year_init, NIH_appropriation_init)
# Add BRDPI data
BRDPI_init <- c(.098, .104, .086, .062, .059, .056, .042, .053, .050, .052, .054)
Model_df_init <- cbind(Model_df_init, BRDPI_init)
Approp_1980 <- 0.5 * Model_df_init$NIH_appropriation_init[1]
Grant_df_init <- Initialize_grant_df(Approp_1980, 17000, 1980)
for (i in 2:11) {
new_appropriation_init <- 0.5 * Model_df_init$NIH_appropriation_init[i]
# Use BRDPI from the previous year to adjust grant sizes
new_inflation_init <- Model_df_init$BRDPI_init[(i - 1)]
Grant_df_init <- Increment_year(Grant_df_init, new_appropriation_init, new_inflation_init)
}
colnames(Model_df_init)[1] <- "Year"
Grant_df <- Grant_df_init
for (i in 2:26) {
new_appropriation <- 0.5 * Model_df$NIH_appropriation[i]
new_inflation <- Model_df$BRDPI[(i - 1)]
Grant_df <- Increment_year(Grant_df, new_appropriation, new_inflation)
}
Model_df <- merge(Model_df, Grant_df, byrow = "Year")
```
```{r Add application number data and plot, echo = FALSE}
RPG_applications_observed <- c(21509, 21416, 21734, 23626, 25510, 25224, 23821, 24221, 24151, 26408, 27798, 28368, 30068, 34710, 40861, 43069, 45688, 47455, 43467, 43142, 45983, 49592, 51313, 49581, 51073, 52190)
Model_df <- cbind(Model_df, RPG_applications_observed)
p_1 <- ggplot(Model_df, aes(x = Year, y = value))
p_1 <- p_1 + ylim(c(0, 55000))
p_1 <- p_1 + ylab("Number of Research Project Grant Applications")
p_1 <- p_1 + geom_point(aes(y = RPG_applications_observed), color = "red")
p_1 <- p_1 + geom_line(aes(y = RPG_applications_observed), color = "red")
# Save png of plot for blog
suppressMessages(ggsave("Application_number_plot.png", dpi = 600))
p_1
```
This curve is somewhat reminiscent of [the curve for the NIH appropriation as a function of time shown in an earlier post](http://blogs.sciencemag.org/sciencehound/2016/08/25/modeling-success-rates/). The drop in the number of applications that occurs in 2008-2009 is an artifact due to the effects of the American Recovery and Reinvestment Act (ARRA). The funding associated with the ARRA was not included in the appropriations data, and applications that were considered for ARRA funding were also removed.
The grant application number and appropriation curves are compared directly below, plotted as fractional changes since 1990 to facilitate comparison.
```{r Compute fractional changes in appropriations and applications and plot, echo = FALSE}
Model_df <- mutate(Model_df, NIH_appropriation_fractional_change_1990 = NIH_appropriation / Model_df$NIH_appropriation[1])
Model_df <- mutate(Model_df, RPG_applications_fractional_change_1990 = RPG_applications_observed / Model_df$RPG_applications_observed[1])
Model_df_sub <- select(Model_df, Year, NIH_appropriation_fractional_change_1990, RPG_applications_fractional_change_1990)
write.csv(Model_df_sub, file = "Model_df_frac_change.csv")
Model_df_sub_melt <- melt(Model_df_sub, id = c("Year"))
p_2 <- ggplot(Model_df_sub_melt)
p_2 <- p_2 + geom_line(aes(x=Year, y=value, color=variable))
p_2 <- p_2 + ylab("Fractional Change since 1990")
p_2 <- p_2 + scale_color_manual(values=c("black","red"))
# Save png of plot for blog
suppressMessages(ggsave("Fractional_change_plot.png", dpi = 600))
p_2
```
The curves are similar in shape although the increase in the NIH appropriation curve is larger by approximately a factor of 2 than is the grant application number curve. The curves, normalized so that they have the same overall height, are compared below.
```{r Compared normalized curves, echo = FALSE}
Model_df <- mutate(Model_df, NIH_appropriation_fractional_change_normalized = NIH_appropriation_fractional_change_1990 / max(NIH_appropriation_fractional_change_1990))
Model_df <- mutate(Model_df, RPG_applications_fractional_change_normalized = RPG_applications_fractional_change_1990 / max(RPG_applications_fractional_change_1990))
Model_df_sub <- select(Model_df, Year, NIH_appropriation_fractional_change_normalized, RPG_applications_fractional_change_normalized)
Model_df_sub_melt <- melt(Model_df_sub, id = c("Year"))
p_3 <- ggplot(Model_df_sub_melt)
p_3 <- p_3 + geom_line(aes(x=Year, y=value, color=variable))
p_3 <- p_3 + scale_color_manual(values=c("black","red"))
# Save png of plot for blog
suppressMessages(ggsave("Scaled_fractional_change_plot.png", dpi = 600))
p_3
```
Examination of the curves reveals that the grant application number curve is shifted to later years by ~2 years compared with the NIH appropriation curve. This makes mechanistic sense in that a relatively large increase in the NIH appropriation might cause institutions to hire more faculty who then apply for grants and might cause individual investigators to submit more applications. However, these responses do not take place instantaneously but require a year or more for the applications to be written and submitted.
###A model for grant application numbers based on appropriation history
A linear model can now be fit to predict the grant application number curve as a linear combination of the appropriation curves shifted by 1 and 2 years, including a constant term.
```{r Fit a linear model to predict RPG applications from appropriations, echo = FALSE, warning = FALSE}
NIH_appropriation_1 <- c(NA, Model_df$NIH_appropriation_fractional_change_1990[1:25])
NIH_appropriation_2 <- c(NA, NA, Model_df$NIH_appropriation_fractional_change_1990[1:24])
RPG_application_0 <- c(NA, NA, Model_df$RPG_applications_fractional_change_1990[3:26])
Model_df <- cbind(Model_df, RPG_application_0)
Model_df <- cbind(Model_df, NIH_appropriation_1)
Model_df <- cbind(Model_df, NIH_appropriation_2)
Fit_df <- select(Model_df, Year, RPG_application_0, NIH_appropriation_1, NIH_appropriation_2)
Fit_df <- Fit_df[3:26, ]
Year <- Fit_df$Year[3:26]
Fit_df <- select(Fit_df, RPG_application_0, NIH_appropriation_1, NIH_appropriation_2)
fit <- lm(RPG_application_0 ~ ., data = Fit_df)
RPG_applications_predicted <- predict(fit)
Fit_df <- cbind(Fit_df, RPG_applications_predicted)
Fit_df <- cbind(Year, Fit_df)
Fit_sub_df <- select(Fit_df, Year, RPG_application_0, RPG_applications_predicted)
write.csv(Fit_sub_df, file = "Application_fit_df.csv")
Fit_sub_df_melt <- melt(Fit_sub_df, id = c("Year"))
p_4 <- ggplot(Fit_sub_df_melt)
p_4 <- p_4 + geom_line(aes(x=Year, y=value, color=variable))
p_4 <- p_4 + scale_color_manual(values=c("black","red"))
# Save png of plot for blog
suppressMessages(ggsave("App_number_fit_plot.png", dpi = 600))
p_4
RPG_applications_predicted <- Model_df$RPG_applications_observed[1] * RPG_applications_predicted
RPG_applications_predicted <- c(NA, NA, RPG_applications_predicted)
Model_df <- cbind(Model_df, RPG_applications_predicted)
```
The number of grant applications can be calculated from the appropriation curves by m~1~*(appropriation-1 year offset) + m~2~*(appropriation-2 year offset) + b, where m~1~ = `r round(coefficients(fit)[2], 2)`, m~2~ = `r round(coefficients(fit)[3], 2)`, and b = `r round(coefficients(fit)[1], 2)`.
The agreement is reasonable. The major differences occur in years 2008-2009 due to the impact of ARRA noted above. The overall Pearson correlation coefficient is `r round(cor(Model_df$RPG_applications_observed[3:26], Model_df$RPG_applications_predicted[3:26]), 3)`.
###Conclusions
A model has been developed that allows the prediction of the number of NIH grant applications from the appropriations history. This model can be used in conjuction with [the previously described model](http://blogs.sciencemag.org/sciencehound/2016/09/01/modeling-the-annual-number-of-new-and-competing-nih-research-project-grants/) for the number of grants awarded [to predict grant success rates](http://blogs.sciencemag.org/sciencehound/2016/08/25/modeling-success-rates/), for actual appropriation histories or for hypothetical ones.
The grant application number model was developed empirically, based on observed similarities between the grant application number curve and the appropriation curve. Although it is not truly mechanism-based, the model is consistent with a simple mechanistic interpretation as noted. It is interesting that grant application numbers increased more or less monotonically. Thus, it would have been difficult to develop a model from the [inflation-corrected appropriation curve](http://blogs.sciencemag.org/sciencehound/2016/09/01/modeling-the-annual-number-of-new-and-competing-nih-research-project-grants/) because this peaked in 2003 and has been falling almost every year since. This raises an interesting point. Application numbers have gone up when the appropriation increases by more than inflation or when the appropriation increase is less than inflation. This could be interpreted in terms of two dynamic drivers. When the appropriation increases by more than inflation, institutions and investigators sense opportunity and submit more applications; when the appropriation increases by less than inflation, institutions and investigators sense tough times with lower success rates and submit more applications to increase their chances of competing successfully for funding.
It will be interesting to compare how well this empirical model does in predicting grant application numbers in future years.
###Additional files
An [R Markdown](http://rmarkdown.rstudio.com) file that generates this post, including the R code, is available.