In the previous post, I outlined a method based on the r “gender” package for estimating the gender distributions for populations of authors submitting papers to *Science *and applied this method to look at gender distributions for authors who submitted Reports to *Science*. Here, I will extend this analysis to look at correlations in the genders of first and corresponding authors. However, I will first characterize and refine the gender inference method further.

### Refinement of the gender inference method

The “gender” method is based on an examination of the frequency of genders associated with particular first names in large databases. The “gender” package yields a default gender assignment based on a cutoff of 0.50 for the frequencies of names within the database but also returns the actual frequencies. The default method produced reasonable accuracy for a test set of 4789 individuals with user-provided gender information. For 3403 of these individuals, gender could be inferred using the “gender” package. The accuracy of this gender inference was different for males and females, with an accuracy of 98.1% for males but only 84.0% for females. This suggests that potential *Science *authors are more likely to be male than would be expected based on the database used by the “gender” package and that gender inference accuracy could possibly be improved by adjusting the frequency cutoff used for gender inference.

A histogram of the name frequency data for the 3403 individuals is shown below.

This histogram shows clear separation between male and female individuals in terms of male frequency in the database. Almost all of the individuals on the right of the graph with male frequencies greater than 0.50 are male (blue), whereas almost all of the individuals on the left with male frequencies less than 0.50 are female (red).

The cutoff parameter of 0.50 can be adjusted. If a higher cutoff is used, the gender of more individuals will be inferred to be female. However, many of these individuals will actually be male. This will increase the accuracy of inference for males, but will decrease it for females. Conversely, if a lower cutoff is used, the gender of more individuals will be inferred to be male, with the opposite effect on accuracy.

The trade-off can be depicted through a receiver operating characteristic (ROC) curve. The ROC curve is a plot of the sensitivity (the rate of true positives) versus 1 minus the specificity (the rate of false positives). The ROC curve for inferring female gender is shown below.

The ROC curve would have a perfect right angle in the upper left-hand corner for a test that was perfectly accurate.

We can directly examine the variation in the accuracy of inferred genders as a function of the cutoff parameter.

As could be anticipated from the relatively clean separation shown in the first figure, the accuracy curve is relatively constant over a wide range of cutoff values. However, the maximum accuracy is found at a cutoff of 0.11 (shown with the red arrow) rather than 0.50. The accuracy for this cutoff is 95.2%, somewhat higher than for the earlier results with a cutoff of 0.50 (93.7%). Although this may seem like a small change, it represents a reduction is the gender inference error rate from 6.3 to 4.8%—a reduction in relative terms of 24%. With a cutoff of 0.11, the accuracy for males was 96.7% and for females, 91.0%. Thus, with the new cutoff, the accuracy for males decreased slightly (by 1.4%), but the accuracy for females increased by 7.0%. Using a cutoff of 0.11 both improved the overall accuracy and decreased the discrepancy in accuracy for females compared to males, suggesting that it may be more suitable than the default cutoff for use in further analyses.

We can apply this new cutoff to the dataset of 2182 (out of 3568) names from papers published and rejected in 2015 for which genders were determined from web searches and could be inferred using the “gender”-based tool. For this dataset, the accuracy with a cutoff of 0.11 is 95.7%, somewhat higher than for the earlier results with a cutoff of 0.50 (94.7%). With a cutoff of 0.11, the accuracy for males was 97.2% and for females, 90.6%. Again, with the new cutoff, the accuracy for males decreased slightly and the accuracy for females increased.

Treating the 2015 dataset as an independent reveals that the maximum accuracy is found at a cutoff 0.19 rather than 0.50. The accuracy for this cutoff is 95.8%, only very slightly improved over that obtained with a cutoff of 0.11. For future calculations, we will use the average of these cutoff values, 0.15.

### Correlations between the genders of corresponding and first authors of Reports

With the gender inference tool optimized, we now turn to questions regarding correlations between the genders of the corresponding authors of Reports in *Science *and the genders of the first author. More specifically, we will examine Reports submitted to *Science *from 2010 to 2017 that have a single corresponding author for which the genders of both the corresponding author and the first author could be inferred using the “gender”-based method. Of the 71,275 Reports with first and corresponding authors submitted over this period, 66,057 have a single corresponding author. Note that the percentage of Reports with a single corresponding author dropped from 100 to 76% from 2010 to 2017 because of changes in journal policy and data capture, and levels of collaboration in the scientific community.

Within this set of Reports, we first excluded Reports where the corresponding author and the first author were the same individual. Reports with different corresponding and first authors accounted for 59% of total submissions. This varied considerably between fields (Reports were divided among fields based on the editors who handled the submissions) from 77% in the life sciences, to 69% in the physical sciences, to 24% in other fields (including ecology, evolution, social sciences, and others), highlighting differences in both scientific processes and author-order practices in different disciplines.

For the submitted Reports with different corresponding and first authors (both of which had genders that could be inferred by the “gender”-based tool), we calculated the number of submissions with the four possible combinations of male and female first authors and male and female corresponding authors and divided each by the number of the combination expected if the genders of the first and corresponding authors were independent from one another. More explicitly, these paired-author ratios are defined as follows:

Ratio_FF = Number of submissions with female corresponding author and female first author / (Total number of submissions × fraction of females among corresponding authors) × (fraction of females among first authors)

Ratio_FM = Number of submissions with female corresponding author and male first author / (Total number of submissions × fraction of females among corresponding authors) × (fraction of males among first authors)

Ratio_MF = Number of submissions with male corresponding author and female first author / (Total number of submissions × fraction of males among corresponding authors) × (fraction of females among first authors

Ratio_MM = Number of submissions with male corresponding author and male first author / (Total number of submissions × fraction of males among corresponding authors) × (fraction of males among first authors)

To estimate the uncertainties due to error rates for the gender inferences (approximately 3% for males and 9% for females), we performed simulations where inferred genders were randomly varied guided by these probabilities, and the paired-author ratios were calculated.

For the full dataset, the paired-author ratios were as follows:

Ratio_FF = 1.197 ± 0.028

Ratio_FM = 0.912 ± 0.014

Ratio_MF = 0.958 ± 0.006

Ratio_MM = 1.018 ± 0.003

where the uncertainties are reported as 95% confidence intervals based on the error rates for gender inference.

The paired-author ratio for female corresponding authors and female first authors is by far the largest of the four ratios. This indicates that there are 20% more Reports submitted by pairs of a female corresponding author and a female first author than would be expected if corresponding author gender and first author gender were uncorrelated. By contrast, the paired-author ratio for male corresponding authors and male first authors is barely different from 1.0. The paired-author ratios for female corresponding author and male first author pairs and male corresponding author and female first author pairs are less than 1.0, as must be the case because the weighted average of the four ratios must be 1.0.

The paired-author ratios also vary across fields. These ratio minus 1.0 (to better visualize the differences) are plotted for different fields below.

In all fields, the ratio_FF is substantially larger than the ratio_MM. These ratios also vary substantially from field to field, being largest in the “other” category (fields other than life and physical sciences) and smallest in the physical sciences. Indeed, in the physical sciences, none of the ratios are substantially different from 1.0 when the errors associated with gender inference are taken into account. There are a variety of possible explanations for these variations, but these cannot be evaluated from these data alone.

These results can also be examined over time (although the uncertainties due to errors in gender inference become larger owing to smaller sample sizes). The results for ratio_FF and ratio_MM are plotted below for the entire dataset and by field below.

In general, the paired-author ratios are relatively stable from year to year, with no notable trends (given the remaining gender uncertainty). This uncertainly is quite dominant for ratio_FF for the physical sciences and other categories owing to the relatively low numbers of females among these authors.

As a final comparison, we looked at these ratios for published Reports compared to overall submissions. These values are shown below.

Overall submission paired-author ratios:

Ratio_FF = 1.197 ± 0.028

Ratio_FM = 0.912 ± 0.014

Ratio_MF = 0.958 ± 0.006

Ratio_MM = 1.018 ± 0.003

Published Report paired-author ratios:

Ratio_FF 1.236 ± 0.091

Ratio_FM 0.913 ± 0.033

Ratio_MF 0.952 ± 0.019

Ratio_MM 1.018 ± 0.007

The values for Ratio_FM, Ratio_MF, and Ratio_MM are essentially identical between the overall submissions and published Report datasets. The value of Ratio_FF appears to be very slightly larger in the published Report dataset, but the difference is well within the overlap of 95% confidence intervals, which are largest for this category because of the relatively smaller numbers of female authors.

### Summary

We have refined the gender inference tool based on the “gender” package by adjusting the cutoff parameter. This increases the overall accuracy in gender inference from 93.7 to 95.2%, with a substantial increase in the accuracy of inferring female gender. Using this tool, we have examined a set of approximately 39,000 Reports submitted from 2010 to 2017 that have both corresponding and first authors for whom gender could be inferred. We found that Reports with a female corresponding author and a female first author present at a level 20% higher than would be expected if the genders of the corresponding and first authors were uncorrelated, although this was not true to nearly the same extent for male authors. This phenomenon was most pronounced in fields other than life sciences or physical sciences and was of very low magnitude in the physical sciences. This phenomenon has been relatively stable over the period of time studied.