Science Careers Blog

March 29, 2012

New Federal Big Data Initiative to Drive Computational Training

The Obama administration, in partnership with several federal agencies including the National Institutes of Health (NIH) and the National Science Foundation (NSF), today announced the creation of the Big Data Research and Development Initiative to improve the government's, academia's, and private industry's ability to collect and make sense of the vast amounts of data pouring in from health records, national labs, consumer-based reporting, and other sources.

Representatives from the federal agencies involved gathered for a briefing this afternoon at the American Association for the Advancement of Science (which publishes Science Careers) in Washington, DC. John Holdren, director of the White House Office of Science and Technology Policy, said there is a critical need in the United States for an increased "ability to move from data to knowledge to action."

"While the private sector will clearly take the lead in developing big-data-related products and services, the government can play an important role by supporting long-term R&D, investing in the big-data workforce, using big-data approaches to make progress on key national challenges, and increasing access to the government's own data," Holdren said.

So far, the federal government has under-invested in this capacity, he said, so the government has assembled a number of new programs to help build the infrastructure necessary to big-data collection and analysis, and to train the scientists necessary to analyze it. Together, the programs total more than $200 million in new investments.

As part of that effort, NSF Director Subra Suresh revealed that a $2 million award will be given to a research training group to design an undergraduate curriculum that teaches students how to use complex graphical and visualization tools for giant data sets. An unspecified amount will be spent encouraging and providing support for institutions to develop interdisciplinary graduate programs dedicated to training scientists and engineers to work with such data.

Suresh added that NSF will use its existing Integrative Graduate Education and Research Traineeship Program to support training and education for researchers who work with very large data sets.

"Data increasingly serve as the primary driver for discovery and decision-making," he said.

Of course, once you train this next generation of data scientists, theoretically there should be jobs available, ready to take advantage of their skills. Will there be companies ready and willing to hire these scientists in a few years?

According to James Manyika, director of the McKinsey Global Institute (MGI), a business and economics research firm, a report his organization published last year estimates that within a few years there will be a shortage of between 150,000 and 190,000 people in the United States with "deep analytical skills" who can work with very large data sets; some 300,000 to 400,000 people needed for skilled technician and support staff positions; and 1.5 million people needed to be "data-savvy managers and decision-makers." Most of these jobs, the report estimates, will be available in health care, drug discovery and development, software engineering, retail, and manufacturing.

Such predictions, though, are difficult to match up with reality. They're based on analyses of data from the U.S. Bureau of Labor Statistics, the U.S. Census, and MGI's own interviews with companies. Whether companies will actually offer that many jobs to people with big-data skills will depend on a number of factors that can't yet be determined. The pace of the economic recovery will surely play a role in whether MGI's numbers bear out. For a briefing dedicated to gigantic amounts of data, the data supporting the existence of future jobs for big-data trainees was surprisingly sparse.

As for the immediate demand, it's highly regional. Companies in the Silicon Valley region of California can't hire enough people to fill the demand for workers who can work with large data sets, said Daphne Koller, a computer scientist at Stanford University. But that's much less true in parts of the country that lack Silicon Valley's richness of biotech and start-up companies.

Nevertheless, NIH Director Francis Collins was bullish about future trainees' job prospects. "If I were a college senior or a first-year graduate student interested in biology, I would migrate as fast as I could into computational biology," he said. "It is a very appealing career path."

Collins also mentioned that NIH is dedicated to providing training programs for current scientists to acquire the kinds of big-data skills that would allow them to compete for these jobs, though he did not say what these programs would look like.

blog comments powered by Disqus