Training for the Future
Adapting to the Flood of Data
This issue of Biomedical Computational Review (BCR) is about transforming the biomedical workforce into a biomedical big-data workforce. It focuses on one of the most pressing problems facing biomedical science today: evolution of the workforce in response to the flood of data. Because the rate of that evolution is affected by the availability of appropriate training and education, this issue of BCR is a timely and welcome addition to the continuing conversation on the topic.
The biomedical Big Data workforce is diverse—it includes biomedical scientists who are users of Big Data, data scientists who develop methods, data engineers who build tools, as well as librarians, who organize and manage data. Although each of these requires a different mix of skills, core areas are common to all of these groups and include three components: 1) an understanding of the processes represented by the data (biomedical knowledge), 2) the facility to handle and manage the data (computational skills), and 3) the knowledge to conduct and interpret analyses and draw conclusions (statistical skills).
These three components—computer science, statistics, and a type of biomedical science—are each whole fields of study in their own right. Acquiring expertise in one area is a long process; acquiring expertise in multiple areas is something very few people will achieve. A more realistic goal for this age of Big Data is to have many individuals with some knowledge in all three areas along with expertise in at least one area.
Ideally, all scientists would know (and trainees would be taught) how to discover and conduct standard analyses of their data and, more importantly, how to determine when standard analyses are not appropriate. When standard analyses break down and new challenges are discovered, collaborations with method and tool developers—a group that has been collectively referred to as biomedical data scientists—are needed.
Method developers often recognize that doing a precise principled analysis is computationally infeasible. Although a model-based analysis may have foundations in principles such as maximizing likelihood or minimizing Bayes risk, often approximations must be made to reduce the computational cost; approximations done in a skillful, deliberate, and measured way, with attention to diagnostics, can maintain interpretability and reliability of results. Tool developers bring skills that turn ideas and prototypes into hardened products through creative algorithm design based on a thorough understanding of the computational framework being used. Although method developers are often tool developers (and vice versa), separating the description of the roles illuminates the tradeoff between time and accuracy that often exists. The competing demands between computational cost and confidence in results need to be weighed by the whole team, including biomedical scientists, method developers, and tool developers.
Intra-team communication is essential for a group of researchers to function well. Communication is aided by having enough overlapping expertise to be able to translate from one field to another, as each field may have its own specialized language. Many departments are already making a conscious effort to build overlap between fields by, for example, adding more statistics training to bioinformatics programs and more computational training to biostatistics programs.
A goal of the NIH Big Data to Knowledge Initiative is to foster the development of training and education opportunities that enable trainees and scientists to gain the skills needed to contribute most effectively to biomedical Big Data teams. Awards issued in the past year—for Big Data training programs, courses, open educational resources, and career development—represent early efforts toward transforming the biomedical workforce into a biomedical Big Data workforce. Achieving this goal will bring challenges, some of which are illuminated in this issue of BCR. With challenges come opportunities, and the NIH is keen to seize those opportunities, and ultimately, to turn data into discovery into health.