Breathing Life Into Paper
The edict that academics must “publish or perish” serves not merely to advance careers, but also to stress the importance of transmitting knowledge from scientist to scientist and generation to generation. The work of many individuals and teams builds upon previous work and drives scientific progress, but the way in which scientific knowledge amasses is changing.
Traditionally, the scientific literature consists of peer-reviewed journal articles, books and other prose, and it is the interpretation of results and conclusions that usually convey the most knowledge. While time-honored tradition is certainly not going away, supplementary ways to transmit value beyond the written word are becoming increasingly important.
Particularly in computationally intensive fields, scientifically valuable information is transferred in various forms: computer programs and source code; raw data too numerous for journal articles; multimedia illustrations (e.g., animations, 3D models); databases and knowledgebases (e.g., HapMap, PDB, LIDC, RCT Bank); and teaching resources (e.g., MIT OpenCourseWare, clinical teaching files). Integration of these computational products with the traditional scientific literature has begun to happen in some fields but has yet to become truly pervasive.
Green1 makes the case that computational methods have caused papers to outgrow the page limits of traditional papers, making reproducibility unachievable without linkage to the methods themselves. Buckheit and Donoho2 have developed ways to disseminate live figures and tables in order to promote reproducibility of results. They even go so far as to suggest that journal articles are not true scholarship but are instead mere advertisements for the underlying accomplishments. Both Bourne3 and Gentleman4 advocate that there is tremendous untapped potential in creating live documents that link to various types of electronic resources. They all envision text, multimedia, data and code seamlessly integrated as an interactive form of scientific dissemination that would far surpass the limitations of static paper-based publication.
Such ideas raise numerous questions about how to accrue academic credit for contributing computational products; how to permanently cite such contributions; how to integrate computational material with traditional publications; how to deal with intellectual property concerns; how to control the use or misuse of computational work; how to design a quality control program for an integrated multimedia system; and how to improve the reproducibility of results.
These are all important issues that do not have easy solutions. But perhaps the most important issue will be whether sharing and disseminating all of these computational products will come naturally to an academic research culture that is accustomed to sharing knowledge at a more conceptual level. I believe the shift to integrated scientific dissemination can and will permeate, but only when computational contributions receive the respect they are due. There is more than one way to stand on the shoulders of giants.
1 P.J. Green. Diversities of gifts, but the same spirit. The Statistician, pages 423–438, 2003.
2 J. Buckheit and D. L. Donoho. Wavelab and reproducible research. In A. Antoniadis, editor, Wavelets and Statistics. Springer-Verlag, 1995.
3 P. Bourne. Will a Biological Database Be Different from a Biological Journal? PLoS Computational Biology. 2005 August; 1(3): e34.
4 Robert Gentleman, “Reproducible Research: A Bioinformatics Case Study” (May 2004). Bioconductor Project Working Papers. Working Paper 3. http://www.bepress.com/bioconductor/paper3