Semantic Publishing and Scientific Journals
Keeping up with the literature is a challenge for all scientists. But some researchers are making it easier by enhancing the usability and understanding of an article’s contents in a variety of ways—an approach called “semantic publishing.” Recent efforts include a manual demonstration project published by the Public Library of Science (PLoS) as well as a number of automated tools being developed around the world. Combined, they provide an intriguing glimpse at scientific publishing’s possible future.
“It’s exciting to me that now there are the first stirrings of people who are doing this for real with semantic markup either manually or automatically,” says David Shotton, PhD, a reader in image bioinformatics at Oxford University and lead author of an April 2009 PLoS Computational Biology paper describing the demonstration project. “If researchers can find relevant papers faster and understand their import faster, that will assist their research.”
Shotton and his colleagues spent several weeks last year manually enhancing a paper (by Reis et al., 2008) published in PLoS Neglected Tropical Diseases (http://dx.doi.org/10.1371/journal.pntd.0000228.x001). Among other things, they added machine readable data (Excel spreadsheets rather than static images); provided ways to highlight various important terms in the paper; and added hyperlinks. In addition, scrolling over a text citation brings up a hover box showing the citation as well as relevant text from the original citation—so the reader can understand why it is cited without having to look it up.
“Many of the things we did are trivial, but cumulatively they make a difference. Perhaps a small difference, but a helpful difference,” Shotton says.
Shotten and his colleagues manually curated the paper—a slow process that could be improved via automation. Automation of some of Shotton’s manual tasks has already occurred through the Elsevier Grand Challenge (where Shotton served as a judge)—a contest created to improve the way scientific information is communicated and used. One of the runners-up this year—a team from Australia—built a tool that automatically creates the kind of citation hover boxes that Shotton’s group built by hand. It uses very standard reliable text mining algorithms to extract words from the citing reference, looks at the cited reference for similar conjunctions of words, and pulls back the most relevant sentences. “And it works,” Shotton says.
This year’s Challenge’s winners (announced in April) developed a browser plug-in called Reflect (freely downloadable at http://reflect.ws). Clicking on the REFLECT button in any Web browser automatically marks up an online document to show instances of protein, gene and chemical names—in just seconds. Next, a click on the highlighted term brings up a box with all sorts of information about that gene/protein or chemical. Soon, the group hopes to add other categories, such as diseases and cell types.
The journal Nature is starting to implement some semantic publishing approaches, says Timo Hannay, PhD, the publishing director at Nature.com. Still, he says, there remains the question of which enhancements to implement first, given the state of technology; and how to get authors to buy in, especially if they will have to do extra work. “We’re just at the beginning, but I’d like to see as much of our information as possible provided in structured, standard, machine-readable form,” Hannay says.