ENCODE's Threads
A novel approach to publishing for large research projects
When a large research project generates lots of data over a long time, that data can tell many different stories. Such was the case when the ENCODE (Encyclopedia of DNA Elements) project geared up to publish its first wave of results. “They had to decide which stories were the most prominent and most complete to be told within the confines of traditional research papers,” says Magdalena Skipper, the Nature editor who worked with the ENCODE project’s authors.
Unfortunately, by choosing a set number of topics, she says, “other stories became fragmented and told across multiple papers.” To address that problem, the researchers created a set of “threads” that pull together 13 of these otherwise fragmented stories. They then manually collected the relevant portions of each thread—a process akin to highlighting the portions (including figures and tables) of 30 papers that relate to a specific topic. So, for example, the tale of machine learning approaches to genomics became one thread; and three-dimensional connections across the genome became another.
The threads don’t have a classic identity. “They aren’t indexed in PubMed.” Skipper says. But they provide a tool for exploring the published information through a different lens. “In an ideal world,” Skipper says, “one would be able to generate these threads automatically on any topic.” But current text-mining tools lag a bit—they can’t, for example, adequately extract relevant figures or other display items.
To maximize the utility of a group of related papers, Skipper says she hopes Nature will do something like threads again. “Researchers appreciate it—it’s visually appealing and the content is useful.”
Post new comment