Computation Competitions Take Off!
Contests involving algorithms for protein structure prediction, natural language processing, and computer-aided disease detection are giving researchers a jolt of adrenalin and moving these fields forward
From all parts of the computational spectrum, researchers are duking it out: They are throwing their algorithms into the ring to see which one will out-perform all others on a particular task. Contests that feature algorithms for protein structure prediction, natural language processing, and computer-aided disease detection are giving researchers a jolt of adrenalin and moving these fields forward.
“When you have a field with a quantitative basis and competing approaches in which high performance is one of the main outcomes, it seems like a natural setting for having a competition,” says Ron Summers, MD, PhD, senior investigator and staff radiologist in the department of radiology at NIH. “It’s also beneficial to the field. The spirit of competition encourages hard work to solve difficult problems.”
Protein-structure prediction has been competitive since 1994 when the CASP (Critical Assessment of Techniques for Protein Structure Prediction) contest drew 34 groups to register. Since then, the biennial event has steadily grown in popularity: 263 groups are registered for the 2006 bout, including several that will rely only on in silico tools, without help from human instinct (See Human vs. Machine feature story in this issue).
This year, competitive natural language processing (NLP) gets a boost from one of the National Centers for Biomedical Computation. In conjunction with the fall meeting of the American Medical Informatics Association, i2b2 (Informatics for Integrating Biology and the Bedside) is extending an open invitation to anyone who wants to challenge their own NLP tools using real clinical records.
“Clinical data is not easily accessible to a lot of people who want to work on this type of data,” says Ozlem Uzuner, PhD, assistant professor of information studies at the State University of New York at Albany. “I2b2 and its partners have put together these data and that’s what makes this a unique opportunity.”
The competition is two-pronged. Researchers compete to effectively remove patients’ identifying information from clinical data. (Note: I2b2 has already removed the real information and replaced it with fictional data to protect patient privacy). In addition, they will parse hospital discharge summaries to accurately extract information on patients’ smoking status. The work will help set the stage for researchers to work with clinical data without violating patient privacy.
A computer-assisted polyp detection “bake-off” is also on the horizon. In a traditional bake-off, says Ron Summers, the cooks are given the ingredients and they compete to produce the best cake. In the CAD polyp bake-off, the American College of Radiology Imaging Network (ACRIN) provides researchers with a data set consisting of CT colonoscopy scans from about 200 patients. The researchers then run their CAD systems using these data. About a dozen academic and commercial researchers have expressed interest in participating.
“Various researchers have been producing systems and claiming outstanding performance on very small data sets,” says Summers. “It was competitive but not fair. It was like everyone deciding the terms of their own race.” Since the ultimate goal is to help patients, results need to be standardized, Summers says. “We need to know which approaches are better so everyone can move toward that and improve their systems.” Hence the CAD competition, which Summers hopes will be underway by November.
Challenges in Natural Language Processing for Clinical Data (sponsored by i2b2 in conjunction with AMIA): http://www.i2b2.org/NLP/Main.php
Virtual Colonoscopy CAD Bake-Off: For more information, contact Ron Summers: firstname.lastname@example.org.