Computer Vision that Mimics Human Vision
Computer vision program rivals the human ability to rapidly recognize objects in a complex picture
Our brains can recognize most of the things we pass on an evening stroll: Cars, buildings, trees, and people all register even at a great distance or from an odd angle. Now, a new computer vision program can do the same thing. It successfully rivals the human ability to rapidly recognize objects in a complex picture because it mimics how information flows during the initial stages of visual perception.
“We’ve built a model to be as close as possible to what is known about the human visual system,” explains Thomas Serre, PhD, a postdoctoral associate in the Center for Biological and Computational learning at MIT and lead author of two papers recently published out of the lab run by Tomaso Poggio, PhD, at MIT’s McGovern Institute for Brain Research.
For decades, scientists have struggled to create computer programs that can recognize visual objects as well as humans can. Some computer systems excel at recognizing one particular object, but none are anywhere close to recognizing the wide range of objects observed by the human brain. Visual recognition is complicated by two conflicting goals: a program must be specific enough to discriminate between different objects, such as a person or a car, yet flexible enough to recognize the same type of object in different sizes, poses, and lighting.
To achieve these goals, Serre and colleagues used data recorded from real neurons in the visual system to program two fundamentally different kinds of virtual neurons called S (simple) and C (complex) units. S units recognize specific features of an image; C units monitor a range of S units in one area and allow for variation in position and size.
The researchers were surprised to find that a simple system, consisting of four alternating layers of S and C units, was able to classify pictures of a busy street scene as well as other leading mathematics-based computer vision systems, as described in the March 2007 issue of IEEE Transactions on Pattern Analysis and Machine Intelligence.
Serre’s team then built a more complex system, consisting of many S and C layers designed to closely match the flow of information in a human brain during the first 100-200 milliseconds of perception. This enhanced system performed as well as humans on a rapid object recognition task: distinguishing animals from non-animals when images were flashed in front of humans and computers. The work appeared in the April 2007 issue of the Proceedings of the National Academy of Sciences. The computer system even made errors similar to the errors made by humans, suggesting that the model recapitulates the early processes of the human visual system.
The model will be used as a tool by neuroscientists to better understand the human visual system, and also has practical applications for surveillance, driving assistance, and autonomous robotics. According to Poggio, the team’s next goal is to extend the model to include the “back projections” from other parts of the brain that allow feedback proces ing of visual information after 200 milliseconds.
“This is the first demonstration that a purely bottom up approach to visual object recognition, inspired by recordings from the neurons in the brain, is effective as a practical computer vision system,” says Terry Sejnowski, PhD, head of the Computational Neurobiology Lab at the Salk Institute. “There is much more work to do, both to improve its performance, and also to use it to better understand how our own visual system works.”