Peripheral vision allows humans to see shapes that are not in their direct line of sight, albeit with less detail. This feature increases your field of view and is useful in a variety of situations, such as detecting vehicles approaching from the side.
Unlike humans, AI does not have peripheral vision. Equipping computer vision models with this capability could more effectively detect approaching hazards or predict whether a human driver would notice an approaching object.
Taking a step in this direction, researchers at MIT have developed an image dataset that allows machine learning models to simulate peripheral vision. They found that training a model using this dataset improved the model’s ability to detect objects in the visual periphery, although the model’s performance was still inferior to humans.
Their results also revealed that, unlike humans, neither the size of the objects nor the amount of visual clutter in the scene had a strong impact on the AI’s performance.
“There’s something fundamental going on here. We’ve tested so many different models that even with training they get a little bit better, but they’re not completely human-like. So the question is: What is missing in these models?” said Vasya Dutel, a postdoctoral fellow and co-author of the paper detailing this work.
Answering this question may help researchers build machine learning models that can see the world the same way humans do. Such models could be used to develop displays that not only improve driver safety but also make it easier for people to see.
Additionally, a better understanding of peripheral vision in AI models could help researchers predict human behavior more accurately, adds lead author Anne Harrington MEng ’23.
“If we can truly capture the essence of what is being represented in the periphery by modeling peripheral vision, it will help us understand the features of a visual scene that move the eye to gather more information. ” she explains.
Their co-authors include Mark Hamilton, a graduate student in electrical engineering and computer science. Ayush Tewari, Postdoc. Simon Stent, research manager at Toyota Research Institute. and lead author William T. Freeman, the Thomas and Gerd Perkins Professor of Electrical Engineering and Computer Science and a member of the Computer Science and Artificial Intelligence Laboratory (CSAIL). and Ruth Rosenholtz, a senior researcher in the Department of Brain and Cognitive Sciences and a member of CSAIL. This research will be presented at the International Conference on Learning Representations.
“Whenever a human interacts with a machine, such as a car, robot, or user interface, it is critical to understand what the human is seeing. Peripheral vision plays a key role in that understanding. ” Rosenholtz said.
Peripheral vision simulation
Stretch your arms out in front of you and point your thumbs up. A small area around the thumbnail is visible through the fovea, a small depression in the center of the retina that provides the clearest vision. Everything else you see is on the periphery of your vision. The visual cortex represents a scene with less detail and reliability as you move away from sharp focus.
Many existing approaches to modeling peripheral vision in AI represent this worsening detail by blurring the edges of the image, but the information loss that occurs in the optic nerve and visual cortex is much more complex.
To achieve a more accurate approach, the MIT researchers started with a technique used to model human peripheral vision. This method, known as a texture tiling model, transforms images to represent the loss of human visual information.
They modified this model to be able to transform images similarly, but in a more flexible way that doesn’t require people or AI to know in advance where to look.
“This allows us to faithfully model peripheral vision in the same way that human vision research is done,” Harrington says.
The researchers used this modified technique to generate a huge dataset of transformed images that appear more textured in certain areas, allowing them to capture the finer details that occur when humans look further into their surroundings. expressed loss.
They then used that dataset to train several computer vision models and compared them to human performance on object detection tasks.
“We had to be very smart about how we set up our experiments so that we could also test machine learning models. We didn’t want to have to retrain the model on toy tasks that it wasn’t meant to do. ” she says.
unique performance
The human and the model were shown a pair of transformed images that were identical except for a target object located around one image. Next, each participant was asked to select an image that contained the object.
“What really surprised us was how good people were at detecting objects around them. We looked at at least 10 different image sets, but it was just too easy. . We kept having to use smaller and smaller objects,” adds Harrington.
The researchers found that training a model from scratch using a dataset provided the greatest performance gains and improved object detection and recognition abilities. Fine-tuning the model using the dataset was a process of fine-tuning the pre-trained model to perform new tasks, and the performance gains were small.
But in both cases, the machines were no better than humans, especially at detecting distant objects. Their performance also did not follow the same pattern as humans.
“This may suggest that the model is not using context in the same way that humans perform these detection tasks. The model’s strategy may be different,” Harrington said. says.
The researchers plan to continue investigating these differences with the goal of finding models that can predict human performance in peripheral visual areas. This could, for example, enable AI systems that warn drivers of dangers they may not be aware of. They also hope to encourage other researchers to conduct additional computer vision research using publicly available datasets.
“This study suggests that human peripheral vision should be considered an optimized representation for performing real-world tasks, not just poor vision due to the limited number of photoreceptors we have. “It’s important because it contributes to our understanding of the world.” “It’s the impact on the world,” said Justin Gardner, an associate professor of psychology at Stanford University who was not involved in the study. “Furthermore, this study shows that neural network models, despite recent advances, are unable to match human performance in this respect, and further AI research learning from the neuroscience of human vision will be encouraged.” This future research will be greatly aided by the database of images provided by the authors to mimic human peripheral vision. ”
This research was supported in part by the Toyota Research Institute and the MIT CSAIL METEOR Fellowship.