IImagine seeing the world through the eyes of a 6-month-old child. You have no words to explain anything. How can we understand language when each sound coming out of the mouths of those around us has an almost infinite number of potential meanings?
This question has led many scientists to hypothesize that humans must have some unique language facility that helps them begin language acquisition. However, in a paper published in science It was revealed this week that a relatively simple AI system fed data taken from a baby’s perspective began learning words.
The paper builds on a dataset of footage captured by helmet-mounted cameras over 18 months of Australian babies aged 6 to 25 months. From 61 hours of video, the research assistants carefully analyzed 375,000 utterances made by parents while babies were playing with a toy block set, such as “You can see that this blocks triangles.” researched and annotated. A clip shared by TIME shows the baby fumbling through a toy set before turning its attention to an uninterested cat.
Researchers from New York University’s Data Science Center and Department of Psychology fed this dataset into a multimodal AI system that can ingest both text and images. They found that the AI model showed some limited accuracy, both in tests using data from a head-mounted camera and on a dataset of idealized images of various objects, but with many We discovered that we can distinguish between different objects.
The AI system was good at naming frequently seen objects, such as apples (which often appear in children’s books) and cribs. Also, we can now better extract unobscured objects in head camera images. According to Wai Keen Vong, one of the study’s authors, they were particularly bad at recognizing knives.
Some psychologists and linguists believe that without innate language abilities, children would not be able to form associations between words and objects. But the fact that relatively simple AI models can even begin to learn word associations with such small datasets calls this view into question, Vong says.
However, it is important to note that the footage collected by the camera records the baby interacting with the world and its parents reacting to it. Andrei Barbu, a researcher at the Massachusetts Institute of Technology’s Computer Science and Artificial Intelligence Laboratory, said this means the AI model is “gathering what the child knows” and is using it to develop word associations. It is said to be advantageous. “If you put this model on a robot and let it run for 61 hours, you wouldn’t get the kind of data that we have here. That data will help us update models like this.”
Since compiling their results, New York University researchers plan to transcribe more than four times as much data from head camera footage and feed it into the model. They want to investigate how much more AI models learn when given more data, Vong said. They also want to test whether the model can begin to learn more difficult words and verbal behaviors that tend to develop later in life.
These experiments could shed more light on how babies learn to speak and help researchers understand the differences between humans and artificial intelligence. “There’s a lot to be gained by studying how humans acquire language,” Vong says. “And how can we do it so efficiently compared to current machines?”