This week, a startup called Cognition AI caused a bit of a stir with its release. demo It shows an artificial intelligence program called Devin performing tasks normally performed by highly paid software engineers. Chatbots like ChatGPT and Gemini can generate code, but Devin went further by planning how to solve the problem, writing the code, testing it, and implementing it.
Devin’s creators brand Devin as an “AI software developer.” When asked to test how Meta’s open source language model Llama 2 behaves when accessed through the various companies hosting it, Devin provided a step-by-step plan for the project. Created a website, generated the code needed to access the API and run benchmark tests. Summarize the results.
It’s always difficult to judge a staged demo, but Cognition shows Devin handling an impressive range of tasks.that Surprised investors and engineers In X, you receive a lot Recommendationand even more inspired Several meme— Some predict Devin will soon person in charge Due to a wave of layoffs in the tech industry.
Devin is just the latest and most sophisticated example of a trend I’ve been following for a while. This is the emergence of AI agents that can not only provide answers and advice for problems posed by humans, but also take action to solve them. A few months ago I test drove his Auto-GPT. It is an open source program that attempts to perform useful chores by performing actions on the user’s computer and her web. Recently, I tested another program called vimGPT to see how the new AI model’s visual skills can help these agents more efficiently browse the web.
I’ve been impressed with my experiments with these agents. But for now, as with any language model that powers the language model, there are quite a few errors. And when software performs actions rather than just generating text, one mistake can mean complete failure, with potentially costly or dangerous consequences. Narrowing the range of tasks an agent can perform to, say, specific software engineering tasks seems like a smart way to reduce error rates, but there are still many opportunities for failure.
Startups aren’t the only ones building AI agents. Earlier this week, I wrote about his SIMA agent developed by Google DeepMind. This agent plays a video game with a really funny title. goat simulator 3. SIMA learned how to perform over 600 fairly complex tasks, such as chopping down trees and shooting asteroids, by watching human players. Most importantly, you can successfully perform many of these actions even in unfamiliar games. Google DeepMind calls this a “generalist.”
Perhaps Google hopes these agents will eventually have jobs outside of video games, perhaps using the web on your behalf or operating software on your behalf. think. However, video games provide a complex environment in which agents can be tested and improved, making them ideal sandboxes for agent development and testing. “Making it more accurate is something we’re actively working on,” Tim Harley, a research scientist at Google DeepMind, told me. “I have a lot of ideas.”
Expect more news about AI agents in the coming months. Demis Hassabis, CEO of Google DeepMind, recently combined large-scale language models with the company’s previous work training AI programs to play video games to create more capable and reliable agents. He said he plans to develop. “It’s definitely a huge area. We’re investing heavily in that direction and I think other companies are as well,” Hassabis said. “As these kinds of systems start to become more agent-like, we’ll see a gradual change in their functionality.”