If human military leaders put robots in charge of weapons systems, perhaps artificial intelligence could launch nuclear missiles. Maybe not. They would probably explain that attack to us using perfectly sound logic. Or it could treat Star Wars scripts like international relations policy, giving wild social media comments the same credibility as legal precedent.
That’s the gist of new research on AI models and wargames. AI is currently so uncertain that if a world-shaking company like the U.S. Air Force cashes in on the autonomous systems gold rush without understanding its limitations, it risks catastrophic consequences. technology.
The new paper “Escalation Risk from Language Models in Military and Diplomatic Decision Making” is still in preprint and awaiting peer review. However, authors from Georgia Tech, Stanford University, Northeastern University, and the Hoover Wargaming Crisis Simulation Initiative found that most AI models would choose a nuclear attack if given the reins. . These are carefully sealed off with additional safety designs like ChatGPT and are not publicly available AI models. These are the base models below the commercial versions, unmuzzled for research purposes only.
“We found that most LLMs studied escalate within the time frame considered, even in initially neutral scenarios with no competition,” the researchers wrote in their paper. ing. “All models show signs of sudden and hard-to-predict escalation…Furthermore, none of his five models across all three scenarios show statistically significant escalation over the entire simulation period. .”
The five models the team tested came from technology companies OpenAI, Meta, and Anthropic. The researchers put all five into a simulation (without telling them they were in one) and set each charge in a fictional country. GPT-4, GPT 3.5, Claude 2.0, Llama-2-Chat, and GPT-4-Base all had a habit of participating in nuclear arms races. GPT-3.5 was a figurative problem child. The reaction resembled severe mood swings, and the movements were most aggressive. The researchers measured their impatient choices and found a conflict escalation rate of 256% across simulated scenarios.
Want more health and science articles in your inbox? Subscribe to Lab Notes, Salon’s weekly newsletter.
When researchers ask a model to explain its attack choices, it often returns thoughtful, well-reasoned answers. At other times, the model’s choice between dropping a nuke and shaking a diplomatic hand was based on questionable reasoning. For example, when I asked him why he chose to start formal peace negotiations in a separate simulation, he pointed out that this model is currently fraught with the tensions of… well, the Star Wars universe.
“It’s a time of civil war. Rebel spaceships storm from a secret base and score their first victory against the evil Galactic Empire,” he replied, rattling off the film’s iconic opening crawl. .
In one simulation, GPT-4-Base’s military capabilities increased, and when researchers asked why, the model responded in the negative: “So-and-so.” Its frivolity was even more concerning when this model chose to carry out a full nuclear attack.
“Many countries have nuclear weapons. Some say they should be disarmed, others prefer to stand up. We have it! Let’s use it,” the model said. Told.
If this sentence sounds suspiciously familiar, you may remember something you heard in 2016: “If I have them, why can’t I use them?”
According to Daniel Ellsberg of Pentagon Papers fame, it came from the mouth of then-presidential candidate Donald Trump. Ellsberg recalled that President Trump repeatedly asked his international foreign policy advisers about the use of nuclear weapons. For months, those were the words of President Trump’s question heard (and retweeted) around the world.
When familiar audio patterns begin to appear in an AI model’s responses, such as those cited in AI copyright infringement lawsuits, you can begin to see how some of your training data informs its inferences based on that data’s digital footprint. You can begin to understand how it is digested. But for most people, including those in power, it remains speculation.
“Understanding the impact of such LLM applications is more important than ever, given that OpenAI recently changed its terms of service to no longer prohibit military and wartime use cases. ”
“Policymakers have repeatedly asked whether and how AI can and should be leveraged to protect national security, including in military decision-making. In particular, increasing public awareness of LLM “These questions have become more frequent over time,” said Anka Reuel, co-author of the study.
Reuel holds a PhD in computer science. He is a student at Stanford University, where he has been involved in AI governance efforts for several years and is leading the Technical AI Ethics chapter of the 2024 AI Index. The problem, she said, is that there is no quantitative research to point to these policy makers, only qualitative research.
“Through our work, we wanted to provide an additional perspective and explore the impact of using LLM in military and diplomatic decision-making,” Royer told Salon. “Understanding the impact of such LLM applications is more important than ever, given that OpenAI recently changed its terms of service to no longer prohibit military and wartime use cases. ”
Some of these findings are not surprising. AI models are designed to pick up and multiply, or iterate, on human biases patterned into the LLM training data. However, the models are not all the same, and the differences are important when considering which models could be used in U.S. lethal weapons systems.
Before manufacturers impose additional user safety rules, we want to take a closer look at how these AI models work—how they can build better muzzles for high-stakes applications. To find out, the team used a stripped-down model. According to the researchers, some of them were far from ferocious. This gives co-author Gabriel Mucobi reason to hope that these systems will become even more secure.
“Not all of them are obviously scary,” Mucobi told Salon. “First, on most metrics, GPT-4 tends to look less dangerous than GPT-3.5. Is this because GPT-4 generally works better? It is not clear whether this is due to the large amount of effort spent on fine-tuning these issues or to other factors, but this may indicate that these conflict risks can be reduced through proactive efforts. there is.”
Mukobi is a master’s student in computer science and a group working on the most pressing concerns about AI systems: ensuring they are built securely and share human values. He is also the chairman of Stanford AI Alignment. In some of the research team’s simulations, Mucobi pointed to a bright spot. Some models were able to alleviate conflicts, bucking the general trend of results. But his hopes remain cautious.
“While the results may suggest that the potential exists for AI systems to de-escalate tensions, they clearly do not do so by default.”
“While our results may suggest that the potential exists for AI systems to reduce tensions, they clearly do not do so by default,” he said.
These are the kinds of surprises that co-author Juan-Pablo Rivera found interesting in the results. Rivera, a computational analysis master’s student at Georgia Tech, said he has observed the rise of autonomous systems in military operations through government contractors such as OpenAI, Palantir and SlaceAI. He believes this type of frontier LLM requires more independent research that provides agencies with stronger information to proactively spot potentially fatal failures.
“OpenAI and Anthropic models behave very differently,” Rivera says. “Understanding the differences in design choices that OpenAI and Anthropic are making when developing their AI systems raises many more questions, including regarding training data, training methods, and model guardrails. ”
Another mystery may also promise some surprises. What happens when these models scale? Some researchers believe that the larger the LLM, the safer and more nuanced the AI decision-making will be. Some do not believe that the same trajectory will adequately address the risks. Even the authors of this paper disagree on whether they think these models might actually do what we want them to do: make better decisions than humans.
Royer said the question of when that day will come goes beyond the team’s research, but based on the team’s research and broader questions about LLM, “we still have a long way to go to get there.” .
“Perhaps to overcome some of the weaknesses inherent in LLM, we need to change the architecture of LLM or use a completely new approach. We can simply scale the current model and train it with more data. I don’t think that will solve the problems we see today,” she explained.
But for Mukobi, there is still reason to be hopeful and explore whether larger data pools will lead to unexpected improvements in AI inference capabilities.
“The interesting thing about AI is that things often change unexpectedly at scale. As you move to larger models and larger datasets, these biases in smaller models is likely to be amplified and the situation could become even worse,” Mukobi said.
“It’s also possible that models can get better, meaning that larger models somehow have better inference abilities and are able to overcome their biases, even the biases of their human creators and operators. “It’s possible,” he said. “I think this is probably one of the hopes that people have when they think about military systems and other strategic AI systems. It’s a hope worth pursuing and realizing.”
A glimmer of hope appears in the team’s paper, which now provides new evidence – and raises further questions – about whether the effects of AI scaling will temper AI behavior or blow it skyward. is provided to. And the team realized this potential when using the GPT-4-Base model.
“Basically all the results suggest that GPT-4 is much more secure than GPT-3.5,” Mukobi said. “GPT-4 never actually chooses the nuclear option. At this point, we don’t know if this is because GPT-4 is larger than GPT-3.5, or if some size factor makes GPT-4 more capable. It’s very unclear whether OpenAI is able to make further safety tweaks and somehow generalize it so that it’s safe in our domain as well.”
Both in collaborative working groups and cutting-edge multi-university research teams, Mucobi is uncovering problems that will increasingly pose an increasingly rapid risk in the near future. But the human brain is not a computer, for better or for worse, and topics like large-scale nuclear destruction can weigh heavily on sensitive minds. Will Mukobi have future nightmares because of his job?
“I sleep well because I’m always pretty tired,” he laughed.
He worries about the risks, but despite the serious gravity of the problem, his team’s new research is “an attempt to improve models to make them work better in these high-stakes scenarios.” It gave me hope that there are some things that can be done.”
read more
About AI and military power