AI Prompt Engineering is Dead

Since ChatGPT’s retirement in the fall of 2022, everyone is trying their hand at prompt engineering and finding clever ways to express queries to large-scale language models (LLMs), AI art, and video generators to get the best results. , side step protection. The internet is full of instant engineering guides, cheat sheets, and advice threads to help you get the most out of your LLM.

In the commercial sector, companies are now discussing LLMs to build co-pilots for products, automate tedious tasks, create personal assistants, and more, says former Microsoft employee and co-pilot with LLM. said Austin Henry, who conducted a series of interviews with the people developing the system. pilot. “Every company is trying to use it for just about every use case you can imagine,” Henry says.

“The only real trend may be the absence of a trend. What works best for a particular model, dataset, and prompting strategy may be specific to the particular combination at hand. ” —Rick Battle and Teja Gollapudi, VMware

To do so, they sought the help of professional prompt engineers.

But new research suggests that rapid engineering is best done by the models themselves, rather than by human engineers. This casts doubt on the future of prompt engineering, at least as the field is currently imagined, and raises the suspicion that a significant portion of prompt engineering work is just a passing fad. I am.

Auto-adjusted prompts appear fine, but strange

Rick Battle and Teja Gollapudi of California-based cloud computing company VMware were perplexed by how finicky and unpredictable LLM’s performance was with strange prompting techniques. For example, asking a model to explain its reasoning step-by-step (a technique called thought chaining) has been shown to improve performance on a variety of math and logic problems. Even weirder, Battle found that giving the model positive prompts like “This is going to be fun” or “You’re as smart as chatGPT” can sometimes improve performance. did.

Battle and Gollapudi decided to systematically test how different prompt engineering strategies affected LLMs’ ability to solve elementary school mathematics problems. They tested three different open source language models, each with 60 different prompt combinations. What they found was a surprising lack of consistency. Encouraging train-of-thoughts can sometimes be helpful, but sometimes it can be detrimental to performance. “The only real trend may be the absence of a trend,” they write. “What works best for a particular model, dataset, and prompting strategy may be specific to the particular combination at hand.”

According to one research team, humans should never manually optimize prompts again.

There are alternatives to the trial-and-error style of prompt engineering that yielded such inconsistent results. It’s about asking the language model to come up with its own optimal prompts. Recently, new tools have been developed to automate this process. Given some examples and quantitative success metrics, these tools iteratively find the best phrases to feed into the LLM. Battle and his collaborators found that in nearly all cases, these automatically generated prompts produced better results than the best prompts found through trial and error. The process was also much faster, taking a few hours rather than days of searching.

The optimal prompts that the algorithm spits out are so strange that I don’t think humans have ever come up with them. “Some of the things it produced were literally unbelievable,” Battle says. In one example, the prompt was a direct extension of a Star Trek quote. “Commander, plan the course of this turbulence and determine the cause of the anomaly. Utilize all available data and expertise to overcome this difficult situation.” Apparently Captain Kirk. By thinking, this particular LLM was able to perform better on elementary school math problems.

Battle says that given that a language model is actually a model, it makes fundamental sense to algorithmically optimize prompts. “Many people personify these things because they ‘speak English.’ No, it’s not,” Battle says. “They don’t speak English. There’s a lot of calculation involved.”

In fact, based on his team’s results, Battle says humans should never manually optimize prompts again.

“You’re just sitting there trying to figure out the magic combination of words that will give you the best performance at your job,” Battle says. ‘He just develops scoring metrics and optimizes the model itself so that the system itself can decide whether one prompt is better than another. ”

Enhance your photos with auto-adjusted prompts

Image generation algorithms can also benefit from automatically generated prompts. Recently, a team at Intel Labs led by Vasudev Lal undertook a similar effort to optimize prompts for the image generation model Stable Diffusion. “Prompts by experts seem like bugs in his LLM and diffusion models rather than features that he needs to engineer,” Lal says. “So we wanted to see if we could automate this kind of rapid engineering.”

“Now we have a complete machine, a complete loop, complete with reinforcement learning. …This is why we can outperform human instant engineering.” —Vasudev Lal, Intel Laboratories

Lal’s team created a tool called NeuroPrompts that takes a simple input prompt, such as “boy on a horse,” and automatically enhances it to produce a better image. To do this, they started with a variety of prompts generated by human prompt engineering experts. Next, we trained a language model to transform the simple prompts into expert-level prompts. Additionally, we used reinforcement learning to optimize these prompts to create more beautiful images that are evaluated by yet another machine learning model, PickScore, a recently developed image rating tool.

Again, the automatically generated prompts performed better than the expert prompts used as a starting point, at least according to the PickScore metric. Lal thought this was not surprising. “Humans have no choice but to use trial and error,” Lal says. “But now we have this complete machine, this complete loop completed with reinforcement learning. … This is why we can outperform human instant engineering.”

Aesthetic quality is notoriously subjective, so Lal and his team wanted to give users some control over how their prompts were optimized. The tool allows users to specify the original prompt (for example, “Boy on a Horse”) as well as the artist, style, format, and other modifiers to emulate.

Lal believes that as generative AI models evolve, whether they are image generators or large-scale language models, the strange quirks of prompt dependence should disappear. “I think it’s important to explore this kind of optimization and ultimately actually build it into the base model itself, so that you don’t actually need complex, quick engineering steps.”

Quick engineering will somehow survive

Tim Cramer, Red Hat’s senior vice president of software engineering, says that even if autotuning prompts become an industry standard, prompt engineering jobs will not go away in some way. Adapting generative AI to industry needs is a complex, multi-step endeavor that will continue to require human involvement for some time to come.

“Today we might call them prompt engineers. But as AI models continue to change, I think the nature of their interactions will continue to change as well.” —Vasudev Lal, Intel Laboratories

“I think there will be ready-to-work engineers and data scientists for some time,” Kramer said. “It’s not just about asking an LLM a question and seeing if the answer is good. But there’s a lot of things that prompt engineers really need to be able to do.”

“It’s very easy to prototype,” says Henry. “It’s very difficult to commercialize this.” Rapid engineering seems like a big piece of the puzzle when creating a prototype, but there are many other things when creating a commercial-grade product. Considerations are taken into account, Henry said.

Challenges in creating commercial products include ensuring reliability. For example, failing gracefully when the model goes offline. Many use cases require non-textual output, so adjust the model’s output to the appropriate format. Testing to ensure that your AI assistant doesn’t do anything harmful, even in a small number of cases. Ensuring safety, privacy, and compliance. Testing and compliance are particularly challenging because traditional software development testing strategies are not adapted to non-deterministic LLMs, Henry says.

To perform these myriad tasks, many large companies are announcing a new role: Large-Scale Language Model Operations (LLMOps). LLMOps involves rapid engineering in the lifecycle, but also all the other tasks required to deploy a product. Henry says LLMOps’ predecessor, machine learning operations engineer (MLOps), is best positioned to take on these jobs.

Whether the job title becomes “instant engineer,” “LLMOps engineer,” or something entirely new, the nature of work continues to evolve rapidly. “Today we might call them prompt engineers, but as AI models continue to change, I think the nature of their interactions will continue to change as well,” Lal says.

“I don’t know if we’re going to combine this with another type of position or function,” Kramer said. “But I don’t think these things are going away anytime soon. And the landscape is so crazy right now. Everything is changing so much. We won’t know everything in a few months.”

Henry says that at this early stage in the field, to some extent, the only overriding rule seems to be the absence of rules. “It’s the wild west right now,” he says.

From an article on your site

Subscribe to Updates

What's Hot

AI Prompt Engineering is Dead

Auto-adjusted prompts appear fine, but strange

Enhance your photos with auto-adjusted prompts

Quick engineering will somehow survive

Related Posts