Researchers from Microsoft and GitHub Inc. conducted an in-depth study of the challenges, opportunities, and needs associated with building an AI-powered product copilot. The study involved interviews with 26 professional software engineers from various companies responsible for developing these advanced tools.
The race to incorporate advanced AI features into products continues, with virtually every technology company looking to add these capabilities to their software. However, many issues remain. Coordinating multiple data sources and prompts can increase the risk of failure, and LLM is difficult to test due to its inherent variability. Developers also struggle to keep up with best practices in this rapidly evolving field, often turning to social media and academic papers for guidance. Safety, privacy, and compliance are major concerns and require careful management to avoid potential harm or breaches.
“A one-stop shop for integrating AI into projects remains a challenge. Developers can get started quickly, move from playgrounds to MVPs, connect different data sources to prompts, and move AI components to prompts. “I’m looking for a place where I can do it, and my existing codebase will work efficiently,” writes Austin Henry. “Prompt linters can provide quick feedback. Developers can also use libraries and “Toolbox” Prompt snippets for common tasks. Additionally, tracking the impact of rapid changes would be extremely valuable,” Henry continued.
One of the key challenges identified was prompt engineering, the process of creating prompts that trigger the AI model’s inference process. “These large language models are often very fragile in terms of responses, so there’s a lot of control and manipulation of behavior through prompts,” said one participant (P7). This unpredictability makes it more of an art than a science, as developers must go through a time-consuming process of trial and error.
Another issue raised was related to benchmark testing. Generative models like LLMs (Large-Scale Language Models) have difficulty making assertions when each response can be different from the last. It’s as if every test case is an unstable test. One participant explained, “That’s why he runs each test 10 times” (P1), and another participant explained, “If you don’t have the right tools, experiments take the longest” (P12). added.
Additionally, participants expressed concerns about safety, privacy, and compliance issues related to integrating AI into products. For example, “Do we want this to affect real people? This is done in a nuclear power plant,” he said, adding that there are concerns about using such technology without proper safety measures. Highlighted the potential risks. Efforts to learn new skills and tools. “This is something completely new to us. We learn as we go. There is no particular path to doing it the right way.” (P1)
This change comes after Microsoft recently launched a redesign and upgrade to its own CoPilot. For example, all English-speaking Copilot users in the US, UK, Australia, India, and New Zealand can now edit images in the chat flow. This change was made after multiple Microsoft Copilot Pro subscribers reported performance issues.
I tested the experience on the latest Windows 11 Canary build and it worked well for text summarizing what I copied. The icon also animates when you copy the image, but this feature is not ready for testing yet. – Tom Warren
Developers interested in learning more about this research can read the summary or the full article.