Just as AI tools like ChatGPT and Copilot have transformed the way people work in all kinds of roles around the world, they’ve also reshaped what’s known as “red teams.” These are groups of cybersecurity experts who think like hackers to help keep technology safe and secure.
Generative AI’s ability to communicate in multiple languages, write stories, and even create photorealistic images introduces new potential risks. These range from providing biased or inaccurate results to giving attackers new ways to stir up discord. These new risks have prompted a different and broader approach to how Microsoft’s AI Red Team needs to work to identify and mitigate potential harm.
“We believe that safety, responsible AI, and the broader notion of AI security are different sides of the same coin,” says Ram Shankar Siva Kumar, principal research leader at Microsoft. “It’s important to have a comprehensive and comprehensive view of all the risks of an AI system before it reaches a customer’s hands. This is an area that will have massive sociotechnical implications.”
The term “red teaming” was coined during the Cold War, when the U.S. Department of Defense conducted simulation exercises with “red teams” that played the Soviets and “blue teams” that played the U.S. and its allies. The cybersecurity community then adopted the language a few decades ago, creating red teams to act as adversaries trying to break, corrupt, or misuse technology—with the ultimate goal of finding and fixing potential damage before real problems arise.
When Siva Kumar formed Microsoft’s AI Red Team in 2019, he followed the traditional model of bringing together cybersecurity experts to proactively investigate weaknesses, much like the company already does with all its products and services.
At the same time, Forough Poursabzi was leading researchers across the company in studies that took a new angle and a different lens on responsible AI, now looking at whether generative technology could be harmful—either intentionally or due to systemic issues in models that were overlooked during their training and evaluation. This is not something red teams have had to deal with before.
The different groups quickly realized that they would be stronger together and joined forces to create a broader red team capable of assessing security risks and social harms side by side, adding to the team a neuroscientist, a linguist, a national security expert, and several other experts from diverse backgrounds and different fields.
“We need a broad range of perspectives to do a responsible AI Red Team well,” says Poursabzi, senior program manager on the AI Ethics and Effects team in Engineering and Research (Aether) at Microsoft. This team explores the entire responsible AI ecosystem at Microsoft and analyzes emerging risks and long-term considerations for generative AI technologies.
The dedicated AI Red Team, led by Siva Kumar, is separate from those building the technology. Its expanded scope includes adversaries who might try to force a system to generate hallucinations, as well as harmful, offensive or biased results, all due to inadequate or inaccurate data.
Team members take on a variety of personas, from a creative teenager playing a prank to a familiar adversary trying to steal data, to uncover blind spots and uncover risks. Team members live around the world and collectively speak 17 languages, from Flemish to Mongolian to Telugu, to help with the diverse cultural contexts and threats specific to each region.
And they don't just try to compromise systems alone – they use large language models (LLMs) for automated attacks on other LLMs.
The group has further expanded the depth of its expertise by releasing open source frameworks such as the Counterfit and the Python Risk Identification Toolkit for generative AI, or PyRIT, earlier this year. They are designed to help security professionals and machine learning engineers working outside the enterprise map potential risks. The tools help red team specialists – a limited resource – be more efficient and productive. The team published best practices from your experiences to help others get started.
Once Microsoft’s AI Red Team finds an issue, it sends it to the responsible AI measurement team, which assesses how much of a threat the issue poses. Other experts and internal groups then address the issue to complete the three-step approach to safe AI: mapping, measuring, and managing risks.
“Our practice covers a wide range of harms that we try to prove,” says Siva Kumar. “We adapt and retool quickly, and that has been the recipe for our success—not waiting for the forces of change to mount, but anticipating them.”
Learn more about Microsoft's Responsible AI Work.
This post is part of Microsoft’s Building AI Responsibly series, which explores key concerns around AI deployment and how the company is addressing them with its responsible AI practices and tools.