Microsoft this week released a Python tool that probes AI models to see if they can be hoodwinked by malicious input data.
And by that, we mean investigating whether, say, an airport’s object-recognition system can be fooled into thinking a gun is a hairbrush, or a bank’s machine-learning-based anti-fraud code can be made to approve dodgy transactions, or a web forum moderation bot can be tricked into allowing through banned hate speech.
The Windows giant’s tool, dubbed Counterfit, is available on GitHub under the MIT license, and is command-line controlled. Essentially, the script can be instructed to delve into a sizable toolbox of programs that automatically generate thousands of adversarial inputs for a given AI model under test. If the output from the model differs from what was expected from the input, then this is recorded as a successful attack.
For example, if a model is shown a slightly altered picture of a car and it predicts it’s a pedestrian, then that’s a win for Counterfit and a vulnerability in the model identified. The goal is to reveal any weak spots in the machine-learning system under test.
You can test models you’ve trained yourself, or black-box models you’ve acquired from network edge devices, mobile applications, or academic projects. Inputs can be text – for probing, say, sentiment analysis systems – or images for computer-vision apps or audio for, say, transcription services.
For instance, Counterfit’s documentation and code includes a tutorial involving a pretrained model taught to identify handwritten numbers from the MNIST database of scribbles. The tutorial shows you how to set up Counterfit to use the so-called Hop-Skip-Jump technique, implemented in the Adversarial Robustness Toolbox, to slightly modify a picture of the number 5 so that the model think it’s a 3.
“This tool was born out of our own need to assess Microsoft’s AI systems for vulnerabilities with the goal of proactively securing AI services, in accordance with Microsoft’s responsible AI principles and Responsible AI Strategy in Engineering (RAISE) initiative,” Redmond’s Will Pearce and Ram Shankar Siva Kumar said in a blog post.
“Counterfit started as a corpus of attack scripts written specifically to target individual AI models, and then morphed into a generic automation tool to attack multiple AI systems at scale.”
A record of successful attempts to fool the given model under test is logged so developers can inspect the inputs to see where they need to shore up their software – or for attackers and penetration-testers to identify where and how to hit a program.
Microsoft said its own AI red team uses Counterfit to probe the tech giant’s algorithms in production, and is looking for ways to adapt the tool so it can scan automatically models for vulnerabilities before they’re deployed. ®