James Mickens is a computer scientist well-known for his hilarious conference presentations. I'll assume you've seen the classics but maybe not a certain 2018 speech of his titled:
Q: Why Do Keynote Speakers Keep Suggesting That Improving Security Is Possible?
A: Because Keynote Speakers Make Bad Life Decisions and Are Poor Role Models
which, despite the title, is actually a crash course in AI with a focus on ethics/real world consequences. Plus security. The thesis is something like: it is a terrible idea to take a system whose inputs and logic you at best vaguely understand and put it in charge of important decisions. Further, if you're feeding it data from humans, both when you train it and when you use it for inference, then the inputs and logic can be compromised (for various definitions of compromised). Presented with the relentless malice of the internet, or, worse, the real physical world, your best case scenario of the machine's learned logic may well not come to pass.
This really hit home for me as something industrial ML efforts are doing a terrible job with. It's actually quite concerning, when you consider things like autonomous vehicles, but also describes a baseline foolishness "in the rush to deploy AI", in Mickens' term.
The rush to deploy AI is related to the pressure to collect data. As you collect more data and apply increasingly automatic modeling methods and computing power, you can start vacuuming up lots of signal and producing some sort of predictive output. But the less you know about your data -- and knowing is not keeping pace with collection or modeling -- the less sure you can be that the industrial probability sensor you're wiring to your important decision process is something that actually will end up agreeing with your actual values. We're typically using these things for social, human, applications, not modeling unchanging laws of thermodynamics. The legitimacy of generalization from your training to your application is not nearly as clear.
Let me give some examples:
Suppose you run a credit card fraud detection model. You have lots of data about transactions, where and when and for what, as well as lots of data about your account holder. You have also pulled in data from your customer service systems, and are tracking the contacts customers are making with you, about what topics, how recently, etc.
Your decision trees find a little pattern: when suspicious transactions occur after a customer has contacted you about traveling, they are much less likely to be actually fraud. Perhaps your analysis of your model turns this up: you find the importance of the date_diff_contact_txn
value under the contact=travel notice
constraint. You're actually proud of this, since it proves that extra data you brought in turned out to be useful!
But where did those important features really come from? You used the phone number from the customer contact records, looked it up against your customer account record. Perhaps the categorization is input by the user in your support phone automation system.
If that were true, you'd have a sadly insecure fraud detection system! The villain could easily spoof the customer number and provide this fake travel notice between obtaining and using the stolen card.
What has gone wrong is that you enlisted systems that normally have low-security applications as inputs into a high-security operation. The question to ask when integrating that extra peripheral data is: how could this be used against my model at inference time? Sure, the attack vector is small, but making a "kitchen sink" model with features from dozens of not-to-be-trusted systems and user-supplied inputs multiplies that vulnerability. Worse, it's typically considered as a 'modeling' problem and accomodated in often-permissive error rate goals, rather than being identified as a security concern.
We're now seeing across more and more applications the power of adversarial networks, oppposing learners which are trying to competitively fool eachother. However, the framework for much commercial machine learning assumes a relatively consistent and obedient adversary. Rather than competing against an equal, we assume we are simply trying to outdo a static baseline.
In digital advertising, it's routine to trust your client to identify itself accurately, answer truthfully about behavioral data that you trust it to store itself. We think that we, the data collector, are in the center of a sort of panopticon, with clients unaware of eachother's outcomes.
Suppose you run a digital 'retargeting' ad campaign, in which near-miss visitors who didn't purchase on your site are offered a deal that you wouldn't present to the public. Rather than provide a coupon code, you attach some information on your links in these ads. Maybe you never even planned the special deal, because you'd wired your pricing system up to an AI price discrimination system that maximized profit given each visitor's information.
But suppose that your visitors are all running the same browser extensions, which inspects the inputs the visitors' browsers are being asked to provide you, and watches the price outcomes. This system could learn the lowest valleys in your price discrimination space, and looking at your model's own metrics, you might not even notice an issue.
A common thread of thinking about your ML inputs in security terms is to assume that your adversaries have a copy of your model. If they can collect all the same data that you can -- since you're asking them to provide the inputs and then sending them the output -- they could very well catch up with you. Obscurity is not a sufficient security strategy for the real world processes we are now thinking of turning over to ML.
An important but often neglected part of modeling is 'error analysis', where you actually look at the examples that your ML is getting wrong and try to understand what it's missing.
A counterpart for ML security might be 'induced error analysis', where you take a correct decision and then fuzz the input until the decision becomes wrong. Give a hearty fuzz to any input that might be user-supplied or easily faked. Give a fuzz to inputs that you're likely to get wrong in measurement. Look at the altered example when the model starts to get it wrong.
If you have an error budget, or a performance benchmark that's required to justify deploying the model, you might take apply this fuzzing to your test set and count against you any observation which is fuzzable into getting wrong. You're only getting it right if the entire user-controlled space around the input is right.
If you have only a few user-trusted fields, then perhaps your metrics haven't suffered. But if you have many, you'll see that the possible alterations are too drastic, and you can't really go out there with that model.
Which goes back to Mickens' talk. One of the axioms he observed underlying the current AI craze is: "History has nothing interesting to teach us." Wouldn't something as simple and well known in computer security as fuzzing potentially have something to teach us after all?