Eric Rescorla wrote recently about three people who must have lots of trouble getting their email through spam filters: Jose Viagra, Julia Cialis, and Josh Ambien. I feel especially sorry for poor Jose, who through no fault of his own must get nothing but smirks whenever he says his name.
Anyway, this reminded me of an interesting problem with Bayesian spam filters: they’re trained by the bad guys.
[Background: A Bayesian spam filter uses human advice to learn how to recognize spam. A human classifies messages into spam and non-spam. The Bayesian filter assigns a score to each word, depending on how often that word appears in spam vs. non-spam messages. Newly arrived messages are then classified based on the scores of the words they contain. Words used mostly in spam, such as “Viagra”, get negative scores, so messages containing them tend to get classified as spam. Which is good, unless your name is Jose Viagra.]
Many spammers have taken to lacing their messages with sections of “word salad” containing meaningless strings of innocuous-looking words, in the hopes that the word salad will trigger positive associations in the recipient’s Bayesian filter.
Now suppose a big spammer wanted to poison a particular word, so that messages containing that word would be (mis)classified as spam. The spammer could sprinkle the target word throughout the word salad in his outgoing spam messages. When users classified those messages as spam, the targeted word would develop a negative score in the users’ Bayesian spam filters. Later, messages with the targeted word would likely be mistaken for spam.
This attack could even be carried out against a particular targeted user. By feeding that user a steady diet of spam (or pseudo-spam) containing the target word, a malicious person could build up a highly negative score for that word in the targeted user’s filter.
Of course, this won’t work, or will be less effective, for words that have appeared frequently in a user’s legitimate messages in the past. But it might work for a word that is about to become more frequent, such as the name of a person in the news, or a political party. For example, somebody could have tried to poison “Fahrenheit” just before Michael Moore’s movie was released, or “Whitewater” in the early days of the Clinton administration.
There is a general lesson here about the use of learning methods in security. Learning is attractive, because it can adapt to the bad guys’ behavior. But the fact that the bad guys are teaching the system how to behave can also be a serious drawback.
Leave a Reply