Microsoft Researching Anti-Spam Technique 660
Tim C writes "Microsoft's Research group are working on a technique to combat spam. Dubbed the 'Penny Black project', it involves making email senders perform a computation taking around 10 seconds, which their recipients can then check for. This delay would limit bulk emailing speeds to around 8000 a day, meaning that to spam all of those 'fresh, guaranteed 25 million addresses' would take approximately 8.5 years." We've reported on this before.
I RTFA, but what exactly is it? (Score:4, Interesting)
Involves calculating hashes (Score:5, Interesting)
Re:Question... (Score:3, Interesting)
You really want to email me [or get priority over other emails] you will do as I say.
Of course you can get to the point where it's too much hassle. I think MSFT is seeking to have this built into OE [e.g. integrated]
Tom
Why not charge per message? (Score:2, Interesting)
Re:not a solution (Score:2, Interesting)
The idea though is that you can automate the process. E.g. unless the email has a tag on it that's valid you delete/filter the message.
Tom
Comment removed (Score:5, Interesting)
GPU's? (Score:2, Interesting)
But they soon realised it was better to use memory latency - the time it takes for the computer's processor to get information from its memory chip - than CPU power.
Don't GPU's have a lot smaller memory latency?
hmm, whats this?
BrookGPU: General Purpose Programming on GPUs [slashdot.org]
Re:Oh yeah they invented this... (Score:3, Interesting)
This memory-bound one doesn't have such a nice reduction but it's conjectured to be similar.
So you can't "fake the method". Sure they could put a fake header in there, e.g.
X-MBHC: BLAH
But the verifier could trivially see it was faked.
Tom
Re:not a solution (Score:3, Interesting)
The second point that I have is that the whining is interesting, and this is a big part of the problem. We, the lazy users, will absolutely have to get used to taking some sort of action ourselves as part of whatever the SPAM solution turns out to be. Right now we like the very low barrier to entry into the e-mail community, but that is exactly what makes SPAM possible.
I have taken a couple of very small steps in the direction of participation in the solution. I decided to start signing all of my e-mail with my PGP signature. It it ignored by many and it confuses many, and it probably makes some roll their eyes (it's quite a geek fashion statement). But it damn sure identifies the message as one that I wrote, and it (sort of, except without a CA) identifies me as a person and not a spammer. I feel that PGP signatures might very well be a part of the SPAM solution. Everybody could sign all of their e-mail, which is getting easier for non-geeks every day, and we could all start rejecting e-mail that is not signed. We could even all get real keys from real CAs and reject all mail from users that have not been independently verified. Send whatever you want in your e-mail, even Viagra ads, but make sure I can trace it back to YOU.
The second step I have taken is to install and use SpamAssassin on my mail server. It's something that is making the situation more tolerable, although it's still costing me a little in terms of bandwidth of the messages I never see and don't want to see being sent to my server. It also minimizes the impact of SPAM on me, which could be a bad thing because my SPAM problem is actually bigger than I regularly realize. But my point is that it required some effort on my part. It wasn't enough for me to bitch about SPAM. I had to take an action.
SPAM is more like terrorism (bear with me) than is initially obvious. Do you check under your car for a bomb before you get in? Neither do I. But I did when I lived in a place where car bombs against my demographic were a reality. I altered my behavior to counter the threat. I could have said, "I shouldn't have to check under my car," but instead I got down on the ground and took a look. I could also say, "Airport security is an inconvenience, " or "Do I look like a terrorist?" or "SPAM should just go away or be 'fixed' by the government or somebody like Microsoft, but not in a way that I have to participate." But the problem is here and it's staring us in the face. We must change our behavior in order to fix the problem. Once we're all on board with the fact that we are all a part of the solution, we can be free of it.
This MS Research stuff is all very interesting, and all ideas are welcome at the table of solutions, but the neat thing is that the technology to remove SPAM from our lives already exists. But it's a little strange and uncomfortable. It would be great if we could all pull together on some sort of e-mail signing solution and work together to get the word out to the world that we can take our e-mail system back.
First, though, we have to get over the fact that we MUST change our assumptions and we must raise the barrier to entry -- not much, but some.
Finally, I'm sure I probably misunderstood the spirit of your reply. It got me started on a vent, and that's not a bad thing.
RP
Re:what's your point? (Score:4, Interesting)
On the other hand, IBM Research has done pretty well, though it too has gone through hard times. Its contributions to open-source are substantial, and at the same time, it's much more in touch with the demands of the company.
Now, if someone had beaten me to it and moderated my parent as flamebait perhaps I'd have kept quiet....
Re:Oh yeah they invented this... (Score:3, Interesting)
However, 8MB of what essentially amounts to cache is expensive. This means now for a spammer to spam in volume they have to buy a $20,000 cpu.
The trick though, is in the original HC to make spammers slow down you have to slow down the lower end users.
MSFT research realized that if you make the memory bus the major limitation you can level most desktops. E.g. a P4-3000 is only 4 times faster than a P2-233 in terms of tag generation.
Ram is relatively cheap [even in older desktops] so you can step this upto [say] 32MB buffers. They will only be required to send an email but will totally prevent "zero-wait state 32MB cells" since they would cost a shit load of money.
Of course this makes the system useless for portables since they often have little memory to spare. At the conference the speakers suggested that the ISP would then generate tags [at a cost] for the users.
Tom
Why not just.... (Score:5, Interesting)
Re:Oh yeah they invented this... (Score:5, Interesting)
I believe you 100%, only Microsoft would come up with a solution that artificially induces inefficiency.
I'm no fan of Microsoft, but this is silly. Lots of security tools "artificially induce" inefficiency. One relatively early example that comes to mind is Unix crypt, the function originally used to hash passwords. It runs a DES-like algorithm many times to produce its results, not because that improves the quality of the hashing, but because it takes longer, which makes brute force attacks harder. The Unix login program also deliberately introduces an artificial delay after every failed login attempt, and it's not to give you time to remember your password.
There are many instances in which slowing down legitimate users a little is an effective mechanism for deterring abuse.
That said, I still think this particular idea is stupid, since there are plenty of people who have a legitimate reason to send large volumes of e-mail, and this would cause them more pain than it would cause spammers.
Re:Mailing list operators do use their own compute (Score:3, Interesting)
Well, maybe. There still could be a white list for cases like this.
I think that high volume mailing lists should probably actually be newsgroups anyway. But what it does do is put a crimp in people who host a lot of low volume mailing lists.
Re:what's your point? (Score:5, Interesting)
And my point is that your comment is both insulting to MSR and misses the point.
Your comment is insulting to MSR because anybody who knows anything about CS research knows that MSR has top people. They have produced hundreds of first tier journal publications over the years. This is just a minor publication among many good things MSR has done.
It's meaningless because you are missing the main problem that all industrial research labs share: making the connection between research and products. MSR has been as unsuccessful at that as any other of the big industrial computer research labs before. Microsoft's problems is the quality and lack of innovation in their products, not their research labs.
mod parent offtopic.
I suppose when your points are weak, you have to fall back on calling on moderators. Why don't you engage your brain instead of falling back on such underhanded tactics?
Why bother with the computation? (Score:3, Interesting)
So why bother with all the computation and hashing, and just refuse to accept connections from a given IP except every 10 seconds? So if an email was sent from AAA.BBB.CCC.DDD at 00:00.00, don't accept another from that IP until 00:00.10.
This makes it happen entirely at the recipeient server side, so you're not breaking SMTP, and it's backwards compatible with everyone else.
On the other hand, if it's 10/sec per email it doesn't sound like this would be feasable to implement:
Re:This not only isn't going to work, it's a disas (Score:2, Interesting)
Micro$oft's proposal has several issues. First, the proposal itself:
"If I don't know you, I have to prove to you that I have spent a little bit of time in resources to send you that e-mail."
This changes the effort to convincing the system that I know you and we can bypass all of this. Microsoft's track record tells me that this will be accomplished quickly (likely before the software even reaches final release.)
"...use memory latency
No, it relies on bus speeds and memory speeds, not to mention caching schemes. These change almost as rapidly as processor speeds these days.
All of that is meaningless when you look at the greater problem:
"For this scheme to work, it would want to be something all mail agents would want to do,"
There are 2 ways to implement such a solution; on the server side and on the client. As for the server:
Not just want to do but be able to do. Since SMTP severs began requiring authentication (several years ago), most spammers have turned to using old servers still alive on the net. These would not have new schemes implemented. Denying them to play if they don't update would kill several servers (including several universities).
As for the client:
Anyone who can say "HELO" can send a mail (see RFC 821, RFC 1123, RFC 2821). This means that any decent coder can write a mail SMTP client in about 30 minutes. We will never be able to assume all spammers are using any e-mail client.
"It is certainly not going to stop all spam for good"
And in the aftermath, we will all have slowed our systems with no effect on spam levels.
Re:Question... (Score:3, Interesting)
How do you know if the key is valid?
Why can't a spammer just make up a false key? Does the client check it mathematically? How long does that take? Why not just delete the spam manually (like we all do now) if it's still going to take time to filter it out?
LK
Re:Spammers don't use their own computers (Score:2, Interesting)
Kleedrac
Re:Mailing list operators do use their own compute (Score:3, Interesting)
I think that high volume mailing lists should probably actually be newsgroups anyway. But what it does do is put a crimp in people who host a lot of low volume mailing lists.
As somebody who hosts low-volume mailing lists, I have to agree.
Whitelists are nifty (we use them extensively), but what worries me on that score is that if they become frequent, I suspect we'll just see spammers hijacking address books along with machines, and forging "trusted" From lines.
Re:not a solution (Score:5, Interesting)
First setup a whitelist, make this your first spam check. On the whitelist? Email goes through never checking for any other spam criteria. (Mailing list should be accepted here).\
For mail that doesn't pass the white list check we can check for the header created by the MS program. We verify that the computationally intense header is correct and maybe we can let that through if we want, maybe I let emails with this tag pass through my spam checker with a higher spam score.
If we decided to accept mails with the header, we now check the remaining email with a very thorough spam checker and use a very low score.
No matter how many computers they have, it will lower the number of emails that are able to be sent, if people filter on this criteria.
Re:Why bother with the computation? (Score:3, Interesting)
The idea is not to take longer sending one email. Spammers don't send spam one at a time and wait for the first one to be finished before sending the second one. The idea is to force the spammer to spend something, specifically in this case 10-20 seconds of CPU time, per message. If all you are doing is sleeping 10 seconds, the spammers can out multithread you and just wait, while making 10000 other SMTP connections in parallel doing the same thing. The rate of messages will ultimately be the same but it will just take 10 seconds longer for the rate to reach the peak. Imagine what work the spammer's spam engine is doing while 10000 victims are sleeping for 10 seconds ... nothing at all ... then as soon as those sleeps are done, the spam flows. The spammer just has to raise the number of concurrent connections that are done. RAM is cheap.
Your proposal would affect how many spams you get from that one spammer, but not how much total the spammer can get through. If you get more than 8640 spams per day from the same one IP address, then your proposal will be effective. But many spammers have 1000 servers, and some have 1000000 or more cracked windows machines at their disposal. Even the crypto idea is weak against the latter situation.
Re:Proposed "Sender do Something" technique. (Score:3, Interesting)
As a matter of policy, I do not respond to whitelisting requests because the sender of the whitelisting request has already accused, with zero basis in fact, of being a spammer...
If you got a whitelisting request from him, it would have been because your message looks like spam. That is not a zero basis in fact from his perspective.
In fact it would be because you did something in your email to total a high bayesian filtering score.
As the sender *I* would not be insulted if that were to happen. In fact, it would be great to know that the mail I send is not being silently trashed. How unimportant is your message that the perceived insult is of greater importance?
I always wonder these days whether a mail got through, when it is not answered. I find I end up on the phone more often than not, because mail is no longer a reliable method of communication due to spam.
If you continue to get a lot of whitelist requests after such a system is implemented, it would behoove you to make your mail look less like spam. For instance, not using Base-64 encoding, or sending purely HTML mail, or including trademarked names of pharmaceuticals, or including random strings of characters, linking to spam domains, putting lookalike accented characters or too much punctuation in the subject line, or cc'ing or bcc'ing everyone in your mail.
research? microsoft? (Score:4, Interesting)
Yahoo works better with regards to spam though I wish it would empty the bulk mail folder more often.
And my pop3 acct has something called greylisting and that alone cuts 95% of spam. Plus black and white listing IPs and domains helps too (for instance, only allowing email from hotmail.com if it originates from one of hotmail's servers, etc.) and blocking known spam-haven Class C ranges (eg x.x.x.*).
Re:This not only isn't going to work, it's a disas (Score:2, Interesting)
First, the protocol is overly complex. The receiver sets the puzzle. How does the receiver to this. But sending the puzzle before receiving the email? That is complex, perhaps involving connections that must remain open for tens of seconds, or lists that correlate puzzles to particular senders, and the sender must match the answer. How will the puzzle be generated. Will it be psuedorandom or pad. How will we gauge the strength of the puzzle. I do not see how this is superior to current filtering.
Second, alternate filtering methods will still be needed. Whitelists will have to be kept so that friends, interoffice mail, and current customers will not be challenged. Email that does not meet the challenge will still have to be accepted and filtered. The only advantage is that certain email will be tagged as 'safe' because the sender solved your puzzle. This 'safe' email will still often have to filtered to meet the specific needs of the receiver. For instance, a 'safe' email may still contain graphic sexual content unsuitable for the office.
Third, there may be no way to know whether the calculation was done. If the puzzle is pseudo-random, the sender may exploit some weakness. If the puzzle is off a standard one-time pad, and the number of puzzles are finite, or can be cataloged into a finite number of sets, the sender may have database that already contains complete or partial answers. So, even if the spammer is not using owned hardware, there is no way to know that each email is in fact generating any specific liability.
Again, this is a ploy for MS to sell servers to advertisers. The number of machines, and related number of MS licenses, is going to be non-trivial. The client will be built into outlook and the marketing will convince consumers that anything marked safe is legitimate advertising and not spam. This does nothing to solve the spam problem.
Just hashcash - wasteful, impotent, and harmful. (Score:3, Interesting)
Hashcash is wasteful... it just runs processes at full blast for tens of seconds to tens of minutes at a time, which is a small energy waste but overall a loss.
Hashcash is impotent... any hashcash scheme cheap enough to let someone with an older computer send mail in less than minutes won't slow down a P4-3GHz at all.
Hashcash is harmful, because it makes no distinction between solicited and unsolicited mail. How would you subscribe to Slashdot without whitelisting it?
And once you're whitelisting senders, you might as well just whitelist everyone you get mail from, and now you only need to discourage unknown senders. And hashcash is still a silly solution there, how about real cash?
Here's one way to do that. Whitelist not a sender, but a server. A server at a company that simply charges a few pennies to a few dollars to forward mail (you pick the level of unsolicited mail you want), or one that requires other hoops...
Much simpler, doesn't require new proprietary Microsoft technology, and allows all kinds of alternatives...