Gmail Spam Filter Changes Bite Linus Torvalds 136
An anonymous reader points out The Register's story that recent changes to the spam filters that Google uses to pare down junk in gmail evidently are a bit overzealous. Linus Torvalds, who famously likes to manage by email, and whose email flow includes a lot of mailing lists, isn't happy with it.
Ironically perhaps, it was only last week that the Gmail team blogged that its spam filter's rate of false positives is down to less than 0.05 per cent.
In his post, Torvalds said his own experience belies that claim, and that around 30 per cent of the mail in his spam box turned out not to be spam.
"It's actually at the point where I'm noticing missing messages in the email conversations I see, because Gmail has been marking emails in the middle of the conversation as spam. Things that people replied to and that contained patches and problem descriptions," Torvalds wrote.
Not sure if this is first comment (Score:5, Funny)
Re:Not sure if this is first comment (Score:4, Funny)
No, you weren't the first as I posted a reply to this thread
two days ago and it wound up in Linus' inbox as a kernel bug-fix.
This Just In (Score:5, Insightful)
Individual that differs more than 6-sigma from the population's mean has trouble with automated tools designed for the average person.
Gmail's spam filter is why email is still useful.
Works for me - whatever that is worth (Score:5, Insightful)
Individual that differs more than 6-sigma from the population's mean has trouble with automated tools designed for the average person.
Exactly. I use Gmail and I honestly haven't had a false positive (flagged as spam when it isn't) in over two years. I still get the occasional false negative (spam that isn't flagged) at a frequency of a few per week. It's good enough that I don't even bother to routinely check my spam filter. It also is pretty good on the training - once you've spent a little time telling it what is spam and what isn't for you in my experience it is pretty good after that. Frankly if you have to check your spam filter often it isn't a very good spam filter.
I suspect Linus has rather unusual email requirements. Perhaps Gmail isn't the ideal solution for him. Very few tools are perfect for everyone. I'm a little surprised he's having that much trouble but stranger things have happened.
Re:Works for me - whatever that is worth (Score:5, Interesting)
I read about this a few days ago on The Register, according to one user there, this particular issue is to do with DKIM and mailing lists (the stuff Linus had issues with was all Linux kernel mailing list messages);
bhtooefr "Basically, Google's enforcing DKIM from certain domains, and if a message is "from" someone whose e-mail host provides proper DKIM, but it's missing it, Google (and Yahoo) servers reject it. Mailing lists aren't usually set up to properly handle DKIM (being, effectively, a relay), and therefore get rejected.
The workaround that I saw one mailing list use was to resend the e-mail from the mailing list's address, append "via (mailing list name)" to the name on the from field, and just have both the mailing list and the original author in reply-to."
Seems like people running mailing lists need to take a look at how spam filters work, rather than mail providers changing anything. If I understand correctly, the policy is sensible and blocks a likely spam vector, and legit mailing lists could easily be set up to not fail that particular check.
For regular mail, I'm like you guys, Google's spam filtering does a fantastic job. I never check my spam folder any more, unless I'm expecting an email and it doesn't arrive, but it's been ages since I had a false positive.
Re: Works for me - whatever that is worth (Score:3, Interesting)
I've had this problem with small websites I run. A lot of contact forms default to using the submitter as from still, I have to edit the code that sends the mail in the module to be from the site's domain and use the reply-to.
I started having to do this year's ago, yet very few modules let you take advantage of reply-to still. Very annoying.
Re: (Score:2, Insightful)
No, you're backwards. It's up to spam filter developers to understand how mailing lists work and not falsely flag legitimate traffic. If your filter breaks a mailing list, your filter is broken.
Re: (Score:1)
Except that you have it exactly backwards since almost all spam is just bulk mail exactly like any mailing list.
If you are sending bulk mail it is your responsibility to know the best practices for maintaining unsubscribe, for physical contact info, for sending mail only to people that have double opted in. It is also your responsibility to setup SPF, reverse DNS, and increasingly DKIM if you want your bulk mail to make it to its destination. This is why companies like Dyn are around as they take all the g
Re: (Score:2)
The reality is that most email list servers were set up a decade ago by somebody who hasn't touched it since. Expecting them to all upgrade their software to accommodate some new scheme is ridiculous enough to qualify for one of those "Your solution is impractical because" posts with the "it requires broad deployment on hundreds of thousands of servers, performed by tens of thousands of unpaid volunteers" checkbox checked.
Unlike the people who set up most of those email list servers, the Google employees
Re: (Score:3)
Or perhaps you don't understand how mail spam filtering works?
If I send a message to a mailing list server, and it resends the message claiming that it is me, this is wrong.
C22@mail.com - > ML@MAILlist.com
ML@MAILlist.com -> user@list.com (as C22@mail.com)
list.com will always say MAILlist.com isn't mail.com, why is it sending me mail. This is a misconfigured mail relay problem, not a spam problem. MAILlist.com is not an authorized mail server of mail.com, it is completely valid to reject it as a spoo
Re: Works for me - whatever that is worth (Score:1)
SMTP doesn't have a concept of "authorised mail server". Hope this helps.
Re: (Score:2)
Re: (Score:2)
For regular mail, I'm like you guys, Google's spam filtering does a fantastic job. I never check my spam folder any more, unless I'm expecting an email and it doesn't arrive, but it's been ages since I had a false positive.
I will check my spam folder from time to time just to see what kind of spam is out there. I always like the 419 style scams. I rarely, very rarely, find any real mail in the spam folder. In fact, the few times I have found real mail in the spam folder, it was spammy in nature not a true communication that I valued.
Really, kudos to the Google spam team.
Re:Works for me - whatever that is worth (Score:5, Informative)
GMail started flagging Youtube newsletters as SPAM but gets confused by another filter I added manually to all e-mails from Youtube. I had created a filter which adds a label to that type of e-mail, and now GMail says "This message was not sent to Spam because of a filter you created." every time I am getting an e-mail from Youtube.
Funny, 'cause Youtube is owned by Google.
Re: (Score:3)
While I might agree with most of that, there's really no reason to flag a mid-thread reply as spam.
Re: Works for me - whatever that is worth (Score:1)
Maybe Linus should just stop accepting patches from that Nigerian prince?
Re: (Score:2)
I get false positives regularly. Usually the "confirm your email address" messages from services whose sign-up processes are stuck in 2010.
Re: (Score:2)
I have. From this filter update no less.
I receive mail on two different accounts from different addresses, with several hundred mail messages in the one account and several thousand in the other sitting in my inbox. As of last week messages started to go into the spam folder on both accounts until flagged as not spam.
As an added bonus to stupidity, one of the addresses marked as spam is a prominent .edu address.
Re: (Score:2)
Re: (Score:2)
Re:This Just In (Score:5, Informative)
Gmail's spam filter is why email is still useful.
I might not be six sigmas from the population mean, but the aggressive filtering of Google's mail service is annoying me more and more. I don't use it myself, but quite a few of my recurring professional contacts do, often behind their own domains so there's no way to know until it breaks. Aside from the privacy implications of that, I'm getting awfully bored of finishing a day's work, e-mailing the results to wherever they need to go, and getting in the next morning to find a nasty note from Google that was sent back after I'd left saying my mail had been blocked because they considered something in the attached file a security risk. This is particularly infuriating if I'm working in the UK and sending the results to a contact on US time, because it costs between half a day and an entire day to catch up.
Re: (Score:2)
Re: (Score:2)
Yes, that's a possible workaround, but now we have a bunch more problems. E-mail is simple, standardised, and time-tested. There are plenty of tools that will let us transfer a file another way, but very few that will then keep that file associated with all the other relevant messages, and very few that literally everyone will have, and very few that don't require more effort to set up.
Alternative proposal: Don't use e-mail services that don't do e-mail properly.
Re:This Just In (Score:5, Interesting)
Individual that differs more than 6-sigma from the population's mean has trouble with automated tools designed for the average person.
Gmail's spam filter is why email is still useful.
In my experience it is crap. Not as bad as Linus experience, but it stil mistook on 1 in 200 emails just like google says and that is COMPLETELY UNACCEPTABLE. Having to find important emails in the thousands of spam emails is a problem, and haven't seen any other spam filter with that many false positives.
Re: (Score:3)
Maybe it wouldn't be such a problem if you deleted everything in your spam folder daily instead of just letting it sit there until it's 30 days old and gets removed automatically.
Re: (Score:1)
But that's the same as stealing from Google!
Re: (Score:2)
The main problem that I have experienced with Gmail's spam filter is that it is overly aggressive. I was on a Yahoo! group mailing list. It got hit by a spammer. I flagged the email as spam. Then a bunch of yahoos responded to the spam or to the people responding to the spam. All of those people's posts to that group started getting flagged as spam. I started noticing holes in conversations.
Now, I kind of appreciate what Google is doing here. If you keep that behavior in place, you disincentivize the
Re:This Just In (Score:5, Informative)
When setting up the filter, you make sure to check the "Never send it to spam" option.
Re: (Score:3)
he was complaining because without notice the behavior changed and he started missing valid emails from addresses previously he was responding to, partially without rhyme or reason, since he started missing in-between emails and sometimes would get a later email but see that there was mail in the 'thread' that he had missed due to the spam filter.
the point is, gmail changed the spam filter without notice (like starting to mark mail "this would go to spam next week") or whatever.
Re: (Score:2)
Re: (Score:2)
Because my mother-in-law was also fuming against it just a day or two ago.
State the Obvious (Score:5, Funny)
Re: (Score:2)
Yeah, like Microsoft or Apple!
Re: (Score:3)
Yeah, like Microsoft or Apple!
No.
Like any national, free, reasonable email provider (in my country, the post office does this).
Like *all* our hosting service providers do, too, at no delta-cost, and in a well controlled manner if you chose an associative hosting. Many come to mind in Europe, like the belgian All2all, the french Ouvaton
Sorry for the bluntness, that's not you but the mere idea of Torvalds registering at Google that shocked me...
Re: (Score:1)
Maybe someone could explain to him how to set up his own mail server. IIRC there are free open source mail servers with spam filtering. I think there is even an open source OS to run it all if cost is an issue. He ought to look into it.
Re: (Score:2)
That was my first reaction as to how he should respond. But perhaps it would be better if he talked someone else into hosting the lists (and maintaining the server). Linus is probably quite busy with other business.
Still, Cannonical might take the job, or the FSF. Perhaps the OpenSuse people. I'm not really sure I'd want Red Hat to have that much leverage.
Re: (Score:2)
Me too, that's why I mockingly replied with "Microsoft and Apple" as alternatives.
other providers (Score:3)
I for one am extremely shocked that the above post ('use other providers') be flagged as funny.
Torvalds is the last person I'd imagine registering an email address @ Google.
Wise as he may have been, sorry, but to me he's a moron now just because of this.
I just hope I won't evolve his way when getting older.
Re: (Score:3)
Torvalds is the last person I'd imagine registering an email address @ Google... I just hope I won't evolve his way when getting older.
Age has nothing to do with it. I'm older than Torvalds, and I refuse to use Gmail.
Filters will do this (Score:3)
As I have said before, spam is an economic problem. We won't solve the problem with filters, or with any kind of punishment (legal or otherwise); we need to look at this rationally as an economic problem.
Re:Filters will do this (Score:4, Interesting)
Re: (Score:3)
One of the ways of combating it economically is to make it require more effort to successfully deliver spam to the target recipients. i.e. using a filter.
The problem is that the spammers can acquire more opportunity to get past filters (by taking over more computers for their botnets, to send more spam from with more permutations designed to confuse filters) with more ease than the time it takes to train the filters on what is spam and what is not. When using filters, it becomes an arms race - and only the spammers can win.
In other words it already costs the spammers almost nothing to send out a deluge of billions of spam emails. They already know how
Re: (Score:2)
Re: (Score:2)
So increasing the cost of sending spam should reduce the incentive (positive profit) to spam. Key-based authenticated email with some nominal fee is one way (with its own shortcomings of course)
That is one way to do it, but not my preference. The problem with that method in particular is that it does require everyone who uses email to switch to it immediately in order to be of value.
My preference is to actually interrupt the flow of money to the spamm
Re: (Score:2)
Curious (Score:2)
I've noted an increment of spam on my Gmail account.
From one email every few week to a couple per day.
Re: (Score:2)
Do you mean you've noticed an increase in spam getting into your Inbox, or an increase of spam showing up in your spam folder?
If the latter, I've noticed it comes and goes in waves. Sometimes it'll be days between getting a single spam in my spam folder, and sometimes I'll get two or three in one day.
I want to say it's like a sine wave, but I haven't kept a record that would back up that claim.
Re: Curious (Score:2)
Re: (Score:2)
Ha! I'd mod you up if I could.
Re: (Score:2)
Graph [abstrusegoose.com]
Re: (Score:3)
The reason for the wave effect is at least in part because a relatively large proportion of the spam that gets sent actually comes from a very small number of sources. Someone figures out a formula for defeating the current spam filters on enough major systems to be viable and then exploits it heavily for as long as they can. The mail services note the changes in traffic, adapt, and block that traffic. On a really good day, a major spammer actually gets taken to court and removed from the system altogether
Re: (Score:2)
90% of spam never gets to your email account at all. Those are the messages the server is 100% sure is spam.
I get more than my share of spam... but that's because I do things that confuse the mail server... such as flagging as not spam receipt emails from the viagra I bought from an online pharmacy. That right there pretty much fucks you for having any sort of reliable spam filtering.
Re: (Score:2)
My favorite are all the fucking goomoji in the subjects these days. "||mail.google.com/mail/e/" made a welcome addition to my uBlock filters...
Netflix's movie selection pisses off Alan Cox (Score:1, Funny)
more at 10
Poor Linus needs a gofundme (Score:5, Funny)
Apparently he can't afford to purchase decent email service. Maybe someday he'll create something important and then he can get off the crap freemail.
History repeating?... (Score:3)
Maybe someday he'll create something important and then he can get off the crap freemail.
Yup. Given his past success with both Linux kernel and with GIT distributed source management, I too think that out of anyone Linus Torvalds might be the only guy able to effectively solve the SPAM problem.
Re: (Score:2)
He did. But nobody could every remember the name of the command.
GUI (Score:2)
But nobody could every remember the name of the command.
And the Gnome guys promised to make a userfriendly GUI to help against that.
But they are still arguing about how to make it follow HIG, and how to adapt to the upcoming GTK4.
Boolean filters are wrong (Score:1)
Every sane spam filter returns three possible results:
When you have three categories it will reduce the FP rate very hugely. And the most important fact is that a spam filter should never throw away spam. It can be illegal to do so, or at some time it is going to be illegal. You should keep all your spam for documentation, for this reason. Also, you can initialize (learn) your filters very quickly when migrating the system. Spam is a valuable resou
Re:Boolean filters are wrong (Score:4, Insightful)
WTF are you smoking, and can I haz some?
No amendment, not even the first, makes it illegal for me to throw away shit that people decide to send to me.
Pigshit is a valuable resource. Spam is spam. The fact that you can look for similarities in it in order to trash more of it doesn't make it a valuable resource.
Re: (Score:2)
No, but if your ISP incorrectly classifies a job offer sent to you as spam, and summarily deletes it, you're probably going to sue them.
Re: (Score:2)
Re: (Score:2)
"message": "552 5.7.0 This message was blocked because its content presents a potential\n5.7.0 security issue. Please visit\n5.7.0 https://support.google.com/mai... [google.com] to review our message\n5.7.0 content and attachment content guidelines. k3si2092734igx.18 - gsmtp",
Re: (Score:2)
Re: (Score:2)
This is not rocket science, but too many people running mail servers don't understand the bac
Re: (Score:2)
And in this case, it should be marked as spam, and either a) held by the ISP for some period of time, per the ToS that the user agreed to, or b) delivered to the user, marked as spam, for them to do with as they see fit.
The ONLY situation that anybody here has described that MUST NOT HAPPEN is this chain of three steps:
1) Recipient's ISP SMTP server accepts a message
2) Recipient's ISP SMTP server decides the message is spam
3) Recipient's ISP SMTP server deletes the message with no notification to anybody
The
Re: (Score:2)
Citation needed. Seriously. I looked. And even if there was a suit, did the idiot win?
Re: (Score:2)
Re: (Score:2)
She settled out-of-court for an undisclosed amount (she probably didn't have to pay them for all the defamation she threw their way), and life goes on.
One case of unknown outcome 13 years ago in an area that would seem, on the surface, to be ripe for litigation, doesn't seem
Re: (Score:2)
That would be fine. Again, it's the 'accept, then silently delete' that's the problem.
Re: (Score:2)
I agree with the logic of your argument, but why be so rude about it?
Re: (Score:2)
Additionally, the argument being logical doesn't imply that it is true. It is based on some incorrect premises. For one thing the law is often illogical. For another the EULA cannot bind you to something that the law doesn't allow it to bind you to.
Additionally, being able to win a suit, even easily, doesn't prevent you from being sued.
That said, they might well be able to win the suit easily, at least in some jurisdictions.
Already resovled (Score:2, Informative)
You can tell this is old news given that it was this past weekend the Linus posted an update on Google Plus stating that false positive rates were back down to normal for him.
https://plus.google.com/+LinusTorvalds/posts/dJdkRxUCRmK
Karma is a bitch (Score:1)
That's what you get for unleashing your half-baked Git solution on the world. I've used it every day for two years and it is just a horrid mess. How many times I've had to do a "reset --hard" because a branch just shits itself for no good reason.
And what if I want to bring up two separate branches side-by-side to do some copying? Can't fucking do it in Git.
Linus, bless you for Linux. But curse you for Git. Git is Shit.
Re: (Score:3)
Re: (Score:2)
Oh come now. I quite like git (it's my go-to choice these days for everything---Darcs, I hardly knew ye) and I have occasionally had to reset--hard after fucking something up. I've found fuckups a little easier than I'd prefer.
Besides generalised strostrup is right: there are two kinds of tools, those people complain about and those people use. I would therefore expect many complaints about git.
(and many are not unjustified either)
Re: (Score:2)
And what if I want to bring up two separate branches side-by-side to do some copying? Can't fucking do it in Git.
For side by side comparisions you can always just do a lightweight clone which pretty much should happen automatically if you clone to another directory within the same filesystem, i.e.
git clone -b branch orig_repo branch_repo
Re: (Score:2)
Interesting, thanks!
Re: (Score:1)
Git as source control = shit
Git as a tool for Linus to merge kernel patch sets = amazing.
Git is fantastic at what it was designed for.
Re: (Score:2)
I totally agree. Git is way too focused on the repository maintainer than the day to day developer that just wants to check in code and not deal with the esoterica of the source control system. Even with Sourcetree it's just weird.
The bitching is slower than the fixing (Score:4, Informative)
He posted about it in G+, a googler noticed and offered to look into it. One day later The Register is feeding off the echoes and the story is slashdotted.
"Much better now.
Of the 100+ messages caught as spam over-night, only two were false positives (and I reported them). My email is getting back to normal."
https://plus.google.com/+LinusTorvalds/posts/dJdkRxUCRmK
Already Solved (Score:4, Informative)
On the next day, Linus wrote "My email is getting back to normal."
https://plus.google.com/+LinusTorvalds/posts/dJdkRxUCRmK
If you're missing an email... (Score:2)
I'm sure NSA has a copy. All you need is to fill out a FIOA request and interrogate Michael Hayden until he admits it
he's using gmail? (Score:4, Funny)
Somebody should tell Linus about this great new operating system I run at home. I have sendmail running on my machine, and it lets me control my spam filters and everything.
It's called "Linux". I highly recommend it.
Re: (Score:1)
Re: (Score:2)
The article says "around 30 per cent of the mail in his spam box turned out not to be spam", and you call that "the world's best spam filter"?
domain issues (Score:5, Interesting)
From his original post, there is a clear date he claims the FP rate to have gone up... so this isn't a blanket Gmail FP rate issue, but rather a Gmail or spam blacklist incident, which is quite different from what the summary would suggest. As of right now:
http://mxtoolbox.com/SuperTool.aspx?action=blacklist%3aLKML.ORG&run=toolpage
lkml.org Added to UCEPROTECTL2
Uceprotectl2 Automatically Delists Entries
This blacklist does not offer any form of manual request to delist. Your IP Address will either automatically expire from listing after a given timeframe, or after time expires from the last receipt of spam into their spamtraps from your IP Address.
Uceprotectl2 Accepts Payments Or Donations
This blacklist does support a manual request to remove, delist, or expedite your IP Address from their database upon Payment or Donation of fees to their organization. Please note the following; 1) MxToolBox does not in any way advocate the paying of removal from any blacklists. 2) Removal requests that are submitted without addressing the core problem will likely result in your IP Address being relisted in the database which can cause subsequent problems and extended listing periods without release.
More information about UCEPROTECTL2 can be found at their website: http://www.uceprotect.net/ [uceprotect.net]
Reason for listing - Net 146.185.176.0/21 is UCEPROTECT-Level2 listed because 36 abusers are hosted by RCN-ASN - Reality Check Network Corp./AS46652 there. See: http://www.uceprotect.net/rblc... [uceprotect.net]
UCEPROTECTL2 seems a bit shady, but I am not blacklist expert.
Also as a side note, any spam filter that attempts context evaluation has a tendency to mark emails with code or special character formatting as spam. Even emails with links. So for someone like Linus to have a higher blanket spam FP rate is also not surprising.
The best gmail feature is the "never treat as spam" filter.
Checking the wrong thing in a not great place? (Score:1)
First up lkml.org [lkml.org] is a third party site that hosts Linux kernel mailing list archives on a website. Regular Linux kernel mail isn't actually sent from it (I believe that's done by vger [kernel.org]) so we're looking up the email reputation for the wrong IP...
Secondly UCEPROTECT is a very aggressive blacklist which states upfront they will block people who they believe are in the vicinity of people who the judge to be sending them spam. It's not the be and end all though and on one server I looked some time ago it's effe
Political filters (Score:2)
I just checked my spam folder again. I reserved a U-Haul, and the email confirming that reservation went to spam. One of the few false positives I get, but there are others.
What's interesting is all the political fund-raising emails. Only the conservative ones end up in Spam (Campaign for Liberty, Conservative Senate committee, Scott Walker, Rand Paul, Jeb Bush, Mark Mix, etc.). That's fine - I don't really want to see those emails anyway. Yet, I get lots of the same emails from Hillary Clinton, Obama,
As usual, google is way overrated (Score:2)
Moreover, any time I access gmail via ipv6 (I have dual-stack) my messages are marked as spam and I receive terrible warnings about someone else trying to break into my e-mail, and that despite using the same system and browser and the IPv6 whois reco
Yes, and for many months now (Score:2)
I have the same experience. For many months now, Gmail has been overzealous in marking stuff as spam. Stuff like daily emails from servers I manage with log digests. Emails about pending security package upgrades. Even when I specifically say that a certain subject string (e.g. "logwatch") is to be excluded, Gmail ignores that rule. It has been very frustrating trying to exclude stuff via filters in Gmail.
Suggestion (Score:2)
automatically reject email failing its SPF or DKIM checks. If it's forged, by definition it's spam.
Here's the use case (Score:1)
What is probably causing Linus's problem is the open subscription nature of the LKML. I'm guessing that a lot of people are subscribing, then flagging messages as spam rather than deleting them. Once a certain threshold is hit, that and similar messages are then flagged as spam.
DSN's and NDR's too (Score:2)
Was helping a guy having trouble posting to our LUG list the other day.
He had DSN's being delivered to his GMail spam folder. I thought, "golly, how does Gmail figure those could be spam?" Nobody is going to sneak a Viagra ad through underneath a 550 report.
Of course his other problem was he was using Comcast as an outbound relay. Their new relay retries a message once every second five times and then gives up forever. Totally breaks greylisting, or even temporary outages. Didn't even try my backup MX.