95% of User-Generated Content Is Bogus 192
coomaria writes "The HoneyGrid scans 40 million Web sites and 10 million emails, so it was bound to find something interesting. Among the things it found was that a staggering 95% of User Generated Content is either malicious in nature or spam." Here is the report's front door; to read the actual report you'll have to give up name, rank, and serial number.
Nothing to see here. Move along. (Bad summary) (Score:1, Informative)
BS in the summary. TFA says:
"95% of user-generated posts on Web sites are spam or malicious."
The user generated content is valid, it's just the "comments" sections which are getting hit by spambots. If this is front page news, then the fact that 95% of email is spam is news as well. Nothing to see here. Move along.
71% of statistics are useless (Score:1, Informative)
71% of statistics are useless ...
Re:So many floating ads in the first link (Score:1, Informative)
"Print version" (less cruft): http://www.daniweb.com/forums/printthread258407.html
Adblock Plus seems to get rid of most of the cruft on the initial page, by the way.
Re:It might be true, but it's also irrelevent. (Score:5, Informative)
It seems that at least as well as anyone can estimate, the current population really is [wikipedia.org] about 5% of the total humans who've ever lived.
Re:It might be true, but it's also irrelevent. (Score:2, Informative)
Sturgeon's Law comes into play, as always. 90% of everything is crud
-uso.
Re:It might be true, but it's also irrelevent. (Score:3, Informative)
Depending on what you assume about paleolithic populations about 15%-25% of all the humans who ever lived are alive today. That means that roughly one our of every five people who ever walked the Earth have the potential to post to slashdot.
Re:The message... (Score:3, Informative)
You're reading too much into it, and you are also misled by the misquote in the ,/ title.
The article said "95% of user-generated posts on Web sites are spam or malicious",
probably meaning postings in forums, "comments" and stuff like that.
They're not saying plain web pages by *authors* who aren't faceless corporation drones
are crap.
Re:Replace "UGC" with "Usenet" (Score:2, Informative)
BBS's? Realy? I don't remember a single instance of "spam" on any BBS during the golden years. Perhaps that's because individual systems were far easier to control and moderate.
USENET fell because it was never designed with any real moderation or control in mind. Which was great as long as the users played nicely together. But after the Eternal September and the coming of gold diggers like Cantor & Siegel, the whole system fell apart.
If you want the flood of garbage to stop, you need someone standing at the door with a baseball bat. The days of the internet "playing nicely together" ended back in 1995.
Re:This just in (Score:3, Informative)
Re:It might be true, but it's also irrelevent. (Score:4, Informative)
It's plausible that the past 100 years has had more people alive than all of human history
And that would still make the current population only a little more than 50% of all that people that have been alive.
Except considering that homo sapiens have been around for several hundred thousand years, I think your estimates for the number of humans that have ever walked the planet may be a bit low.
Re:This just in (Score:5, Informative)
It is no different than domain names. Type a random sequence of 4 characters .com, and the vast majority of times you will get some fairly innocuous spam site, e.g. dneo.com [dneo.com] (picked at random), with no real content.
But it doesn't interfere much with most poeple's use of the web.
95% of the story is bogus (Score:5, Informative)
Matters a lot how they get their "sample", honeypots, honeyclients, reputation systems and "advanced grid computing systems" (whatever it is). What is feeding information to that sample? Not old sites with rightful content sitting around since years ago, but in good part spammers, botnets, and people that want that your pc forms part of one. And mail is already known that is 95% spam. The sample is just too rigged to be at all related with what really is in internet or what you have some chance to see.
Re:It might be true, but it's also irrelevent. (Score:3, Informative)
Thankfully, I suspect most of the 'user generated content spam' doesn't show up on the first couple search page results
That's what I was going to say. Unless people are searching for cialis or real replica watches or VIaGrA, they shouldn't see the spam itself. I spend a lot of time browsing all sorts of different sites and it's very rare for me to ever see spam*. How I've avoided the 95% of the web that is spam? I must have some hidden talent, who knows.
*The exception being the occasional google search where instead of information about a thing, I get three pages of people trying to sell the thing (try "lp gas generator" )
-b
Re:This just in (Score:3, Informative)
But don't watch his Last Lecture for just that...
The actual new vulnerabilities (Score:3, Informative)
First, here's the actual report [websense.com], without any form to fill out. (Backup copy at WebCitation. [webcitation.org]) Amusingly, the report is clearly written for a target audience who prints out PDF files on paper. It contains charts in tiny type.
The report covers the usual email issues, which will be familiar to Slashdot readers. New issues for 2009 are the following:
The report identifies Google's weak security in their search engine as a problem. Microsoft's Internet Explorer remains a problem, of course, but now Google is now the attack target of choice to drive traffic to a site that can attack the browser. Google still, apparently, hasn't figured out a good way to prevent link farms from driving up search position.
Re:Scribd (Score:1, Informative)
The only sense in which it's scribd's problem, is that scribd's chosen to use Flash/PDF as a DRM mechanism. (So yeah, it's 100% scribd's problem, but sometimes scribd's the only place on which certain content can be found.)
All Scribd (and docstoc, for that matter) does is take someone else's PDF, wrap it in a bucket of Flash DRM shit, and publish it.
(Proof that it's DRM? I tried to print an 80-page manual out of it -- turns out that if it takes more than 60 seconds for scribd [slashdot.org] to print it out, the scribd Flash fucklet autokills the print job, just in case you had the audacity to do something like "print to PDF". 20-25 pages off a slow PC. 60-70 pages off a fast PC. I killed half a deciduous forest before I figured out WTF was going on.
All so that scrid [blogspot.com] could prevent you from doing a "SaveAs". Scribd, and all services that merely wrap Flash around otherwise-downloadable content formats, and I'm looking at You,Tube, are teh suck, even when they're the only means by which the content can be found.)
Re:This just in (Score:3, Informative)
Re:This just in (Score:3, Informative)
I think you've slightly missed the point. When they say bogus they don't mean the content on a site like Wikipedia, although that site provides a useful example to explain my point. Try to go to Wikipedia, except do a typo.
http://www.wikapedia.org/ [wikapedia.org]
http://www.wikipeedia.org/ [wikipeedia.org]
http://www.wickipedia.org/ [wickipedia.org]
http://www.wikepedia.org/ [wikepedia.org]
I imagine this is likely to be what they're talking about when they say bogus or a scam. Take any of your favourite websites and slightly misspell the URL. Then extrapolate out over everyones favourite, popular websites. Then realise that there are probably dozens of variations for each one.
Re:This just in (Score:5, Informative)