95% of User-Generated Content Is Bogus 192
coomaria writes "The HoneyGrid scans 40 million Web sites and 10 million emails, so it was bound to find something interesting. Among the things it found was that a staggering 95% of User Generated Content is either malicious in nature or spam." Here is the report's front door; to read the actual report you'll have to give up name, rank, and serial number.
This just in (Score:5, Funny)
Animals shit in ~95% of their habitat...
Re:This just in (Score:5, Funny)
a staggering 95% of User Generated Content is... ...spam. Here is the report's front door; to read the actual report you'll have to give up name, rank, and serial number.
Give up your Name, rank, email... so we can enlighten you with valuable information from our partners.
Re:This just in (Score:4, Funny)
you mean, fill in their User Generated statistics maliciously?
Re: (Score:2)
fill in their User Generated statistics maliciously?
No, no, no. Fill it out properly so you can get more highly valuable information. They would never use that info to send more spam your way.
---------
News Flash! World ends! Tune in at 10:00 for details.
Re: (Score:3, Funny)
You have to average those two numbers to get the 95% figure. Don't be so lazy next time.
Re: (Score:3, Interesting)
Anonymity comes into play I suspect. I'm not a psychologist though. It makes me wonder if there will be any attempt (or anyone with the compute power and gumption is more accurate I suppose) to fact check Wikipedia. I'm rather curious as to how that will turn out if it is done in a non-biased and total in situ way. I imagine it would take a great deal of work and then there are people who will lay claim as to it being constantly changed but the point that I'm considering is what is the accuracy level at a p
Re: (Score:2)
I agree. But imagine what a difficult task that would be. According to Wiki itself, it contains 14 million articles. You would have to find experts in each of the fields to check each article, which are supposedly the people who wrote them in the first place. Hopefully, anyway.
Re:This just in (Score:5, Informative)
It is no different than domain names. Type a random sequence of 4 characters .com, and the vast majority of times you will get some fairly innocuous spam site, e.g. dneo.com [dneo.com] (picked at random), with no real content.
But it doesn't interfere much with most poeple's use of the web.
Re: (Score:3, Informative)
Re: (Score:3, Informative)
But don't watch his Last Lecture for just that...
Re: (Score:3, Interesting)
I'm not surprised. Wikipedia is great for niche articles like finding out what happened to Star Trek, The Experience [wikipedia.org]. Such niche information wouldn't be viable for Britannica to cover, but anyone with an interest can put up an article about it. If you want real articles on things like science, DON'T GO TO AN ENCYCLOPEDIA. They're about as good at teaching you usable science as they are teaching you how to play the flute.
Re: (Score:3, Informative)
Re: (Score:3, Informative)
I think you've slightly missed the point. When they say bogus they don't mean the content on a site like Wikipedia, although that site provides a useful example to explain my point. Try to go to Wikipedia, except do a typo.
http://www.wikapedia.org/ [wikapedia.org]
http://www.wikipeedia.org/ [wikipeedia.org]
http://www.wickipedia.org/ [wickipedia.org]
http://www.wikepedia.org/ [wikepedia.org]
I imagine this is likely to be what they're talking about when they say bogus or a scam. Take any of your favourite websites and slightly misspell the URL. Then extrapolate out over every
Re: (Score:2)
Re:This just in (Score:5, Informative)
Want to get ripped? (Score:5, Funny)
I got ripped in 2 weeks. learn how with secret juice formula.
Re: (Score:3, Funny)
Speaking of juice, there's nothing better than a cold glass of Fanta [fanta.com]!
Re: (Score:2)
Don't Cha Wanna Wanna?
Re: (Score:2)
Fanta went downhill after real juice was introduced. If I wanted juice in my soda, I'd get Sunkist.
Re: (Score:3, Funny)
If I wanted juice in my soda, I'd steal it from Mark McGwire.
Re:Want to get ripped? (Score:5, Funny)
If I want real juice, I just drink Florida Orange Juice®. It's not just for breakfast anymore!
Re: (Score:2)
I love how these ads are being modded informative as opposed to funny.
Did you ever consider that *maybe* they were under the mistaken impression that orange juice *was* just for breakfast, but the post informed them that this is no longer the case? ;)
And it only took having my same sig for most of 10 years for it to actually be topical.
Re: (Score:2)
Spam replying to spam?
Where can I file a patent?!
Let me be the first to post that this is BS. (Score:5, Funny)
Re: (Score:2, Interesting)
This is slashdot (Score:5, Funny)
We know.
Re: (Score:3, Insightful)
Re: (Score:2)
I'm a prince from the far lands of absurdistan and would like to ask if you would like to [insert random passage of text here]
You'll have a better chance of getting me to insert something if you said you were a princess.
I'm sorry, that's Valentine's day anticipation talking.
Re: (Score:2)
Re: (Score:2, Funny)
No..... this is SPARTA!!!!
It might be true, but it's also irrelevent. (Score:5, Insightful)
The fact is that there are millions of old blogs, unused forums, ancient guestbooks, etc that are easy to spam automatically. While it might very well be true that 95% of comments on the internet are spam of some sort, they're probably read by a tiny fraction of internet users. People tend to stick to about a dozen big sites that get very little rubbish posted on them at all.
Car analogy: 95% of cars are rusty old heaps of crap that can't move. Thankfully they're in scrapyards and not on the roads.
Re:It might be true, but it's also irrelevent. (Score:5, Funny)
95% of humans are over 100 years old. Most of them are dead.
Re:It might be true, but it's also irrelevent. (Score:5, Funny)
That should be on Fox News.
"Number of dead people reaches all time high!"
Re:It might be true, but it's also irrelevent. (Score:5, Funny)
Well, then MSNBC would just rip into Fox for inferring these unfortunate individuals should no longer vote. CNN would chime in and blame the lack of universal health care for the deaths.
Re: (Score:3, Interesting)
Re: (Score:2)
But their atoms live on.
Re: (Score:2, Funny)
Oh god, are they okay?
Re:It might be true, but it's also irrelevent. (Score:5, Informative)
It seems that at least as well as anyone can estimate, the current population really is [wikipedia.org] about 5% of the total humans who've ever lived.
Re: (Score:3, Interesting)
There is evidence that there were some advanced civilizations prior to the theorized comet incident. They might hav
Re: (Score:3, Interesting)
There were, but not many. Nowhere near the scads of people roaming the planet today. I've read that there have been several times in known history where there were fewer than a couple hundred thousand people; it's plausible that the past 100 years has had more people alive than all of human history, considering the multiple near-extinction events which have supposedly occurred.
Re:It might be true, but it's also irrelevent. (Score:4, Informative)
It's plausible that the past 100 years has had more people alive than all of human history
And that would still make the current population only a little more than 50% of all that people that have been alive.
Except considering that homo sapiens have been around for several hundred thousand years, I think your estimates for the number of humans that have ever walked the planet may be a bit low.
Re: (Score:3, Informative)
Depending on what you assume about paleolithic populations about 15%-25% of all the humans who ever lived are alive today. That means that roughly one our of every five people who ever walked the Earth have the potential to post to slashdot.
Re: (Score:2)
I did a little more reading on this today and it looks like a better estimate is 5%-10% of all humans who ever lived are alive today. However, the estimate depends critically on what one assumes for life expectancy, and we do not have very good information about that for most of the time that modern humans have been around.
Re: (Score:2)
You really think the 5% of population that has been alive in the last 100 years counts for that much population in history?
We might not have the numbers, but we got nukes.
Re: (Score:2, Interesting)
I don't assume they included Wikipedia in the "user generated" category, otherwise that much non-bogus content would have definitely tipped the scale a bit.
In my personal experience however, even without wikipedia, I have not come across that much bogus stuff on forums and random comments.
Re: (Score:3, Funny)
Are you implying that Wikipedia is not bogus content?
Re: (Score:3, Interesting)
Re:It might be true, but it's also irrelevent. (Score:5, Insightful)
A lot of forum software works well, until it gets "behind the curve", and then the site maintainer pulls the site*.
By "behind the curve" I mean any of the following can/does happen:
1) Forum software gets out of date and user fails to upgrade due to modifications or similar, resulting in spam.
2) Forum software gets popular without having a good security model and/or update cycle, resulting in exploits.
3) Gets inundated with comment approvals and the forum (or blog) gets ignored or set to auto-allow out of frustration.
* By "pulls the site" I mean "abandons it but doesn't take it down". That's typically the end result.
It's a lot of work to maintain your own forum and/or blog: managing spam can and will take hours+ from your day if you've not got a good automated and/or textual way to deal with it: web interfaces are clumsy.
Car analogy: 95% of cars are rusty old heaps of crap that can't move. Thankfully they're in scrapyards and not on the roads.
Yet, unlike most of those cars, the actual blog content is not necessarily useless. I have seen quite a few abandoned blogs and/or forums which have 3-10 year old information on them which is by no means useless; it's just getting buried.
Digital archeologists of the future will probably have to figure out an automated way to prune back the spam to find the actual Internet, the way things are going.
Consider: if spam accounts for 95% of all user-generated content, and said user-generated content is actually a non-trivial percentage of all actual content online (believable), consider how much bandwidth gets wasted by these spammers. (Thankfully, I suspect most of the 'user generated content spam' doesn't show up on the first couple search page results so it's not going to likely be perused with regularity - unless it's more heavily seeded on topics common folks search.)
Re: (Score:3, Informative)
Thankfully, I suspect most of the 'user generated content spam' doesn't show up on the first couple search page results
That's what I was going to say. Unless people are searching for cialis or real replica watches or VIaGrA, they shouldn't see the spam itself. I spend a lot of time browsing all sorts of different sites and it's very rare for me to ever see spam*. How I've avoided the 95% of the web that is spam? I must have some hidden talent, who knows.
*The exception being the occasional google search where instead of information about a thing, I get three pages of people trying to sell the thing (try "lp gas generator" )
-
Re:It might be true, but it's also irrelevent. (Score:5, Interesting)
How much of it is user generated content that's copied from one site onto a zillion others?
Re: (Score:2)
How much of it is user generated content that's copied from one site onto a zillion others?
Or onto the same site. It amazes me at the number of YouTube videos which people rip then upload back to YouTube as their own. I like to think of this need to be the person who provides the video as "Insufficient Attention Disorder".
Re: (Score:2, Informative)
Sturgeon's Law comes into play, as always. 90% of everything is crud
-uso.
Re:It might be true, but it's also irrelevent. (Score:5, Funny)
And when they want a change from that, they come here.
Re: (Score:2)
Irrelevent [ir'-rel-e-vent] - Adjective:
The wasteful use or application of a cooling device when not strictly necessary.
USAGE: "Larry left the air conditioning unit on all throughout winter; its power consumption was irrelevent."
ORIGIN: Teh Intarwebz.
Re: (Score:2)
whoosh!
Here's a hint: Read the parent's subject line again, this time without the spell-checker.
-dZ.
Re: (Score:2)
That makes it sound a little too innocuous for my tastes. It's not like 95% of emails are spam, but they're all sitting on a server somewhere and no one has to deal with them, so it's fine. For your car analogy to work for me, it would have to be more like "95% of cars are rusty old heaps of crap that can't move. They're littering the highways, but we can steer around them."
My mail server is seeing a little less than this-- only 85% of incoming email is spam. Still, that means that I have to filter all
What about "numbers posts"? (Score:2)
So many floating ads in the first link (Score:2, Offtopic)
That's a little too much cruft for me. They can keep their content, I don't want it.
Re: (Score:2)
Re: (Score:2)
For me in konqueror, the site rendered in text that was overwritten in a few seconds by a pure black page with a couple of itsy white boxes with green text which then morphed into a pure featureless white page with no scrollbars. Does that count as "bogus and/or spam?"
Was going to RTFA but it's probably bogus (Score:5, Funny)
...95% probability actually. So I didn't bother.
just a cheap shot (Score:4, Funny)
I guess that goes in hand with 95% of kdawson's submissions being crap and not worth the time.
just another cheap shot (Score:2)
You missed your assessment by ~5%.
-dZ.
40 000 000 sites per hour? (Score:2, Interesting)
Every single hour the Internet HoneyGrid scans some 40 million websites for malicious code as well as 10 million emails for unwanted content and malicious code.
So 40 million sites per hour is 960 million sites per day. While wikipedia says that there over 25 billion pages [wikipedia.org] but can that number be accurate?
The message... (Score:4, Insightful)
The subtext of this article is that you should forget about letting users create content on the Internet, because all they do is create junk and try to scam good honest people. Just leave the content creation to the institutions, and media conglomerates who know how to do it. It's safer that way, and you'll like it.
Well, I don't care if 99% of user-generated content it is crap; people need to be free to create it, because some individual in the other 1% may just come up with the cure for cancer, and despite whatever it does to Big Pharma's profits, everyone needs to be able to hear about it.
Re:The message... (Score:4, Interesting)
Re: (Score:3, Informative)
You're reading too much into it, and you are also misled by the misquote in the ,/ title.
The article said "95% of user-generated posts on Web sites are spam or malicious",
probably meaning posti
can be adequately explained by stupidity (Score:2, Insightful)
"Never attribute to malice that which can be adequately explained by stupidity"
So I read "95% of User Generated Content is stupid" I agree, count me in.
So Sturgeon was right (Score:5, Interesting)
"Ninety percent of everything is crud."
http://en.wikipedia.org/wiki/Sturgeon's_Law [wikipedia.org]
Re:So Sturgeon was right (Score:5, Funny)
Re: (Score:2)
And don't forget the important corollary (trivial to prove): 111.1111% of crud is everything.
So, if you spew more crud than your share, you'll get everything you want! At least, it seems to work that way for a lot of political figures.
Calling spam email UGC is... disingenuous. (Score:5, Insightful)
I would say that 95% of email is commercial in nature, and not "user generated content". To me "UGC" is something that people who are actually active users (consumers as well as creators) of a service generate... not something injected into the service from outside by predators.
And of the rest... (Score:3, Funny)
Out of the 5% that are not generated by spambots, 99% is still generated by idiots.
Not so staggering... (Score:3, Insightful)
... a staggering 95% of User Generated Content is either malicious in nature or spam.
Considering 95% of internet users are malicious (see GIFT [penny-arcade.com]), it's hardly staggering that 95% of user generated content is malicious too. :p
Having read some blogs... (Score:2)
Domain hijacking (Score:2)
If you use an ISP that hijacks unregistered domains, such as Virgin, to land you on their search page then that statistic goes up to 99.99%
Phillip.
Well there are a lot of sites within sites (Score:2)
As I discovered wit on of my sites a few years ago, someone had installed a site within mine and in investigating it I discovered there are plenty other siets with teh same issue, many even on Source Forge.
My advice is to do an inventory of the files on your site, to see if you to have such a problem.
Replace "UGC" with "Usenet" (Score:5, Insightful)
We've seen this before, with Usenet, BBS's, MUD's, and Email. The advertisers, and the trolls, find it easy to spew their material across many thousands of targets, and get enough money or gratification from doing so that it funds their efforts. It doesn't even have to make money: they just have to believe that it _can_ make money, and the professionals will simply continue.
Whatever would make anyone think that "User Generated Content" forums would be any different?
Re: (Score:2, Informative)
BBS's? Realy? I don't remember a single instance of "spam" on any BBS during the golden years. Perhaps that's because individual systems were far easier to control and moderate.
USENET fell because it was never designed with any real moderation or control in mind. Which was great as long as the users played nicely together. But after the Eternal September and the coming of gold diggers like Cantor & Siegel, the whole system fell apart.
If you want the flood of garbage to stop, you need someone standi
95%? (Score:2)
More like 99% if you include the non malicious stupidity into the mix.
95% of the story is bogus (Score:5, Informative)
Matters a lot how they get their "sample", honeypots, honeyclients, reputation systems and "advanced grid computing systems" (whatever it is). What is feeding information to that sample? Not old sites with rightful content sitting around since years ago, but in good part spammers, botnets, and people that want that your pc forms part of one. And mail is already known that is 95% spam. The sample is just too rigged to be at all related with what really is in internet or what you have some chance to see.
Google's fault for their dependence on linking (Score:4, Insightful)
Emails spam aside, I would say that most of that is Google's fault. The other 95% of content created on the internet is in an attempt to SEO web sites in the other 5% of the internet that people do potentially read or visit. Google encourages web masters to get in bound links, thus the whole industry of spamming sites, directories, blog feed sites, and so on that have one purpose and one purpose only: getting as many anchor text links pointed to sites as possible so they will rank higher in Google for key terms.
Re: (Score:2)
Re: (Score:2)
I totally agree. My point was more about how Google is encouraging the creation of a mess of content designed only for the consumption of Google Bots, and in fact most are never visited or seen by humans.
Yes, links are the fundamental core of how the internet works. It is just the rewarding of sites for producing the most links possible. If somehow Google say decided to use a different method, say how often a site was visited for valuing the links out, in fairly short order millions of link farming sites wo
95% chance (Score:4, Funny)
I take it that means there is a 95% chance that this report is bogus, or malicious?
Re: (Score:2)
Sturgeon's Law (Score:2)
Once again shown to be overly optimistic.
http://en.wikipedia.org/wiki/Sturgeon's_Law#.E2.80.9CNinety_percent_of_everything_is_crud.E2.80.9D [wikipedia.org]
Looks like I'll have to change my sig (Score:3, Funny)
Amazingly enough.... (Score:2)
it turns out that 95% of the Slashdot users think the report was about all internet content instead of just user generated content and they responded to that instead.
No big surprise there, huh?
95% of statistics are bogus... (Score:2)
95% is SPAM and ... (Score:2)
The actual new vulnerabilities (Score:3, Informative)
First, here's the actual report [websense.com], without any form to fill out. (Backup copy at WebCitation. [webcitation.org]) Amusingly, the report is clearly written for a target audience who prints out PDF files on paper. It contains charts in tiny type.
The report covers the usual email issues, which will be familiar to Slashdot readers. New issues for 2009 are the following:
The report identifies Google's weak security in their search engine as a problem. Microsoft's Internet Explorer remains a problem, of course, but now Google is now the attack target of choice to drive traffic to a site that can attack the browser. Google still, apparently, hasn't figured out a good way to prevent link farms from driving up search position.
They include Spam? (Score:2)
Scribd (Score:2)
I'm looking at you Scribd. Why Google can't figure out how to push your spam results off the front result page puzzles me since they have a method to keep the Wikipedia clones off the front page. I can't wait for you to experience the same fate.
Therefore... (Score:2)
This article has a 95% chance of being bogus.
Non sequitir (Score:2)
Here is the report's front door; to read the actual report you'll have to give up name, rank, and serial number.
This being Slashdot - how was that sentence even relevant?
Spammers create their own forums (Score:2)
I've seen cases where spammers, unable to reliably defeat the administrators of a popular forum, will simply copy the information on that forum onto another forum and then spam the hell out of it. Forums on the use of Microsoft tools seem to be particularly popular targets.
Re:Nothing to see here. Move along. (Bad summary) (Score:5, Insightful)
And in addition, the report itself doesn't even explain the result. It's a bullet point at the beginning of the report, but there's no explanation or analysis.
Obligatory (Score:2, Funny)
In human terms, the majority of computers have AIDS. And we all know where they caught it.
Your mom?
Re: (Score:2)
"Oh, people can come up with statistics to prove anything. 14% of people know that." -Homer