Forgot your password?
typodupeerror
Security IT

95% of User-Generated Content Is Bogus 192

Posted by kdawson
from the sturgeon's-law-applies dept.
coomaria writes "The HoneyGrid scans 40 million Web sites and 10 million emails, so it was bound to find something interesting. Among the things it found was that a staggering 95% of User Generated Content is either malicious in nature or spam." Here is the report's front door; to read the actual report you'll have to give up name, rank, and serial number.
This discussion has been archived. No new comments can be posted.

95% of User-Generated Content Is Bogus

Comments Filter:
  • by Shadow of Eternity (795165) on Sunday February 07, 2010 @06:26AM (#31051390)

    Animals shit in ~95% of their habitat...

    • by Smegly (1607157) on Sunday February 07, 2010 @07:40AM (#31051640)

      a staggering 95% of User Generated Content is... ...spam. Here is the report's front door; to read the actual report you'll have to give up name, rank, and serial number.

      Give up your Name, rank, email... so we can enlighten you with valuable information from our partners.

    • Re: (Score:3, Interesting)

      by KGIII (973947) *

      Anonymity comes into play I suspect. I'm not a psychologist though. It makes me wonder if there will be any attempt (or anyone with the compute power and gumption is more accurate I suppose) to fact check Wikipedia. I'm rather curious as to how that will turn out if it is done in a non-biased and total in situ way. I imagine it would take a great deal of work and then there are people who will lay claim as to it being constantly changed but the point that I'm considering is what is the accuracy level at a p

      • I agree. But imagine what a difficult task that would be. According to Wiki itself, it contains 14 million articles. You would have to find experts in each of the fields to check each article, which are supposedly the people who wrote them in the first place. Hopefully, anyway.

        • Re:This just in (Score:5, Informative)

          by timeOday (582209) on Sunday February 07, 2010 @11:25AM (#31052540)
          This has almost nothing to do with websites like Wikipedia, which people actually look at. Spammers create huge sets of keyword-laden wikis and other web pages, which all link to each other, for the purpose of fooling search engines that use PageRank and similar algorithms. To search engines, it's hard to differentiate this from a popular site with lots of users. But when you see these pages you know it immediately, like spam in your inbox.

          It is no different than domain names. Type a random sequence of 4 characters .com, and the vast majority of times you will get some fairly innocuous spam site, e.g. dneo.com [dneo.com] (picked at random), with no real content.

          But it doesn't interfere much with most poeple's use of the web.

      • Re: (Score:3, Informative)

        by gumbi west (610122)
        Nature did a study [cnet.com] and found Wikipedia was slightly less reliable than Britannica. The editors of Britannica objected to the methods, and I'm not sure I like them ether, but I think it was an honest attempt. I think all of the articles were science articles and this is from 2005, so it is not exactly what you were asking for (its not 2010).
        • Re: (Score:3, Informative)

          by ChipMonk (711367)
          Randy Pausch, after writing for the World Book Encyclopedia, declared that he had no problem with Wikipedia's quality controls.

          But don't watch his Last Lecture for just that...
        • Re: (Score:3, Interesting)

          by cgenman (325138)

          I'm not surprised. Wikipedia is great for niche articles like finding out what happened to Star Trek, The Experience [wikipedia.org]. Such niche information wouldn't be viable for Britannica to cover, but anyone with an interest can put up an article about it. If you want real articles on things like science, DON'T GO TO AN ENCYCLOPEDIA. They're about as good at teaching you usable science as they are teaching you how to play the flute.

      • Re: (Score:3, Informative)

        by justin12345 (846440)
        I seem to remember that a while back someone (as they say on Fark.com, I'm too drunk to look it up) did a comparison of Encyclopedia Britannica to Wikipedia. Their conclusions were based on a random sampling of 500 topics, with the wiki compared to the Brit article of the same subject. The conclusion was that Britannica contained slightly less errors per entry, but significantly less data per entry as well. The study didn't address the issue of Wikipedia's comparatively massive number of entries, and it did
      • Re: (Score:3, Informative)

        by PaganRitual (551879)

        I think you've slightly missed the point. When they say bogus they don't mean the content on a site like Wikipedia, although that site provides a useful example to explain my point. Try to go to Wikipedia, except do a typo.

        http://www.wikapedia.org/ [wikapedia.org]
        http://www.wikipeedia.org/ [wikipeedia.org]
        http://www.wickipedia.org/ [wickipedia.org]
        http://www.wikepedia.org/ [wikepedia.org]

        I imagine this is likely to be what they're talking about when they say bogus or a scam. Take any of your favourite websites and slightly misspell the URL. Then extrapolate out over every

    • This comment is bogus.
    • Re:This just in (Score:5, Informative)

      by VoltageX (845249) on Sunday February 07, 2010 @07:06PM (#31055858)
      Sorry to hijack this, but http://securitylabs.websense.com/content/Assets/WSL_ReportQ3Q4FNL.PDF [websense.com] seems to be the direct link to the paper.
  • by Anonymous Coward on Sunday February 07, 2010 @06:26AM (#31051392)

    I got ripped in 2 weeks. learn how with secret juice formula.

  • by nicknamenotavailable (1730990) on Sunday February 07, 2010 @06:26AM (#31051394)
    That is so untrue. There is value in what I write.
  • by Junior J. Junior III (192702) on Sunday February 07, 2010 @06:31AM (#31051412) Homepage

    We know.

    • Re: (Score:3, Insightful)

      by Dilligent (1616247)
      +5 Insightful, not Funny, nope.. insightful, only on slashdot could such a thing happen. Part of the reason i love it as much as i do, oh and while you're here: I'm a prince from the far lands of absurdistan and would like to ask if you would like to [insert random passage of text here]
      • by Culture20 (968837)

        I'm a prince from the far lands of absurdistan and would like to ask if you would like to [insert random passage of text here]

        You'll have a better chance of getting me to insert something if you said you were a princess.

        I'm sorry, that's Valentine's day anticipation talking.

      • The reason people mod funny topics Insightful or Informative is that those are both karmically positive mods. Funny is neutral. Off topic, Troll, and Flaimbait are karmically negative. I'm not sure about Overrated and Underrated, they might be karmically neutral too. So when some mod wants to give someone good karma for telling a funny joke, or if modding a joke "Insightful" increases the funniness of the joke, they tend to do so.
    • Re: (Score:2, Funny)

      by Anonymous Coward

      No..... this is SPARTA!!!!

  • by onion2k (203094) on Sunday February 07, 2010 @06:32AM (#31051416) Homepage

    95% of user-generated posts on Web sites are spam or malicious.

    The fact is that there are millions of old blogs, unused forums, ancient guestbooks, etc that are easy to spam automatically. While it might very well be true that 95% of comments on the internet are spam of some sort, they're probably read by a tiny fraction of internet users. People tend to stick to about a dozen big sites that get very little rubbish posted on them at all.

    Car analogy: 95% of cars are rusty old heaps of crap that can't move. Thankfully they're in scrapyards and not on the roads.

    • by mwvdlee (775178) on Sunday February 07, 2010 @06:43AM (#31051454) Homepage

      95% of humans are over 100 years old. Most of them are dead.

    • Re: (Score:2, Interesting)

      I don't assume they included Wikipedia in the "user generated" category, otherwise that much non-bogus content would have definitely tipped the scale a bit.

      In my personal experience however, even without wikipedia, I have not come across that much bogus stuff on forums and random comments.

      • Re: (Score:3, Funny)

        by Anonymous Coward

        Are you implying that Wikipedia is not bogus content?

    • Re: (Score:3, Interesting)

      by Yaur (1069446)
      More likely they are generalizing the activity they are seeing on their fake/honey pot sites on the internet as a whole.
    • by CAIMLAS (41445) on Sunday February 07, 2010 @07:11AM (#31051542) Homepage

      A lot of forum software works well, until it gets "behind the curve", and then the site maintainer pulls the site*.

      By "behind the curve" I mean any of the following can/does happen:
      1) Forum software gets out of date and user fails to upgrade due to modifications or similar, resulting in spam.
      2) Forum software gets popular without having a good security model and/or update cycle, resulting in exploits.
      3) Gets inundated with comment approvals and the forum (or blog) gets ignored or set to auto-allow out of frustration.

      * By "pulls the site" I mean "abandons it but doesn't take it down". That's typically the end result.

      It's a lot of work to maintain your own forum and/or blog: managing spam can and will take hours+ from your day if you've not got a good automated and/or textual way to deal with it: web interfaces are clumsy.

      Car analogy: 95% of cars are rusty old heaps of crap that can't move. Thankfully they're in scrapyards and not on the roads.

      Yet, unlike most of those cars, the actual blog content is not necessarily useless. I have seen quite a few abandoned blogs and/or forums which have 3-10 year old information on them which is by no means useless; it's just getting buried.

      Digital archeologists of the future will probably have to figure out an automated way to prune back the spam to find the actual Internet, the way things are going.

      Consider: if spam accounts for 95% of all user-generated content, and said user-generated content is actually a non-trivial percentage of all actual content online (believable), consider how much bandwidth gets wasted by these spammers. (Thankfully, I suspect most of the 'user generated content spam' doesn't show up on the first couple search page results so it's not going to likely be perused with regularity - unless it's more heavily seeded on topics common folks search.)

      • Re: (Score:3, Informative)

        Thankfully, I suspect most of the 'user generated content spam' doesn't show up on the first couple search page results

        That's what I was going to say. Unless people are searching for cialis or real replica watches or VIaGrA, they shouldn't see the spam itself. I spend a lot of time browsing all sorts of different sites and it's very rare for me to ever see spam*. How I've avoided the 95% of the web that is spam? I must have some hidden talent, who knows.

        *The exception being the occasional google search where instead of information about a thing, I get three pages of people trying to sell the thing (try "lp gas generator" )

        -

    • by Kugrian (886993) on Sunday February 07, 2010 @07:18AM (#31051568) Homepage

      How much of it is user generated content that's copied from one site onto a zillion others?

      • How much of it is user generated content that's copied from one site onto a zillion others?

        Or onto the same site. It amazes me at the number of YouTube videos which people rip then upload back to YouTube as their own. I like to think of this need to be the person who provides the video as "Insufficient Attention Disorder".

    • Re: (Score:2, Informative)

      by dosius (230542)

      Sturgeon's Law comes into play, as always. 90% of everything is crud

      -uso.

    • by Hognoxious (631665) on Sunday February 07, 2010 @08:47AM (#31051858) Homepage Journal

      People tend to stick to about a dozen big sites that get very little rubbish posted on them at all.

      And when they want a change from that, they come here.

    • by dzfoo (772245)

      Irrelevent [ir'-rel-e-vent] - Adjective:
              The wasteful use or application of a cooling device when not strictly necessary.

              USAGE: "Larry left the air conditioning unit on all throughout winter; its power consumption was irrelevent."
              ORIGIN: Teh Intarwebz.

    • That makes it sound a little too innocuous for my tastes. It's not like 95% of emails are spam, but they're all sitting on a server somewhere and no one has to deal with them, so it's fine. For your car analogy to work for me, it would have to be more like "95% of cars are rusty old heaps of crap that can't move. They're littering the highways, but we can steer around them."

      My mail server is seeing a little less than this-- only 85% of incoming email is spam. Still, that means that I have to filter all

    • By the way, what about "numbers posts"? There are cases of spam posts being made that are very similar in style to the transmissions of numbers stations [wikipedia.org] - just strings of short blocks of numbers. Has anyone ever found out what those are about? My guess is that it's some botnet's C&C channel but that's just a guess.
  • I don't think I've seen so many floating ads in a theoretically-legitimate site before. When I opened it, it grayed out the window and popped up trying to get me to fill out something...scrolling around, the mouse runs into these little green underlined words that pops up an ad thing you have to click to close...and after about twenty seconds, another floating window scrolled down the screen and parked in the middle.

    That's a little too much cruft for me. They can keep their content, I don't want it.
    • by kvezach (1199717)
      It's just proving its own point.
    • by vtcodger (957785)

      For me in konqueror, the site rendered in text that was overwritten in a few seconds by a pure black page with a couple of itsy white boxes with green text which then morphed into a pure featureless white page with no scrollbars. Does that count as "bogus and/or spam?"

  • by syousef (465911) on Sunday February 07, 2010 @06:49AM (#31051472) Journal

    ...95% probability actually. So I didn't bother.

  • by Nyder (754090) on Sunday February 07, 2010 @06:55AM (#31051496) Journal

    I guess that goes in hand with 95% of kdawson's submissions being crap and not worth the time.

  • Every single hour the Internet HoneyGrid scans some 40 million websites for malicious code as well as 10 million emails for unwanted content and malicious code.

    So 40 million sites per hour is 960 million sites per day. While wikipedia says that there over 25 billion pages [wikipedia.org] but can that number be accurate?

  • The message... (Score:4, Insightful)

    by Anonymous Coward on Sunday February 07, 2010 @07:01AM (#31051516)

    The subtext of this article is that you should forget about letting users create content on the Internet, because all they do is create junk and try to scam good honest people. Just leave the content creation to the institutions, and media conglomerates who know how to do it. It's safer that way, and you'll like it.

    Well, I don't care if 99% of user-generated content it is crap; people need to be free to create it, because some individual in the other 1% may just come up with the cure for cancer, and despite whatever it does to Big Pharma's profits, everyone needs to be able to hear about it.

    • Re:The message... (Score:4, Interesting)

      by Yaur (1069446) on Sunday February 07, 2010 @07:30AM (#31051612)
      the subtext is, the internet is dangerous so you need to buy their product.
    • Re: (Score:3, Informative)

      by jgrahn (181062)

      The subtext of this article is that you should forget about letting users create content on the Internet, because all they do is create junk and try to scam good honest people. Just leave the content creation to the institutions, and media conglomerates who know how to do it. It's safer that way, and you'll like it.

      You're reading too much into it, and you are also misled by the misquote in the ,/ title. The article said "95% of user-generated posts on Web sites are spam or malicious", probably meaning posti

  • "95% of User Generated Content is either malicious in nature or spam"

    "Never attribute to malice that which can be adequately explained by stupidity"

    So I read "95% of User Generated Content is stupid" I agree,  count me in.
  • by Aussie (10167) on Sunday February 07, 2010 @07:36AM (#31051624) Journal

    "Ninety percent of everything is crud."

    http://en.wikipedia.org/wiki/Sturgeon's_Law [wikipedia.org]

  • I would say that 95% of email is commercial in nature, and not "user generated content". To me "UGC" is something that people who are actually active users (consumers as well as creators) of a service generate... not something injected into the service from outside by predators.

  • by Arancaytar (966377) <arancaytar.ilyaran@gmail.com> on Sunday February 07, 2010 @07:38AM (#31051634) Homepage

    Out of the 5% that are not generated by spambots, 99% is still generated by idiots.

  • by osu-neko (2604) on Sunday February 07, 2010 @07:44AM (#31051652)

    ... a staggering 95% of User Generated Content is either malicious in nature or spam.

    Considering 95% of internet users are malicious (see GIFT [penny-arcade.com]), it's hardly staggering that 95% of user generated content is malicious too. :p

  • 95% is intentionally bad, the other 5% is just shit
  • If you use an ISP that hijacks unregistered domains, such as Virgin, to land you on their search page then that statistic goes up to 99.99%

    Phillip.

  • As I discovered wit on of my sites a few years ago, someone had installed a site within mine and in investigating it I discovered there are plenty other siets with teh same issue, many even on Source Forge.

    My advice is to do an inventory of the files on your site, to see if you to have such a problem.

  • by Antique Geekmeister (740220) on Sunday February 07, 2010 @09:25AM (#31051970)

    We've seen this before, with Usenet, BBS's, MUD's, and Email. The advertisers, and the trolls, find it easy to spew their material across many thousands of targets, and get enough money or gratification from doing so that it funds their efforts. It doesn't even have to make money: they just have to believe that it _can_ make money, and the professionals will simply continue.

    Whatever would make anyone think that "User Generated Content" forums would be any different?

    • Re: (Score:2, Informative)

      by Anonymous Coward

      BBS's? Realy? I don't remember a single instance of "spam" on any BBS during the golden years. Perhaps that's because individual systems were far easier to control and moderate.

      USENET fell because it was never designed with any real moderation or control in mind. Which was great as long as the users played nicely together. But after the Eternal September and the coming of gold diggers like Cantor & Siegel, the whole system fell apart.

      If you want the flood of garbage to stop, you need someone standi

  • More like 99% if you include the non malicious stupidity into the mix.

  • by gmuslera (3436) on Sunday February 07, 2010 @11:26AM (#31052552) Homepage Journal
    The original article [daniweb.com] say that they scan 40 millon sites an 10 millon emails each hour, and they are refering to thjis report [slashdot.org] (that also links to the full info, and video of the presentation of that info).

    Matters a lot how they get their "sample", honeypots, honeyclients, reputation systems and "advanced grid computing systems" (whatever it is). What is feeding information to that sample? Not old sites with rightful content sitting around since years ago, but in good part spammers, botnets, and people that want that your pc forms part of one. And mail is already known that is 95% spam. The sample is just too rigged to be at all related with what really is in internet or what you have some chance to see.

  • by cenc (1310167) on Sunday February 07, 2010 @11:38AM (#31052604) Homepage

    Emails spam aside, I would say that most of that is Google's fault. The other 95% of content created on the internet is in an attempt to SEO web sites in the other 5% of the internet that people do potentially read or visit. Google encourages web masters to get in bound links, thus the whole industry of spamming sites, directories, blog feed sites, and so on that have one purpose and one purpose only: getting as many anchor text links pointed to sites as possible so they will rank higher in Google for key terms.

    • by pclminion (145572)
      IMHO, people who run into link farms are searching for really spammy shit in the first place. The idea of basing page ranking on the link structure of the web is so fundamentally correct that there's no real alternative. If your results are useless because they're filled with spam, then you are searching for some really stupid shit.
      • by cenc (1310167)

        I totally agree. My point was more about how Google is encouraging the creation of a mess of content designed only for the consumption of Google Bots, and in fact most are never visited or seen by humans.

        Yes, links are the fundamental core of how the internet works. It is just the rewarding of sites for producing the most links possible. If somehow Google say decided to use a different method, say how often a site was visited for valuing the links out, in fairly short order millions of link farming sites wo

  • 95% chance (Score:4, Funny)

    by kylben (1008989) on Sunday February 07, 2010 @12:04PM (#31052732) Homepage

    I take it that means there is a 95% chance that this report is bogus, or malicious?

  • by RudeIota (1131331) on Sunday February 07, 2010 @12:36PM (#31052874) Homepage
    I'll have to change it from "Everything" to "95% of everything". :-(
  • it turns out that 95% of the Slashdot users think the report was about all internet content instead of just user generated content and they responded to that instead.

    No big surprise there, huh?

  • ....malicious and as useless as spam.
  • of the remaining 5%, 95% of that is also SPAM, or malicious or something? We already know about SPAM percentages, so I assume this is measuring something new, like non-automated emails contain huge amounts of things that people consider SPAM.
  • by Animats (122034) on Sunday February 07, 2010 @01:59PM (#31053470) Homepage

    First, here's the actual report [websense.com], without any form to fill out. (Backup copy at WebCitation. [webcitation.org]) Amusingly, the report is clearly written for a target audience who prints out PDF files on paper. It contains charts in tiny type.

    The report covers the usual email issues, which will be familiar to Slashdot readers. New issues for 2009 are the following:

    • Anti-virus companies are slowing down. Average time to "patch: (really, release a new identifying signature) has increased from 22 hours to 46 hours. By the time the anti-virus companies catch up, the attack has changed. This indicates the uselessness of signature-based attack detection.
    • More attacks are successfully targeting search engines. Google is more vulnerable to hacked SEO than previously thought. Google Trends, which drives Google Suggest (the command completion in Google search boxes) is extremely vulnerable. (I've commented on that before.) "The average number of malicious sites in any Google search using hot/trending topics (as ranked by Google) by the end of the year stood at 13.7% for the top 100 results."
    • The "long tail" of the Web is becoming less important as more user generated content moves to the top 100 sites. More attacks now involve injection of hostile code into user generated content on major sites.

    The report identifies Google's weak security in their search engine as a problem. Microsoft's Internet Explorer remains a problem, of course, but now Google is now the attack target of choice to drive traffic to a site that can attack the browser. Google still, apparently, hasn't figured out a good way to prevent link farms from driving up search position.

  • I think that figure is way too low if they include spam in the equation. I don't think that Spam is 'user generated content' - it is more likely 'user targeted content'. Maybe I need to frag M$ into this as an example: 'Microsoft dominates 100% of the Windows Desktop Market'...
  • I'm looking at you Scribd. Why Google can't figure out how to push your spam results off the front result page puzzles me since they have a method to keep the Wikipedia clones off the front page. I can't wait for you to experience the same fate.

  • This article has a 95% chance of being bogus.

  • Here is the report's front door; to read the actual report you'll have to give up name, rank, and serial number.

    This being Slashdot - how was that sentence even relevant?

  • I've seen cases where spammers, unable to reliably defeat the administrators of a popular forum, will simply copy the information on that forum onto another forum and then spam the hell out of it. Forums on the use of Microsoft tools seem to be particularly popular targets.

In 1869 the waffle iron was invented for people who had wrinkled waffles.

Working...