Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
Security IT

95% of User-Generated Content Is Bogus 192

coomaria writes "The HoneyGrid scans 40 million Web sites and 10 million emails, so it was bound to find something interesting. Among the things it found was that a staggering 95% of User Generated Content is either malicious in nature or spam." Here is the report's front door; to read the actual report you'll have to give up name, rank, and serial number.
This discussion has been archived. No new comments can be posted.

95% of User-Generated Content Is Bogus

Comments Filter:
  • by Anonymous Coward on Sunday February 07, 2010 @06:32AM (#31051420)

    BS in the summary. TFA says:

    "95% of user-generated posts on Web sites are spam or malicious."

    The user generated content is valid, it's just the "comments" sections which are getting hit by spambots. If this is front page news, then the fact that 95% of email is spam is news as well. Nothing to see here. Move along.

  • by Anonymous Coward on Sunday February 07, 2010 @06:48AM (#31051468)

    71% of statistics are useless ...

  • by Anonymous Coward on Sunday February 07, 2010 @07:08AM (#31051536)

    "Print version" (less cruft): http://www.daniweb.com/forums/printthread258407.html

    Adblock Plus seems to get rid of most of the cruft on the initial page, by the way.

  • It seems that at least as well as anyone can estimate, the current population really is [wikipedia.org] about 5% of the total humans who've ever lived.

  • by dosius ( 230542 ) <bridget@buric.co> on Sunday February 07, 2010 @08:30AM (#31051822) Journal

    Sturgeon's Law comes into play, as always. 90% of everything is crud

    -uso.

  • by mrsquid0 ( 1335303 ) on Sunday February 07, 2010 @09:37AM (#31052018) Homepage

    Depending on what you assume about paleolithic populations about 15%-25% of all the humans who ever lived are alive today. That means that roughly one our of every five people who ever walked the Earth have the potential to post to slashdot.

  • Re:The message... (Score:3, Informative)

    by jgrahn ( 181062 ) on Sunday February 07, 2010 @10:09AM (#31052142)

    The subtext of this article is that you should forget about letting users create content on the Internet, because all they do is create junk and try to scam good honest people. Just leave the content creation to the institutions, and media conglomerates who know how to do it. It's safer that way, and you'll like it.

    You're reading too much into it, and you are also misled by the misquote in the ,/ title. The article said "95% of user-generated posts on Web sites are spam or malicious", probably meaning postings in forums, "comments" and stuff like that. They're not saying plain web pages by *authors* who aren't faceless corporation drones are crap.

  • by Anonymous Coward on Sunday February 07, 2010 @10:12AM (#31052160)

    BBS's? Realy? I don't remember a single instance of "spam" on any BBS during the golden years. Perhaps that's because individual systems were far easier to control and moderate.

    USENET fell because it was never designed with any real moderation or control in mind. Which was great as long as the users played nicely together. But after the Eternal September and the coming of gold diggers like Cantor & Siegel, the whole system fell apart.

    If you want the flood of garbage to stop, you need someone standing at the door with a baseball bat. The days of the internet "playing nicely together" ended back in 1995.

  • Re:This just in (Score:3, Informative)

    by gumbi west ( 610122 ) on Sunday February 07, 2010 @10:44AM (#31052324) Journal
    Nature did a study [cnet.com] and found Wikipedia was slightly less reliable than Britannica. The editors of Britannica objected to the methods, and I'm not sure I like them ether, but I think it was an honest attempt. I think all of the articles were science articles and this is from 2005, so it is not exactly what you were asking for (its not 2010).
  • by steelfood ( 895457 ) on Sunday February 07, 2010 @10:52AM (#31052364)

    It's plausible that the past 100 years has had more people alive than all of human history

    And that would still make the current population only a little more than 50% of all that people that have been alive.

    Except considering that homo sapiens have been around for several hundred thousand years, I think your estimates for the number of humans that have ever walked the planet may be a bit low.

  • Re:This just in (Score:5, Informative)

    by timeOday ( 582209 ) on Sunday February 07, 2010 @11:25AM (#31052540)
    This has almost nothing to do with websites like Wikipedia, which people actually look at. Spammers create huge sets of keyword-laden wikis and other web pages, which all link to each other, for the purpose of fooling search engines that use PageRank and similar algorithms. To search engines, it's hard to differentiate this from a popular site with lots of users. But when you see these pages you know it immediately, like spam in your inbox.

    It is no different than domain names. Type a random sequence of 4 characters .com, and the vast majority of times you will get some fairly innocuous spam site, e.g. dneo.com [dneo.com] (picked at random), with no real content.

    But it doesn't interfere much with most poeple's use of the web.

  • by gmuslera ( 3436 ) on Sunday February 07, 2010 @11:26AM (#31052552) Homepage Journal
    The original article [daniweb.com] say that they scan 40 millon sites an 10 millon emails each hour, and they are refering to thjis report [slashdot.org] (that also links to the full info, and video of the presentation of that info).

    Matters a lot how they get their "sample", honeypots, honeyclients, reputation systems and "advanced grid computing systems" (whatever it is). What is feeding information to that sample? Not old sites with rightful content sitting around since years ago, but in good part spammers, botnets, and people that want that your pc forms part of one. And mail is already known that is 95% spam. The sample is just too rigged to be at all related with what really is in internet or what you have some chance to see.

  • by greyhueofdoubt ( 1159527 ) on Sunday February 07, 2010 @12:19PM (#31052786) Homepage Journal

    Thankfully, I suspect most of the 'user generated content spam' doesn't show up on the first couple search page results

    That's what I was going to say. Unless people are searching for cialis or real replica watches or VIaGrA, they shouldn't see the spam itself. I spend a lot of time browsing all sorts of different sites and it's very rare for me to ever see spam*. How I've avoided the 95% of the web that is spam? I must have some hidden talent, who knows.

    *The exception being the occasional google search where instead of information about a thing, I get three pages of people trying to sell the thing (try "lp gas generator" )

    -b

  • Re:This just in (Score:3, Informative)

    by ChipMonk ( 711367 ) on Sunday February 07, 2010 @01:00PM (#31053014) Journal
    Randy Pausch, after writing for the World Book Encyclopedia, declared that he had no problem with Wikipedia's quality controls.

    But don't watch his Last Lecture for just that...
  • by Animats ( 122034 ) on Sunday February 07, 2010 @01:59PM (#31053470) Homepage

    First, here's the actual report [websense.com], without any form to fill out. (Backup copy at WebCitation. [webcitation.org]) Amusingly, the report is clearly written for a target audience who prints out PDF files on paper. It contains charts in tiny type.

    The report covers the usual email issues, which will be familiar to Slashdot readers. New issues for 2009 are the following:

    • Anti-virus companies are slowing down. Average time to "patch: (really, release a new identifying signature) has increased from 22 hours to 46 hours. By the time the anti-virus companies catch up, the attack has changed. This indicates the uselessness of signature-based attack detection.
    • More attacks are successfully targeting search engines. Google is more vulnerable to hacked SEO than previously thought. Google Trends, which drives Google Suggest (the command completion in Google search boxes) is extremely vulnerable. (I've commented on that before.) "The average number of malicious sites in any Google search using hot/trending topics (as ranked by Google) by the end of the year stood at 13.7% for the top 100 results."
    • The "long tail" of the Web is becoming less important as more user generated content moves to the top 100 sites. More attacks now involve injection of hostile code into user generated content on major sites.

    The report identifies Google's weak security in their search engine as a problem. Microsoft's Internet Explorer remains a problem, of course, but now Google is now the attack target of choice to drive traffic to a site that can attack the browser. Google still, apparently, hasn't figured out a good way to prevent link farms from driving up search position.

  • Re:Scribd (Score:1, Informative)

    by Anonymous Coward on Sunday February 07, 2010 @05:20PM (#31054910)

    I'm looking at you Scribd. Why Google can't figure out how to push your spam results off the front result page puzzles me since they have a method to keep the Wikipedia clones off the front page. I can't wait for you to experience the same fate.

    The only sense in which it's scribd's problem, is that scribd's chosen to use Flash/PDF as a DRM mechanism. (So yeah, it's 100% scribd's problem, but sometimes scribd's the only place on which certain content can be found.)

    All Scribd (and docstoc, for that matter) does is take someone else's PDF, wrap it in a bucket of Flash DRM shit, and publish it.

    (Proof that it's DRM? I tried to print an 80-page manual out of it -- turns out that if it takes more than 60 seconds for scribd [slashdot.org] to print it out, the scribd Flash fucklet autokills the print job, just in case you had the audacity to do something like "print to PDF". 20-25 pages off a slow PC. 60-70 pages off a fast PC. I killed half a deciduous forest before I figured out WTF was going on.

    All so that scrid [blogspot.com] could prevent you from doing a "SaveAs". Scribd, and all services that merely wrap Flash around otherwise-downloadable content formats, and I'm looking at You,Tube, are teh suck, even when they're the only means by which the content can be found.)

  • Re:This just in (Score:3, Informative)

    by justin12345 ( 846440 ) on Sunday February 07, 2010 @06:23PM (#31055490)
    I seem to remember that a while back someone (as they say on Fark.com, I'm too drunk to look it up) did a comparison of Encyclopedia Britannica to Wikipedia. Their conclusions were based on a random sampling of 500 topics, with the wiki compared to the Brit article of the same subject. The conclusion was that Britannica contained slightly less errors per entry, but significantly less data per entry as well. The study didn't address the issue of Wikipedia's comparatively massive number of entries, and it didn't address the fact that a large number of the wiki articles are about topics Britannica would be foolish to waste the paper to print. [wikipedia.org]
  • Re:This just in (Score:3, Informative)

    by PaganRitual ( 551879 ) <splaga@nOSpam.internode.on.net> on Sunday February 07, 2010 @06:48PM (#31055712)

    I think you've slightly missed the point. When they say bogus they don't mean the content on a site like Wikipedia, although that site provides a useful example to explain my point. Try to go to Wikipedia, except do a typo.

    http://www.wikapedia.org/ [wikapedia.org]
    http://www.wikipeedia.org/ [wikipeedia.org]
    http://www.wickipedia.org/ [wickipedia.org]
    http://www.wikepedia.org/ [wikepedia.org]

    I imagine this is likely to be what they're talking about when they say bogus or a scam. Take any of your favourite websites and slightly misspell the URL. Then extrapolate out over everyones favourite, popular websites. Then realise that there are probably dozens of variations for each one.

  • Re:This just in (Score:5, Informative)

    by VoltageX ( 845249 ) on Sunday February 07, 2010 @07:06PM (#31055858)
    Sorry to hijack this, but http://securitylabs.websense.com/content/Assets/WSL_ReportQ3Q4FNL.PDF [websense.com] seems to be the direct link to the paper.

It's a naive, domestic operating system without any breeding, but I think you'll be amused by its presumption.

Working...