Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
Security

The Problem of Search Engines and "Sekrit" Data 411

Nos. writes: "CNet is reporting that not only Google but other search engines are finding password and credit card numbers while doing its indexing. An interesting quote from the article by Google: 'We define public as anything placed on the public Internet and not blocked to search engines in any way. The primary burden falls to the people who are incorrectly exposing this information. But at the same time, we're certainly aware of the problem, and our development team is exploring different solutions behind the scenes.'" As the article outlines, this has been a problem for a long time -- and with no easy solution in sight.
This discussion has been archived. No new comments can be posted.

The Problem of Search Engines and "Sekrit" Data

Comments Filter:
  • by Bonker ( 243350 ) on Monday November 26, 2001 @12:48PM (#2613791)
    I don't see what's so hard about this problem. It's very simple... don't keep data of any kind on the web server. That's what firewalled, password/encryption protected DB servers are for.
  • by posmon ( 516207 ) on Monday November 26, 2001 @12:50PM (#2613805) Homepage
    just because google is only picking them up now doesn't mean that they haven't been there for years!

    how can someone be so blatantly stupid as to store anything other than their web content, never mind credit card details, in their published folders? how? they redirected my documents to c:\inetpub\wwwroot\%username%\...???

  • by Kr3m3Puff ( 413047 ) <me&kitsonkelly,com> on Monday November 26, 2001 @12:51PM (#2613813) Homepage Journal
    The big complaint of the article is that Google is searching for new types of files, instead of HTML. If some goofball left some link to a Word document with his passwords in it, he gets what he deserves.

    The quote from that article about Google not thinking about this before the put it forward is idiotic. How can Google be responsible for documents that are in the public domain, that anyone can get to by typing a URL into a browser. It isn't insecure software, just dumb people...

  • by tomblackwell ( 6196 ) on Monday November 26, 2001 @12:51PM (#2613815) Homepage
    ...obey the Robot Exclusion Standard [robotstxt.org]. This is not a big secret, and is linked to by all major search engines. Anyone wishing to exclude a well-behaved robot (like those of major search engines) can place a small file on their site which controls the behaviour of the robot. Don't want a robot in a particular directory? Then set your robots.txt up correctly.

    P.S. Anyone keeping credit card info in a web directory that's accessible to the outside world should really think long and hard about getting out of business on the internet.
  • by Karma 50 ( 538274 ) on Monday November 26, 2001 @12:56PM (#2613856) Homepage
    Google has just added the ability to index PDFs, word docs etc. So, yes, the information was there before, but now it is much easier to find.
  • by Xerithane ( 13482 ) <xerithane@@@nerdfarm...org> on Monday November 26, 2001 @12:56PM (#2613857) Homepage Journal
    It is a burden, but the responsibility does not lie on a crawling engine. You could check any 10 digit number (and expdate with a lune check if available) but with all the different formatting done on CC numbers (XXXX-XXXX-XXXX-XXXX, XXXXXXXXXXXXXXXX, etc) the algorithm could get ugly to maintain.

    I don't see why Google or any other search engine has to even acknowledge this problem, it's simply Someone Else's Problem. If I was paying a web team/master/monkey any money at all and found out about this, heads would roll. It seems that even thinking of pointing a finger at google is the same tactic Microsoft is doing at those "irresponsible" individuals pointing out security flaws.

    If anything Google is providing them a service by telling them about the problem.
  • by Neon Spiral Injector ( 21234 ) on Monday November 26, 2001 @12:57PM (#2613863)
    In published folders? How about on machines that are on the Internet at all.

    In an ideal setup the machine storing credit card information wouldn't have a network card, or speak any networking protocal. You'd have a front end secure webserver. That machine would would pass the credit card information to the backend across a serial link. The backend machine would process the card and return the status. The CC data would only be a one way transfer, with no way of retrieving it back off of that machine.
  • by KyleCordes ( 10679 ) on Monday November 26, 2001 @01:01PM (#2613900) Homepage
    [know how Basic Authentication works before hosting web sites]

    ... and know that it's a wholly inadequate way of "protecting" credit card numbers!
  • by Milican ( 58140 ) on Monday November 26, 2001 @01:06PM (#2613942) Journal
    "Webmasters should know how to protect their files before they even start writing a Web site"

    That quote sums up the exact problem. It's not googles fault for finding out what an idiot the web merchant was. As a matter of fact I thank google for exposing this problem. This is nothing short of gross negligence on the part of any web merchant to have any credit card numbers publicly accessible in any way. There is no reason this kind of information should not be under strong security.

    To have a search engine discover this kind of information is dispicable, unprofessional, and just plain idiotic. As others have mentioned these guys need to get a firewall, use some security, and quit being such incredible fools with such valuable information. Any merchant who exposes credit card information through the stupidity of word documents, or excel spreadsheets on their public web server, or any non-secure server of any kind deserves to get sued into oblivion. Although, people usually don't like lawyers I'm really glad we have them in the US because they help stop this kind of stuff. Too many lazy people don't think its in their best interest to protect the identity, or financial security of others. I'm glad lawyers are here to show them the light :)

    JOhn
  • by Anonymous Coward on Monday November 26, 2001 @01:08PM (#2613953)
    What ignorance to security. Security is a problem that cannot be solved with technology alone. If you think encryption and/or firewalls will prevent this sort of issue, you totally misunderstand the purpose/capabilities of these tools. In this case, privacy is better protected through people (education) and process (security policy). If I write bad code that exposes credit card numbers (irregardless of whether I store data on the web server, use encryption, and use firewalls), the numbers will still be disclosed.
  • by ryanvm ( 247662 ) on Monday November 26, 2001 @01:12PM (#2613976)
    The Robot Exclusion Standard (e.g. robots.txt) is mainly useful for making sure that search engines don't cache dynamic data on your web site. That way users don't get a 404 error when clicking on your links in the search results.

    You should not be using robots.txt to keep confidential data out of caches. In fact, most semi-intelligent crackers would actually download the robots.txt with the specific intention of finding ill-hidden sensitive data.

  • by hattig ( 47930 ) on Monday November 26, 2001 @01:14PM (#2613986) Journal
    It is a simple rule of the web - any directory or subdirectory thereof that is configured to be accessible via the internet (either html root directories, ftp root directories, gnutella shared directories, etc) should be assumed to be publically accessible. Do not store anything that should be private in these areas.

    Secondly, it appears that companies are storing credit card numbers (a) in the clear and (b) in these public areas. These companies should not be allowed to trade on the internet! That is so inept when learning how to use pgp/gpg takes no time at all, and simply storing the PGP encrypted files outside the publically accessible filesystem is just changing the line of code that writes to "payments/ordernumber.asc" to "~/payments/ordernumber.asc" (or whatever). Of course, the PGP secret key is not stored on a publically accessible computer at all.

    But I shouldn't be giving a basic course on how to secure website payments, etc, to you lot - you know it or could work it out (or a similar method) pretty quickly. It is those dumb administrators that don't have a clue about security that are to blame (or their PHB).

  • by devnullkac ( 223246 ) on Monday November 26, 2001 @01:16PM (#2614001) Homepage
    Near the end of the article, there's a quote from Gary McGraw:
    The guys at Google thought, 'How cool that we can offer this to our users' without thinking about security. If you want to do this right, you have to think about security from the beginning and have a very solid approach to software design and software development that is based on what bad guys might possibly do to cause your program grief.
    I must say I couldn't disagree more. To suggest that web site administrators can somehow entrust Google to implement the "obscurity" part of their "security through obscurity" plan is unrealistic. As an external entity, Google is really just another one of those "bad guys" and the fact that they're making your mistakes obvious without actually exploiting them is what people where I come from call a Good Thing.
  • Hint, Hint. (Score:2, Insightful)

    by A_Non_Moose ( 413034 ) on Monday November 26, 2001 @01:22PM (#2614035) Homepage Journal
    "The underlying issue is that the infrastructure of all these Web sites aren't protected."

    Agreed. Such lax security via the use of Frontpage, IIS, .asp and VBS in webpages.
    You might as well do and impression of Duncan in the movie Shrek "Ooo! Ooo! pick me! pick me!"

    Webmasters queried about the search engine problem said precautions against overzealous search bots are of fundamental concern.

    Uhh...they are "bots"...they don't think, they do.
    Does the bot say "Oh, look, these guys did something stupid...let's tell them about it."

    No, they search, they index and they generate reports.

    I've seen this problem crop up before when a coworker was looking for something totally unrelated on google.
    Sad part was it was an ISP I had respect for, despite moving from them to broadband.
    What killed my respect was at the very top of the pages was "Generated by Frontpage Express"...gack!

    I don't recall if it was a user account or one of their admin accounts...but for modem access I kind of stopped recommending them, or pointed out my observations.

    I have to parrot, and agree, with the "Human Error" but add "Computer accelerated and amplified".

    It happens, but that does not mean we have to like it, much less let it keep happening.
  • by Anonymous Coward on Monday November 26, 2001 @01:43PM (#2614155)
    It is a simple rule of the web - any directory or subdirectory thereof that is configured to be accessible via the internet (either html root directories, ftp root directories, gnutella shared directories, etc) should be assumed to be publically accessible. Do not store anything that should be private in these areas.

    am i misreading this, or are you suggesting that no private information should ever be accessible via the web, regardless of precautions taken during implementation?

    personally i think that's going a bit too far. for example, i'm fairly confident that my banking information, accessible online at my bank's website over https and protected by password, is safe. and if it's not? well, that's why the bank is insured.

  • by EccentricAnomaly ( 451326 ) on Monday November 26, 2001 @01:51PM (#2614193) Homepage
    C|Net seems to think the security problem is with Google:

    "The guys at Google thought, 'How cool that we can offer this to our users' without thinking about security. If you want to do this right, you have to think about security from the beginning and have a very solid approach to software design and software development that is based on what bad guys might possibly do to cause your program grief."

    This is crazy. Google isn't doing anything wrong. The problem is with the idiots who don't spend five minutes to check that their secret data is really hidden.

    This is like blaming a dog owner when his dog bites a burgler... er uh, nevermind.
  • Re:Oh Yeah? (Score:2, Insightful)

    by morcego ( 260031 ) on Monday November 26, 2001 @01:59PM (#2614243)
    Yes, it could. Actualy, it very trivial to do.
    I actualy tried to search for my credit card number, but only searched for 8 digits, in various forms (always the same digits, mind you), like:
    "XXXX XXXX"
    "XXXX-XXXX"
    "XXXXXXXX"

    Thanks god, nothing ...
    This is something I sugest you people to do. I would sugest using the last 8 digits, onde the "last 4 digits" are commonly used, but you won't be exposing something that is probably already everywhere.
  • Directory searches (Score:4, Insightful)

    by wytcld ( 179112 ) on Monday November 26, 2001 @02:03PM (#2614273) Homepage
    Some search engines don't just check the pages linked from other pages on the server, but also look for other files in the subdirectories presented in links.

    So if http://credit.com/ has a link to http://credit.com/signin/entry.html then these engines will also check http://credit.com/signin/ - which will, if directory indexes are on and there is no index.html page there, show all the files in the directory. In which case http://credit.com/signin/custlist.dat - your flatfile list including credit cards - gets indexed.

    So if you're going to have directory indexing on (which there can be valid reasons for) you really need to create an empty index.html file as the very next step each time you set up a subdirectory, even if you only intend to link to files within it.
  • by Webmonger ( 24302 ) on Monday November 26, 2001 @03:24PM (#2614911) Homepage
    Umm, I don't think that's how it happens. I think Google indexes the page and THEN the idiots put on the password protection.

    If Google accessed it via a special link, then Google would store that link, and you'd use that link, and you'd see it yourself.

    (another form of not-secret link:
    http://user:password@domain/path/file)
  • by Anonymous Coward on Monday November 26, 2001 @04:44PM (#2615460)
    Years ago cable companies cried foul that ordinary citizens were grabbing satelite communications off the air with their fancy 6' dishes and whatching whatever they wanted for free. The companies raised a big stink and tried to get people to pay for the content. The FCC said "tough luck buddy. If you put it out there then people have a perfect right to grab it." Since that time most satelite traffic has been encrypted.

    If you run a web site on the public internet then you should be paying attention to this basic fact: If you put it out there then people have a perfect right to grab it, even if you don't specifically tell them it's there. (I know FCC rulings don't apply, but the principle is the same). You should encrypt EVERYTHING you don't want people to see.

    Encryption is like your pants, it keeps people from seeing your privates. Hiding your URLs and hoping is like running realy, realy fast with no pants on - most people wont see your stuff, but there's always some bastard with a handy-cam.
  • Erm...so you're just going to magically verify them without knowing them?

    Here's a big hint: Not everyone is running some sort of completely automated, completely external validation service, and, duh, if they aren't, they need to know the numbers so they can actually charge the people.

    About the only reason they shouldn't be in your computers somewhere is if you're using a third party to handle all that stuff...and then they will be in their computer. They, rather obviously, have to exist somewhere to be send to the CC companies.

They are relatively good but absolutely terrible. -- Alan Kay, commenting on Apollos

Working...