The Problem of Search Engines and "Sekrit" Data 411
Nos. writes: "CNet is reporting that not only Google but other search engines are finding password and credit card numbers while doing its indexing. An interesting quote from the article by Google: 'We define public as anything placed on the public Internet and not blocked to search engines in any way. The primary burden falls to the people who are incorrectly exposing this information. But at the same time, we're certainly aware of the problem, and our development team is exploring different solutions behind the scenes.'" As the article outlines, this has been a problem for a long time -- and with no easy solution in sight.
A symptom of poor programming... (Score:4, Insightful)
how the FUCK is this possible? (Score:2, Insightful)
how can someone be so blatantly stupid as to store anything other than their web content, never mind credit card details, in their published folders? how? they redirected my documents to c:\inetpub\wwwroot\%username%\...???
Stopping Google won't stop the problem... (Score:5, Insightful)
The quote from that article about Google not thinking about this before the put it forward is idiotic. How can Google be responsible for documents that are in the public domain, that anyone can get to by typing a URL into a browser. It isn't insecure software, just dumb people...
Well Behaved Crawlers (Score:4, Insightful)
P.S. Anyone keeping credit card info in a web directory that's accessible to the outside world should really think long and hard about getting out of business on the internet.
Re:how the FUCK is this possible? (Score:2, Insightful)
Re:Simple but burdensome solution (Score:5, Insightful)
I don't see why Google or any other search engine has to even acknowledge this problem, it's simply Someone Else's Problem. If I was paying a web team/master/monkey any money at all and found out about this, heads would roll. It seems that even thinking of pointing a finger at google is the same tactic Microsoft is doing at those "irresponsible" individuals pointing out security flaws.
If anything Google is providing them a service by telling them about the problem.
Re:how the FUCK is this possible? (Score:2, Insightful)
In an ideal setup the machine storing credit card information wouldn't have a network card, or speak any networking protocal. You'd have a front end secure webserver. That machine would would pass the credit card information to the backend across a serial link. The backend machine would process the card and return the status. The CC data would only be a one way transfer, with no way of retrieving it back off of that machine.
Basic Authentication (Score:3, Insightful)
... and know that it's a wholly inadequate way of "protecting" credit card numbers!
Bring out the legal eagles (Score:4, Insightful)
That quote sums up the exact problem. It's not googles fault for finding out what an idiot the web merchant was. As a matter of fact I thank google for exposing this problem. This is nothing short of gross negligence on the part of any web merchant to have any credit card numbers publicly accessible in any way. There is no reason this kind of information should not be under strong security.
To have a search engine discover this kind of information is dispicable, unprofessional, and just plain idiotic. As others have mentioned these guys need to get a firewall, use some security, and quit being such incredible fools with such valuable information. Any merchant who exposes credit card information through the stupidity of word documents, or excel spreadsheets on their public web server, or any non-secure server of any kind deserves to get sued into oblivion. Although, people usually don't like lawyers I'm really glad we have them in the US because they help stop this kind of stuff. Too many lazy people don't think its in their best interest to protect the identity, or financial security of others. I'm glad lawyers are here to show them the light
JOhn
Re:A symptom of poor programming... (Score:1, Insightful)
Re:Well Behaved Crawlers (Score:5, Insightful)
You should not be using robots.txt to keep confidential data out of caches. In fact, most semi-intelligent crackers would actually download the robots.txt with the specific intention of finding ill-hidden sensitive data.
Web Sites are public by definition (Score:4, Insightful)
Secondly, it appears that companies are storing credit card numbers (a) in the clear and (b) in these public areas. These companies should not be allowed to trade on the internet! That is so inept when learning how to use pgp/gpg takes no time at all, and simply storing the PGP encrypted files outside the publically accessible filesystem is just changing the line of code that writes to "payments/ordernumber.asc" to "~/payments/ordernumber.asc" (or whatever). Of course, the PGP secret key is not stored on a publically accessible computer at all.
But I shouldn't be giving a basic course on how to secure website payments, etc, to you lot - you know it or could work it out (or a similar method) pretty quickly. It is those dumb administrators that don't have a clue about security that are to blame (or their PHB).
Disagree With Gary McGraw (Score:4, Insightful)
Hint, Hint. (Score:2, Insightful)
Agreed. Such lax security via the use of Frontpage, IIS,
You might as well do and impression of Duncan in the movie Shrek "Ooo! Ooo! pick me! pick me!"
Webmasters queried about the search engine problem said precautions against overzealous search bots are of fundamental concern.
Uhh...they are "bots"...they don't think, they do.
Does the bot say "Oh, look, these guys did something stupid...let's tell them about it."
No, they search, they index and they generate reports.
I've seen this problem crop up before when a coworker was looking for something totally unrelated on google.
Sad part was it was an ISP I had respect for, despite moving from them to broadband.
What killed my respect was at the very top of the pages was "Generated by Frontpage Express"...gack!
I don't recall if it was a user account or one of their admin accounts...but for modem access I kind of stopped recommending them, or pointed out my observations.
I have to parrot, and agree, with the "Human Error" but add "Computer accelerated and amplified".
It happens, but that does not mean we have to like it, much less let it keep happening.
Re:Web Sites are public by definition (Score:1, Insightful)
am i misreading this, or are you suggesting that no private information should ever be accessible via the web, regardless of precautions taken during implementation?
personally i think that's going a bit too far. for example, i'm fairly confident that my banking information, accessible online at my bank's website over https and protected by password, is safe. and if it's not? well, that's why the bank is insured.
Re:This is what happens when you use frontpage... (Score:2, Insightful)
"The guys at Google thought, 'How cool that we can offer this to our users' without thinking about security. If you want to do this right, you have to think about security from the beginning and have a very solid approach to software design and software development that is based on what bad guys might possibly do to cause your program grief."
This is crazy. Google isn't doing anything wrong. The problem is with the idiots who don't spend five minutes to check that their secret data is really hidden.
This is like blaming a dog owner when his dog bites a burgler... er uh, nevermind.
Re:Oh Yeah? (Score:2, Insightful)
I actualy tried to search for my credit card number, but only searched for 8 digits, in various forms (always the same digits, mind you), like:
"XXXX XXXX"
"XXXX-XXXX"
"XXXXXXXX"
Thanks god, nothing
This is something I sugest you people to do. I would sugest using the last 8 digits, onde the "last 4 digits" are commonly used, but you won't be exposing something that is probably already everywhere.
Directory searches (Score:4, Insightful)
So if http://credit.com/ has a link to http://credit.com/signin/entry.html then these engines will also check http://credit.com/signin/ - which will, if directory indexes are on and there is no index.html page there, show all the files in the directory. In which case http://credit.com/signin/custlist.dat - your flatfile list including credit cards - gets indexed.
So if you're going to have directory indexing on (which there can be valid reasons for) you really need to create an empty index.html file as the very next step each time you set up a subdirectory, even if you only intend to link to files within it.
Re:Stopping Google won't stop the problem... (Score:3, Insightful)
If Google accessed it via a special link, then Google would store that link, and you'd use that link, and you'd see it yourself.
(another form of not-secret link:
http://user:password@domain/path/file)
Re:Stopping Google won't stop the problem... (Score:4, Insightful)
If you run a web site on the public internet then you should be paying attention to this basic fact: If you put it out there then people have a perfect right to grab it, even if you don't specifically tell them it's there. (I know FCC rulings don't apply, but the principle is the same). You should encrypt EVERYTHING you don't want people to see.
Encryption is like your pants, it keeps people from seeing your privates. Hiding your URLs and hoping is like running realy, realy fast with no pants on - most people wont see your stuff, but there's always some bastard with a handy-cam.
Re:To test your credit-card ordering site... (Score:2, Insightful)
Here's a big hint: Not everyone is running some sort of completely automated, completely external validation service, and, duh, if they aren't, they need to know the numbers so they can actually charge the people.
About the only reason they shouldn't be in your computers somewhere is if you're using a third party to handle all that stuff...and then they will be in their computer. They, rather obviously, have to exist somewhere to be send to the CC companies.