Online Search Engines Lift Cover Of Privacy 460
Rican writes "MSNBC has an interesting article about how 'Googledorks' are using the powerful search engine to do searches across the web for sensitive and/or private information. Some of this information includes 'Medical records, bank account numbers, students' grades, and the docking locations of 804 U.S. Navy ships, submarines and destroyers.'"
Um. (Score:5, Insightful)
Re:Um. (Score:5, Interesting)
Here's how it works. Let's say you put a page on your site called
http://yoursite.com/temporary/hidden/dontreadth
And it is not linked to ever.
If you send that URL to someone using Opera with the right settings (but you don't know that) and they read the private document, within minutes GOOGLE WILL CRAWL THAT DOCUMENT!
Nothing is private any more under situations like that. Let's say that private document then links to all your older private documents. Google can then freely crawl it's way in to read the rest.
Who's to blame for this then? not you. You've already ensured you hadn't linked to it. Not the opera user, as they have read the document, and respecting your privacy they've not mentioned it to anyone else
However underhanded tactics like sneaking in a google crawl in this manner is unacceptable to me. My firewall blocks all google crawler bots for this very reason
Re:Um. (Score:5, Insightful)
Fuck that shit (Score:4, Insightful)
No one has any right to complain if their page is in a search engine unless they followed the robots.txt protocol and the search engine did not.
Re:Fuck that shit (Score:3, Insightful)
Re:Fuck that shit (Score:4, Interesting)
Not really. I mean, you're not really giving much away with
Disallow:
unless going to http://mysite.com/personal/ returns a directory listing.
The general point is that yes, you do have to trust people to respect the robots.txt. The problem we're talking about is Google, though, and we know they do respect it.
Re:Fuck that shit (Score:5, Insightful)
Not if the robots.txt file prevents you from accessing that data, which it does.
The robots.txt file prevents nothing. It's merely a request that the spider "not go here." It's not a lock on the door. It's a sign that says, "please do not enter my house."
Re:Fuck that shit (Score:5, Funny)
The safe, however, should be locked.
Re:Fuck that shit (Score:3, Informative)
Sure, you could do
Disallow:
Disallow:
Or, if you want to be simpler, you could just do
Disallow:
Comment removed (Score:4, Informative)
Re:Um. (Score:5, Informative)
.htaccess anyone?
That, along with an appropriate robots.txt file should be all you would need to prevent a crawl, right?
Re:Um. (Score:4, Insightful)
No, it's not worth a damn if you're talking about actually sensitive data.
Re:Um. (Score:4, Interesting)
Re:Um. (Score:5, Funny)
Dude, if you think writing "htttp" with three t:s and put a space in the URL is gonna stop people from finding that document, you're pretty behind to tell you the truth.
I do wonder, however, how YOU knew the location of locationsOfAllAgentsInTheWorld.xls? That's supposed to be a secret!
Re:Um. (Score:3, Insightful)
There's a lot of this going around lately, whether we're talking webservers or configuring sendmail: a lot of folks with their shiny new CS degrees telling the rest of us that our tools are broken and asking us to trust Mr. Bill to set us straight. I'd be a lot more confident with their advice if they would at least give the impression that they had ever configured the tools they are so ready to throw aside the tools they say
Re:Um. (Score:5, Informative)
http://yoursite.com/temporary/hidden/dontreadth
And it is not linked to ever.
I realize this is redundant, and you were likely trolling, but Google will leave you right the fuck alone, so long as you put another little file at:
http://yoursite.com/robots.txt
That contains the text:
User-agent: *
Disallow:
I realize this is opt-out rather than opt-in, but there's just one place you have to opt, and there isn't another way that Google could possibly do their job. Everybody else seems to understand that the internet is a publicly accessible network.
So who's to blame? You. You put a sensitive document in a publicly accessible location on the internet, and took no precautions to keep it secure. Not linking to it is not a precaution.
wait... (Score:4, Interesting)
All it takes is one cross-link from a site that links, and a number of hits, and google will advertise the cross-link, robots.txt or not.
Re:wait... (Score:3, Insightful)
Re:wait... (Score:5, Insightful)
Re:Um. (Score:5, Insightful)
<snip>
And it is not linked to ever.
Then you have still put it in a publically accessible place, and bear full blame for others finding it.
For a physical-world analogy, let's say that you want to give a note to a friend (which, for some reason, requires a non-conventional mode of delivery). You could leave it at page 416 of "The complete minutes of the Town of Dullsville, 1853 to 1862", which no one had checked out in the past 30 years. Tell your friend where to find it, and 999 times out of 1000, you'd have no problems.
If you one day used that same method of sending a note, only to discover someone checked out the book and removed the note, would you actually have the gall to blame anyone but yourself?
Slashdotters, of all people, have heard this over and over and over... Security through obscurity may help in addition to some form of "real" security, but it almost never works by itself. The web counts as a very public place. If you place sensitive information on it with no security beyond a "hidden" URL, don't act surprised when the NYT has it as a headline the next week.
And for reference, yeah, I too have stuck random files up on my site for a friend to grab. But never when it would have mattered if someone else randomly found those files.
Get a clue (Score:5, Informative)
Re:Um. (Score:3, Insightful)
Where I work we have a few servers that are addressable from the internet in a DMZ. Everthing else is untouchable, so the Opera trick doesn't work. The next block we have is that we use Netegrity for corporate wide single-sign-on. Every non-public webserver has a Netegrity client installed. To get any document, you need to first authenticate against the Netegrity policy server over SSL.
There is also
Re:Um. (Score:3, Insightful)
Absolutely you, because you assumed that not linking to a document would make it private. Bad assumption. Even without Opera's "feature", someone could stumble upon the proper URL by blind luck, or as part of a dictionary attack, or by sniffing HTTP header traffic.
If you want to keep something private, don't put it on a public web site. Period.
Enough of the bullshit! (Score:4, Informative)
Re:Enough of the bullshit! (Score:5, Informative)
Opera's interaction with the Google ad system:
visiting and your IP address (with the exceptions Opera filters
out -- see below)
IP address, to better target the ads
is on that page
and the Web page accessed
It's quite clear if you actually read properly (Score:3, Informative)
Re:Uh-huh. (Score:5, Informative)
> existance of that page get from Opera to Google such that it
> could pin-point (not crawl) that page?
Opera submits URLs browsed to by users, to google, when advert support is turned on.
http://www.opera.com/adsupport/ [opera.com]
From that page:
--------
What is the connection between the Web page and the relevant ad displayed by Google?
Opera's interaction with the Google ad system:
The Opera browser sends Google the URL of the web page you are visiting and your IP address (with the exceptions Opera filters out -- see below)
--------
Exceptions are https, forms, passwords, cgi, and non-http URLs.
As an example from my apache log file last night, when I gave a friend a URL to a photo: It's surprising how many Opera users will deny this happens, despite the evidence. That's a 5 minute delay, google is pretty quick with its crawling. Personally, I don't mind. I put things up in my temporary directory and pull them down fairly soon after. I know nothing is secure if it's just an unprotected URL, so I'm not worried like the grandparent poster. However, Opera does send URLs to google, and google does come back and check them out.
Some clues for you (Score:4, Informative)
b) Opera always has the name "Opera" in it's UA string, even when masquerading as IE.
c) Mediapartners-google doesn't feed the Google search engine. It is only used for Google adverts.
Re:Um. (Score:5, Insightful)
i think that this is somewhat an issues of bad management and somewhat (maybe more) and issue of the weakness of web service security (compared to something like local novell services).
eric
Kazaa and Gnutella are cooler (Score:5, Interesting)
It's surprising what people will sit in their kazaa upload directory, using it like a documents dump. Legal papers, company's employee policy documents, employee records, sensitive stuff, medical records.
Taken straight from people's HDs, no hacking, cracking or other media-unfriendly terms needed, just the ignorance of the people who leave this stuff open is needed.
Re:Kazaa and Gnutella are cooler (Score:4, Interesting)
Re:Kazaa and Gnutella are cooler (Score:5, Informative)
Other examples are ".dbx", the file name extension for mail folders in Outlook Express. Or ".pwl", the Windows 9x system password file (supposedly easily crackable with the correct tool).
There are unfortunately clueless users who share their whole hard drive. File sharing programs have however started getting better in discouraging or preventing the users from doing this.
Re:Kazaa and Gnutella are cooler (Score:4, Interesting)
Here, you can get registered names, phone numbers, software keys, and all kinds of other scary stuff...
I tried it once, and was shocked at how many I found it in just a few seconds...
What I like (Score:5, Informative)
What I like to do is go on gnutella or kazaa and search for "DSN" or one of a number of similar prefixes. Why? Because most digital cameras save their files in a specific hardwired format, and the kind of people who leave their entire hard drive shared on kazaa are the kind of people who don't rename their digital cameras.
You can find the most random, interesting, occationally personal shit that way.
I'm trying to remember the other common prefixes besides DSN and failing.
-- Super ugly ultraman
Re:What I like (Score:5, Interesting)
Click on the "show me some pictures" button at the upper-right.
Re:Kazaa and Gnutella are cooler (Score:3, Flamebait)
Re:Kazaa and Gnutella are cooler (Score:3, Interesting)
Re:Kazaa and Gnutella are cooler (Score:3, Funny)
I dont know about you but the more people that see my resume the better.
JV
Cover of "Privacy" (Score:5, Insightful)
Re:Cover of "Privacy" (Score:3, Interesting)
Re:Cover of "Privacy" (Score:5, Insightful)
Perhaps a more accurate title would have been "Online Search Engines Remove Delusion Of Privacy."
Cheers,
IT
I've heard of "cow orkers"... (Score:5, Funny)
...but what the heck are "googled orks"?
Re:I've heard of "cow orkers"... (Score:4, Funny)
It's the technical term for searching the web for the name of an extra in the big fight scenes in The Lord of the Rings movies.
This is a very popular pass time in New Zealand, where 95% of the country's population was used in the Minas Tirith scene.
Re:I've heard of "cow orkers"... (Score:5, Insightful)
How come Homer and Krusty look like clones?
It's intentional. MG originally intended it to be a joke; Bart didn't respect his dad, but he worshiped a clown who looked exactly like his dad. He mentioned this on an NPR interview last week.
Why Google? (Score:5, Insightful)
Re:Why Google? (Score:5, Insightful)
Re:Why Google? (Score:5, Interesting)
The same as a metal detector or store directory leaflet - these are tools used for information retrieval.
Re:Why Google? (Score:3, Insightful)
Re:Why Google? (Score:5, Informative)
2) This is an article from MSN. This information was available long before Google, but it is, at the very least, curious to see this sort of article from Microsoft when they have been going to the press lately about how Microsoft intends to develop their own search technology...
There's good stuff out there not on Google (Score:5, Interesting)
I don't know why Google never indexes this stuff, it's clearly public record and can be of interest to a lot of people, but they never did (I checked them many times, including just now, and they show no indication of the document). I wonder what other good government documents are out there if you only know where to look for them.
Re:There's good stuff out there not on Google (Score:3, Informative)
User-agent: * /Archives /Archives/bin /Archives/dev /Archives/etc /Archives/ftp /Archives/gopher /Archives/tmp /Archives/usr /cgi-bin /bin /oursite/previews
Disallow:
Disallow:
Disallow:
Disallow:
Disallow:
Disallow:
Disallow:
Disallow:
Disallow:
Disallow:
Disallow:
SS Minnow (Score:5, Funny)
The worst example.. (Score:5, Informative)
Re:The worst example.. (Score:4, Interesting)
Most of the codes are actually to enter stolen property. To query a CCH on a person you need a name, sex and DOB. You can also use a SSN.
Most of the info you get back is kinda boring. With the exception of juvenile arrest data, it's all public record. But you'd have to know what court house to go to. the NCIC CCH file brings it all into one place.
You'd get, name, race, sex, dob, ssn and dl info, along with height, weight, hair and eye color, fingerprint classification along with a listing of arrests, and court dispositions of those arrests.
If you are going to steal someone's identity, you could do better than stealing a crook's.
If you know someone has been arrested by the Anytown Police Department. Go to their records section and do an open records act request for the last arrest's booking sheet. Most likely you'll get most of their identifying info except the SSN.
But whatever you do, don't ever run the President's DL. The Secret Service gets real nasty about that!You can do this on KaZaA too. (Score:5, Interesting)
Interestingly, I found a text file with all the user names and passwords for brokerage firms, and bank accounts, of the IT director at the firm I was working in. Scary, considering he was supposed to have "15 years in the IT industry".
Could happen to you (Score:5, Interesting)
A while back I Googled my credit card number for a laugh. I was shocked to find it in an indexed webserver log for a site I had previously 'tried' to purchase from. (the form timed-out and I gave up).
A quick call to the bank and a few angry calls to the company sorted it, but I was not impressed.
Perhaps a tool to search for ones own private details should be developed to keep an eye on this?
Re:Could happen to you (Score:5, Interesting)
You say you typed your CC# into Google. Unless I missed something, this means that...
1. It was transmitted over an unsecure connection
2. It may have been logged as part of regular access logs
and for the paranoid
3. It may have been logged specifically as a potential CC# at Google (either due to the company having such a dubious programme, or a rogue employee / group of employees).
For all you know now, if you searched Yahoo in the future (for whatever reason), your search query with Google may pop up
Re:Could happen to you (Score:5, Funny)
I wonder if google has a feature where I can view recent search terms...? You had a laugh, I get a giggle, we're all having fun!
Re:Could happen to you (Score:3, Interesting)
A while back I Googled my credit card number for a laugh
You therefore send your credit card number, unencrypted, over the Internet. Along the way it would have probably been logged at a proxy cache and would have certainly been logged at Google. You sure are a trusting fellow.
Cue Dr. Evil (Score:5, Funny)
Nothing new (Score:4, Informative)
Re:Nothing new (Score:5, Funny)
Re:Nothing new (Score:5, Insightful)
Good Circle of Reasoning. (Score:3, Funny)
f Bill Gates is using the same broswer that he pushed in 1995, then he is a total moron. He is not a moron. Therefore he is not using the same browser that he pushed in 1995,IE, QED
dumb, de-dumb, dumb.
Nice of MSNBC to malign the thing M$ can neither match nor buy.
FUD Story to pump MSN Search? (Score:3, Interesting)
1. Microsoft has stated it wants to win the search engine war.
2. MSNBC (Microsoft owned) puts out story calling Google insecure because it invades your privacy.
3. MSN Search comes out with "secure, private searching" for only $9.95 a month.
4. Profit???
Conclusion: This is nothing more than a FUD story designed to sow the seeds of doubt about Google.
Re:FUD Story to pump MSN Search? (Score:3, Informative)
Homework answers (Score:4, Funny)
Lets pretend each week I have a program to code.
You see if you pretend, of course, I put the filename into google, and clicked search. In pretend, you know what came up?
The source code to the program I had to write for my university.
But remember, this is in pretend land.
Re:Homework answers (Score:4, Funny)
Re:That's good to avoid cheaters (Score:3, Interesting)
Hard to hide (Score:5, Insightful)
Re:Hard to hide (Score:3, Interesting)
web servers for morons (Score:5, Insightful)
The real story here is that companies and other organizations and institutions are setting machines up as servers and are too stupid to create an appropriate robots.txt file and/or keep their confidential information elsewhere. Google doesn't just drop in, even on networked machines. I have some sympathy for individuals who don't understand what they are doing when they make their machine a server, but surely any professional sysadmin, even one with limited training and experience, should know better than this. It's the same as leaving your briefcase on the front seat of an unlocked car.
so who owns it, how can we stop it? (Score:5, Insightful)
Part of this problem comes out of who owns the daggoned data. For example, let's say a hospital, instead of using clipboards, uses smartcards to hocket about patient records.
Who own's the data. The hospital, the insurance company paying the bill, or the poor schmuck on the business end of a colonoscopy?
I ask because without the indiviual having the write to own the data, there seems to me little that can be done to protect oneself other than go through expensive and tedious legal channels.
And if someone else can own sensitive data about me, then what can we do, as private citizens with limited resources, to make sure larger entities such as insurance companies play by rules like HIPPA?
Read this once... (Score:3, Interesting)
Geez (Score:5, Insightful)
Peeps nowadays...
nothing new (Score:5, Funny)
Err, not me of course ;-)
docking locations of 804 ships? (Score:5, Insightful)
Also, these are not precise locations. Yeah, you can find that the USS Roosevelt (DDG-80) is homeported in Mayport, Florida but you're not going to find the precise pier number.
As for ships on deployment, one can find their general locations just by looking at the latest issue of the Navy Times and by reading the newspaper of the town that the ship and its battlegroup are from.
The Navy really tightened up on what get's posted on official ship's websites after 9/11. If there is sensitive information still out there, Google is not at fault, but rather the unit's webmaster, Commanding Officer, and the Operational Security people who are supposed to be looking out for that sort of thing.
Primary issue is the historical data problem (Score:4, Insightful)
This applies to any information that's ever been stored electronically; I call it the "backup tape problem". Someday, that information may (will?) find its way online, a public service will index it, and the genie will be out of the bottle forever.
This could be earth shattering for google? (Score:3, Insightful)
Wouldn't that be interesting?
/. google! (Score:3, Funny)
this is news? (Score:3)
Seems some one in the mainstream press got a clue and has decided that the other 98% of the people should join in on the fun... if they can figure out how to use Google that is.
Who knows, maybe they'll even teach the clueless about Google image search... which came in handy this last weekend when a girl who wanted to model but couldn't figure out how to send me a pic attached in an email... Curious as to what she looked like, I googled and found her [google.com].
As you can see, the stuff you can find on image search sure as hell beats those top-secret pentagon word documents anyday
Google can't always hack it (Score:5, Interesting)
old skool trick (Score:5, Insightful)
"http://*:*@" member
and you would get a bunch of sites with direct links into passworded member sites. Microsoft will put a stop to this with their latest update to IE however.
Tin Foil people, please observe (Score:5, Funny)
Just gotta watch out for the honey pots (Score:5, Interesting)
They have some Webalizer stats [gray-world.net] for the honey pot too.
Now to use it for good (Score:3, Interesting)
Your an evil badguy and go nuts on Google... Credit Cards... Horray... Now to go nutz.
Leave it to MS NBC to neglect to mention that this is also a tool for good.
Your a credit card holder..... Now go google your credit cards... DO IT NOW.
Did you find it? I didn't.
I've got 4 credit cards.. two store cards one business visa and one personal mastercard.
(Oh yeah hackers the name on the card is Felinoid) Yeah they'll buy that.. not...
Don't need to use Google BTW... Use Alta Vista.. or Microsoft serch.. or Lycos...
Oh yeah and when your done put your credit cards away (I had to leave desk while entering post an left my wallet on desk... Now my credit cards are gone and I think I saw a stuffed teady bear running down the street yelling "Charge it"... Just kidding got all my cards..).
(Oh yeah if you do see a teady bear running down the street your missing credit cards are the least of your conserns)
Now to set up a bot to trap all thies searches on Google....
(Oh come on it had to be said)
Re:Now to use it for good (Score:3, Informative)
Good! (Score:5, Insightful)
This isn't some hardened criminal mastermind at work. It's not a seasoned cracker attacking military targets. This isn't even some script kiddie poking at IIS. It's a MACHINE. A machine that respects robots.txt for Eris' sake!
If medical records and other "real" secrets are this visible, something is terribly wrong and I want to see public floggings. Seriously, this is not a case of weak security, or poor security, or incompetent security. It's a case of there not being so much as a screen door between the public and sensitive information.
This is actually a case where I think the government (or at least the courts) can do some good. You'll notice banks don't get hacked on a daily basis. That's because they'd lose squintillions of dollars if it happened. But nobody cares about my medical records because it costs money not to have incompetent asses running things. On the other hand, if revealing to without were punishible by a $1000 fine per person, per offense, you'd notice a severe tightening of security in a mighty big hurry.
It's a shame that suing people is sometimes the only way to get their attention, but with the decline of basic civil responsibility it might be inevitable.
stop right there (Score:3, Interesting)
This isn't "happening to the government", as if the government is some innocent victim. Rather, "the government screwed up big time". Likewise, if some company has sensitive personal information lying around on a public web server, the company is at fault and should be liable.
Let's not make victims out of perpetrators.
finding out whether something has leaked about you (Score:3, Informative)
Keep in mind, however, that Google queries are not encrypted and are not guaranteed to be private or secure, so, for your search, don't use the full SSN or anything else that shouldn't be disclosed.
Military Records (Score:3, Informative)
This is *not* Hacking? (Score:4, Insightful)
But, if I wander into an unprotected system, like a bank or military site, and I start reading confidential documents... Is this not a crime?
What's the difference if I locate the unprotected documents via a search engine or by using a port scanner with an IP range.
I think what I'm saying is that port scanning and finding an vunerable system, going into that system and looking around is now a crime.
But didn't I just describe what's going on with google hacking?
I don't advocate nor believe any of this is a crime but where and why is a line drawn between them?
I've often said about hacking that just because I go to the market and forget to lock my front door, that doesn't mean I expect to come home and find someone rumaging through my house.
If it's an administrator who forgets to lock down a port or one how inadvertantly places confidential materal on the wrong box... Again, Where is the line and how is it drawn, and why, between criminal hacking and "it's on an open system, google found it so it's legal".
I'm just asking. It's early in the AM and my brain isn't working because it's not seeing the difference. I'm only seeing a very fine line between what one might consider a "public" system versus one that expected to be "private". Is the only difference our "expectation" of privacy that makes one illegal and another a sport?
Re:This is *not* Hacking? (Score:3, Insightful)
In most of the cases referenced in this article, the sites hosting the sensitive data didn't just leave their doors unlocked, they brought the data outside and dumped it on the curb. If you're walking by and see something worth salvaging in what for all purposes appears to be someone's trash, do you consider it illegal to pick it up and take it with you?
Re:Google threatens privacy and national security (Score:5, Insightful)
No, they should not. They are not in a position to know what _is_ sensitive - and to whom. They can reasonably only assume that anything reachable with an ordinary, polite spider is meant to be accessible to the world at large. If you feel certain information should not be made accessible, bring it up with those actually making it accessible, not with those just indexing it once it is.
Shooting the messenger is not just pointless, it is counterproductive.
Re:Google threatens privacy and national security (Score:5, Insightful)
Re:Google threatens privacy and national security (Score:4, Funny)
I'd pin most of it on Saruman.
Re:Nothings private (Score:4, Interesting)
Anyone else notice that the site is msnbc.msn.com? Isn't Microsoft trying to develop a google competitor?
Am I just another cynical bastard?
Re:Nothings private (Score:5, Insightful)
And on a totally unrelated thought. . .
Is Yuki Noguchi on crack? Google does not do anything to privacy. All Google does is make it easier to find publicly available information. Maybe "Online search engines act as a catalyst to find private information" would be more a accurate title. ". .Re:Names.. (Score:3, Funny)
if they put it there themselves, yes, but... (Score:3, Informative)