Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
Security Privacy The Internet

Online Search Engines Lift Cover Of Privacy 460

Rican writes "MSNBC has an interesting article about how 'Googledorks' are using the powerful search engine to do searches across the web for sensitive and/or private information. Some of this information includes 'Medical records, bank account numbers, students' grades, and the docking locations of 804 U.S. Navy ships, submarines and destroyers.'"
This discussion has been archived. No new comments can be posted.

Online Search Engines Lift Cover Of Privacy

Comments Filter:
  • The worst example.. (Score:5, Informative)

    by centralizati0n ( 714381 ) <tommy.yorkNO@SPAMgmail.com> on Monday February 09, 2004 @10:22PM (#8233483) Homepage Journal
    The worst example I saw was the FBI NCIC 2000 manual [state.fl.us] [PDF]. It gives you examples of how to look up criminal records and such... which could be very useful to the criminally vested social engineer.
  • Nothing new (Score:4, Informative)

    by dattaway ( 3088 ) on Monday February 09, 2004 @10:25PM (#8233508) Homepage Journal
    People have used this for years to find things like Bill Gates' social security number and all kinds of things we think should be private. Chances are, if its in a record somewhere, that information will leak onto the internet sooner than most people think.
  • by baryon351 ( 626717 ) on Monday February 09, 2004 @10:29PM (#8233546)
    They don't seem to be, although many could. There's just too many unique ones out there IMHO.

    Then again I don't have a WP that'll run those scripts.
  • by npistentis ( 694431 ) on Monday February 09, 2004 @10:30PM (#8233553)
    it was an AP story- I read the same thing in this morning's washington post.
  • Re:Um. (Score:5, Informative)

    by mhesseltine ( 541806 ) on Monday February 09, 2004 @10:41PM (#8233633) Homepage Journal

    .htaccess anyone?

    That, along with an appropriate robots.txt file should be all you would need to prevent a crawl, right?

  • Re:Um. (Score:5, Informative)

    by Elwood P Dowd ( 16933 ) <judgmentalist@gmail.com> on Monday February 09, 2004 @10:42PM (#8233644) Journal
    Here's how it works. Let's say you put a page on your site called

    http://yoursite.com/temporary/hidden/dontreadthi s/ private_document.html

    And it is not linked to ever.


    I realize this is redundant, and you were likely trolling, but Google will leave you right the fuck alone, so long as you put another little file at:

    http://yoursite.com/robots.txt

    That contains the text:

    User-agent: *
    Disallow: /

    I realize this is opt-out rather than opt-in, but there's just one place you have to opt, and there isn't another way that Google could possibly do their job. Everybody else seems to understand that the internet is a publicly accessible network.

    So who's to blame? You. You put a sensitive document in a publicly accessible location on the internet, and took no precautions to keep it secure. Not linking to it is not a precaution.
  • Re:Why Google? (Score:5, Informative)

    by Xenographic ( 557057 ) on Monday February 09, 2004 @10:44PM (#8233654) Journal
    1) This is old. I remember searching for things like '"index +of" vti' and other such things (try it and modify that search if you like, but it was interesting to find out just what sort of interesting tidbits one might find in such a folder).

    2) This is an article from MSN. This information was available long before Google, but it is, at the very least, curious to see this sort of article from Microsoft when they have been going to the press lately about how Microsoft intends to develop their own search technology...
  • by tsvk ( 624784 ) on Monday February 09, 2004 @10:45PM (#8233664)
    Go into kazaa and gnutella and search for any .doc files. Or some likely sounding names like "resume" or "job application".

    Other examples are ".dbx", the file name extension for mail folders in Outlook Express. Or ".pwl", the Windows 9x system password file (supposedly easily crackable with the correct tool).

    There are unfortunately clueless users who share their whole hard drive. File sharing programs have however started getting better in discouraging or preventing the users from doing this.

  • What I like (Score:5, Informative)

    by Anonymous Coward on Monday February 09, 2004 @10:46PM (#8233677)
    The thing is that most people will literally inadvertantly share their entire hard drive's contents, or at least all "media files".

    What I like to do is go on gnutella or kazaa and search for "DSN" or one of a number of similar prefixes. Why? Because most digital cameras save their files in a specific hardwired format, and the kind of people who leave their entire hard drive shared on kazaa are the kind of people who don't rename their digital cameras.

    You can find the most random, interesting, occationally personal shit that way.

    I'm trying to remember the other common prefixes besides DSN and failing.

    -- Super ugly ultraman
  • Re:Hard to hide (Score:2, Informative)

    by You're All Wrong ( 573825 ) on Monday February 09, 2004 @10:53PM (#8233728)
    """
    one of the central tenets of computer network security: If it is connected to the Internet, it can be accessed
    """

    That's not one of the central tenets of computer network security.
    If it's not connected to the internet, it cannot be accessed, but that doesn't imply what you've said.

    If it's connected to the internet, and there's a daemon which answers requests with the information requested, then it
    can be accessed. There's a subtle difference though - namely the daemon which answers the requests. Without that there's no access, and there can never be any access.

    YAW.
  • Re:Um. (Score:2, Informative)

    by lambent ( 234167 ) on Monday February 09, 2004 @10:53PM (#8233732)
    robots.txt doesn't matter worth a damn, if you're not feeling polite.
  • Get a clue (Score:5, Informative)

    by Chuck Chunder ( 21021 ) on Monday February 09, 2004 @11:02PM (#8233788) Journal
    The google mediapartners bot which will look at pages for the purposes of advertising such as in Opera is different and seperate from the bot that adds pages to Google's search database. The mediapartners bot does not feed the Google search engine [webmasterworld.com].
  • Noindex (Score:1, Informative)

    by Zenmonkeycat ( 749580 ) on Monday February 09, 2004 @11:10PM (#8233835)
    Please webmasters, learn to use the proper code for preventing bots from scanning your page. The Robot meta tag will do that quite effectively. Alternately, you could just /not/ make a webpage with your usernames and passwords, and that would be a lot easier.
  • by Chuck Chunder ( 21021 ) on Monday February 09, 2004 @11:14PM (#8233874) Journal
  • Re:Hardc0re hax0r. (Score:2, Informative)

    by nick0909 ( 721613 ) on Monday February 09, 2004 @11:36PM (#8234003)
    Is googledorks a real hacker movement or just some random key word any one with a high ranking web page can abuse?

    It appears to be a buzzword that Johnny Long just kinda made up. I used Google to "hack" away and find his website: http://johnny.ihackstuff.com/ [ihackstuff.com]
    It appears his definition of googledorking (?) is not just finding private info, but just anything wacky/weird/different, private is just one of those things.

    Do we now call it g00g|3?
  • Re:Uh-huh. (Score:5, Informative)

    by Anonymous Coward on Monday February 09, 2004 @11:52PM (#8234089)
    > Want to expand on that or are you just trolling? How did the
    > existance of that page get from Opera to Google such that it
    > could pin-point (not crawl) that page?

    Opera submits URLs browsed to by users, to google, when advert support is turned on.

    http://www.opera.com/adsupport/ [opera.com]

    From that page:
    --------
    What is the connection between the Web page and the relevant ad displayed by Google?
    Opera's interaction with the Google ad system:

    The Opera browser sends Google the URL of the web page you are visiting and your IP address (with the exceptions Opera filters out -- see below)
    --------

    Exceptions are https, forms, passwords, cgi, and non-http URLs.

    As an example from my apache log file last night, when I gave a friend a URL to a photo:
    xxxxxxx.upc-g.chello.nl - - [10/Feb/2004:02:23:53 +1100] "GET /temporary/sooted.jpg HTTP/1.1" 200 74339 "-" "Opera/7.23 (X11; Linux i686; U) [en-GB]"
    crawler8.googlebot.com - - [10/Feb/2004:02:28:39 +1100] "GET /temporary/sooted.jpg HTTP/1.0" 200 74339 "-" "Mediapartners-Google/2.1 (+http://www.googlebot.com/bot.html)"
    It's surprising how many Opera users will deny this happens, despite the evidence. That's a 5 minute delay, google is pretty quick with its crawling. Personally, I don't mind. I put things up in my temporary directory and pull them down fairly soon after. I know nothing is secure if it's just an unprotected URL, so I'm not worried like the grandparent poster. However, Opera does send URLs to google, and google does come back and check them out.
  • by taped2thedesk ( 614051 ) on Monday February 09, 2004 @11:56PM (#8234119)
    Your a credit card holder..... Now go google your credit cards... DO IT NOW. Did you find it? I didn't.
    Oh sure, it's all fun and games until your credit card number gets displayed on the Live Query [hofstra.edu] screen at Google HQ... :-p
  • by Syre ( 234917 ) on Tuesday February 10, 2004 @12:19AM (#8234236)
    Hmm... if Opera doesn't send URLs to Google, why does it say on the page you linked [opera.com] (bold and italics mine):

    Opera's interaction with the Google ad system:
    • The Opera browser sends Google the URL of the web page you are
      visiting
      and your IP address (with the exceptions Opera filters
      out -- see below)
    • Google tries to determine your general geographic location based on your
      IP address, to better target the ads
    • The Google ad server consults Google's web database to find out what kind of content
      is on that page
    • Ads that are deemed most relevant are then served based on geographic location
      and the Web page accessed
  • by ajagci ( 737734 ) on Tuesday February 10, 2004 @12:28AM (#8234276)
    You can find out whether personal information about you is available accidentally by searching for your name and a piece of your sensitive information on Google, say, your name and the last four digits of your SSN, the last four digits of a credit card number, parts of your phone number, or your street address. Leaked personal information would have to contain both your name and that other information. Chances are that you will retrieve only a few documents, which you can quickly review.

    Keep in mind, however, that Google queries are not encrypted and are not guaranteed to be private or secure, so, for your search, don't use the full SSN or anything else that shouldn't be disclosed.
  • by Chuck Chunder ( 21021 ) on Tuesday February 10, 2004 @01:16AM (#8234583) Journal
    I said Opera doesn't "send such urls" to Google. Specifically the post I was replying to talked about pages that are the result of form submissions. The page I linked to states Opera does not send:
    • URLs with CGI arguments (E.g: http://www.example.com?formsdata)
    • Forms data in POST requests
    (as well as a few others).
  • Comment removed (Score:4, Informative)

    by account_deleted ( 4530225 ) on Tuesday February 10, 2004 @01:34AM (#8234680)
    Comment removed based on user account deletion
  • Military Records (Score:3, Informative)

    by prestidigital ( 341064 ) * on Tuesday February 10, 2004 @01:39AM (#8234706) Journal
    Just tonight I was Googling for "number personnel U.S. military" and I was surprised to find many links along the lines of "How to find U.S. military personnel." The site with the most links to directories has a Netherlands domain name, which seemed odd. I tried to find some family members and did turn up some information. Some sites were DoD and had recognizable warnings about monitoring. Another was a .com for the military community and required standard registration procedures. I don't know if it's a good idea to have this information online and I wonder what military folks think about it. I reckon there are pros & cons.
  • Some clues for you (Score:4, Informative)

    by Chuck Chunder ( 21021 ) on Tuesday February 10, 2004 @01:40AM (#8234711) Journal
    a) Mediapartners-google does check robots.txt
    b) Opera always has the name "Opera" in it's UA string, even when masquerading as IE.
    c) Mediapartners-google doesn't feed the Google search engine. It is only used for Google adverts.
  • by Norman the Wise ( 732143 ) <michael.t.white@ ... TEom minus punct> on Tuesday February 10, 2004 @01:59AM (#8234790)

    Google does retain information on search queries in some form. If you go and check the Google Zeitgeist (Weekly Version [google.com] & the Annual Version [google.com]) they have statistics on most searched terms, time graphs showing, for example the spike in search queries after the California Quake, and lots of other interesting information.

    For the week ending February 2, the top search terms in the US were:

    1. janet jackson
    2. superbowl halftime
    3. mtv
    4. justin timberlake
    5. tom brady
    6. groundhog day
    7. cbs
    8. oscar nominations
    9. kazuhito tadano
    10. john kerry
  • Re:Plagiarism (Score:1, Informative)

    by dedazo ( 737510 ) on Tuesday February 10, 2004 @02:02AM (#8234807) Journal
    The MSNBC article fully credits the WP. What's your problem?
  • This [sec.gov] might have something to do with it...

    User-agent: *
    Disallow: /Archives
    Disallow: /Archives/bin
    Disallow: /Archives/dev
    Disallow: /Archives/etc
    Disallow: /Archives/ftp
    Disallow: /Archives/gopher
    Disallow: /Archives/tmp
    Disallow: /Archives/usr
    Disallow: /cgi-bin
    Disallow: /bin
    Disallow: /oursite/previews

  • Re:Fuck that shit (Score:2, Informative)

    by devilspgd ( 652955 ) * on Tuesday February 10, 2004 @04:29AM (#8235406) Homepage
    Just wildcard it. Use robots.txt to say that /secretstuff/* should not be indexed, that still won't help the l33t hax0r determine that it's /secretstuff/toodumbtouseapassword/bush-secret-nuk e-codes.lnk.exe.pif.scr which is the hidden file to destroy the world.
  • by Tonttoro ( 112872 ) on Tuesday February 10, 2004 @08:11AM (#8236133)
    Maybe you should try a later version of Mozilla. You know the older ones have bugs that are fixed in later ones.
  • by tuxette ( 731067 ) * <tuxette.gmail@com> on Tuesday February 10, 2004 @09:21AM (#8236468) Homepage Journal
    A lot of the personal data that is publicly accessible was not made publicly accessible by the data subject, but by a third person/party.
  • Re:Fuck that shit (Score:3, Informative)

    by saforrest ( 184929 ) on Tuesday February 10, 2004 @12:25PM (#8238396) Journal
    More specifically, it says "Please do not enter my house and steal my jewelery and banknotes which are in the safe in the bottom-right of the bedroom closet."

    Sure, you could do

    Disallow: /house/closet/bottomright/safe/jewelry
    Disallow: /house/closet/bottomright/safe/banknotes

    Or, if you want to be simpler, you could just do

    Disallow: /house/ :)

Our business in life is not to succeed but to continue to fail in high spirits. -- Robert Louis Stevenson

Working...