Data Firm Leaks 48 Million User Profiles it Scraped From Facebook, LinkedIn, Others (zdnet.com) 56
Zack Whittaker, reporting for ZDNet: A little-known data firm was able to build 48 million personal profiles, combining data from sites and social networks like Facebook, LinkedIn, Twitter, and Zillow, among others -- without the users' knowledge or consent. Localblox, a Bellevue, Wash.-based firm, says it "automatically crawls, discovers, extracts, indexes, maps and augments data in a variety of formats from the web and from exchange networks." Since its founding in 2010, the company has focused its collection on publicly accessible data sources, like social networks Facebook, Twitter, and LinkedIn, and real estate site Zillow to name a few, to produce profiles.
But earlier this year, the company left a massive store of profile data on a public but unlisted Amazon S3 storage bucket without a password, allowing anyone to download its contents. The bucket, labeled "lbdumps," contained a file that unpacked to a single file over 1.2 terabytes in size. The file listed 48 million individual records, scraped from public profiles, consolidated, then stitched together.
But earlier this year, the company left a massive store of profile data on a public but unlisted Amazon S3 storage bucket without a password, allowing anyone to download its contents. The bucket, labeled "lbdumps," contained a file that unpacked to a single file over 1.2 terabytes in size. The file listed 48 million individual records, scraped from public profiles, consolidated, then stitched together.
"Leaked" public data (Score:2)
I'm not sure that word means what you think it means.
Re: (Score:1)
So, I'm not seeing it. Please explain the exploit. Someone knows both my name, address.... and now my twitter handle? Oh... noes?
If your online accounts use your real name... then yes, this could all be exploited to... make a modern day phone book.
Re:"Leaked" public data (Score:4, Insightful)
I'm not sure that word means what you think it means.
the company left a massive store of profile data on a public but unlisted Amazon S3 storage bucket
Cue the Congressional hearing with the 80 year old Congressman asking why Amazon even allows companies to store anything in these buckets if they have holes, and why they can't just stop the leaks with duct tape.
Re: (Score:2)
Typical conversation at a 2025 back-yard BBQ:
Joe: So I was thinking of maybe taking a trip to Paris ... ... ...no... ...
Bill: Yeah, I know, my google assistant briefed me on you profile on the car ride hear.
Joe: Oh.
Joe: So, have you started to carve any new chainsaw sculptures. I mean after the pengiun, Siri told me about that one already.
Bill: No, still finishing up the penguin.
Bill: Any thoughts on the-
Joe:
Bill: the town referendum?
Joe: Yeah I knew that's what you were asking. Not really.
Bill: Oh.
Joe:
Re: (Score:1)
Typical conversation at a 2025 back-yard BBQ:
Sounds about write. In 2025 we'll be so dumb we misspell words when talking.
Re: (Score:2)
Typical conversation at a 2025 back-yard BBQ:
Sounds about write. In 2025 we'll be so dumb we misspell words when talking.
If you haven't done so already, watch https://www.imdb.com/title/tt0... [imdb.com]
Re: (Score:3)
Same convo between friends that stalk each other and AREN'T douchbags just trying to shut down conversations:
Joe: So I was thinking of maybe taking a trip to Paris
Bill: Yeah, I saw that post. You've got to hit up the Louve.
(Conversation about Paris ensues)
Joe: So, Siri told me about that panguin, how's it going?
Bill: Still finishing up, want to see it?
Joe: Yes.
(They go to garage)
Bill: Any thoughts on the town referendum?
Joe: No Bill, even in a made up contrived example, nobody wants to talk about town refer
Hmmm (Score:2)
Where's the benefit of locking down this user data? It seems like, if we want to harm scammy companies like this, removing their profit motive by publishing all the (non-copyrighted) data makes sense.
Re: (Score:2)
The data isn't private. This scammy company already scrapped it. No doubt many others did too. Allowing them to maintain the secrecy of their data just gives them a profit motive.
Re: (Score:2)
Are you one of those Russian troll things? I always wondered what one of those looked like, and you sure do have all the earmarks.
Did you learn English in high school? Do they make you sit in a cubicle and write stuff in boldface, using as much English swearwords as you can think up?
Gosh. You must be one of the happiest people in the world. Have a nice day. I wonder if Slashdot got your IP address. I doubt they do anything about such rubbish. Oh well.
Re: (Score:2)
Yeah, it kinda does sound like an old time BOFH posting on usenet.
Kids these days.
Re: (Score:1)
What would you do with this data? (Score:2)
I mean, personally, what would you as a typical slashdotter do with this data if you weren't too busy cleaning the I.T. closets?
See who can build the most efficient script to "find Waldo"?
Re: (Score:2)
I mean, personally, what would you as a typical slashdotter do with this data if you weren't too busy cleaning the I.T. closets?
See who can build the most efficient script to "find Waldo"?
Grey hat:
It'd be a great (though ethically questionable) corpus of data for training your AI for whatever sort of prediction data you want.
Black hat:
It'd also be a good for political targeting or looking for easily scammable people for spear-phishing or spam cons.
Counter argument (Score:4, Interesting)
The court found that information alone without a minimum of original creativity cannot be protected by copyright. In the case appealed, Feist had copied information from Rural's telephone listings to include in its own, after Rural had refused to license the information. Rural sued for copyright infringement. The Court ruled that information contained in Rural's phone directory was not copyrightable and that therefore no infringement existed.
Re: (Score:2)
Not that I disagree with the ruling about phone listings, but my Facebook profile lists my job as rocket surgeon and most of the other information there is equally fictional. There can be a fair amount of creativity in what would otherwise just be listings of factual data.
Did they sell any to the Republicans? (Score:1, Insightful)
If they sold it to Republicans they need to be dragged before Congress and publicly humiliated, otherwise this is a non-issue.
Re: (Score:2)
Low effort partisian bashing from an anonymous coward and inexplicibly getting upvotes...
Yep, this one tastes like professional shilling. I think someone out there really wants to get this issues cut down along party lines. Good luck with that though, I don't think democrats OR republicans are too happy with Facebook over the sort of shit they let happen.
Re: (Score:2)
I just typically down mod it, but I commented in this thread.
Also because that trolling is even lower-grade bullshit. Randomly swearing "FUCK TRUMP" in a thread that has nothing to do with him is just noise. Ever read Anathem by Neal Stephenson? He has a great bit in there about different sorts of propaganda and bullshit and spam. Literal static is the lowest grade, easy to ignore as there's no content there. It's not even bullshit. While top quality bullshit would be an otherwise impeccable scient
Re: (Score:2)
If they sold it to Republicans they need to be dragged before Congress and publicly humiliated, otherwise this is a non-issue.
You jest but data like this should be a liability and treated as such. It would certainly be one if something like the European GDPR were in effect in the US.
Facebook, Equifax and the like should be punished for their so-called "lapses in security".
4 scumbags and a data scientist. (Score:5, Informative)
Here is their publicly available personal info.
http://www.localblox.com/ [localblox.com]
George Fink - CEO/Marketer/Scumbag: https://www.linkedin.com/in/ge... [linkedin.com]
Sabira Arefin - Founder/Entrepreneur(lol)/Scumbag: https://www.linkedin.com/in/sa... [linkedin.com]
Colby Atwood - President/Marketer/Scumbag: https://www.linkedin.com/in/co... [linkedin.com]
Ashfaq Rahman - Chief Data Scientist/Scumbag: https://www.linkedin.com/in/as... [linkedin.com]
it was public data (Score:1)
A little-known data firm was able to build 48 million personal profiles, combining data from sites and social networks like Facebook, LinkedIn, Twitter, and Zillow,
That is data people posted publicly.
Now if they did that with FB's "shadow profiles" of non-users, then maybe I can see a cause for being upset. But if people spew their private data to every advert company on the internet, inc the biggest data aggregators out there like FB, G and Linkedin, they do not have a "reasonable expectation of privacy". That's like publishing your drunken fratboy antics in the New York Times, and then being upset when someone reads about them.
People have to start thinking about w
Zuck Lies (Score:2)
Re: (Score:3)
He doesn't. He sells lists of names that meet criteria. The data itself is too valuable to sell, just once.
Facebook is upset that Cambridge Analytics did what Facebook does. Never throw away data and never miss a chance to collect more.
Is this a witch hunt? (Score:2)
hmmmm, wait a second... *sniffs the smoke* *listens to the chanting mob* *Looks down at the pitchfork in his hands*. Yep. This is a witch-hunt.
Now, don't get me wrong. I honestly despise this paticular brand of witch. These guys suck and their actions have a very anti-social bent to it. Their buisness model is abuse and intrusive. Fuck marketers. I know plenty well enough to protect myself, but "the masses" are just kinda generally dumb and enough are swayable into doing dumb things. Like using emacs
What a world (Score:5, Insightful)
A Canadian kid gets charged with "exploiting a vulnerability", (i.e. incrementing a number in a URL), and faces ten years in prison for archiving the FOI data he collected as a result. He had no idea he was doing anything wrong. (FOI? Hello!). These assclowns scraped data, and created 48 million personal profiles without consent. They knew full well what they were doing. Then they effectively published the data. Careless, much? Arguably they were criminally careless. They probably won't face any penalties at all. Go figure.
Re: (Score:2)
They also donated to the DNC.
Subtle.
But quit trying to shoehorn this into a partisan issue you shitty little shill.
Re: (Score:2)
Well the kid was accessing data that was meant to be secured, if poorly. These researchers are accessing data that is meant to be public.
They can do whatever they like with my linked in data, it was put up there for everyone in the world to see.
Public data was leaked publicly? (Score:2)
Wait, so a company that scraped data from public sources, left the data unsecured, and the public could access it?
personal profiles...from sites and social networks like Facebook, LinkedIn, Twitter, and Zillow, among others - -- without the users' knowledge or consent.
Are you telling me that users of social networks do not know that the public part of their profile is available publicly? What? Hey, there's plenty of privacy violations going around, but this isn't one of them. Save your outrage for any one of the many other examples.
Order your own LexisNexis file - you'll be shocked (Score:2, Informative)
Why are so many Amazon buckets public? (Score:2)
Was there a time when Amazon shipped S3 buckets public by default, with permissions wide open to the world? What is it with these S3 buckets.
Last time I set up a public bucket (to share some of my photos to some friends), I had to explicitly set the checkbox, and it came up with "you can't just walk into Mordor" warning.
So ... (Score:2)
Link? (Score:1)
Barely the tip of the iceberg. (Score:2)
The general public is barely aware of 1, 2 or 3 companies that have collected and used information from public and private sources because of the left wing faux outrage that Trump was involved with 1 of them.
What are they going to say when they find out its also LinkedIn and Twitter and every other 'free' service and more collecting/scraping/surveying/using/sharing/selling every shred of collected information to sell more advertising and or create relationships for their own purposes.
This was so easy to pre