Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Security Media Movies Privacy Your Rights Online

Anonymity of Netflix Prize Dataset Broken 164

KentuckyFC writes "The anonymity of the Netflix Prize dataset has been broken by a pair of computer scientists from the University of Texas, according to a report from the physics arXivblog. It turns out that an individual's set of ratings and the dates on which they were made are pretty unique, particularly if the ratings involve films outside the most popular 100 movies. So it's straightforward to find a match by comparing the anonymized data against publicly available ratings on the Internet Movie Database (IMDb) (abstract on the physics arxiv). The researchers used this method to find how individuals on the IMDb privately rated films on Netflix, in the process possibly working out their political affiliation, sexual preferences and a number of other personal details"
This discussion has been archived. No new comments can be posted.

Anonymity of Netflix Prize Dataset Broken

Comments Filter:
  • by CastrTroy ( 595695 ) on Tuesday November 27, 2007 @10:36AM (#21491815)
    Seems like it was only broken because the identity of the people was posted somewhere else, along with the ratings. My only question is how they connected the rankings on Netflix, to the rankings on IMDB. Does Netflix take the liberty of submitting all the users rankings to IMDB for them, and also include their name with this data? If you just have anonymous dataset A, with anonymous dataset B, you could match up users from both and figure out which person in A is the same person in B, but you still wouldn't know who the person is. However, if you now have dataset B be not anonymous, then it's not too difficult to compare movie ratings and find out who the people are.
  • did it work? (Score:3, Interesting)

    by Speare ( 84249 ) on Tuesday November 27, 2007 @10:39AM (#21491859) Homepage Journal

    The researchers used this method to find how individuals on the IMDb privately rated films on Netflix, in the process possibly working out their political affiliation, sexual preferences and a number of other personal details

    {tongueincheek}Yeah, but the question is, will knowing those personal facts generate better movie recommendations?{/tongueincheek}

    When there's a significant prize at stake, researchers can try all sorts of slimy tricks to win. (I'm not saying that's the motive behind this report, but there are many "researchers" going for the prize.) And when there's significant profits at stake, a corporation will damn-fire-certainly use whatever means they can use to maximize those profits, regardless of whether it might be "ethical."

  • by Anonymous Coward on Tuesday November 27, 2007 @10:54AM (#21492005)
    There are two things going on here. One, many people are asking how you could identify any personal information about people based on their movie preferences. The answer is data-mining. Very sophisticated techniques exist to do things exactly like this, i.e. take a data set and find out about the people.

    The second problem is that by deanonymizing the NetFlix data, you can start to cheat on the NetFlix prize. The requirement to win $1 million is that your recommendation engine is 10% better than the one they are currently using. However, if you can learn the exact preferences of some users in the dataset (i.e. by finding the rest of their ratings on IMDB) then you can hardcode that into your recommendation engine and get the recommendations for these users exactly right. This can boost your score even though your actual system is no better than the existing one. This is known as over-fitting to the data.

    Finally, this paper is over a year old. Can we please have some new news?
  • by styryx ( 952942 ) on Tuesday November 27, 2007 @11:01AM (#21492083)
    That's the plot of Hudson Hawk. Good flick.
  • by xtracto ( 837672 ) on Tuesday November 27, 2007 @11:58AM (#21492875) Journal
    One example, Shindlers list, great movie, do NOT want to see it again. Same with Grave of the fireflies. Some movies just ain't for multiple viewings. They are my "favorite movies I never want to see again".

    Just out of curiosity, why don't you want to see those films again? both of them are really good films and although I would not see them every weekend (as for example Sin City), I enjoy watching them from time to time. The plot is interesting, the photography/drawing is nice and the screen writing is well done.

    I find it difficult to understand your statement, "favorite movies I never want to see again", if you do not want to see them again, then you do not enjoy watching them... unless you dislike enjoyment and only watch films that make you cry or have a bad time (I would suggest you United 93... worst film I have seen in a looong long time... or Broeback Mountain, a 1 hour marlboro country ad).

    I not not know about the netflix scoring algorithm but I have found criticker.com quite reliable for my tastes.

    Am I insane in thinking that you can see a movie as being a great artwork and still not liking it or viceversa?
    It might be akin to the "La Gioconda" painting. Everybody says it is the best piece of art of all the time, yet, after having watched it *twice* live in the Louvre I have yet to find something special about it (I prefer for example, paintings from Giovanni Paninni, which is relatively unknown)
  • Re:Probabilities (Score:3, Interesting)

    by coolGuyZak ( 844482 ) on Tuesday November 27, 2007 @12:10PM (#21493049)
    Some tech-savvy households may enable profiles on Netflix, enabling each person to track their likes & dislikes independently. (I did this for my GF, who has wildly disparate tastes from me). I'm not sure what effect that would have on the data. It'd certainly be neat if the scientists could differentiate between individual and multiple users using a particular profile.
  • by phobos13013 ( 813040 ) on Tuesday November 27, 2007 @12:15PM (#21493125)
    Actually TFA seems to suggest that the more obscure and pretentious we are, the easier it is the track us. If we become homogeneous drones voting on the top 100 films, we are safe! Even so, I don't plan to become a homogeneous drone...
  • by SmallFurryCreature ( 593017 ) on Tuesday November 27, 2007 @01:11PM (#21493877) Journal

    The comment "favotire movie I never want to see again" is one I got from a review of Grave of the Fireflies that I just happened to totally agree with. Don't read the reviews, just watch it yourselve and if you are not into Anime just set that aside for the duration of the movie, then ask yourselve again, if you can understand that comment.

    It is powerfull movie, like Shindlers List, but not a happy tale. I am not talking a tear jerker movie here, I am talking a "we will all burn in hell for this" movie. Tear jerkers I can take, Christmas in August is one. Sad tale, nicely told but ultimately human. It makes you sad, not sick of humanity.

    Perhaps I am just too emotional about this kinda stuff, one reason might be that I grew up with halfunderstood tales of "that was were your great-uncle was picked up". When you realize just why your grandmother had 9 brothers and sisters yet you never met any. I got one aunt, my grand-parents had 3 kids, a starvation story like GotF hits a lot closer with a history like that. (The dutch hunger winter)

    I enjoy all kinds of movies and would NOT have NOT watched these two, but that doesn't mean I want to see them again. There are some people who list Shindlers List as a feel good movie because it 'ends well'. I suppose you might see it that way, I don't.

    I can regonize your statements that the photography is nice and the screen writing is well done, but the plot is intresting? To you it is a plot, to me it is a sickening part of history that I am far too close to.

    Perhaps it is a bit like how Richard Pryor's monologue about the 200th celebration of the US was not exactly all that cheerfull.

    Terry Pratchets Nanny Ogg describers at one point the difference between merry and mirth (or something like that) she describes how she was joyfull when her child was being born but she wasn't exactly chuckling at the time. Enjoying a movie and enjoying it are two different things, at least for me. I can't describe it any clearer.

  • by AmiMoJo ( 196126 ) on Wednesday November 28, 2007 @10:20AM (#21504043) Homepage Journal
    None of the mainstream media picked up on it, but I remember thinking this sort of thing might be possible with the data lost by HMRC too. I bet Tesco would love to get their hands on it for planning where to put new stores and what to stock etc. Combined with their Clubcard database, of course.

"Life begins when you can spend your spare time programming instead of watching television." -- Cal Keegan

Working...