Robust de-anonymization of large sparse datasets - ut computer ...
5 pages - 686.5 KB
Netflix prize dataset, which contains anonymous movie ... ing or similar data mining purposes (as in the case of the netflix prize dataset), ...
Document in text mode:
RobustDe-anonymizationofLargeSparseDatasetsArvindNarayananandVitalyShmatikovTheUniversityofTexasatAustinAbstractWepresentanewclassofstatisticalde-anonymizationattacksagainsthigh-dimensionalmicro-data,suchasindividualpreferences,recommen-dations,transactionrecordsandsoon.Ourtechniquesarerobusttoperturbationinthedataandtoleratesomemistakesintheadversary’sbackgroundknowledge.Weapplyourde-anonymizationmethodologytotheNetflixPrizedataset,whichcontainsanonymousmovieratingsof500,000subscribersofNetflix,theworld’slargestonlinemovierentalservice.Wedemonstratethatanadversarywhoknowsonlyalittlebitaboutanindividualsubscribercaneasilyidentifythissub-scriber’srecordinthedataset.UsingtheInternetMovieDatabaseasthesourceofbackgroundknowl-edge,wesuccessfullyidentifiedtheNetflixrecordsofknownusers,uncoveringtheirapparentpoliticalpref-erencesandotherpotentiallysensitiveinformation.1IntroductionDatasetscontainingmicro-data,thatis,informationaboutspecificindividuals,areincreasinglybecomingpublicinresponseto“open...