/ 0
60%
Table of contents

Document in text mode:

RobustDe-anonymizationofLargeSparseDatasetsArvindNarayananandVitalyShmatikovTheUniversityofTexasatAustinAbstractWepresentanewclassofstatisticalde-anonymizationattacksagainsthigh-dimensionalmicro-data,suchasindividualpreferences,recommen-dations,transactionrecordsandsoon.Ourtechniquesarerobusttoperturbationinthedataandtoleratesomemistakesintheadversary’sbackgroundknowledge.Weapplyourde-anonymizationmethodologytotheNetflixPrizedataset,whichcontainsanonymousmovieratingsof500,000subscribersofNetflix,theworld’slargestonlinemovierentalservice.Wedemonstratethatanadversarywhoknowsonlyalittlebitaboutanindividualsubscribercaneasilyidentifythissub-scriber’srecordinthedataset.UsingtheInternetMovieDatabaseasthesourceofbackgroundknowl-edge,wesuccessfullyidentifiedtheNetflixrecordsofknownusers,uncoveringtheirapparentpoliticalpref-erencesandotherpotentiallysensitiveinformation.1IntroductionDatasetscontainingmicro-data,thatis,informationaboutspecificindividuals,areincreasinglybecomingpublicinresponseto“open...