Is de-identification of personal records possible?

Last month Harvard Magazine ran a fantastic article on privacy in the current era, focusing in particular on the work of researcher Latanya Sweeney, who has demonstrated a somewhat alarming ability take personal data that has been de-identified in accordance with current technical standards and “re-identify” it through the use of publicly available data sources. Then last week the New York Times reported on two computer scientists at UT-Austin who had great success identifying individuals whose de-identified movie rental records had been provided by Netflix as part of a competition to improve the video rental-by-mail firm’s automated recommendation software. Netflix went so far as to deny that it was possible to positively identify anyone in the data it provided, due to measures the company had taken to alter the data, and compared the de-identification measures Netflix used to standards for anonymizing personal health information.

While it may be a bit of a leap to extrapolate the results of the Texas researchers to the health information domain, the privacy advocates appear to have reason for concern. The frequency with which de-identified health record information is made available to industry, government, and research organizations coupled with what seems to be a failure among many governing authorities to understand just how feasible it is to successfully correlate these anonymous records with other available personal information sets seem to be imparting a false sense of security around de-identification in general. As more and more attention is focused on this area of research, it may well be that current standards for de-identification simply cannot provide the sort of privacy protection they are intended to deliver.