After all the hours spent in HIPAA training over the years, physicians and other health care workers might think of HIPAA as a powerful regulation. It’s true that HIPAA does require health care workers to follow a number of rules, with pretty harsh penalties for violations. But from a patient’s perspective, how well does U.S. law protect overall health information privacy? Unfortunately, not very well, and things are getting worse.
The privacy provisions of HIPAA were enacted in 2002. Back then, most individuals’ health care data still took the form of paper-based medical records maintained by hospitals and clinics. Artificial intelligence and large-scale “Big data” analytic techniques had yet to emerge in their modern forms. Surveillance capitalism, the business model by which technology companies compile detailed profiles of their users to support ad targeting, was just getting started. So it was understandable that HIPAA was written to address the privacy risks of that earlier era rather than the risks that exist today. Three glaring deficiencies make HIPAA increasingly weak: the definition of covered entities, the de-identification loophole, and the focus on disclosures rather than the downstream uses of data.
In 2012, the retailer Target found itself in the national press for inadvertently outing a teenage girl to her parents as being pregnant. Target (more specifically, the software determining which customers to mail baby supply coupons to) had acquired knowledge of the pregnancy not by obtaining the girl’s medical records, but rather through analysis of her shopping patterns. Still, pregnancy is undeniably a health condition, and one for which someone might expect protection under HIPAA. But they would be disappointed. HIPAA only regulates the release of personal health information by health care workers and organizations, as well as health insurers and insurance claim clearinghouses, which are collectively referred to as covered entities. When non-covered entities such as Target, or pharmaceutical companies, or social media giants, or even many web-based health information companies, are able to obtain personal health data through sources other than traditional medical records, HIPAA doesn’t apply to them.
HIPAA’s second glaring weakness is the de-identification loophole. When certain identifiers such as names, dates, and locations are removed from a collection of clinical data, that data is no longer considered subject to HIPAA and can be legally shared or even sold to other organizations. (Disclosure: I’ve used de-identified health data in some of my own academic research projects.) There is also a large data broker industry that purchases de-identified medical records from hospitals and commercial laboratories, and then resells them to pharmaceutical companies and other customers.
The problem with de-identification is that while it creates the appearance of anonymity, it doesn’t actually make the data anonymous. If you take a de-identified data set and cross-reference it against other data sets containing information about those same individuals, it is often possible to re-identify the people in the first data set. Probabilistic methods add additional power, and matches don’t need to be 100 percent reliable to serve business goals such as targeted advertising. Re-identification of previously de-identified medical records isn’t just a theoretic risk. A recent investigation by Stat News found that Quintiles, a contract research organization, and Truven Health Analytics, a health care data broker, had successfully linked the de-identified medical records from millions of patients (obtained from MedicaLogic, then a subsidiary of General Electric) with an insurance claim database. This allowed re-identification with a reported accuracy rate of 95 percent.
The third problem with HIPAA is that while it penalizes certain types of inappropriate data sharing, it doesn’t do a good job distinguishing between different types of subsequent data uses. More people are comfortable with their data being used for academic research, for example, than for commercial uses such as targeted advertising. More nefarious uses, such as for employment or insurance plan discrimination, have become increasingly technically feasible and challenging to detect. Because of this potential for harm, combined with the relative ease of concealing causation within artificial intelligence algorithms, privacy law ought to have particularly strong restrictions on commercial uses of health data. Instead, commercial uses are actually less heavily regulated in the U.S. than academic research uses, because the latter are at least subject to a separate set of federal laws governing human subjects research.
Clearly, health privacy law needs to be modernized. Modernizing may bring additional benefits beyond individual privacy. If the public trusts that their health data won’t be misused, they might be more open to health data aggregation for academic and public health purposes. Consider 2020, for example, when the fragmented U.S. health care system struggled to gather reliable statistics on COVID-19 infections and therapeutic outcomes. Much better data were coming out of the United Kingdom during that time, despite having only a fifth as many people. The U.K. has national health identity numbers (which the U.S. has banned due to privacy concerns) and central health data aggregation. It also has the Data Protection Act of 2018 (the U.K. implementation of the European Union’s General Data Protection Regulation). In a democracy, public data aggregation is only sustainable in strong data privacy protections such as these.
Health care data includes the most private details of our lives. Americans want and deserve laws that control PHI in patients’ hands, not corporations.
Brian R. Jackson is a pathologist.