Big Data in health care has the potential to drastically improve patient-level insights when combined with personalized patient data. Understanding a patient’s whole health journey is critical to conducting impactful research, making the right diagnosis, clinical decision support, and securing health equity for individuals and communities; however, the ability to uncover actionable insights from Big Data can prove to be elusive. To harness the full potential of Big Data, stakeholders are turning to next-generation data tokenization to enable the blending of real-world data (RWD) sources.
The core aim of tokenization is to leverage data in a more holistic context through deep patient linking and deidentification across a wide array of datasets, while ensuring security and preserving patient privacy. By leveraging the tokens generated by matching and tokenization processes, instead of identified data, the effort, cost, and time to match, integrate, and process datasets can be reduced, all while ensuring the security of this data.
Historically, the health care industry has tolerated abysmally low tokenization match rates. Listen folks, it’s 2023! Don’t you remember? We were all supposed to be driving flying cars by now! The least we can expect is to no longer have to battle to match patients with their data. The good news is we no longer need to accept poor match rates.
Referential matching has proven to be the lynchpin in improving match rates across longitudinal datasets. Instead of attempting to make exact matches on two rigid sets of data to link patients, referential matches create hashes from all instances of a patient found across historical datasets. You are probably asking yourself, “Self, so what does this mean practically?”
Well, practically, hashes are recipes that can identify patients across the historical variability of their name and demographic record. Hashes can include demographic information (like first and last name, date of birth), geographic information (like address or treatment location), and even clinical data.
For instance, in my own history, my last name changed when I was adopted, the last name of my adoptive father is regularly misspelled, my middle name is occasionally abbreviated or excluded, and only my first name initial appears in my data. I have lived in four states across a dozen addresses and have had health care delivered across those instances of my demographic history through different institutions under different insurers. A legacy master data management (matching) model would struggle to identify these instances of me spread across time and data sources (such as claims, EMR, and social determinants data).
A referential match model can contain many weighted recipes that mix various combinations of these facets of patient records and utilize advanced AI/ML matching techniques (and old school methods like Soundex, which can match names that sound like other names or are commonly misspelled). The matching engine then ranks matches based on these recipes by their weights and results, allowing for nuanced scoring and usage. This support for multi-recipe matching, as well as a large referential dataset to match against (an extensive history of multiple patient instances), can sharply increase matching rates, making tokenization much more valuable for data enrichment.
Contextualizing care
“So what?” you might ask. Well, the reality is these advancing methods, combined with cloud scale, are now making real the promise of combining layers of clinical data, social and community determinants, medical claims, mortality information, and more into a single longitudinal patient record at scale and speed.
For example, a provider can effectively match a patient’s demographics, specified social determinants, and their medical claims to help drive an increased medication adherence for at-risk patients. This integrative power creates a more holistic picture of the individual, arming providers with actionable insights to provide care that produces the best possible outcomes.
Beyond enhancing individual-level insights, tokenization also expands to reach underserved populations that may have been overlooked in the past. Large-scale social determinants datasets are now shining a greater light on geography-based outcomes where your zip code is more telling of your health outcomes than your genealogy. By integrating RWD from various sources, tokenization breaks down data silos and provides a more complete picture for population health and care management programs and is starting to drive critical SDoH interventions that can be scaled and automated just as easily as other traditional measures.
These changes to data integration have significant implications for clinical research and precision medicine. Tokenized data can help close study endpoints more quickly, improving study durations and reducing collection burden for overloaded providers during studies. This reduced burden can also help drive higher participation from overburdened community health institutions.
Additionally, the ability to aggregate datasets and make more nuanced inferences can help to create more inclusive clinical trials. Researchers can leverage tokenization to facilitate the inclusion of more diverse populations in trials through cleaner identification and trial matching algorithms and large-scale matching databases without fear of patient data leakage or losing patients to other facilities. This, in turn, leads to research findings that are more representative and applicable to real-world patient populations.
Balancing privacy with insight
Privacy is a key aspect and one of the main reasons tokenization is used, but the true value lies within its ability to balance privacy protection while still expanding real-world insight. Other methods, such as the HIPAA Safe Harbor Provision, while effective for data privacy, often exclude the very identifiers that are critically important for SDoH-related context and evaluation. Tokenization can enable the linking of data across different health systems and data types while protecting sensitive and identifiable information. This method of de-identification provides a more robust approach and ensures privacy without sacrificing valuable context and demographic information that might otherwise be used for data-driven decision support and process.
Furthermore, tokenization enables individual-level insights to be easily aggregated, uncovering meaningful population-level data for researchers who, in turn, can gain a deeper understanding of a disease or treatment’s impact on a patient population through AI, ML, and traditional analytics. All of this can be accomplished while still maintaining expertly determined privacy – and your sanity.
The era of tokenization
While data tokenization is not new, most processes do not leverage the full power and potential of tokenized data. The true opportunities to improve patient outcomes and transform the understanding of health and disease using next-generation tokenization will be driven by more effective matching, powerful hash and token technology, and critical referential data stores. By leveraging the power of Big Health Data while preserving privacy, tokenization can offer health care leaders a future-proof tool to execute research, create solutions, personalize treatment plans, and champion policies that better serve everyone.
Adam Mariano is a health care executive.