Monday, March 19, 2012

Where is Health Care’s Big Data?

The world of health care is abuzz with heated discussions about health information exchange, data liberation and the beneficial consequences of these actions to all stakeholders who use, deliver, regulate or profit from the one fifth of the U.S. economy devoted to medical care. While the efforts to liquefy health information from its solid paper state to a fluid stream of zeros and ones are hardly new, the release of the Meaningful Use Stage 2 proposed rules, has triggered a renewed Pavlovian response to the prospect of having people’s health information flowing freely over the Internet creating massive amounts of Big Data. Before the frenzy gets way ahead of itself, perhaps we should take a closer look at how health information is created, where it resides, how it is shared and what Meaningful Use Stage 2 is targeting for change. In short, let’s follow the Data…

Health Care Data Creation
Health information about an individual begins accumulating from the moment of birth, which is duly recorded when a tiny new consumer emerges into the world. Over one’s life, encounters with the medical system are carefully recorded and now, most of those encounters will be computerized. The following are the main sources of health data:
  • Medical Care Providers – Hospitals, physicians, pharmacists, nurses, therapists and eventually long term care facilities are recording medical encounter information both for treatment and for financial purposes. The data created by health care providers is very rich in clinical information and also contains a significant amount of socio-economic details. Currently, medical information is scattered amongst all providers of health care encountered during a lifetime, and is mostly maintained on paper. Portions of it may or may not be exchanged in an ad-hoc fashion when people seek care at various medical facilities.
  • Public and Private Payers – For the overwhelming majority of Americans who utilize insurance instruments to finance medical care, a lower fidelity version of health information is being maintained by the payer, mostly in electronic format, tallying ailments, therapies and everything else that has a dollar value associated with it.
  • Ancillary Providers – From pharmacies to laboratories and imaging facilities, ancillary service providers are also maintaining lists of medical services and products provided to people and when pertinent, clinical results and invoicing information. This information is also largely maintained in electronic format.
  • Personal Health – Although largely confined to a tiny minority of healthy, educated and tech savvy people, the results of personal monitoring of health indicators are beginning to accumulate in various private information systems. Currently most of this data is created by individuals outside the traditional medical system and is maintained and controlled by new and rather small technology vendors in electronic format.
Health Care Data Whereabouts
Obviously entities that create health data are also maintaining complete copies of said data for their records. However, large amounts of data are being exchanged currently between facilities of care, payers, diagnostic companies and government agencies, mostly in electronic format. As data moves around, and it does move, new repositories of data also emerge.
  • Health Data Creators – Medical care providers, ancillary services providers and payers of all stripes store, maintain and supposedly own practically all health data created by their various business units and all copies of data created by others and transmitted to them during the course of business. There are significant overlaps between various data creators. For example, while payers do create financial data, all clinical information they possess is created and also stored by others.
  • Public Health Agencies – Registries (e.g. immunizations, cancer, etc.) and other regulatory reporting repositories are also storing pieces of information transmitted to them by health data creators as required by State laws and vary greatly in availability and capabilities, but most information is electronically maintained.
  • Clearinghouses – The facilitators of information exchange, mostly medical claims and payment data, medications, and to a lesser extent laboratory data, are also accumulating copies of whatever information is flowing through their systems in electronic format.
  • Health Information Organizations – A special case of the clearinghouse model, these entities are mostly concerned with facilitating communications among regional health care providers, and in some cases are also undertaking data analysis services on behalf of their clients. As such, these organizations in many instances accumulate some portions of medical records data for some segments of the population in their geographical catchment areas.
  • Technology Vendors – Those who supply electronic means to health data creators, and particularly the vendors who offer their technology in a remote service model, retain full access to their customers data, and for smaller customers, such as physicians in private practice, technology vendors are contractually reserving rights to make use of health data in a manner consistent with HIPAA regulations.
  • Consumers – Aside from the emerging personal health monitoring devices and applications, a small minority of savvy consumers is also maintaining personal health records separately from their medical services providers, usually in remotely stored and web accessible repositories.
Meaningful Health Information Exchange
While Meaningful Use Stage 1 did not introduce any tangible breakthroughs in health information exchange beyond what was already occurring, the proposed Stage 2 measures are attempting to spur exchange of health data in several ways.
  • Increase of exiting exchange – All thresholds for information exchange already in place, such as prescription data and laboratory results data, have been increased.
  • Public Health Reporting – Actual reporting of data to government agencies is now required.
  • Provider-to-provider – Some ad-hoc, point-to-point, standardized exchange of clinical summaries between various health care providers will be required.
  • Provider-to/from-patient – In addition to requiring that physicians and hospitals provide health information to patients in electronic format, Meaningful Use Stage 2 places requirements on patients to contact their doctors by email and to access their electronic medical records online. The proposed rules stop short of mandating that patients actually copy their medical records themselves, or ship a copy to a third party recipient, but we have Stage 3 and onwards to look forward to. Either way, technology must be enabled to allow patients, or authorized entities (family, friends, other software, etc.) to extract copies of health information from the HIPAA covered entities where the data was created.
So where is the Big Data of health care? It is very likely that Meaningful Use Stage 2, along with accelerated Electronic Health Records adoption, will increase the size and number of current health data repositories, as well as the ad-hoc exchange of information between them. Changes in reimbursement models will also spur the slicing and dicing of health data, particularly for the purpose of risk management, but there is nothing in the proposed Meaningful Use Stage 2 rules to suggest wholesale merging of health data repositories, and on their own, none of them could be considered Big Data. What if we endeavor to index the contents of all repositories, per the PCAST model, and allow government, or other legitimate “stakeholders”, access to query every indexed and networked health data repository in the land? Is this the much touted and feverishly anticipated Big Data of health care? Meh… Something is still missing.

Big Data is not just lots and lots of data. Big Data is indeed big, but it is also very difficult to accommodate in traditional relational database systems. Big Data is very dynamic and changes fast, furiously and continuously, and Big Data is usually a combination of multiple unrelated sources; it is messy, inaccurate and needs lots of cleaning, processing and culling before it can be used. Well, we all know health care data is a royal mess, but other than that it has no other characteristics of Big Data. Some forward thinking clinical researchers, concerned with the ability of “decision makers” to make decisions for us, are suggesting that we make the data bigger by adding patient reported “psychosocial issues and health behavior”, which are usually outside the realm of medical care. They boldly envision doctors administering questionnaires to their patients to enrich the data sets obtainable by tying together “millions of encounters in real-world settings”. Not sure how you do that over disparate repositories without fully identifying each patient, but those are just details, and surely patients will see the value in allowing decision makers to assess our quality of life for comparative-effectiveness purposes. Anyway, adding some very patient-centered fields, quantifying your happiness or sadness, to the medical record is not going to elevate health care data to the status of Big Data. In order for health care data to become truly Big Data, it will have to be combined with other data sources, such as social media data, Internet search data, financial data, census data, shopping data, cell phone data, and the list goes on. And herein lies the Big excitement.

Of course, once we throw health care data into the boiling cauldron of Big Data, many wondrous things can emerge along with some nightmarish ones too. The only (legal) tool for extracting health care data from the HIPAA covered ecosystem today is the patient, because the patient is the only one who can legally “transmit” data from a covered entity to one that is not hampered by such details, like the various standalone Personal Health Records provided by private companies and all sorts of other health and wellness tools built by eager entrepreneurs. Once those shiny new “Transmit” buttons are added to patient portals, per Meaningful Use Stage 2 rules, there will no doubt be dozens of tools competing for “informed permission” from patients to crawl health care providers’ portals and auto extract everything possible and aggregate it “on behalf” of the patient. Will patients cooperate and donate personal health records to the Big Data enterprise? Perhaps a few, but not enough to make the bells ring. There will need to be a few more rounds of rulemaking before patient data can be truly liberated from those antiquated HIPAA protections, and properly used to build a few fortunes and to improve the quality and accuracy of the coupons we receive in the mail.


  1. My cousin recommended this blog and she was totally right keep up the fantastic work!

    Health Information Exchange

  2. What a great post. I can see the huge reference you made before posting this health care info.

    Self Insured Health Benifit Consultants.