Sunday, October 9, 2011

The Rise of Big Data

Health care is in the process of getting itself computerized. Fashionably late to the party, health care is making a big entrance into the information age, because health care is well positioned to become a big player in the ongoing Big Data game. In case you haven’t noticed computerized health care, which used to be the realm of obscure and mostly small companies, is now attracting interest from household names such as IBM, Google, AT&T, Verizon and Microsoft, just to name a few. The amount and quality of Big Data that health care can bring to the table is tremendous and it complements the business activities of many large technology players. We all know about paper charts currently being transformed via electronic medical records to computerized data, but what exactly is Big Data? Is it lots and lots of data? Yes, but that’s not all it is.

Americans live for approximately 78 years. They see a doctor about 4 times per year and spend on average 0.6 days each year in a hospital. To keep a life time record of blood pressure readings for all Americans, including metadata (date/time of reading, who recorded the measure and where, etc.) takes approximately 6 TB (terabytes) of storage space, or about 12 laptops with standard 600 GB hard drives. Not too big. What if we start using mobile wearable devices to quantify ourselves, as some folks already do, and we record blood pressure, say, every hour? We will require 1460 TB of storage, or almost 3000 laptops, or the equivalent of 6 times the digitized contents of the Library of Congress, and this is for blood pressure monitoring only. Adding in the remaining 99.9% of the medical record, including large imaging files, hospital monitoring devices, pharmacy data, insurer data, telehealth sessions and other personal health sensors, and keeping in mind that all these data are meant to be exchanged freely over the Internet, we are approaching a data tsunami of biblical proportions. And we are not done just yet. Once health care’s Big Data is released into the mainstream Internet, it will initiate secondary and tertiary waves of new data created by consumers addressing their newly found health care data on social media venues, specialty forums, blogs and commercial sites offering services for health data. Big Data is the fluid combination of the ever increasing real-time data streams created by everything from government to businesses to Facebook, Twitter, Geo-locators, mobile devices and connected sensors everywhere. Big Data is as much about size as it is about cross pollination of data from disparate sources.

A fascinating June 2011, McKinsey report predicts that Big Data is the “next frontier for innovation, competition, and productivity” and that Big Data will become equal to labor and capital in its importance to production. For U.S. health care, the report is predicting $300 billion per year in savings due to utilization of Big Data to drive the execution of strategies proposed by health care experts. In the area of clinical operations, the report lists projected savings from Comparative Effectiveness Research (CER) when tied to insurance coverage, Clinical Decision Support (CDS) savings derived from delegating work to lower paid resources and from reductions in adverse events, transparency for consumers in the form of quality reports for physicians and hospitals, home monitoring devices including pills that report back when they are ingested, and profiling patients for managed care interventions. Administrative savings are projected from automated systems to detect and reduce fraud and from shifting to outcomes based reimbursement for providers and, interestingly, for drug manufacturers through collective bargaining by insurers. Most savings listed under research and development opportunities from Big Data seem to accrue to pharmaceutical and device manufacturers. There is nothing to suggest that Big Data will somehow reduce unit prices of products or services.

To be honest, I don’t quite understand where the $300 billion in savings come from as there are no actual itemized numbers to support this prediction. In addition to stated reliance on individual studies and expert interviews, there are many structural assumptions regarding massive provider consolidation, proliferation of Accountable Care Organizations, technology adoption rates of 90% across the industry and data sharing amongst all stakeholders, at which point Big Data will come in and do its thing. The costs for generating, storing and analyzing Big Data which include emerging data storage technologies and analytical expertise are factored in, with the costs of national deployment of EHRs alone “estimated at around $20 billion a year, after initial deployment (estimated at up to $200 billion)”.

Most people, including doctors, will probably agree that pertinent data, big or small, can be transformed into pertinent information, and pertinent information is vital to good decision making. But is Big Data pertinent? Are all those petabytes of minute details about everything and everybody really useful, or are we just mixing a little wheat with a lot of chaff? There are various opinions on this, but the prevailing wisdom seems to be that the more data you have, the more likely you are to be able to extract something useful out of it. By observing patterns and correlations in this ocean of information you may discover answers to questions you wouldn’t have known to ask in the first place. There is much power in Big Data, but there is also danger. As big as Big Data may be, it does not guarantee that it is complete or accurate, which may lead to equally incomplete and inaccurate observations. Big Data is not available to all and is not created by all in equal amounts, which may lead to undue power for Big Data holders and misrepresentation of interests for those who do not generate enough Big Data. Collection and analysis of Big Data has obvious implications to privacy and human rights. But the biggest danger of all, in my opinion, is the forthcoming relaxations in the rigors of accepted scientific methods, and none seems bigger than the temptation to infer causality from correlation.

We’ve been there before. When humanity dwelt in caves and villages, correlation was enough to establish causality. We’ve come a long way since, but the global village we are creating today seems tempted to go back to observation as the main way of gaining understanding. Just like the historic villagers, we are now convinced that we can see everything there is to be seen; therefore the answers to all our questions must be found in the Big Data mirror we placed in front of us. All we have to do is stare at it long enough and the patterns will emerge. The sheer size and variety of Big Data will make it much easier to reject the null hypothesis and see patterns where none exist. On the other hand, if we keep staring at our digital selves in the eye for long enough, perhaps we will achieve the most coveted observation of all: a glimpse through the windows to our digitized soul.

1 comment:

  1. Well, what we can say for sure about Big Data is that it has now become bigger. :) Huge and monumental, as digitization branches into every respect, every area of human concern. But its demand for a physical corpus is going to remain, whether it be in the form of hardware, or software that is going to enable it. And this corpus is going to need a huge brain to contain it, which is what storage mechanisms essentially are.