Digital Utopia, Meet the Null Hypothesis

By Frank Celia

February 10, 2015 | A kind of consumer-oriented populism lies at the heart of the quintessential Silicon Valley business model. Companies like Google and Facebook succeed because they are free and appeal to hundreds of millions of people who use them every day. Once achieved, massive popularity can then be leveraged to realize widely disparate goals, ones perhaps unimaginable when the industries were young.

In stark contrast, healthcare, and the research and regulatory institutions that underpin it, exists in a world far removed from any populist notions. For all the talk over the last twenty years of patient empowerment, medicine remains a top-down affair: Highly educated individuals using empirical methods perfected over hundreds of years think, invent, innovate, experiment, draw conclusions, and then tell the rest of us how to beat an illness or enhance wellbeing. Ask any group of physicians about the impact of "Dr. Google," and the medical establishment's low opinion of internet egalitarianism will quickly become clear.

Of all the many difficulties to be overcome before the benefits of big data analytics and patient generated data (PGD) can be attained, this imminent clash of cultures may be the most profound. These two worlds are currently eyeing each other with suspicion and anxiety, if not outright animosity. How they resolve their differences will most likely determine the face of healthcare in the years to come.

Battle for Bio-Signal   

It's no secret that Silicon Valley plans to use the burgeoning fitness wearables trend as an entrée into the $2.7 trillion healthcare market. Last year both Apple and Google unveiled similar health tracking platforms, HealthKit and Goggle Fit, software designed to aggregate bio-data from wearable technologies. And where these leviathans go digital startups soon follow. In the first half of 2014, more than $2.3 billion of venture capital flowed into this space, compared to a total of $1.9 billion in the whole of 2013.

As always, investors expect consumers to drive the market, which can then be monetized in much the same way web searches are. "I think the first thing that is going to happen is consumers are going to start buying these products [bio sensors, health tracking apps, etc.], which they're doing now," explains Jesse Slade Shantz, chief medical officer at OMsignal, a company that manufactures bio-sensing clothing. "A lot of what's going on now is companies are trying to be the data traffickers, to serve as the pipes and plumbing for this information—it's what we call the emerging 'battle for bio-signal.' The companies in control of this data will be able to study it and derive insights for themselves."

Profit motive aside, from a policy making standpoint, it is widely believed that rapidly growing corpora of PGD, in combination with advances in data analytics, ought to produce an overall win for traditional medical research. Hence the NIH's kickoff last year of its Big Data to Knowledge (BD2K) initiative, which began by granting $32 million to universities to develop big data tools and infrastructure and could swell to $656 million by 2020. In addition, stage 2 of the meaningful use electronic healthcare records (EHR) federal law calls for physicians and hospitals to submit syndromic surveillance data, immunization registries and other information to public health agencies in hopes of creating useful databases (although given interoperability challenges, it remains unclear how successful these efforts will be). In the private sector, Covance, one of the world's largest CROs, teamed up with Novartis last year to develop a state-of-the-art, single-site "data farm" to integrate and analyze large swaths of PGD.

As industry and government build platforms, several pilot-like studies are investigating specific research possibilities. Perhaps the lowest hanging fruit here is the chance to replace subjective practitioner assessments and patient self-reporting with hard statistics. For example, a study sponsored by Intel seeks to improve staging of Parkinson's disease. Current Parkinson's staging consists of a practitioner judging how well the patient speaks, moves and walks over a 15 minute period. In contrast, an unobtrusive wearable accelerometer can measure slowness of movement, tremor, and sleep quality 24 hours a day, providing far more objective data. By the end of the year, Intel hopes to monitor 10,000 Parkinson's patients this way. 

In another study, drug maker Biogen Idec gave Fitbit bands to 250 multiple sclerosis patients with the aim of tracking activity levels and sleep patterns. Such data might be used to measure responsiveness to drug therapy or predict the onset of relapsing MS, according to the company.  

The next higher level of complexity involves using large PGD sets to achieve true predictive medicine. An often-cited potential area of study here is heart failure. It is widely believed that two weeks prior to a major cardiac event, heart rhythms vary enough to be clinically significant. Once identified, these arrhythmia patterns could red flag imminent heart attacks.

Given long-term PGD for thousands, or at some point maybe even millions, of individuals, there is almost no end to the possible early-warning signals that could emerge, just by tracking ordinary biophysical patterns and behaviors. In one example, a 2012 Japanese study (DOI: 10.1111/j.1464-5491.2012.03667.x) found degradation in sleep quality combined with small variances in normal hemoglobin A1C levels could predict diabetes onset by as much as 10 years in advance.

Finally, the Holy Grail of possibilities in this realm: a greatly expanded ability to discover new drugs and a drastically reduced time to bring them to market. This is the ambitious goal of Berg, a Massachusetts-based pharmaceutical and diagnostics company, and a handful of firms like it. By combining PGD with systems biology, big data analytics, and artificial intelligence, Berg hopes to cut the time and expense of drug development in half, according to president and CTO Niven Narain. (Bio-IT World profiled Berg in August 2014).

The company's proprietary discovery platform automatically flags disease biomarkers, diagnostic tools and potential therapies, Narain says. "The key thing is we don't make hypotheses. We let the AI guide us through these areas."

Narain's is not the first company to reach for such lofty goals, and many skeptics in the biotech world question its chances of success. But even if Berg's and similar efforts fail in their main objectives, they still may make secondary advances along the way. In Berg's case, during the course of a partnership with the U.S. Department of Defense, which has been compiling military personnel PGD for two decades, preliminary results suggest it has uncovered prostate cancer bio-markers far more predictive for the disease than current standard-of-care, the prostate specific antigen (PSA) test, a diagnostic often no better than flipping a coin. If proven correct, this discovery alone would revolutionize the $16 billion-a-year PSA industry.

Painful Business   

If you believe the digital visionaries, the possibilities described above represent just the tip of the iceberg. In theory, the amount of personal data up for parsing is breathtaking: genomes, transcriptomes, proteomes, microbiomics, metabolomics, epigenetics, DNA structural variations, DNA SNP mutations, and much more. Even the most skeptical observers concede that the near future should present enormous opportunities for advancement in a wide array of life-science, biotech, and pharmaceutical industries.

On the other hand, many obstacles need to be overcome as well: Healthcare privacy laws will have to be altered or worked around. Medical data will have to be "de-personalized" the way bank records were 30 years ago at the dawn of the computer age. The pervasive lack of data platform interoperability will be a tough challenge. Research organizations often cannot handle extremely large data sets. For example, wearable heart rate monitors take 250 samples per second, or 9 gigabytes of data per person, per month. Hence, to study hundreds of thousands of subjects, enormous amounts of data will need to stored, compressed, and aggregated. Finally, healthcare employers who stand to benefit most from these advances—insurance companies, the federal government, universities, CROs, health systems, etc.—gripe that their workforces often lack the skills to implement them. Many are looking to recruit fresh talent from the technology sector.

Moreover, it is unclear how well the Silicon Valley mindset will mesh with traditional healthcare's. The FDA's slapdown of 23andMe in November 2013 still rankles many a Palo Alto decision maker. It is considerably more difficult to implement the ethos of "move fast and break things" when at any moment the federal government can shut down your business with the stroke of a pen, or even summon the Department of Justice if it so chooses.  

Responding to these pressures, Google co-founder Sergey Brin reportedly groused last year: "Health is just so heavily regulated. It's just a painful business to be in."

Melanie Swan, a consultant and healthcare start-up entrepreneur, believes many of these challenges can be overcome if old-guard institutions like the FDA, with their outmoded thinking, make way for the kind of innovation that has transformed the rest of the world. "There is something about medicine. There's a kind of stuck belief set. It's treated more like a public service, not a business," she says. "I have a finance background, and when I arrived in healthcare and genomics I was astounded at how unsophisticated the disease-modeling mathematics were, compared to, say, risk modeling in the financial world. Even just applying the simplest financial algorithms to the healthcare market is something that has not been done yet."

The Whole Haystack 

On the patient-facing, front-lines of healthcare, among the hospitals, pharmacies, clinics, ORs and exams room, the area that collects the most data per patient is the intensive care unit, according to Kevin R. Ward, a professor at the Department of Emergency Medicine, University of Michigan. "We are taking somewhere between 10,000 and 100,000 points of data per second from each of these patients. They are hooked up to more monitors, they have more medications, more lab work, and more medical imaging than any other type of patient." The ICU also consumes the lion's share of resources, averaging about 40% to 50% of a hospital's budget.

Dr. Ward is also director of the Michigan Center for Integrative Research and Clinical Care, an organization dedicated to partnering with the private sector to harness fields like systems biology and nanoscience in hopes of delivering better care and reducing costs in the ICU.

Although confident new biotechnologies will result in important advances, he is reluctant to open the floodgates to every popular phone app or consumer gadget that can generate a data stream."My attitude is we are taking a sort of null hypothesis approach here, which is the default assumption that none of this big data is going to make any difference. Only then do we move forward and explore to see if that's true."

Why such a grim view? For starters, some of the most time-tested wearable sensors have proven to be inaccurate, a worrying discovery. "I'm not living in the consumer world where if you're off by 10% or 15% it's not going to be life threatening," he notes.

Furthermore, focusing on only one stream of data—such as in the case of monitoring heart rates—provides an excessively narrow clinical view. A wider range of factors would broaden the picture. Was the patient exercising? What was the glucose level? Specifics on age, race, and sex? "Where we get in trouble is only looking at one stream of data," he says. "If you want to find those valuable needles in the haystack, you need the whole haystack."

A controlled, clinical setting such as a hospital, he says, is the best way to study these sensors, at least initially, adding: "If it works in an ICU, it will work in your living room."

The Costs of Failure  

Two widely reported incidents illustrate how these two worlds often talk past each other. Last year an article in Science magazine called attention to the fact that Google Flu Trends (GFT), a data program aimed at predicting levels of flu outbreak in the U.S. by tracking flu-related internet searches, had been for several years wildly overestimating the disease's prevalence. Because GTF was often acclaimed as an exemplar of consumer-driven big data, the authors took Google to task for what they termed "big data hubris" and a lack of scientific vigor.  

The article pointed out that service-oriented sites like Google, Twitter, and Facebook are not always the best sources of objectivity. For one thing, they often suffer from built-in features that function as bugs in the data collection process, a phenomenon the authors dub "blue team" dynamics. In Google's case, the site's recommended searches, which are based on what others are searching for, will increase the relative magnitude of searches, thus skewing results in favor of popular topics.

These sites can also suffer from "red team" dynamics, that is, attempts from outsiders trying to game the system. To prove this, the authors cite instances of Twitter and Facebook being used to spread rumors about stock prices and markets. "The core challenge is that most big data that has received popular attention are not the output of instruments designed to produce valid and reliable data amenable for scientific analysis," the paper concludes.

Conversely, what GFT's imprecision was to science purists, the failed rollout of the Affordable Care Act's website Healthcare.gov was to the digerati—evidence of the other side's fundamental incompetence.  When the site crashed, federal officials were forced to rely on a hastily-assembled team of Silicon Valley top guns to save the day.

In the aftermath, a member of the rescue team was quoted saying: "People are more scared of things here [in Washington, DC]. The costs of failure are perceived as being higher than where we're from."

Cutting to what is perhaps the crux of the matter, journalist Kevin Roose, in New York magazine, responded that if a programmer fails, a website or an app might malfunction. "But if, say, a Department of Defense code base hiccups and leads to a major security breach, it could create a political disaster with far-reaching consequences. Heads would roll; lives could be lost. That's a lot of what makes government IT contractors 'more scared' than start-up daredevils—the costs of failure aren't just 'perceived as being much higher' where they work—they actually are much higher."

Frank Celia is a freelance healthcare writer based in the Philadelphia area.