Understanding Big Data in Healthcare

Big data is no longer a speculation of things to come, but integral to almost every aspect of every business in today’s economy. Collecting information is only the beginning. Using it effectively will be the key differentiator for a successful business. In the healthcare sector, the significance and role of Big data cannot be understated. In this industry, big data issues are pervasive – from basic research, through drug discovery, to clinical trials, to post-market surveillance, and to all of patient care. The article deals with general issues and challenges with particular emphasis on the regularly ignored critical problem of finding key information in the huge volume of published literature.

Information Collection

The first challenge with big data in healthcare is knowing what information to collect. In short, the answer is everything’s possible. Even if the need for a particular type of data is not obvious today, there are so many cases where we find ourselves ruing the lack of historical information that we only now know would have been enormously useful. Of course, collection of certain types of data in this field has the additional challenge of protecting privacy. Implementation of rigorous data collection programs will yield great benefits. As an example, look at the Big Data Analytics program at MD Anderson which promises to leverage patient data to obtain better outcomes. More efforts of this type are critically needed, ideally with some uniformity and the abilities to share relevant aspects of the data through an integrated analysis.

Information Analytics

It is this analytic use of the data that provides the real promise of benefits from big data. With artificial intelligence methods being implemented, there are more and more opportunities to learn from the collected data, thus impacting business decisions and patient care. How can actionable knowledge be gained from the vast stores of data? The general approach is to carry out directed analyses to address specific questions that you know you need to answer. For instance, historical information on response to treatment regimens across a huge number of individuals with different genetic and natural history backgrounds theoretically could be used to define the best treatment option for any new patient. This is, of course, a non-trivial problem and much work remains to truly exploit the value in the relevant big data.

Improving the Use of Published Information

Big data is not limited to patient data and some of these other sources are woefully underutilized. Consider the vast repositories of published health and medical information, including journal articles, patents, treatment protocols, and much more. This published content serves as a critical guide to every aspect of the healthcare industry. 80% of drug targets are identified from published data; design of clinical trials relies on this knowledge store for designing the best protocols; and patient treatment options are known to physicians through these publications. Unfortunately, getting the necessary knowledge from this content historically has been very difficult. Traditional search engines provide lists of results that are difficult to assimilate. And, when those lists have so many irrelevant results, users narrow down their query to get something of interest on the first page. This process almost always excludes critical information. The impact is huge. Almost 90% of clinical trials fail. In many cases, those failures could have been avoided if previously published information had been found and understood. In some cases, a decision might be made to end a project in favor of other opportunities, and in other cases the design of the trial might have been more effective with better understanding of which patients to include or exclude.

The problem of utilizing the published literature is not just the difficulties and time-consuming nature of traditional discovery methods. A more fundamental problem is uncovered by retrospective analyses that very often show that available published information a user could not have known to ask about, would have had a critical impact on decision making. Simply put, how do you answer a question that you didn’t would arise?

Fortunately, a new dawn in the use of published information is at hand. By combining artificial intelligence methods with predictive analytics, Quertle has created a platform that uncovers hidden information. Quertle’s BioAI™ platform supports investigation at the systems biology level, in addition to finding precise answers to specific questions. In short, systems biology refers to perspectives that cover broad systems all at once – for example, all genes and the way they interact. Effective searching of the literature at this deeper level is only possible with AI methods that provide very high relevance. Visual analytics help the user comprehend the results set and serve as an intuitive exploration framework. The user not only can answer the key questions at hand but is also able to see connections that may not have been expected. These “Aha!” moments enable new insights that can help get better drugs to market faster and ultimately save lives.