Why we must integrate data to generate meaningful data analysis

In our last post, we talked about the tricky business of finding the right data to answer your question.

Sometimes, we can use off-the-shelf data to answer simple questions.

The challenge we often face, however, is that no one data set has everything we need. If only someone would create a complete data set that would have all of the characteristics we would like to leverage…

That doesn’t exist, of course, so we have to create those data sets ourselves.

When we need data from several different databases, we have to integrate.

And why is integrating so important?

Often, when we meet with a new potential client, they ask us what source (note – not sources) of data we use. They are used to working with one large data-collection company that also offers analysis.

If they obtain their analytics from a company that uses just one source of data, then the results are naturally skewed to the strengths and weaknesses of that data set.  Why not look at multiple data sets to enhance the analysis? Why not integrate multiple claims databases, and then add in electronic health records?

It is not easy, of course, to get data that has been collected and curated by different organizations to talk to one another. With the right approach, however, virtually any data can be integrated. Vencore, in its work in the defense space, has been enabling satellites, GPS, airplanes, submarines and rockets to communicate with one another for decades. We are applying that same expertise to the healthcare space.

Our skills are data agnostic: whether we are looking at a data stream from a mapping system or 140 million unique lives in a claims database, we can apply our methodology. We have co-authored papers with Vladimir Vapnik, the inventor of support vector machines. We have helped to create knowledge transfer techniques using the algorithm known as Learning Using Privileged Information. We have a suite of machine learning models that can be aggregated into ensemble learning techniques.

Knowledge transfer is a particularly helpful technique in combining data sources.  At Vencore Labs, we have developed a number of algorithms that extract knowledge from one database, and incorporate it into another. By incorporating knowledge from one database into models built on a different database we can obtain better performance. 

We can extract real value.

I suppose it is no surprise that everyone thinks their own data set is the best. And they might well be right – for one parameter. But it is so rare that one data set truly can do everything we need.

Let’s stop arguing about which bit of data is best. The real answer is that using all of it, in an integrated fashion, is best.

Tara Grabowsky, MD
Chief Medical Officer