The Limits of Google’s Flu Tracker

Big data is everywhere. It has tons of potential if applied the right way. I wrote a post on this just last month. Apparently and maybe not surprisingly it was all the rage at SXSW last week. Big data is big in and out of healthcare and is maybe one of the biggest buzzwords out there.

Below is what Google told me Big Data means:

bigdata

Last week, Science published a critical editorial on the results of Google Flu Tracker (GFT). I don’t have access to full articles on Science any more, so all I could do was read this interpretation of the article by Time. I miss having academic access sometimes and not being able to access the full-text primary sources.

Google Flu Tracker got a lot of attention when it launched. It’s billed as an analytics tool that uses Google search data to predict flu incidence. It was built and tested against historical data. 

GFT has been billed as an example of the power of big data, apparently often cited as an example. The very cool thing about GFT is that it infers valuable flu data from things that people are already doing, in this case Google searches for flu-related terms.

What would make the GFT even cooler is if it was accurate. Accuracy seems to be a big problem for the GFT, according to the Science article above. GFT over-predicted flu by over 50 percent in 2011-2012 and 2012-2013. That’s a poor performance. I’m surprised it’s so off from the actual amount of flu in the US. Google created it to be an improvement over the CDC’s model for flu prevalence, but it doesn’t seem to be ready for prime time.

However, it’s not time to write off the GFT or to use it as a proxy for the failure and limitations of big data. The GFT is more of a toy at this point and should be seen that way. With most big data initiatives in health and wellness, it’s so early that it’s not ready for prime time. But it’s a good starting point, and the GFT is a great example of what can potentially be done in public health with big data and passive browsing data. 

GFT will get better as Google learns and improves the algorithms. Like all things Google, it may be retired at some point if it doesn’t fit Google’s overall mission and business. Google Wave and Health are the major examples in this category.

It may be that the GFT will never be very good at predicting flu, but it should have enough data to get close. Web browsers and searchers are not reflective of actual disease conditions. They are highly reflective of symptoms.

The power of Google and its vast amount of data in public health will be in intelligently mapping those symptom searches, weighing the frequency, tracking the geography, and then coming up with an accurate forecast of incidence. The percent of people searching on flu does not equal the incidence of flu, but it’s logical that it would correlate to incidence once controlled for factors like the hypochondria of the general public.

The real problem with the accuracy of Google data — and this is bigger than public health — is one of my favorite enhancements to search: guided search. Guided search is awesome! Google knows what I’m looking for, or at least the most common related search, before I’m done typing it. You can also use Google to find common relations. I use Google all the time to search for related and competitive companies. Try typing the name of a company then "vs" into Google to see what you get. You can also tell the most common way people search for things.

What’s really interesting about guided search is that it goes beyond the Google mission of organizing the world’s information and influences and directs the way people think about the world’s information. It’s fascinating to consider the impact this has on data and access. As I said, I love the guidance from Google, but I wonder if it has an impact on the value of Google’s search data? I also love guided search on Amazon, and heavily rely on Amazon to direct me to the product for which I’m searching.

Google has tons of information about individuals, including health information. I’m excited to see experiments like GFT and others that use vast amounts of web browsing data. But they are just that today — experiments. Hopefully they will inform future data initiatives.

TGphoto

Travis Good is an MD/MBA and co-founder of Catalyze. More about me.

↑ Back to top

Founding Sponsors

Platinum Sponsors