Twitter Launches API Access To Its Entire Firehose: What It Means for Healthcare and Health IT

1-29-2014 10-14-18 PM

Twitter unveils a new API that allows enterprise customers of its existing data analytics services unfettered access to its entire history of more than 500 million tweets. The new service, called the Full-Archive Search API, provides businesses with the ability to selectively search Twitter’s full historical content through a RESTful API that supports rules syntax. The new API processes executable queries that search as far back as Twitter’s first tweet, on March 21, 2006. Searches can target unique key words and phrases, and can filter results to help users target tweets published from within certain geographical boundaries or tweets composed within certain timeframes. The enhancement was deployed with advertising and product management clients in mind, but unleashing such a massive social data set will undoubtedly have significant implications for public health researchers, as well as for payers and health systems exploring new ways of quantifying and managing population health.

Social data has contributed to a number of compelling health-focused data analytics research projects in the past several years. In 2013, following the path that Google Flu Trends established, researchers at Johns Hopkins turned to Twitter data to develop an algorithm to monitor flu trends in real time. The goal of the project was to see if search queries could be developed that were sophisticated enough to identify tweets composed by people that likely had the flu, but filter out spikes in flu-related tweets caused by seasonal media coverage. The team was successful and published results favorably comparing their own data to CDC flu trends.

Soon after, a separate project launched in New York City that attempted to discern the positive, negative, or neutral effect that more than 70 key data elements had on health. The team developed search queries that tracked how often people go to the gym, visit a particular restaurant, or take public transportation, and then compared each of those data points to see if there were correlations with how much more or less often those people got sick. This work represents one of the earliest efforts to use Twitter data to support a broad population health analysis program.

By 2014, Twitter data was underpinning significant public health data analytics efforts, including Harvard Medical School’s Project HealthMap, a syndromic surveillance platform that consolidates data from a number of sources, including Twitter, to track outbreaks across the globe. HealthMap has the notable distinction of identifying the 2014 Ebola outbreak a full two weeks before the World Health Organization’s tracking tools noticed a problem. HealthMap was able to access Twitter’s full firehose of real-time and historical data because it was chosen as one of Twitter’s Data Grant recipients, a program launched by Twitter in 2014 to give researchers working on important social projects unrestricted access to its data.

Researchers at the Center for Statistics and the Social Sciences at the University of Washington also found success embedding Twitter data into population health surveillance programs. A team at the university developed an algorithm that it hoped would be able to identify depression in individual users, by analyzing a user’s full twitter history and evaluating variance in overall volume of tweets, what times individuals tweeted, and how frequently they engaged other users directly. The team also searched for keywords that had a strong correlation with depression. Early algorithms have accurately identified depression 70 percent of the time, and a new paper outlining improvements to these results is scheduled to run in Sociological Methods and Research later this year. Lead researcher Tyler McCormick explains, “Our attitude is that Twitter is the largest observational study of human behavior we’ve ever known, and we’re working very hard to take advantage of it.”

Many other public health-focused research projects have turned to Twitter for data over the past two years, with varying levels of success and failure. Researchers at the University of Rochester in New York successfully used Twitter to track food poisoning outbreaks back to unhygienic restaurants that were making their patrons sick.  Earlier this year, researchers from the University of Pennsylvania published research findings in the January issue of Psychology Science that demonstrated how algorithms trained to scrub Twitter data could calculate population-level heart disease mortality rates better than existing methods.

It will be interesting to see what the public health sector does, and what enterprise clients like payers and health IT companies like Health Catalyst are able to do with this new API and data source. Blending health and social data into a unified platform could deliver powerful tools for the healthcare system.


Enjoy HIStalk Connect? Sign up for update alerts, or follow us at @HIStalkConnect.

↑ Back to top

Founding Sponsors

Platinum Sponsors