Researchers Use Twitter Data To Predict Heart Disease Risk


Researchers from the University of Pennsylvania have discovered that Twitter data analysis can yield a treasure trove of highly-accurate population health data that outperforms modern risk assessment tools in its ability to forecast localized, disease-specific mortality rates. The study, which was published in the January issue of Psychology Science, is a compelling addition to a growing body of evidence that suggests that social media data analysis could become a powerful tool in the future of population health.

Starting in 2013, a team comprised of psychologists and computer scientists from the University of Pennsylvania began building an algorithm that could systematically sort tweets based on the tone and emotional nature of the message. To do this, a library of keywords was created that the algorithm used to match tweets with relevant positive and negative emotions. (Note: If you haven’t seen a peer-reviewed research paper that systematically lists out every blush-inducing curse word in the modern English vocabulary, this is your chance.)

Next, the team used the algorithm to analyze 146 million tweets from the Twitter “garden hose” and, using geo-location data from the tweets, plotted the findings on a map. This map gave researchers a glimpse into the average emotional state of Twitter users, down to the county level.


With this data in hand, researchers began to analyze the data from a public health perspective. The experiment was designed to assess community-level risk of heart disease because hostility and chronic stress are both well known risk factors for heart disease, and researchers felt that they could accurately measure these sentiments from Twitter data.  The results proved this hunch to be correct.

Controlling for age, income, and education, researchers found a strong correlation between their data and heart disease mortality rates in each community. Communities with more negative tweets had higher heart disease mortality rates, while communities with more positive tweets had lower heart disease mortality rates. The team compared the accuracy of the predictions its data supported with predictions derived from a modern 10-point risk assessment tool that considers demographics, socioeconomics, and relevant health factors, including: smoking status, diabetes, hypertension, and obesity.  Researchers found that the Twitter analysis predicted heart disease “significantly better” than the standard assessment.

The findings are important because to date, there are no low-cost methods of capturing intangible risk factors like hostility and chronic stress at the community level. Analyzing social media presents researchers with an entirely new avenue to assess the underlying psychological environment in communities with high levels of disease, and could one day lead to behavioral population health initiatives aimed at curbing chronic disease in these communities.

Enjoy HIStalk Connect? Sign up for update alerts, or follow us at @HIStalkConnect.

↑ Back to top

Founding Sponsors

Platinum Sponsors