Harvard Researchers Take Dr. Google To Task


Harvard Medical School researchers have published findings from a study designed to measure the diagnostic accuracy of online symptom checkers, and the results were far worse than one might expect.  Researchers initially identified 143 online symptom checkers that they could potentially evaluate, but chose to limit their analysis to symptom checkers that were free and publically available, and that supported general diagnostics and did not limit themselves to certain subsets of conditions. Next, researchers identified the logic and data sources that powered each platform, and excluded any duplicates so that only unique methods were being evaluated. After applying these exclusions, researchers were left with 23 English-language symptom checkers including commercially popular sites like WebMD, and others run by reputable medical organizations like the NHS, Mayo Clinic, and the American Academy of Pediatrics.

Next, researchers used 45 patient profiles, or vignettes, borrowed from medical education programs that presented classic medical diagnoses in the form of reported symptoms. Researchers strategically chose 15 conditions that would require emergency care, 15 that would require non-urgent care, and 15 that were self limiting and did not require medical care. Both common and uncommon conditions were selected to evaluate the depth of the symptom checker’s data sources.

To evaluate the results, researchers noted whether the correct diagnosis was returned as the most likely diagnosis, and also whether it was listed at all within the top 20 suggested results.  After analyzing the performance of all 23 symptom checkers, researchers found that the correct diagnosis was only suggested as the most likely condition 34 percent of the time. The correct diagnosis was recommended first only 24 percent of the time for emergency conditions, but improved to 38 percent of the time for non-urgent conditions that would require some level of medical attention, and 40 percent of the time for self-limiting medical conditions that would not require medical treatment. A correct diagnosis was returned within the top three results 51 percent of the time, and could be found in the top 20 results only 58 percent of the time.

Researchers also evaluated the accuracy of triage recommendations, or whether the platform recommended emergency care when appropriate, versus recommending a next-day office visit, or no medical treatment at all. Appropriate advice was given in 57 percent of tested scenarios, but emergency scenarios resulted in the best performance with 80 percent appropriately recommending immediate medical attention. Researchers found that symptom checkers that used Schmitt or Thompson triage protocols performed the best compared to others.

In general, symptom checkers run by physician associations performed the best, followed by those managed by private companies like WebMD. Symptom checkers managed by health plans and government agencies performed the worst in the study.

Enjoy HIStalk Connect? Sign up for update alerts, or follow us at @HIStalkConnect.

↑ Back to top

Founding Sponsors

Platinum Sponsors