As plenty of people regularly turn to the World Wide Web as a first-line medical resource, Wikipedia’s page views are therefore able to predict disease outbreaks nearly a month before official health advice, according to a team of US scientists. The Los Alamos National Laboratory team says people are searching online before seeking medical help, so they succeeded in forecasting tuberculosis and influenza outbreaks four weeks in advance.
“Traditional, biologically-focused monitoring techniques are accurate but costly and slow; in response, new techniques based on social Internet data, such as social media and search queries, are emerging,” they write. “These efforts are promising, but important challenges in the areas of scientific peer review, breadth of diseases and countries, and forecasting hamper their operational usefulness.”
So, they set about probing the data in Wikipedia and comparing it to incidence reports coming from the World Health Organization. By comparing the two, they were able to build a model that translated Wikipedia edits into disease incidence around the world. They did that for seven diseases: cholera, dengue, Ebola, HIV/AIDs, influenza, plague, and tuberculosis, in nine different locations, from Haiti to Norway.
Since its launch in 2001, Wikipedia has become the sixth most visited site in the world. Researchers reported the site contains around 30 million articles in 287 languages and it serves roughly 850 million article requests per day. More importantly, it’s a free, open source of data that is gaining traction as an “effective and timely disease surveillance.”
“A global disease-forecasting system will change the way we respond to epidemics,” Dr. Sara Del Valle, lead study author of the Los Alamos National Laboratory in New Mexico, said in a press release. “In the same way we check the weather each morning, individuals and public health officials can monitor disease incidence and plan for the future based on today’s forecast. The goal of this research is to build an operational disease monitoring and forecasting system with open data and open source code. This paper shows we can achieve that goal.”
Moreover the paper shows the potential to transfer models across different regions; that is, one can “train” a computer model using public health data in one location and implement the model in another region. For example, researchers could create models using data from Japan to track and forecast disease in India. This is particularly important for countries that do not offer reliable disease data.