Prototype software can give early warnings of disease or violence outbreaks by spotting clues in news reports.

Newspaper-collection

Newspaper-collection

Researchers are creating software that analyzes 22 years of New York Times archives, Wikipedia and about 90 other web resources to predict future disease outbreaks, riots and deaths — and hopefully prevent them.

The system could someday help aid organizations and others be more proactive in tackling disease outbreaks or other problems, says Eric Horvitz, distinguished scientist and codirector at Microsoft Research. “I truly view this as a foreshadowing of what’s to come,” he says. “Eventually this kind of work will start to have an influence on how things go for people.” Horvitz did the research in collaboration with Kira Radinsky, a PhD researcher at the Technion-Israel Institute.

Software Predicts Tomorrow’s News by Analyzing Today’s and Yesterday’s

Eric Horvitz of Microsoft Research and Kira Radinsky of the Technion-Israel Institute describe their work in a newly released paper, “Mining the Web to Predict Future Events” (PDF).

predeciting future

The system provides striking results when tested on historical data. For example, reports of droughts in Angola in 2006 triggered a warning about possible cholera outbreaks in the country, because previous events had taught the system that cholera outbreaks were more likely in years following droughts. A second warning about cholera in Angola was triggered by news reports of large storms in Africa in early 2007; less than a week later, reports appeared that cholera had become established. In similar tests involving forecasts of disease, violence, and a significant numbers of deaths, the system’s
warnings were correct between 70 to 90 percent of the time.

Horvitz says the performance is good enough to suggest that a more refined version could be used in real settings, to assist experts at, for example, government aid agencies involved in planning humanitarian response and readiness. “ We’ve done some reaching out and plan to do some follow-up work with such people,” says Horvitz.

The system was built using 22 years of New York Times archives, from 1986 to 2007, but it also draws on data from the Web to learn about what leads up to major news events.

All this information provides valuable context that’s not available in news article, and which is necessary to figure out general rules for what events precede others.

One of the problems that the researchers faced in developing their software model is the fact that tragic events in poor African countries are often not widely reported. So they taught the software to generalize somewhat: “Instead of considering only ‘Rwanda cholera outbreak,’ an event with a small number of historical cases, we consider more general events of the form: “[Country in Africa] cholera outbreak.” We turn to world knowledge available on the Web…[that] maps Rwanda to the following concepts: Republics, African countries, Land- locked countries, Bantu countries, etc.”

Horvitz and Radinsky also taught the software what to ignore: It “was able to recognize that the drought experienced in New York City on March 1989, published in the NYT under the title: ‘Emergency is declared over drought’ would not be associated with a disease outbreak…The system estimates that, for droughts to cause cholera with high probability, the drought needs to happen in dense populations (such as the refugee camps in Angola and Bangladesh) located in underdeveloped countries that are proximal to bodies of water.”

Microsoft doesn’t have plans to commercialize Horvitz and Radinsky’s research as yet, but the project will continue, says Horvitz, who wants to mine more newspaper archives as well as digitized books.

Many things about the world have changed in recent decades, but human nature and many aspects of the environment have stayed the same, Horvitz says, so software may be able to learn patterns from even very old data that can suggest what’s ahead. “I’m personally interested in getting data further back in time,” he says.

Source: MIT Technology Review