Tuesday, September 13, 2011

Culturomics: Computers That Predict The Future? 1

Are there computer models that predict the turn of world events?  Can they predict the actions of large numbers of people? What are geocoding, tone mining, and network analysis?


Culturomics?  What is it?  Wikipedia gives us a very concise explanation:
Culturomics is a form of computational lexicology that studies human behavior and cultural trends through the quantitative analysis of digitized texts. Researchers data mine large digital archives to investigate cultural phenomena reflected in language and word usage. The term is an American neologism first described in a 2010 Science article called Quantitative Analysis of Culture Using Millions of Digitized Books, co-authored by Harvard researchers Jean-Baptiste Michel and Erez Lieberman Aiden. Michel and Aiden helped create the Google Labs project Google Ngram Viewer which uses n-gram's to analyze the Google Book digital library for cultural patterns in language use over time.
The Information revolution has truly spanned an explosion of data that threatens to drown us unless we have means to interpret it.  This is the critical difference between data and information.  Information informs the reader, data merely confronts him.  This information revolution has not only affected the sciences but has not moved into the humanities.  One experiment done by researchers at Stanford University has been to data mine hundreds of letters written from 1700-1750 by famous European writers and thinkers.  The results have been fascinating.  This "Republic of Letters" demonstrates where this correspondence had the most effects.  These scholars, scientists, philosophers, revolutionaries, writers and a host of others, communicated with each other through letters in Europe and America.  This term seems to have been coined by
Pierre Bayle
Pierre Bayle (1647-1708), who was the first of they Encyclopedists who revolutionized learning in Europe and America during the period of the Enlightenment.  He is generally considered to have created the world's first book review journal in 1684 which he called Nouvelles de la republique des lettres.  Although most scholars no longer see a direct connection between the Enlightenment and this correspondence, no doubt these men communicated their ideas to each other which further fueled the movement of ideas.  Google has made this research power available to anyone here.
one graphic of this data mining algorithm
click to enlarge
enlarge

We include a video from Stanford University which helps explains this endeavor.  If you cannot see the embedded video, here is the link: http://youtu.be/nw0oS-AOIPE.
If you're interested in learning how to use this tool from google and like to see it rather than read about it, we provide this video.  If you cannot see the embedded video, here is the link: http://youtu.be/N6P0TYx5-sw.

This research began with the publication of a paper titled, Quantitative Analysis of Culture Using Millions of Digitized Books.  The abstract explained things quite clearly.
We constructed a corpus of digitized texts containing about 4% of all books ever printed. Analysis of this corpus enables us to investigate cultural trends quantitatively. We survey the vast terrain of ‘culturomics,’ focusing on linguistic and cultural phenomena that were reflected in the English language between 1800 and 2000. We show how this approach can provide insights about fields as diverse as lexicography, the evolution of grammar, collective memory, the adoption of technology, the pursuit of fame, censorship, and historical epidemiology. Culturomics extends the boundaries of rigorous quantitative inquiry to a wide array of new phenomena spanning the social sciences and the humanities.
But this idea was logically expanded to begin to model the behavior not only of ideas but of the actions of those who espouse them - human behavior.  After all, if books could be scanned, why not newspapers, magazines, websites, etc.?  Just recently, in September 5th, Kalve H. Leetaru, published an article, titled, Culturomics 2.0: Forecasting large-scale human behavior using global news media tone in time and space.  Again, the abstract of the article does a good job of explaining the concept:
News is increasingly being produced and consumed online, supplanting print and broadcast to represent nearly half of the news monitored across the world today by Western intelligence agencies. Recent literature has suggested that computational analysis of large text archives can yield novel insights to the functioning of society, including predicting future economic events. Applying tone and geographic analysis to a 30–year worldwide news archive, global news tone is found to have forecasted the revolutions in Tunisia, Egypt, and Libya, including the removal of Egyptian President Mubarak, predicted the stability of Saudi Arabia (at least through May 2011), estimated Osama Bin Laden’s likely hiding place as a 200–kilometer radius in Northern Pakistan that includes Abbotabad, and offered a new look at the world’s cultural affiliations. Along the way, common assertions about the news, such as “news is becoming more negative” and “American news portrays a U.S.–centric view of the world” are found to have merit.
The article goes on to detail this idea even further.  Culturomics was based initially to understand "digested history" as in books.  The article however, goes on to point out that, "People take action based on the imperfect information  available to them at the time, and the news media captures a snapshot of the real-time public information environment."  News sources indicate a lot more than just "facts."  The research in this area goes as far back as 1977 with the publication of a paper titled, The Many Worlds of the World's Press, published in the Journal of Communication, by George Gerbner and George Marvanyi.  The 2011 article, citing this 1977 paper states, "News contains far more than just factual details; an array of cultural and contextual influences strongly impact how events are framed for an outlet's audience, offering a window into national consciousness."  They are looking to predict social behavior, "A growing body of work has shown that measuring the 'tone' of this realtime consciousness can accurately forecast many broad social behaviors, ranging from box office sales to the stock market itself."

The central question the paper asks is the same question of this series.  "Can public tone of the global news data forecast even broader behaviors, such as the stability of nations, the location of terrorist leaders, or even offer new insight on conflict and cooperation among countries, as accurately as it predicts movie sales of stock movements?"  We shall find out.

No comments: