Monday, February 28, 2011

Minority Report, Positivistism & Chaos Theory

Anomaly Detection at Multiple Scales (ADAMS) is a preview of the film Minority Report but without the precogs.  Will this effort end up in the same way as the system in the film?

All governments want control.  It is inherent to their breed.  They would like the people to calmly trust them and submit to their ideas.  Governments are this way because people are this way and people ,when unchecked with an opposing idealogical force, run rampant with a lust for power.  Modern societies rely on science. Sometimes they use science to give them a veneer of respectability.  Is ADAMS one of these veneers?

This mathematical model that builds profiles for types of criminal personalities has been submitted to DARPA on October of 2010.  According to an official release ADAMS is purposed to:
...characterize graphs containing up to billions of nodes by structural feature sets calculated using recent breakthroughs in graph analytic techniques.  ADAMS will use these features as the basis for novel anomaly detection algorithms.  One of the key questions that ADAMS must answer is what are the right features corresponding to any given data set?  The answer to this question will depend on the data represented by the graphs and will, in many cases, be dynamic reflecting the dynamic nature of the data.  It is, therefore, not practical to hand craft the appropriate features/  ADAMS will need to apply machine learning techniques that will modify graph feature definitions and their application to anomaly detection based on user feedback in response to automatically generated anomaly rankings.
There are critical terms that we must understand in order to fully grasp the scope of this project.  The first one is the "recent breakthroughs in graph analytic techniques."  The second key term is machine learning.

Graph Analytics
click to enlarge
We located an interesting presentation done by Anupam Joshi, an Indian researcher working with IBM, entitled Semantic (Graph) Analytics for Situational Awareness.  In this presentation Joshi states that the purpose of semantic analysis is the "mining of patterns and trends to predict events."
"Simulators are very valuable if you have a very bounded domain that you are simulating. You have to have the rules that they choose to simulate very clear. IF they aren't clear it breaks into mush almost immediately."
George Gilder
These techniques will first be developed by the military and will then filter down to law enforcement in domestic situations.  In this paper Joshi speaks about social networks such as Twitter, Facebook and YouTube as sources for information.  He drew an elaborate chart of how these networks would be used to make predictions on dangerous individuals, humanitarian aid and disaster relief, although we doubt very much if DARPA is funding this for non-military purposes.  Twitter will be one of the prime services that will be analyzed for information since it is free and public information.  The diagram to the left of the page illustrates how it will be done.  The most dangerous aspect of this, besides the obvious flaw to all who understand Chaos Theory of the hopelessness of predicting complex systems such as human behavior, is that "graphical models for unsupervised topic discovery" will be used.  So a machine, based on a faulty reductionist statistical model will be used to decide if some activity or individual falls under "terrorist" or criminal activity will not only be used, but it will be unsupervised.

Joshi also suggests that "influence models be used, integrating NLP, or Neuro-Linguistic Programming."  What are influenced models?  
"Supposing I am able to tell a mother that her 8-year-old has a one in three chance of committing a homicide by age 18. What the hell do I do with that information? What do the various social services do with that information? I don't know."
Richard Berk, Professor of Criminology

Peer Influence Modeling (homiphily)
In a paper entitled Peer Influence Groups: Identifying Dense Clusters in LArge Networks, James Moody from the Department of Sociology at Ohio State University demonstrates a concept of the way certain individuals will influence others, whether for good or bad.  Of course he is kind enough, from the outset (the Abstract) to show us how erroneous his methods will be, by the protocols he will use.  He states that, 
Using software familiar to most sociologists, the method reduces the network to a set of m position variables that can then be used in fast cluster analysis programs. The method is tested against simulated networks with a known small-world structure showing that the underlying clusters can be accurately recovered.
He assumes that if his model will work on a small structure, then it will work on a larger network.  This of course is exactly the opposite.  Very simple networks of any kind may be able to be predicted if there are very few factors to take into consideration, while large networks have completely unpredictable outcomes.  This ruins any model one may make to "predict"ANY outcome.

Another example of how failed these models will be is to be found in the abstract of a paper by Bob Edward Vazquez entitled, Methodological Difficulties of Modeling Peer Influence: A Discussion of OLS, Tobit and CLAD.   Vazquez states that after,
Using data from wave IV of the National Youth Survey, the effect of drug-related peer delinquency is modeled as a function of the bond to peers.  Due to the complexities of such a test, this paper discusses limitations of both the standard linear model and Tobit regression when applied to delinquency data coupled with statistical interactions.
click to enlarge
He then becomes captain of the obvious with his common sense conclusion that, "The results suggest the effect of peers increases with the intensity of bonds to peers."  Did we really need all these strange sounding acronyms for tests to determine this great truth? His suggestion?  To use the absolute deviation model.  Forrester Research will for $499 dollars be glad to sell you their Peer Influence Analysis.  The thesis being that more obvious statements like "people's influence on each other rivals online advertising," or that a "minority of people generate 80% of the impressions."  Is this information really worth $499 dollars?  Does not anyone know from experience that they would listen to someone they know and trust much more than an advertiser who is obviously making a profit from a recommendation?

This might be amusing enough until we realize that these predictive models are being applied to issues of life and death (military), or imprisonment and punishment (law enforcement).  ADAMS is just another model being sold at a profit.  In fact to show just how far short these behavioral predictive models fall short of any reasonable success rate, the better ones, only reach a 75% success rate in terms of criminals repeating a crime.  Although mathematical models are very good at demonstrating our ignorance of the real world, they are unfortunately not good at predicting events or processes in the real world.

Machine Learning
The DARPA proposal for the ADAMS model also proposes that, "ADAMS will need to apply machine learning techniques that will modify graph feature definitions and their application to anomaly detection based on user feedback in response to automatically generated anomaly rankings."  This is a sophisticated way of saying that a computer with "artificial intelligence" will need to modify the detection model based in a users feedback.  Machine intelligence can be defined as,
a branch of artificial intelligence, which is a scientific discipline concerned with the design and development of algorithms that allow computers to evolve behaviors based on empirical data, such as from sensor data or databases. A learner can take advantage of examples (data) to capture characteristics of interest of their unknown underlying probability distribution. Data can be seen as examples that illustrate relations between observed variables. A major focus of machine learning research is to automatically learn to recognize complex patterns and make intelligent decisions based on data; the difficulty lies in the fact that the set of all possible behaviors given all possible inputs is too large to be covered by the set of observed examples (training data). Hence the learner must generalize from the given examples, so as to be able to produce a useful output in new cases.
There will be no human oversight due to the massive volumes of data.  These computers without human oversight, will also have the power to modify the graph definitions.  So the machine would have to make human-like decisions without the human gift of intuition or even meeting the person where one person can gather a great amount of information from human gestures, eye contact, etc.  This will not even be a factor for the computer.  It will be a cold lifeless mathematical decision.  Is this the future we want in our society?  Do we want computers with yet untested theories of artificial intelligence making these grave life or death decisions?  This is not to say that artificial intelligence may never reach these levels, it is only to say that it is not yet ready to perform THIS kind of function.

The Butterfly Effect Strikes Again
One of the tenets of Chaos Theory or non linear systems is that they cannot scale up.  In other words, if it predicts certain data at one time, it will predict different data the next time.  We know why this happens.  It happens because the initial data is not identical from one instance of the process to the other.  This difference in the data is NOT because someone was sloppy in their measurements, but because they are incapable of measuring it precisely each time.  This inability to measure precisely would be true no matter how powerful or sensitive our instruments of measurement were.  So we are NOT surprised to find this statement made in the DARPA report on ADAMS model under, "some of the issues that ADAMS must address are."
1. There are graph algorithms that work on graphs with 1000's of nodes which have recently found not to scale
2. E.g., the measure of betweenness centrality becomes unstable as graphs grow. It becomes sensitive to minor variations in link structure that are well within noise levels of the data.
3. New massive-scale graph analytic techniques that are resilient to noise, sampling bias, and scale are nedded to detect the very weak signal from the background of legitimate behaviors.
When they speak of "noise levels"or "sampling bias" it is a nice scientific way of saying data that they cannot explain.  When these models are applied to criminals or military operations, it translates into people not being granted parole or dangerous criminals being released, or soldiers dying from faulty prediction models. The term "not to scale" means that these models fall apart from any notion of accuracy once a lot of data is fed into them.  These are tell-tale signs of a reductionist outdated scientific method being applied to complex, unpredictable, real-life situations.

We will speak more about this report in a second article in this series.  Minority Report techniques do not work.  Models that predict something as vastly complex as human behavior have, with our present methods of modeling no chance of success.  Indeed they may never have a chance.

1 comment:

Zana said...

Hello. I see that you are talking about modeling, simulation and chaotic systems. Yesterday i found one great open access ( free to download) book “Chaotic systems ”. This book presents a collection of major developments in chaos systems covering aspects on chaotic behavioral modeling and simulation, control and synchronization of chaos systems, and applications like secure communications. It is a good source to acquire recent knowledge and ideas for future research on chaos systems and to develop experiments applied to real life problems. You can find it here: Cheers!