By now, you have most certainly have read about the publication of a massive (72,000+) number of classified documents related to coalition operation in Afghanistan by the whistleblowers group Wikileaks. The data are available in several formats at the Wikileaks dedicated site.
Before proceeding, I want to point out that given the nature by which this information was obtained and subsequently disseminated I am unclear as to the legal protections provided to those in possession of the data (i.e., retaining copies on their hard drives), or performing analysis (i.e., citing data in research). As such, I am not recommending or condoning anyone download the data until these questions are explicitly addressed.
I, however, have downloaded the data and begun examining it at a high-level. I believe such an examination is critical for two reasons: first, this is the first time in history that the public has been given such a granular view of the day-to-day operation of contemporary warfare. With the proper analytical tools, this data may reveal insights to the predicates of conflict in ways that previous aggregate-level data could not. Second, because the data may have gone through some degree of filtering/selection by Wikileaks, an intricate analysis of the data may provide insight into the nature of that selection and the process by which this selection occurred.
After the jump is an initial overall descriptive visualization of the data as it was provided by Wikileaks, with some brief interpretations. Over the next several days and weeks, I hope to examine the data in more detail and periodically present the results.

The above graph displays the volume of reports over the six year period covered by the data set, broken down by the reporting region, e.g, RC SOUTH, RC EAST, etc.; and the target of attack noted in the incident report, e.g., ENEMY, FRIENDLY, etc.
My motivation in creating this chart was to do a very quick assessment of the trends in the data. Given the nature of the reports, we would expect a noticeable degree of seasonality (peaks and valleys) given the natural ebb and flow of war. Any drastic deviations from this expectation could indicate a strong degree of selection on the part of Wikileaks. As you can see, however, the data generally do fit this expectation. Note the dramatic upward trending seasonality present in the heavy reporting areas of RC EAST and RC SOUTH. Perhaps more interestingly, though, is the sudden increase in the number of NEUTRAL reports present in the data for RC EAST and RC CAPITAL for the period roughly between mid-2006 and mid-2008.
Perhaps a more detailed reading of the reports from those areas at that time would reveal information about the nature of the fighting at that time, or the selection process present in the data.
Automatically Generated Related posts:




I could not resist either: http://i25.tinypic.com/33db4zp.jpg
Those are daily event counts
[Reply]
The last bump is probably the presidential election, right?
Also, I wonder what explains the variability in rates of detection/dismantling:
http://i30.tinypic.com/5zp9as.jpg
(note that geom_smooth is probably not best here.)
[Reply]
Drew Conway Reply:
August 1st, 2010 at 12:06 pm
I am glad you are looking at attack types. I was going to wait till later to loo, so we can compare notes
[Reply]
[...] been taking apart by The New York Times, The Guardian, and Der Spiegel, but scientist (e.g., Drew Conway over at Zero Intelligence Agents) are only beginning to engulf themselves in the [...]
nice, looking forward to your findings
[Reply]
[...] a bunch of data about IED explosions in Afghanistan from the WikiLeaks data that just came out. Drew Conway took a look at it and it looked pretty similar to the plots that I made the other [...]
Hmmm…might try letting the fda package (functional data analysis) loose on it, and test that ebb and flow of war theory.
[Reply]
This is very interesting. I am trying to do similar research but in different focus. It would be very interesting to see your findings and it would be very nice to share some raw data.
[Reply]
[...] is publicly accessible, the research and analysis of the data is distributed. On his blog Zero Intelligence Agents, NYU Politics Department grad student Drew Conway has started undertaking a statistical analysis of [...]
[...] is publicly accessible, the research and analysis of the data is distributed. On his blog Zero Intelligence Agents, NYU Politics Department grad student Drew Conway has started undertaking a statistical analysis of [...]
[...] Filed under: Uncategorized by techappsgroup — Comments Off August 9, 2010 Drew Conway took the time to do some pretty good analysis on the Wikileaks data released a few weeks…. Using the free, open source analysis tool called R, he was able to extrapolate some very [...]
[...] publiquement, la recherche et l’analyse des données sont distribuées. Sur son blog Zero Intelligence Agents, Drew Conway, un étudiant en sciences politiques de la New York University a commencé une [...]
[...] Dass das mehr als erschreckend, sondern auch aufschlussreich ist, demonstriert Conway am Beispiel der Anschläge entlang einer afghanischen Ringstraße: Daran könne man die Strategie der Taliban erkennen, die afghanische Regierung zu unterminieren, indem sie Dörfer voneinander abschneiden. Auch wenn die Afghanistan Protokolle nicht die ganze Wahrheit erzählen können – eines lasse sich mithilfe Conways Software auf jeden Fall herauslesen: wie schlimm es zwischen 2006 und 2007 wirklich wurde. [...]
[...] took to the opportunity, downloading and immediately munging it. Several others have done the same (notably Drew Conway), and some of their results are much prettier and possibly more instructive [...]
[...] tour au décorticage des données. C’est par exemple le cas de Drew Conway qui sur son blog Zero Intelligence Agents à commencer une analyse statistique des données et nous livre, avec la collaboration de Mikeal [...]