The NYC Analytics X Prize team have been working for less than a week, and I am already very impressed with what we have been able to pull together. We have an impressive interdisciplinary team, and as I have said before, other teams should fear us. All of our work is completely open source; however, so feel free to browse the repository, and ask questions or make contributions.
One of the methods our team is interested in applying to the problem of analyzing and predicting homicide as a social and geographic (zip codes) problem is spatial regression. The primary motivation for this method is to correct for the correlation of errors between predictions for geographic proximal geographic units. It is only useful; however, it is can be shown that spatial relations effect independence of observations. In terms of the Analytics X Prize, the units are zip codes, and proximity is defined as sharing a border, i.e., zip code adjacency.
To begin to address this question we can use a convenient alternative conceptualization of the data; whereby, rather than using a map partitioned by zip codes the city of Philadelphia can be reconstructed as a network of zip codes connected by physical adjacency. That is, if two zip codes share a border than there are connected. By adding the homicide data to this network it is possible to observe what—if any—spillover or transference of this social phenomenon is occurring over time. After the jump are three network visualizations using this method, where each node is a Philadelphia zip code, and nodes are sized by the proportion of homicides reported in that zip code for 2007, 2008 and 2009.

2007 Homicide Counts

2008 Homicide Counts

2009 Homicide Counts
By removing the fixed geography of a map, and allowing zip code adjacency to define the layout of the network several interesting aspects of the social dynamic become clear. First, there appears to be a chain of adjacent zip codes that form the the consistent “core” for homicide. Starting at 12120, and moving down to 10143, these zip codes exhibit a consistently high propensity for homicide. Within this chain; however, and even more interesting process appears to be occurring.
Note the size of the 19132 node in 2007. Then, move to the 2008 figure and notice the reduction in 19132, but the increases in 19121 and 12104. Now, observe the changes in size to all of these nodes in the 2009 figure, as well as the sudden swelling of the 19143 node. There appears to be a downward (with respect to these figures) flow of this social phenomenon through the chain of adjacent zip codes from 19132 to 19143 over the course of 3 years. Of course, without any contextual information it is impossible to explain this, but what it does make clear is that space matters, and this networked conceptualization provides an intuitive way to analyze these dynamics.
What else can be ascertained from these figures?
Automatically Generated Related posts:




These images are great. I now realize that one way to easily use the graph-distance metrics you’ve got is to implement KNN as a predictor.
As for the time trends, are you using raw counts or probabilities?
[Reply]
Drew Conway Reply:
January 16th, 2010 at 9:10 am
Ahh, you beat me to the punch on KNN, I was going to email that to the team when I uploaded the network data today! There are also several flow metrics we could use if we weight the edges by something, e.g., length of border, etc.
I am using the data you scrubbed and put on Git, which I think are probabilities.
[Reply]
Is spatial proximity a good enough measure. Two areas could be adjacant but have a river between them. I would expect such areas to have less in common than their proximity would indicate. Should rivers, main roads etc be considered as a weight on proximity.
[Reply]
Drew Conway Reply:
January 16th, 2010 at 9:38 am
I think it is a good start, but you are absolutely right about other geographic features. In fact, I think proximity to water can be a good proxy for a lot of thing that relate to social behavior, so I am in the process of creating a dummy on interactions where there is water on one of the borders.
[Reply]
[...] Zero Intelligence Agents ยป Homicide in Philadelphia as a Networked Process ~1min (tags: datamining graf prediction) [...]
How did you generate the graphs? Manually or using a tool? And how did you put the data (zip-code adjacency) together?
[Reply]
Drew Conway Reply:
January 18th, 2010 at 8:55 am
I actually hand-coded the data from a map of Philadelphia
[Reply]