It is Friday, so I am inclined to take the liberty of posting something outside my usual tag-line areas of interest (see tag-line above). While I was out in San Francisco I met a lot of people interested in applying network analysis to their business areas or data sets. One group was TechCrunch, who had a data set of venture capital firms investments over the past two years. They had performed a some network analysis on it in the past, but were interested in getting my perspective on the data (those circular views can be so uninformative), so they handed it over to me and asked me to have at it. The image in the upper-right and the large circular hierarchical visualization below are the results of this analysis.
These visualizations combine several analytical techniques into a single view. The complexity of these images can be daunting at first; therefore, I highly recommend two things: first, allow me to walk you through how to interpret them before you begin to explore; and second, when you are ready I highly recommend viewing the above image in full-screen mode. Let’s begin!
Description of analysis
The type of network data used in this analysis is what is referred to as bipartite, or two-mode. This means that there are two distinct types of nodes in the data, and nodes of the same type cannot connect to one another. In our case, we have types as VC firms investing in start-ups, where the investment constitutes an edge between a VC and a start-up. To understand the structure of co-investments among VC firms the first step is to convert this two-mode network into a one-mode network; wherein VC firms are directly connected if they have invested in the same start-up. This network will actually help inform us as to how various firms cluster in terms of their investments, and possibly how that strategy manifests as a networked structure.
The conversion requires simple matrix algebra (multiply the two-mode matrix of connections by its transpose), which generates a square matrix of co-investment connections—with the additional benefit being that each non-zero element of the matrix represents the number of times these firms co-invested. This co-investment now represents a relative strength of tie between VC firms. The next step is to try to make sense of the massively complex network that results from this conversion. Specifically, after converting from a two-mode network to a one-mode the VC network contained 2,075 nodes with 9,443 edges—far too large and densely connected to be meaningful. To reduce the complexity of the network I performed a procedure called dichotomization, which will remove any edges that have a value below a certain threshold. In this case, I created a histogram of the edge weights and found that the most logical threshold was 3, and thus any edge between two firms that had co-invested in a start-up less than 4 times was removed. This produced the desired result, leaving me with a main component network of 205 VC firms sharing 390 edges.
Given that the motivation of these firms is presumably to invest in successful start-ups I was curious to see what—if any—natural community structures emerged from the data. Using a hierarchical community detection algorithm, I created a four-level grouping of firms based on their network structure. At each level of the hierarchy firms are more closely related by their co-investments. Next, I wanted to establish a centrality score for each firm based on their structural position as well as the relative strengths of their co-investment ties. To do so I used a centrality metric that took both of these things into account. The next, and final, step is to visually display all this information.
I have been eager to find a good reason to use a circular hierarchical visualization of a network data. There are many examples of this technique around the web web; however, I often find them less than informative. In this case, however, I think the visualization tells a compelling story.
Summary of visualization and interpretation
- Each concentric colored ring represents one of the four hierarchical clusters, with increasing granularity as you move toward the center; therefore, communities of firms become smaller at each progressive ring.
- Firm proximity in the circle corresponds to the similarity of their structure, which in this case is co-investment. That is, firms listed next to one-another are the most similar even if they are in different hierarchical clusters.
- All 205 firms are labeled, and their label size corresponds to their centrality score (denoted as HUB_SCORE in the legend at the lower-right).
- Tie between firms are sized by their weight, i.e., the number of times those firms co-invested
How I interpret this…
There are some really interesting community structures forming here. Most interesting is the community grouped in purple (1 o’clock), which includes DAG Ventures and Benchmark Capital. That community has several strong ties to the three other notable communities; grey (4 o’clock), which includes First Round Capital and Acel Partners; magenta (5 o’clock), which includes Spark Capital and Union Square Ventures; and mustard yellow (9 o’clock), which includes Kleiner Perkins Caufield & Byers. These four communities constitute the core of this network, and within that the purple group is most central. This is also clearly visible in the network visualization in the top-right of this post.
Given this emergent community structure the next step is to begin the consider all of the consequences of this structure:
- What market strategies among firms in the same community is causing these distinct network pattern?
- Why are are there not connections between firms in the grey and mustard yellow group, yet it appears everyone has co-invested with someone in the purple group? What makes these firms so influential?
- Looking ahead, can we say something about the firms in these influential communities that do not yet have a high authority score? Which among them could emerge as the next most successful VC firm?
Unfortunately, these are not questions I am qualified to answer, but I am very curious if anyone reading this has a more informed opinion. I welcome your thoughts in the comments.
Photo: Circular hierarchical network visualization generated using NetworkWorkbench
Automatically Generated Related posts:




Great work. I like your idea of using a lower-bound to restrict the complexity of the network. I have encountered similar problems in my work, but any cutoff seems arbitrary. Is there some accepted way to deal with the complexity problem?
The Crunchbase data is IT-focused, so the VC firms and entrepreneurial firms included are a distinct subset of all financing events. Next, these two papers are the first comprehensive look at the networks of VC firms:
http://papers.ssrn.com/sol3/papers.cfm?abstract_id=923824
http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1294148
Both look at the consequences of a VC firm’s (or fund’s) network status in terms of performance and competition. Reverse causality is a huge problem when studying the relationship between networks and outcomes: do outcomes create networks or the other way around?
I am using the VentureOne database to study the network of VC partners and firms in the context of the creation of new VC firms and social cooperation. The best part about the data is its panel structure: one can compute the network each quarter for 20 years. Unfortunately, I can’t find a lot of research on network evolution. Any pointers?
[Reply]
Drew Conway Reply:
November 22nd, 2009 at 6:43 pm
Thanks for the references, I look forward to reading them (and happy to see other NYU people studying networks).
To your question about what is an appropriate lower-bound for dichotomization, a large part of that is going to depend on the substance of the tie weighting and your own analytical goals. In this case, I simply created a histogram of tie weights and saw that I could effectively uncover the core of the network by removing all co-investment structure below 3.
In terms of network evolution, there is not a lot of work done, especially from a panel perspective. Carter Butts stands out among network researchers for his work in dynamic and evolving networks, so I would suggest starting with his work and going from there.
http://erzuli.ss.uci.edu/~buttsc/
[Reply]
[...] What I am reading Visualizing the Structure of Venture Capital Co-Investments [...]
[...] Venture Capital firms often invest with each other multiple times, developing relationships and cliques. Analyzing those relationships and the network they form is a valuable exercise. I supplied CrunchBase data to Drew Conway, a PhD student in Political Science, at NYU. He blogs at Zero Intelligence Agents. Drew has made a first post about the data, which you can find here. [...]
[...] Zero Intelligence Agents ยป Visualizing the Structure of Venture Capital Co-Investments (tags: visualization vc finance networks) Possibly related posts: (automatically generated)My daily readings 08/27/2009My daily readings 06/26/2008test 03/31/2008 [...]