People use Twitter in a variety of ways. As an academic in training, and part-time blogger, I find Twitter most helpful in introducing me to other bloggers and students that share my interests—that is—how the social sciences, mathematics and computer science combine to affect national security policy. In order to increase these relationships, one must expand their network to following new people. Twitter, however, is quickly growing into a vast universe of Tweets, and it can be difficult to separate the wheat from the chaff when deciding who to follow. Fortunately, through the combination of Twitter’s open API, powerful Python packages, and the techniques of social network analysis, we can find previously hidden friends already very close to us.
The following is a detailed description of how to preform this analysis and illuminate these hidden networks. Before we begin there are two disclaimers; first, this description assumes a slight familiarity with Python, but none with SNA; and two, this method is only feasible for people who at present have a relatively small network on Twitter. Specifically, if you already follow more than 100 people, the method described here will cause you to exceed Twitter’s API rate. My justification here is that if you already follow more than 100 people, you probably do not need any help finding new friends; if you do, request a Twitter whitelisting and proceed.
To compile our list of new people to follow, we will have to accomplish the following task in order:
- Construct our Twitter egonet at depth two (definition to follow)
- Reduce the size of the network from step 1 to include only those individuals with meaningful structure using k-core analysis
- Finally, locate the individuals that are most densely connected with members of your Twitter network, but are not currently in your network (closing triads).
Construct Our Egonet at Depth Two
In network analysis, an egonet is a subgraph that contains the focal node (ego) and all the nodes that ego directly connects to, as well as any common structure among those nodes. Likewise, an egonet at depth two is the composition of the focal node’s egonet, with the egonets of all the nodes from the original egonet. Put simply, this is a network containing you, your friends, the connections among you and your friends, all of your friends’ friends, and the connections among all of them. To gather this data, we will perform two-round snowball search using the python-twitter binding to receive the data, and NetworkX to store and analyze it.
To do so, we will first load the necessary packages, and begin building the network (H/T to Chris Kennedy for introducing me to GitHub, which provides these pretty Python code blocks):
The snowball_build function is going to do all of the work, we just have to tell it who to inspect, and how many rounds to carry out the snowballl search:
Now we build the actual network data. First, we build the seed’s egonet, then take the nodes that search returns, and build their egonets, compose all that data into a single network, and return it:
With the network data stored in memory, we now to save it as a text file. This is useful for several reason, but primarily so we have a back-up copy of the data for future analysis that will not tax out API rate limit. NetworkX makes this very easy:
Complete Python TwitterEgoBuilder available for download here, or online via GitHub here.
Find Meaningful Structure in the Network Using k-core Analysis
If you successfully gathered your network data from step 1, you may notice how unexpectedly large this network is. In my case, my egonet contained 48 nodes (me, plus the 47 people I follow), but the depth two egonet contains 2,465 nodes and 3,127 edges! Let’s inspect how large your network is. To do so, open your Python IDLE and enter in the following lines in order:
The reason the network has ballooned so quickly is that a majority of the structure are pendants and pendant chains. The graph above is an example of a single long pendant chain. For our purposes, however, this structure is useless. We want to locate the people that will help us close triads, and pendants and pendant chain do not have enough associated structure to do this—so let’s remove them. A k-core analysis will identify those individuals with this structure; specifically, all those individuals with a k-core of 2 or higher will be those we will want to keep, everyone else we can eliminate. NetworkX contains a specific module for k-core analysis, so we will load it and create a subgraph of the 2-core:
Note the reduction in the number of nodes and edges, but the increase in the average degree. This tells us that we have eliminated all of the useless structure, and are left with the densely connected core of your Twitter friends, and the people with whom your friends share the most connections. We now have what we need to find your new friends!
Find Your New Friends
We need to get a list of all of your friends’ friends that you currently do not follow. This will tells us the open triads in your network:
The more often a user shows up in our list F the more of our friends that already follow that person. Therefore, we will want to get a frequency count of the names in this list; the higher a user’s count, the more likely we will want to be friends with that person. That is, the more of your friends that follow a person, the more likely it is that this individual shares one of your interests, and the more likely you will want to close that triad. This is a loose interpretation of social balance theory, where we are only analyzing positive ties. This final step is very straightforward:
How to proceeds is up to you. If you found this exercise useful, you will probably want to add the first few names on the list, but you decide where the cutoff is. I hope you have found this exercise enjoyable, and more importantly, I hope it has helped you find new Twitter friends that extend your network in a useful way. As always, I welcome suggestion, comments and questions by e-mail, in the comments, and @drewconway.
Photo: My depth two Twitter egonet at the time of publishing, rendered in 3D by UbiGraph
Automatically Generated Related posts:




Just as a side note you don’t separate wheat from the shaft but the chaff:)
[Reply]
Hello Drew, very interesting article ! I’ve done a python 3D visualization tool for twitter using Ubigraph, the video is here [http://pyevolve.sourceforge.net/wordpress/?p=203]. I’ll release the code soon, maybe you find it useful.
[Reply]
Perone:
Very cool video. I actually used UbiGraph to do some viz while I was working out this code. Let me know when you release your code so I can check it out.
Luke:
Thanks for the heads up, clearly I have never actually separated wheat from anything…
[Reply]
Just wanted to let you know I’m giving this a try and it complained about the “twitter.api()” call;
api=twitter.Api(username=api_user,password=api_pswd)
However, this seems to work;
api=twitter.Twitter(api_user, api_pswd)
[Reply]
FWIW it looks like the twitter python library has changed (or I got a different one when I did “easy_install twitter”).
[Reply]
Ha guy i am a college student taking up programming and i need some help with python do you know any web sites that can help
[Reply]
Ray,
Check out the articles on “Thinking in Python” to get started, and then once you are comfortable check out “Dive Into Python,” for a more thorough reference. Good luck!
[Reply]
Hi Drew,
I was wondering if the algorithm of a two-step-egonetwork drops a lot of connections:
If I say the ego is 0, the friends of the ego are 1 and the friends of the friends are 2. Now for Mr. 0 you are getting all 1s lets call them [1a,1b,1c...] . Then you start with the first 1, aka 1a. For 1a you are getting all the 2s, lets call them 2a,2b,2c… Then you go to 1b and get all of his twos, 2d,2e,2f…
Now wouldnt it be necessary for all of the collected 2s to go through all of their friends lets call them 3s. For all of those 3s we would have to check if one of those is in the network (it could be a 0, 1 or 2) and connect them to those.
I dont see the step in the algorithm, isnt it necessary to not drop a lot of possible connections that are not found?
[Reply]
Very cool, Drew. (And glad to have met you at WIN). Our NetLab has some similar ideas.
[Reply]