Graph analysis becomes a key component of data science. A lot of things can be modeled as graphs, but social networks are really one of the most obvious examples.
In this post, I am going to show how one could visualize its own LinkedIn graph, using the LinkedIn API and Gephi, a very nice software for working on this type of data. If you don't have it yet, just go to http://gephi.org/ and download it now !
My objective is to simply look at my connections (the "nodes" or "vertices" of the graph), see how they relate to each other (the "edges") and find clusters of strongly connected users ("communities"). This is somewhat emulating what is available already in the InMaps data product, but, hey, this is cool to do it by ourselves, no ?
The first thing to do for running this graph analysis is to be able to query LinkedIn via its API. You really don't want to get the data by hand... The API uses the oauth authentification protocol, which will let an application make queries on behalf of a user. So go to https://www.linkedin.com/secure/developer and register a new application. Fill the form as required, and in the OAuth part, use this redirect URL for instance:
Once done, just go to your app page and click on the one you just created. You'll find your application key and secret, that we'll need later:
We can now start to write a Python script to get an access token. Following the good documentation on LinkedIn's website, just wrap the oauth dance inside a single object:
Note that you'll need to specify your own application key and secret in the script (the "consumer" key and secret).
Now open a Terminal window (I am using Mac OS X), and just start a simple HTTP server that will be used for oauth redirection:
Open a new Terminal tab and run the Python script:
The script will ask you to copy/paste a URL into your browser:
Just paste it as required, authorize the app and it will return the verifier code:
Note that it used the callback URL that we provided during the registration process. Just copy the oauth verifier and go back to your terminal. Type "y" and paste the verifier code:
At this stage, we are ready to query LinkedIn using theses tokens.
OK so now a fun part: get your first degree connections and how they relate to each other. After struggling a bit to find a way to do that, I finally managed to understand that the Search API does what we want:
Note that you'll need again to replace with your own credentials: application key and secret (the "consumer" part) and the oauth tokens that we just got.
Last step before we can use Gephi, a bit of data cleansing, mostly simplifying the nodes labels (your connections name) and deduplicating the edges:
Quite straightforward... And this script outputs a CSV ready to be used in Gephi. This consists in list of edges, a pair of connected user names separated by commas. This is basic data to run graph analysis, no attributes is put on nodes and edges, but that's a good start.
So now we are ready for the fun part: visualization ! See next post :)