In the previous post, we learnt how to get data out of LinkedIn via its API. This task is quite technical but an entire component of every data science projects: accessing and manipulating data from a wide variety of data sources. In this post, we're going to do the fun part of ou graph analysis project: building the visualization of the LinkedIn's connections graph.
Just start Gephi, and under File :: Open select the CSV we created during the previous step.. This CSV is just a list of edges, and Gephi will open it without problem:
Note that the graph is set as "Undirected", because LinkedIn relations work in both way. My own graph is made of 449 connections (nodes) and 2828 relations (edges).
Click OK and a very raw graph is popping up:
This graph isn't very helpful and hard to understand, so the first think is to use a spatialization method to put it into a more friendly shape. Several methods are available, either built-in or via plugins, and you'll have to select one based on the size of the graph or what you want to achieve. Here a classical Force Atlas algorithm, a little bit expanded, will lead to this layout:
Note that this nice formatting is available through the Preview panel, where you can select between different styles.
At this stage, the layout algorithm already provides insight about the potential communities within my graph. It automatically groups more densely connected nodes together, while pushing the groups away from each other. It is already obvious that at least 2 large groups of connections are present.
Now if we want to go further, we can use one of the statistics provided in Gephi: modularity. It is a method for detecting communities or clusters of nodes within a graph. It will group together nodes that are more strongly connected than they would have been in a random graph. The modularity is available is the Statistics panel of Gephi. Setting the resolution parameter (a small resolution will detect smaller communities) a little bit lower than the default value, the algorithm detects 8 communities in my network, which actually makes sense. Using the "Partition" panel, we can assign a color to each node depending on its modularity class. The result is this graph:
That's interesting. It's quite easy to put a label on several groups. Using exploration functionalities in Gephi, I can browse the actual people under each node and recreate my profesional network and experiences:
The clustering works very well and provides accurate results. I could have used a lower resolution to split the communities deeper (mostly the red block), but as is, it is already useful.Concluding note:
Graphs become prevalent in a lot of sectors. Great tools like Gephi make it very simple to deal with small scale graphs, like a user's LinkedIn network (once you have the data, which is as usual the hardest part of the job). Exploring communities is a powerful tool for applications with a social / interaction component. It is used to adapt products or services to different type of communities, or for "tribal" marketing when you don't want to act on one user only but on the group he belongs to. Of course, several other applications of graph analysis (like influencer / central node identification, recommendation systems...) can be developed !
Working with graphs at scale (millions of nodes) is a different deal. Other technologies and methods must be used, but the applications can be very valuable. Dataiku already successfully used graph analysis to support our clients businesses at large scale. Drop us a note if you want to know more !