Force-Directed Graphs: Playing around with D3.js

Update: Newer example of Force-Directed d3.js Graph here: Measure and Visualize Semantic Similarity Between Subgraphs

I recently replaced python-graph in my code with NetworkX, a slightly more sophisticated graph library for Python. Besides some more advanced algorithms for graph analysis (comparison, unison etc.) which can prove useful when analyzing data (comparing human data with mine, for example), I can also easily export my graphs to all kinds of formats. For example, to JSON. As I was getting a bit tired of GraphViz’ stubborn methods, and it’s far from dynamic approach, I decided to start playing around with the excellent Data Driven Documents JavaScript library, better known as D3.js, the successor to Protovis. Actually I had planned this quite a while ago, simply because I was impressed with the Force-directed Graph example on their website. I figured for coolness sake, I should implement them, instead of using the crummy GraphViz graphs.

So after a night and day of tinkering with the D3 code (starting from the Graph example included in the release, modifying stuff as I went) I came to this:

Click to play!

The red nodes are the concepts taken from the texts (either literal: filled red circles, or resulting from text classification: red donuts). The orange nodes are LCS-nodes (Lowest Common Subsumers), aka ‘parent’ nodes, and all the grey ones are simply in-between nodes (either for shortest paths between nodes, or parent nodes).

I added the labels, and also implemented zoom and panning functionality (mousewheel to zoom, click and drag to pan), included some metadata (hover with mouse over nodes to see their URI, over edges to see the relation). I am really impressed with the flexibility of D3, it’s amazing that I can now load up any random graph produced from my script, and  instantly see results.

The bigger plan is to make a fully interactive Graph, by starting with the ‘semantic similarity’ graph (where only the red nodes are displayed), and where clicking on edges expands the graph, by showing the relationship between two connected nodes. Semantic expansion at the click of a mouse ;)!

In other news

I’ve got a date for my graduation! If everything goes right, March 23rd is the day I’ll present my finished project. I’ll let you know once it’s final.

Python graphs and visualizations

To my right is a visualization of the output of my SPARQL-powered shortest path algorithm, finding a link between ‘intracellular and extracellular accumulation’ & ‘developmental and adult structural defect’, 2 concepts in the Mouse Pathology ontology. Click it! It shows the two ‘source’ concepts in white, and the shortest path (of 3 nodes: 4 hops) in grey. It looks like this in python:

[[u’′, u’’, u’′], [u’′, u’’, u’′], [u’′, u’’, u’′], [‘′, u’’, u’′]]

Which is clearly less awesome.

My algorithm generates a directed subgraph of all the nodes and edges it encounters during its search for a path between two ontology concepts. I figured generating this subgraph would make it easier to get some of the variables I need for the semantic similarity measurement calculation (such as amount changes in directions in the path, node’s depth from the root, etc.). Furthermore, I can use the subgraph to more easily assign weights to the textual data surrounding the nodes, when assembling my bag-of-words model of a node’s direct context, as I’ve explained in my previous post.

What to use

There are heaps of libraries for managing graphs in Python, and loads of programs to visualize and manipulate graphs. Here’s my stuff of choice.

After looking at several python-graph libraries ( NetworkX, igraph, graph-tool, etc.) I choose to use python-graph, which in my opinion is the most straightforward, compact, simple yet powerful graph-lib for python. I use python-graph to generate a file describing the subgraph in DOT language (“plain text graph description language”). This file can then be imported in a wide array of programs to visualize and manipulate the graph.

Visualizing the subgraph containing the shortest path between two nodes would allow me to get a better picture of what my algorithm fetches, and also to double-check the results (seeing is believing ;)). To visualize graphs, there are plenty of options again. After sifting through Tulip, Graphviz and some other obscure programs I stumbled upon Gephi, a very complete, pretty and simple open-source graph visualization & manipulation program. It has extensive documentation, and several advanced features to manipulate the graph and fetch some values. Ideally though, I will manage all those ‘advanced value-fetching tasks’ in my python script. Gephi still provides a nice and quick way to double-check some of the output and get a more concrete idea about what’s happening, as things can get pretty complex, pretty fast:

1136 nodes of the DOID-ontology, subgraph produced when finding a link for DOID_4 (disease) and DOID_10652 (Alzheimer's Disease) - red nodes in the graph. Linked by the three yellow nodes.