1 Introduction
Studying the networks of online software community is fascinating. CPAN Explorer is a typical project aiming at analyzing the relationships in CPAN community [1]. CRAN package dependency network is another excellent source for this type of research. A state-of-art visualization is usually required to understand the network [2].
A common problem of conventional hairball style network visualization is: the graph becomes uninterpretable when it meets very large networks [3]. Researchers developed techniques such as hierarchical edge bundles [4] to tackle this problem. However, that's just too ideal for real world visualization problems. When it's emphasizing the strong connections in the network, the less strong part and the key details could possibly be ignored. Conventional visualization methods have constrained us to take a further step: revealing more hidden information of the internal structure (vertices, connectivity, etc.) in the network.
2 Hive Plots
Martin Krzywinski, author of the circular style genome visualization tool circos, proposed the hive plots in 2010 [5]. The most significant difference between hive plots and traditional layout is: its graphic design is based on the network's meaningful properties (vertices' degree, connectivity, centrality, etc.) instead of aesthetics. This design makes the graph interpretable and thus simplifies the presentation of relational data.
3 The Visualization
We selected 27 representative packages and visualize every three of them in one hive plot to make a 3x3 hive panel. Each panel represents a specific research field. Each node of the network is mapped on the axes by its degree information: green axis represents out-degree, orange axis represents in-degree, and purple axis combines in/out-degrees together. On each axis, outer nodes have higher degrees. The white connections, as the background, show us the overall connectivity of the network: the nodes have higher out-degrees are heavily depended by all ranges of nodes in the network, and the brighter parts of the arcs tend to indicate potential cluster patterns.

Click here to see a larger version.

Click here to see a larger version.
Meanwhile, we highlight three of the interested packages in each research field in one panel with three different colors to reveal its specific connection patterns. For the first panel, green connections represents lattice package. It's a fundamental package for graphic design in R, which is heavily depended by packages of all degrees. The purple connections represent the rgl package. It depends a little but it's depended by much more packages that distributed more discretely on the orange axis than lattice was. Orange lines represent the gplots package, which contains various miscellaneous tools for plotting. Obviously, the dependency patterns indicate its different role between the previous ones: it's more of a handy toolset for plotting, rather than a core package. The upper right panel shows us three of the data import/export packages: DBI, RODBC and RSQLite. Amazingly, althought they play different roles in the whole community, their dependency patterns are almost the same, except for a little difference between their degrees. The central panel, which highlights the finance-related packages fBasics, fOptions, and fGarch, reveals similar features.
Hive plots are relatively much more informative and comprehensive than conventional hairball-style visualizations, especially for large networks. You could discover much more interesting patterns in other panels yourself with this visualization.
The selected packages (ordered by panel 11, 12, 13, 21, 22 …) are:
- Graphics: lattice / rgl / gplots (Green / Purple / Orange)
- Programming: tools / rJava / Rcpp
- Data Import/Export: DBI / RODBC / RSQLite
- GUI Dev Tools & Framework: tcltk / gWidgets / Rcmdr
- Finance: fBasics / fOptions / fGarch
- Machine Learning: e1071 / rpart / randomForest
- Regression Analysis: car / leaps / quantreg
- Spatial and Geo Statistics: sp / maps / fields
- Time Series Analysis: forecast / timeDate / tseries
4 Details
The creation of this visualization is really simple; highly reproducible for anyone who has a little knowledge of SNA [6]:
- The original data was retrieved from
http://cran.r-project.org/bin/windows/contrib/2.13/PACKAGES
on September 14, 2011. We only extracted the 'Depends' section of each package. After parsing and a bit of cleaning, a network consisted of 2,500 vertices and 5,900 arcs was constructed. - To shrink the network, perform k-core analysis and extract the 4-6 cores partition to form a new network, a denser one, with less noise. Now it's reduced to about 600 vertices and 2,500 arcs.
- Draw the shrinked network permuted by degree information with Martin's linnet tool. Each single panel implies a package's degree and dependency distribution properties. Combine the 9 separated hive plots to form a complete hive panel.
References
[1] Julian Bilcke. CPAN Explorer - An Interactive Exploration of the Perl Ecosystem. http://cpan-explorer.org/, 2009.
[2] Xiao Nan. R2S - PKU Vis Summer School. http://www.road2stat.com/cn/statistics/pku_vis_summer_school.html, 2010.
[3] Koon-Kiu Yana, Gang Fanga, Nitin Bhardwaja, Roger P. Alexandera, Mark Gerstein. Comparing Genomes to Computer Operating Systems in Terms of the Topology and Evolution of their Regulatory Control Networks. Proceedings of the National Academy of Sciences, 107 (20): 9186 - 9191, 2006.
[4] Danny Holten. Hierarchical Edge Bundles: Visualization of Adjacency Relations in Hierarchical Data. IEEE Transactions on Visualization and Computer Graphics (TVCG; Proceedings of Vis/InfoVis 2006), Vol. 12, No. 5, 741 - 748, 2006.
[5] Martin Krzywinski. Hive Plots - Linear Layout for Network Visualization - Visually Interpreting Network Structure and Content Made Possible. http://www.hiveplot.com/, 2010.
[6] Wouter de Nooy, Andrej Mrvar, Vladimir Batagelj. Exploratory Social Network Analysis with Pajek. Cambridge University Press, 2005.
[7] J.R. Heard. World Economic Forum Hive Plot. http://www.visualizing.org/visualizations/world-economic-forum-hive-plot/, 2010.