Uncovering Dependence Clusters and Linchpin Functions

David Binkley, Árpád Beszédes, Syed Islam, Judit Jász and Béla Vancsics 
Dependence clusters are (maximal) collections of mutually dependent source code entities according to some dependence relation. Their presence in software complicates many maintenance activities including testing, refactoring, and feature extraction. Despite several studies finding them common in production code, their formation, identification, and overall structure are not well understood, partly because of challenges in approximating true dependences between program entities. Previous research has considered two approximate dependence relations: a fine-grained statement-level relation using control and data dependences from a program’s System Dependence Graph and a coarser relation based on function-level control-flow reachability. In principal, the first is more expensive and more precise than the second.

Using a collection of twenty programs, we present an empirical investigation of the clusters identified by these two approaches. In support of the analysis, we consider a hybrid cluster type that works at the coarser function-level but is based on the higher-precision statement-level dependences. The three types of clusters are compared based on their slice sets using two clustering metrics. We also perform extensive analysis of the programs to identify linchpin functions – functions primarily responsible for holding a cluster together. Results include evidence that the less expensive, coarser approaches can often be used as effective proxies for the more expensive, finer-grained approaches. Finally, the linchpin analysis shows that linchpin functions can be effectively and automatically identified.