Modeling Class Cohesion as Mixtures of Latent Topics

Abstract

The paper proposes a new measure for the cohesion of classes in Object-Oriented software systems. It is based on the analysis of latent topics embedded in comments and identifiers in source code. The measure, named as Maximal Weighted Entropy, utilizes the Latent Dirichlet Allocation technique and information entropy measures to quantitatively evaluate the cohesion of classes in software. This paper presents the principles and the technology that stand behind the proposed measure. Two case studies on a large open source software system are presented. They compare the new measure with an extensive set of existing metrics and use them to construct models that predict software faults. The case studies indicate that the novel measure captures different aspects of class cohesion compared to the existing cohesion measures and improves fault prediction for most metrics, which are combined with Maximal Weighted Entropy.

Publication
Proceedings of the 25th IEEE International Conference on Software Maintenance (ICSM 2009), Edmonton, Canada, Pages 233–242

BibTeX:

@InProceedings{LPF09,
    author    = {Liu, Yixun and Poshyvanyk, Denys and Ferenc, Rudolf and Gyim{\'o}thy, Tibor and Chrisochoides, Nikos},
    title     = {Modeling Class Cohesion as Mixtures of Latent Topics},
    booktitle = {Proceedings of the 25th IEEE International Conference on Software Maintenance (ICSM 2009)},
    year      = {2009},
    pages     = {233--242},
    address   = {Edmonton, Canada},
    month     = sep,
    publisher = {IEEE Computer Society},
    doi       = {10.1109/ICSM.2009.5306318},
    keywords  = {software fault prediction, class cohesion modeling, object-oriented software system, source code, maximal weighted entropy, latent Dirichlet allocation technique, large open source software system},
    url       = {http://ieeexplore.ieee.org/document/5306318/},
}