Institute of Informatics
Acta Cybernetica
Past Issues
Volume 15, Number 2, 2001
Smallsteps: an adaptive distance-based clustering algorithm
# Smallsteps: an adaptive distance-based clustering algorithm

**Gy. Koch and József Dombi**

### Abstract (in LaTeX format)

In this article we propose a new distance-based clustering algorithm. Distance-based clustering methods operate on data sets that are in similarity space, where the similarities/dissimilarities between the objects are given by a matrix. These algorithms have at least $O(n^2)$ time complexity, where $n$ is the number of objects. One of the latest distance-based method is Chameleon which, according to experiences, works well only on larger data sets and fails on relatively smaller ones. This contraditcs the fact that the $O(n^2)$ time complexity makes the distance-based algorithms unsuitable for huge data sets. Thus we developed a new distance-based method (SmallSteps), which can handle relatively small amount of objects too. In our solution we are looking for connected graphs which have edges with a maximum weight computed on the environments of the objects. The method is capable to detect clusters with different shapes, sizes or densities, it is able to automatically determine the number of clusters and has a special ability to divide clusters into sub-clusters.

### Full text

Available electronic editions: PDF.

### DOI

DOI is not available for this article.

### BibTeX entry
`
@article{Koch:2001:ActaCybernetica,`

author = {Gy. Koch and J{\'o}zsef Dombi},

title = {Smallsteps: an adaptive distance-based clustering algorithm},

journal = {Acta Cybernetica},

year = {2001},

volume = {15},

pages = {241--256},

number = {2},

abstract = {In this article we propose a new distance-based clustering algorithm. Distance-based clustering methods operate on data sets that are in similarity space, where the similarities/dissimilarities between the objects are given by a matrix. These algorithms have at least $O(n^2)$ time complexity, where $n$ is the number of objects. One of the latest distance-based method is Chameleon which, according to experiences, works well only on larger data sets and fails on relatively smaller ones. This contraditcs the fact that the $O(n^2)$ time complexity makes the distance-based algorithms unsuitable for huge data sets. Thus we developed a new distance-based method (SmallSteps), which can handle relatively small amount of objects too. In our solution we are looking for connected graphs which have edges with a maximum weight computed on the environments of the objects. The method is capable to detect clusters with different shapes, sizes or densities, it is able to automatically determine the number of clusters and has a special ability to divide clusters into sub-clusters.}

}