Nearest-neighbor chain algorithm

In the theory of cluster analysis, the nearest-neighbor chain algorithm is an algorithm that can speed up several methods for agglomerative hierarchical clustering. These are methods that take a collection of points as input, and create a hierarchy of clusters of points by repeatedly merging pairs of smaller clusters to form larger clusters. The clustering methods that the nearest-neighbor chain algorithm can be used for include Ward's method, complete-linkage clustering, and single-linkage clustering; these all work by repeatedly merging the closest two clusters but use different definitions of the distance between clusters. The cluster distances for which the nearest-neighbor chain algorithm works are called reducible and are characterized by a simple inequality among certain cluster dis

Comment: enIn the theory of cluster analysis, the nearest-neighbor chain algorithm is an algorithm that can speed up several methods for agglomerative hierarchical clustering. These are methods that take a collection of points as input, and create a hierarchy of clusters of points by repeatedly merging pairs of smaller clusters to form larger clusters. The clustering methods that the nearest-neighbor chain algorithm can be used for include Ward's method, complete-linkage clustering, and single-linkage clustering; these all work by repeatedly merging the closest two clusters but use different definitions of the distance between clusters. The cluster distances for which the nearest-neighbor chain algorithm works are called reducible and are characterized by a simple inequality among certain cluster dis
Depiction
Has abstract: enIn the theory of cluster analysis, the nearest-neighbor chain algorithm is an algorithm that can speed up several methods for agglomerative hierarchical clustering. These are methods that take a collection of points as input, and create a hierarchy of clusters of points by repeatedly merging pairs of smaller clusters to form larger clusters. The clustering methods that the nearest-neighbor chain algorithm can be used for include Ward's method, complete-linkage clustering, and single-linkage clustering; these all work by repeatedly merging the closest two clusters but use different definitions of the distance between clusters. The cluster distances for which the nearest-neighbor chain algorithm works are called reducible and are characterized by a simple inequality among certain cluster distances. The main idea of the algorithm is to find pairs of clusters to merge by following paths in the nearest neighbor graph of the clusters. Every such path will eventually terminate at a pair of clusters that are nearest neighbors of each other, and the algorithm chooses that pair of clusters as the pair to merge. In order to save work by re-using as much as possible of each path, the algorithm uses a stack data structure to keep track of each path that it follows. By following paths in this way, the nearest-neighbor chain algorithm merges its clusters in a different order than methods that always find and merge the closest pair of clusters. However, despite that difference, it always generates the same hierarchy of clusters. The nearest-neighbor chain algorithm constructs a clustering in time proportional to the square of the number of points to be clustered. This is also proportional to the size of its input, when the input is provided in the form of an explicit distance matrix. The algorithm uses an amount of memory proportional to the number of points, when it is used for clustering methods such as Ward's method that allow constant-time calculation of the distance between clusters. However, for some other clustering methods it uses a larger amount of memory in an auxiliary data structure with which it keeps track of the distances between pairs of clusters.
Hypernym: Method
Is primary topic of: Nearest-neighbor chain algorithm
Label: enNearest-neighbor chain algorithm
Link from a Wikipage to another Wikipage: Agglomerative hierarchical clustering; Algorithm; Binary tree; Cardinality; Category:Cluster analysis algorithms; Centroid; Closest pair; Cluster analysis; Complete-linkage clustering; Data analysis; Data structure; Disjoint set; Distance matrix; En:k-means clustering; Euclidean space; File:Hierarchical clustering diagram.png; File:Nearest-neighbor chain algorithm animated.gif; Greedy algorithm; Hierarchical clustering; Jean-Paul Benzécri; Maximal element; Metric space; Minimum spanning tree; Nearest neighbor graph; Outlier; Path (graph theory); Phylogenetic tree; Prim's algorithm; Priority queue; Quadtree; Sequential search; Single-linkage clustering; Stack (abstract data type); Stack (data structure); Taxonomic rank; Taxonomy (biology); Triangle inequality; Ward's method
SameAs: gao3; m.0h55x4; Nearest-neighbor chain algorithm; Q17162702
Subject: Category:Cluster analysis algorithms
Thumbnail
WasDerivedFrom: Nearest-neighbor chain algorithm?oldid=1088104637&ns=0
WikiPageLength: 27316
Wikipage page ID: 33068704
Wikipage revision ID: 1088104637
WikiPageUsesTemplate: Template:Good article; Template:Harvtxt; Template:Math; Template:Mvar; Template:Reflist; Template:Short description

Nearest-neighbor chain algorithm

Backlinks

About

Resources

Support

Follow us