Nearest-neighbor chain algorithm

Nearest-neighbor chain algorithm

In the theory of cluster analysis, the nearest-neighbor chain algorithm is an algorithm that can speed up several methods for agglomerative hierarchical clustering. These are methods that take a collection of points as input, and create a hierarchy of clusters of points by repeatedly merging pairs of smaller clusters to form larger clusters. The clustering methods that the nearest-neighbor chain algorithm can be used for include Ward's method, complete-linkage clustering, and single-linkage clustering; these all work by repeatedly merging the closest two clusters but use different definitions of the distance between clusters. The cluster distances for which the nearest-neighbor chain algorithm works are called reducible and are characterized by a simple inequality among certain cluster dis

Comment
enIn the theory of cluster analysis, the nearest-neighbor chain algorithm is an algorithm that can speed up several methods for agglomerative hierarchical clustering. These are methods that take a collection of points as input, and create a hierarchy of clusters of points by repeatedly merging pairs of smaller clusters to form larger clusters. The clustering methods that the nearest-neighbor chain algorithm can be used for include Ward's method, complete-linkage clustering, and single-linkage clustering; these all work by repeatedly merging the closest two clusters but use different definitions of the distance between clusters. The cluster distances for which the nearest-neighbor chain algorithm works are called reducible and are characterized by a simple inequality among certain cluster dis
Depiction
Hierarchical clustering diagram.png
Nearest-neighbor chain algorithm animated.gif
Has abstract
enIn the theory of cluster analysis, the nearest-neighbor chain algorithm is an algorithm that can speed up several methods for agglomerative hierarchical clustering. These are methods that take a collection of points as input, and create a hierarchy of clusters of points by repeatedly merging pairs of smaller clusters to form larger clusters. The clustering methods that the nearest-neighbor chain algorithm can be used for include Ward's method, complete-linkage clustering, and single-linkage clustering; these all work by repeatedly merging the closest two clusters but use different definitions of the distance between clusters. The cluster distances for which the nearest-neighbor chain algorithm works are called reducible and are characterized by a simple inequality among certain cluster distances. The main idea of the algorithm is to find pairs of clusters to merge by following paths in the nearest neighbor graph of the clusters. Every such path will eventually terminate at a pair of clusters that are nearest neighbors of each other, and the algorithm chooses that pair of clusters as the pair to merge. In order to save work by re-using as much as possible of each path, the algorithm uses a stack data structure to keep track of each path that it follows. By following paths in this way, the nearest-neighbor chain algorithm merges its clusters in a different order than methods that always find and merge the closest pair of clusters. However, despite that difference, it always generates the same hierarchy of clusters. The nearest-neighbor chain algorithm constructs a clustering in time proportional to the square of the number of points to be clustered. This is also proportional to the size of its input, when the input is provided in the form of an explicit distance matrix. The algorithm uses an amount of memory proportional to the number of points, when it is used for clustering methods such as Ward's method that allow constant-time calculation of the distance between clusters. However, for some other clustering methods it uses a larger amount of memory in an auxiliary data structure with which it keeps track of the distances between pairs of clusters.
Hypernym
Method
Is primary topic of
Nearest-neighbor chain algorithm
Label
enNearest-neighbor chain algorithm
Link from a Wikipage to another Wikipage
Agglomerative hierarchical clustering
Algorithm
Binary tree
Cardinality
Category:Cluster analysis algorithms
Centroid
Closest pair
Cluster analysis
Complete-linkage clustering
Data analysis
Data structure
Disjoint set
Distance matrix
En:k-means clustering
Euclidean space
File:Hierarchical clustering diagram.png
File:Nearest-neighbor chain algorithm animated.gif
Greedy algorithm
Hierarchical clustering
Jean-Paul Benzécri
Maximal element
Metric space
Minimum spanning tree
Nearest neighbor graph
Outlier
Path (graph theory)
Phylogenetic tree
Prim's algorithm
Priority queue
Quadtree
Sequential search
Single-linkage clustering
Stack (abstract data type)
Stack (data structure)
Taxonomic rank
Taxonomy (biology)
Triangle inequality
Ward's method
SameAs
gao3
m.0h55x4
Nearest-neighbor chain algorithm
Q17162702
Subject
Category:Cluster analysis algorithms
Thumbnail
Hierarchical clustering diagram.png?width=300
WasDerivedFrom
Nearest-neighbor chain algorithm?oldid=1088104637&ns=0
WikiPageLength
27316
Wikipage page ID
33068704
Wikipage revision ID
1088104637
WikiPageUsesTemplate
Template:Good article
Template:Harvtxt
Template:Math
Template:Mvar
Template:Reflist
Template:Short description