Hierarchical clustering

Hierarchical clustering

In data mining and statistics, hierarchical clustering (also called hierarchical cluster analysis or HCA) is a method of cluster analysis that seeks to build a hierarchy of clusters. Strategies for hierarchical clustering generally fall into two categories: * Agglomerative: This is a "bottom-up" approach: Each observation starts in its own cluster, and pairs of clusters are merged as one moves up the hierarchy. * Divisive: This is a "top-down" approach: All observations start in one cluster, and splits are performed recursively as one moves down the hierarchy.

Comment
enIn data mining and statistics, hierarchical clustering (also called hierarchical cluster analysis or HCA) is a method of cluster analysis that seeks to build a hierarchy of clusters. Strategies for hierarchical clustering generally fall into two categories: * Agglomerative: This is a "bottom-up" approach: Each observation starts in its own cluster, and pairs of clusters are merged as one moves up the hierarchy. * Divisive: This is a "top-down" approach: All observations start in one cluster, and splits are performed recursively as one moves down the hierarchy.
Depiction
Clusters.svg
Hierarchical clustering simple diagram.svg
Iris dendrogram.png
Orange-data-mining-hierarchical-clustering.png
Has abstract
enIn data mining and statistics, hierarchical clustering (also called hierarchical cluster analysis or HCA) is a method of cluster analysis that seeks to build a hierarchy of clusters. Strategies for hierarchical clustering generally fall into two categories: * Agglomerative: This is a "bottom-up" approach: Each observation starts in its own cluster, and pairs of clusters are merged as one moves up the hierarchy. * Divisive: This is a "top-down" approach: All observations start in one cluster, and splits are performed recursively as one moves down the hierarchy. In general, the merges and splits are determined in a greedy manner. The results of hierarchical clustering are usually presented in a dendrogram. The standard algorithm for hierarchical agglomerative clustering (HAC) has a time complexity of and requires memory, which makes it too slow for even medium data sets. However, for some special cases, optimal efficient agglomerative methods (of complexity ) are known: SLINK for single-linkage and CLINK for complete-linkage clustering. With a heap, the runtime of the general case can be reduced to , an improvement on the aforementioned bound of , at the cost of further increasing the memory requirements. In many cases, the memory overheads of this approach are too large to make it practically usable. Except for the special case of single-linkage, none of the algorithms (except exhaustive search in ) can be guaranteed to find the optimum solution. Divisive clustering with an exhaustive search is , but it is common to use faster heuristics to choose splits, such as k-means. Hierarchical clustering has the distinct advantage that any valid measure of distance can be used. In fact, the observations themselves are not required: all that is used is a matrix of distances.
Is primary topic of
Hierarchical clustering
Label
enHierarchical clustering
Link from a Wikipage to an external page
web.archive.org/web/20091110212529/http:/www-stat.stanford.edu/~tibs/ElemStatLearn/%7Carchive-date=2009-11-10
www-stat.stanford.edu/~tibs/ElemStatLearn/%7Cchapter-format=PDF%7Caccess-date=2009-10-20%7Cpublisher=Springer%7Clocation=New
cran.r-project.org/web/packages/dendextend/vignettes/Cluster_Analysis.html
archive.org/details/findinggroupsind00kauf
Link from a Wikipage to another Wikipage
ALGLIB
Binary space partitioning
Bounding volume hierarchy
Brown clustering
Category:Cluster analysis algorithms
Category:Network analysis
Cladistics
Cluster analysis
Complete-linkage clustering
Computational phylogenetics
CrimeStat
CURE data clustering algorithm
Dasgupta's objective
Data mining
Dendrogram
Determining the number of clusters in a data set
Distance
Distance matrix
ELKI
Energy distance
Euclidean distance
File:Clusters.svg
File:Hierarchical clustering simple diagram.svg
File:Iris dendrogram.png
File:Orange-data-mining-hierarchical-clustering.png
GNU
GNU Octave
Greedy algorithm
Heap (data structure)
Hierarchical clustering of networks
Hierarchy
Julia (programming language)
K-means clustering
Locality-sensitive hashing
Mathematica
MathWorks
MATLAB
Metric (mathematics)
NCSS (statistical software)
Nearest-neighbor chain algorithm
Nearest neighbor search
Numerical taxonomy
OPTICS algorithm
Orange (software)
Persistent homology
Qlucore
R (programming language)
SAS System
Scikit-learn
SciPy
Single-linkage clustering
SPSS
Stata
Statistical distance
Statistics
Time complexity
Top-down and bottom-up design
UPGMA
Ward's method
Weka (machine learning)
WPGMA
SameAs
Agrupamiento jerárquico
Clustering gerarchico
Grupowanie hierarchiczne
Hierarchical clustering
Hierarchické shlukování
Hierarchische Clusteranalyse
JRtj
m.05 5szj
Multzokatze hierarkiko
Q1277447
Regroupement hiérarchique
Ієрархічна кластеризація
Иерархическая кластеризация
تجميع هرمي
خوشه‌بندی سلسله‌مراتبی
Subject
Category:Cluster analysis algorithms
Category:Network analysis
Thumbnail
Clusters.svg?width=300
WasDerivedFrom
Hierarchical clustering?oldid=1124785561&ns=0
WikiPageLength
23006
Wikipage page ID
477573
Wikipage revision ID
1124785561
WikiPageUsesTemplate
Template:Authority control
Template:Cite book
Template:Div col
Template:Div col end
Template:Machine learning
Template:Redirect
Template:Reflist
Template:Short description