Cluster analysis is a commonly used technique (or set of techniques) for identifying structure in data when such structure is unknown a priori.
More specifically, cluster analysis is the classification of sets of multivariate data into groups or clusters of similar samples. Most standard clustering methods fall into one of two categories, namely (i) partitional methods, and (ii) hierarchical methods.
In partitional clustering, every data sample is initially assigned to a cluster in some (possibly random) way. Samples are then iteratively transferred from cluster to cluster until some criterion function is minimised. Once the process is complete, the samples will have been partitioned into separate compact clusters. Examples of partitional clustering methods are k-means and Lloyd's method.
In hierarchical clustering, each sample is initially considered a member of its own cluster, after which clusters are recursively combined in pairs according to some predetermined condition until eventually every point belongs to a single cluster. The resulting hierarchical structure may be represented by a binary tree or "dendogram", from which the desired clusters may be extracted. Examples of hierarchical clustering methods are the single-link, Ward's, centroid, complete-link, group average, median, and parametric Lance Williams methods. [Continued...]