

   HHiieerraarrcchhiiccaall CClluusstteerriinngg

        hierclust(a, method=1, bign = F, diagnostics=F, sim=" ",
                     movie=F, option="thru", alpha=1.0, repres="chull",
                     show="change")

        hierclust(dst, method="compact", sim=)

   AArrgguummeennttss::

          a: data matrix to be analyzed, the rows representing
             observations and the columns variables; or, alter-
             natively, lower-diagonal dissimilarity matrix in
             vector or list format.  In latter case, the number
             of values is `n(n-1)/2' where `n' is the number of
             observations.

             (See below for second variant of usage of this
             function.)

             First variant:

     method: First possibility: integer taking the value
             between 1 and 7.  These methods correspond to:
             Ward's minimum variance or error sum of squares
             method (1); single linkage or nearest neighbor
             method (2); complete linkage or diameter (3);
             average linkage, group average, or UPGMA method
             (4); McQuitty's or WPGMA method (5); median,
             Gower's or WPGMC method (6); and centroid or UPGMC
             method (7).  If the `movie' option is used, only
             `method's 1 to 4 are supported.  Second possibil-
             ity: string `compact' (default; complete linkage),
             `average' (average linkage), or `connected' (sin-
             gle linkage).

        sim: structure giving similarities rather than dissimi-
             larities.  This can either be a symmetric matrix
             or a vector with a `Size' attribute.  Only one of
             dissimilarities or similarities may be specified
             (or, of course, as an alternative, a data matrix.

       bign: Is `n' big?  If storage is problemsome, a differ-
             ent implementation of the Ward criterion may be
             tried.  This determines dissimilarities on the
             fly, and hence requires `O(n)' storage.

   diagnostics: show, or suppress, some information directed to
             the command window.  Suppression is useful if
             functional programming capabilities are availed
             of, leading to lazy evaluation; this often results
             in multiple evaluations, with consequent repeti-
             tion of diagnostic information.

      movie: whether interactive display of construction of
             hierarchy is wanted.  This necessitates an open
             plot window.  It does not produce an output
             object.  All remaining arguments apply only in the
             case of the `movie=T'.

     option: (Only for `movie'.) "thru" (default) or "prompt":
             Go right through all n-1 agglomerations, or ask
             the user whether to continue at each agglomera-
             tion?  Here is what we recommend: On the first
             occasion, go right through.  Check out the cluster
             criterion values reported on in the command win-
             dow.  Then using the 'prompt' option, go as far as
             the desired partition, and print it out.

      alpha: (Only for `movie'.) 1.0 (default) or 0.0 < alpha
             <= 1.0.  Only used by the minimum variance agglom-
             erative criterion. Alpha downweights the variance
             component on the principal axis of the cluster,
             and thereby redefines the cluster criterion such
             that linear clusters are preferred.  A value of
             alpha = 1.0 gives exactly the traditional Ward's
             minimum variance method.  See Murtagh and Raftery
             (1984) for more on this altered criterion.

     repres: (Only for `movie'.) repres = "chull" (default),
             "lines", "gauss": Representation of clusters: For
             input data which has dimensionality greater than
             2, only "chull" is allowed (since the representa-
             tions were found to be too problematic, given that
             the clustering takes place in m-dimensions, and
             the the representation is necessarily a 2-dimen-
             sional projection of this).  Representation
             "chull" = convex hull, or straight line in case of
             collinear points.  (We test for collinearity using
             principal components analysis.)  Representation
             "lines" = lines of best fit (least squares fit is
             used, and the extremities are defined by the
             intersection of the min. and max. x values on this
             LS fit line). "gauss" = 1-sigma circles.  Side-
             effect of latter: warning messages indicate when-
             ever circles overlap plot boundaries.

       show: (Only for `movie'.) "change" or "all": In this
             method's present implementation, the clusters at
             each stage are `highlighted' by means of a differ-
             ent symbol.  Option show = "change" has this high-
             lighting changed at each agglomeration, so that
             only the most recent agglomerands are shown.
             Option show = "all" does not turn off the high-
             lighting in this way.

             Second variant (which re-wraps S function
             `hclust': cf. help file on `hclust'):

             Exactly one of dst or sim must be specified.

        dst: a distance structure or distance matrix.  Normally
             this will be the result of the function `dist',
             but it can be any data of the form returned by
             `dist', or  a  full,  symmetric matrix.  Missing
             values are not allowed.

     method: character string giving  the  clustering  method.
             The three  methods  currently implemented are
             "average", "con- nected" (single linkage) and
             "compact" (complete linkage).  (The first three
             characters of the method are sufficient.)

       sim=: structure giving similarities  rather  than  dis-
             tances.  This  can  either be a symmetric matrix
             or a vector with a "Size" attribute.  Missing val-
             ues are not allowed.

   DDeessccrriippttiioonn::

        Agglomerates the closest pair of observations, and
        replaces them with a cluster.  Agglomerates the next
        closest pair of observations or clusters.  Continues
        through `n-1' agglomerations, assuming there are `n'
        observations.

   VVaalluuee::

        When `movie' is not specified in the first variant,
        this is a list describing the hierarchical clustering
        of the observations.  The `movie' option is consider-
        ably slower than alternative options, due to the pro-
        cessing involved.  Hence its use is recommended only on
        small data sets (less than 100 observations).

      merge: an `n-1' by 2 matrix. Row `i' of `merge' describes
             the merging of clusters at step `i' of the clus-
             tering.  If an element `j' in the row is negative,
             then observation `-j' was merged at this stage.
             If `j' is positive then the merge was with the
             cluster formed at the (earlier) stage `j' of the
             algorithm. Thus negative entries in `merge' indi-
             cate agglomerations of singletons, and positive
             entries indicate agglomerations of non-singletons.

     height: a set of `n-1' non-decreasing real values.  The
             clustering "height": that is, the value of the
             criterion associated with the clustering `method'
             for the particular agglomeration.

      order: a vector giving the permutation of the original
             observations suitable for plotting, in the sense
             that a cluster plot using this ordering and matrix
             `merge' will not have crossings of the branches.

             In hierarchical cluster displays, a decision is
             needed at each merge to specify which subtree
             should go on the left and which on the right.
             Since, for `n' observations there are `n-1'
             merges, there are `2^(n-1)' possible orderings for
             the leaves in a cluster tree, or dendrogram.  The
             default algorithm in `hc' is to order the subtree
             so that the tighter cluster is on the left (the
             last, i.e. most recent, merge of the left subtree
             is at a lower value than the last merge of the
             right subtree).  Observations are the tightest
             clusters possible, and merges involving two obser-
             vations place them in order by their observation
             sequence number.

   NNOOTTEE::

        The squared Euclidean distance is used in the `movie'
        case, and in the case when a data matrix is input. Note
        that the `movie' option returns invisibly.

   MMEETTHHOODD::

        In the case of all methods when `bign = F' (default),
        the Lance-Williams dissimilarity update formula is used
        to allow the 7 methods to be implemented.

        In the direct data matrix case, a list of nearest
        neighbors is maintained, which allows the closest pair
        of clusters or observations to be determined by scan-
        ning this list to find the pair of agglomerands on each
        iteration.  Hence `O(n)' operations are required for
        each of `n-1' iterations, i.e.  `O(n^2)' computation
        time in total.  The initial setting up of the `n' near-
        est neighbors also requires `O(n^2)' time.  The pro-
        cessing to be carried out following each agglomeration
        is `O(n)'.  In total, all methods implemented in `hc'
        are of time complexity `O(n^2)'.  Unless `bign = T'
        (see below), storage complexity is `O(n^2)'.  Typical
        times on a Sun SPARCstation 1 for 400x4 and 800x4
        arrays: 12 and 48 secs.

        Ward's minimum variance method aims at finding compact,
        spherical clusters.  The complete link method finds
        similar clusters. The single link method, which is
        closely related to the minimal spanning tree, adopts a
        `friends of friends' clustering strategy.  The other
        methods can be regarded as aiming for clusters with
        characteristics somewhere between the single and com-
        plete link methods.

        In the case of the Ward method, and `bign = T', the
        nearest-neighbor chain algorithm is used, which gives
        an `O(n^2)' computation time algorithm. The input
        matrix is stored, but not the dissimilarities produced.
        Computation time is marginally inferior to the imple-
        mentation of Ward's method when `bign = F' (default).
        Sample times for a 400x4 and an 800x4 array on a Sun
        SPARCstation 1: 12.5 and 50 secs.

   RReeffeerreenncceess::

        A small sample:

        B. Everitt, Cluster Analysis Heinemann Educ. Books,
        London 1974;

        P.H.A. Sneath and R.R. Sokal, Numerical Taxonomy Free-
        man, San Francisco, 1973;

        M.R. Anderberg, Cluster Analysis for Applications Aca-
        demic Press, New York, 1973;

        A.D. Gordon, Classification Chapman and Hall, London,
        1981; and

        F. Murtagh, Multidimensional Clustering Algorithms
        COMPSTAT Lectures 4, Physica-Verlag, Wuerzburg, 1985
        (for algorithmic details of algorithms used).

   SSeeee AAllssoo::

        `plclust' for plotting a dendrogram, and `cutree' for
        cutting.  For finding assignments of objects to clus-
        ters, use `members'.  `cutree', and `dist' are already
        available in S-Plus.  Function `members' is available
        in this function suite.

   EExxaammpplleess::

        # Default Ward criterion
        h <- hierclust(a)
        plclust(h)
        # Real-time display of clustering
        hierclust(a, movie=T)
        # Single link criterion
        h <- hierclust(dist(a), method=connected)
        # Get cluster assignments corresponding to 5-cluster partition:
        k <- members(h)
        k$class[,4]
        # Large `n'
        h <- hierclust(a, bign=T)
        # An example of the incorporated `hclust' function, which allows processing
        # of dissimilarity data:
        hierclust(dist(a))

