

   PPaarrttiittiioonniinngg AArroouunndd MMeeddooiiddss

        pam(x, k, diss = F, metric = "euclidean", stand = F)

   AArrgguummeennttss::

          x: data matrix or dataframe, or dissimilarity matrix,
             depending on the value of the `diss' argument.

             In case of a matrix or dataframe, each row corre-
             sponds to an observation, and each column corre-
             sponds to a variable. All variables must be
             numeric.  Missing values (NAs) are allowed.

             In case of a dissimilarity matrix, `x' is typi-
             cally the output of `daisy' or `dist'. Also a vec-
             tor with length n*(n-1)/2 is allowed (where n is
             the number of observations), and will be inter-
             preted in the same way as the output of the above-
             mentioned functions. Missing values (NAs) are not
             allowed.

          k: integer, the number of clusters.  It is required
             that 0 < k < n where n is the number of observa-
             tions.

       diss: logical flag: if TRUE, then `x' will be considered
             as a dissimilarity matrix. If FALSE, then `x' will
             be considered as a matrix of observations by vari-
             ables.

     metric: character string specifying the metric to be used
             for calculating dissimilarities between observa-
             tions.  The currently available options are
             "euclidean" and "manhattan".  Euclidean distances
             are root sum-of-squares of differences, and man-
             hattan distances are the sum of absolute differ-
             ences.  If `x' is already a dissimilarity matrix,
             then this argument will be ignored.

      stand: logical flag: if TRUE, then the measurements in
             `x' are standardized before calculating the dis-
             similarities. Measurements are standardized for
             each variable (column), by subtracting the vari-
             able's mean value and dividing by the variable's
             mean absolute deviation.  If `x' is already a dis-
             similarity matrix, then this argument will be
             ignored.

   DDeessccrriippttiioonn::

        Returns a list representing a clustering of the data
        into `k' clusters.

   DDeettaaiillss::

        `pam' is fully described in chapter 2 of Kaufman and
        Rousseeuw (1990).  Compared to the k-means approach in
        `kmeans', the function `pam' has the following fea-
        tures: (a) it also accepts a dissimilarity matrix; (b)
        it is more robust because it minimizes a sum of dissim-
        ilarities instead of a sum of squared euclidean dis-
        tances; (c) it provides a novel graphical display, the
        silhouette plot (see `plot.partition') which also
        allows to select the number of clusters.

        The `pam'-algorithm is based on the search for `k' rep-
        resentative objects or medoids among the observations
        of the dataset. These observations should represent the
        structure of the data. After finding a set of `k'
        medoids, `k' clusters are constructed by assigning each
        observation to the nearest medoid. The goal is to find
        `k' representative objects which minimize the sum of
        the dissimilarities of the observations to their clos-
        est representative object.  The algorithm first looks
        for a good initial set of medoids (this is called the
        BUILD phase). Then it finds a local minimum for the
        objective function, that is, a solution such that there
        is no single switch of an observation with a medoid
        that will decrease the objective (this is called the
        SWAP phase).

   VVaalluuee::

        an object of class `"pam"' representing the clustering.
        See `pam.object' for details.

   BBAACCKKGGRROOUUNNDD::

        Cluster analysis divides a dataset into groups (clus-
        ters) of observations that are similar to each other.
        Partitioning methods like `pam', `clara', and `fanny'
        require that the number of clusters be given by the
        user.  Hierarchical methods like `agnes', `diana', and
        `mona' construct a hierarchy of clusterings, with the
        number of clusters ranging from one to the number of
        observations.

   NNOOTTEE::

        For datasets larger than (say) 200 observations, `pam'
        will take a lot of computation time. Then the function
        `clara' is preferable.

   RReeffeerreenncceess::

        Kaufman, L. and Rousseeuw, P.J. (1990).  Finding Groups
        in Data: An Introduction to Cluster Analysis.  Wiley,
        New York.

        Anja Struyf, Mia Hubert & Peter J. Rousseeuw (1996):
        Clustering in an Object-Oriented Environment.  Journal
        of Statistical Software, 1.  <URL:
        http://www.stat.ucla.edu/journals/jss/>

        Struyf, A., Hubert, M. and Rousseeuw, P.J. (1997).
        Integrating Robust Clustering Techniques in S-PLUS,
        Computational Statistics and Data Analysis, 26, 17-37.

   SSeeee AAllssoo::

        `pam.object', `clara', `daisy', `partition.object',
        `plot.partition', `dist'.

   EExxaammpplleess::

        # generate 25 objects, divided into 2 clusters.
        x <- rbind(cbind(rnorm(10,0,0.5), rnorm(10,0,0.5)),
                   cbind(rnorm(15,5,0.5), rnorm(15,5,0.5)))
        pamx <- pam(x, 2)
        pamx
        summary(pamx)
        plot(pamx)

        pam(daisy(x, metric = "manhattan"), 2, diss = T)

        data(ruspini)
        ## Plot similar to Figure 4 in Stryuf et al (1996)
        plot(pam(ruspini, 4), ask = TRUE)

        # generate 25 objects, divided into 2 clusters.
        x <- rbind(cbind(rnorm(10,0,0.5), rnorm(10,0,0.5)),
                   cbind(rnorm(15,5,0.5), rnorm(15,5,0.5)))
        pamx <- pam(x, 2)
        pamx
        summary(pamx)
        plot(pamx)

        pam(daisy(x, metric = "manhattan"), 2, diss = T)

        data(ruspini)
        ## Plot similar to Figure 4 in Stryuf et al (1996)
        plot(pam(ruspini, 4), ask = TRUE)

