

   CClluusstteerr IInnddeexxeess

         clustindex ( clres, x, index = "all" )

   AArrgguummeennttss::

      clres: An object of a clustering result

          x: Data matrix

      index: The indexes being calculated "calinski", "cindex",
             "db", "hartigan", "ratkowsky", "scott", "marriot",
             "ball", "trcovw", "tracew", "friedman", "rubin",
             "ssi", "likelihood", and "all" for all the
             indexes.

   DDeessccrriippttiioonn::

        `clres' is the result of a clustering algorithm of
        class such as "cclust".  This function is calculating
        the values of several clustering indexes. The values of
        the indexes can be independenly used in order to deter-
        mine the number of clusters existing in a data set.

   DDeettaaiillss::

        The description of the indexes is categorized into 3
        groups, based on the statistics mainly used to compute
        them.
        The first group is based on the sum of squares within
        (SSW) and between (SSB) the clusters. These statistics
        measure the dispersion of the data points in a cluster
        and between the clusters respectively. These indexes
        are:

           * calinski: (SSB/(k-1))/(SSW/(n-k)), where n is the
             number of data points and k is the number of clus-
             ters.

           * hartigan: then log(SSB/SSW).

           * ratkowsky: mean(sqrt(varSSB/varSST)), where varSSB
             stands for the SSB for every variable and varSST
             for the total sum of squares for every variable.

           * ball: SSW/k, where k is the number of clusters.

        The second group is based on the statistics of T, i.e.,
        the scatter matrix of the data points, and W, which is
        the sum of the scatter matrices in every group. These
        indexes are:

           * scott: nlog(|T|/|W|), where n is the number of
             data points and |cdot| stands for the determinant
             of a matrix.

           * marriot: k^2 |W|, where k is the number of clus-
             ters.

           * trcovw: Trace Cov W.

           * tracew: Trace W.

           * friedman: Trace W^(-1) B, where B is the scatter
             matrix of the cluster centers.

           * rubin: |T|/|W|.

        The third group consists of four algorithms not belong-
        ing to the previous ones and not having anything in
        common.

           * cindex: if the data set is binary, then while the
             C-Index is a cluster similarity measure, is
             expressed as:
             [d_(w)-min(d_(w))]/[max(d_(w))-min(d_(w))], where
             d_(w) is the sum of all n_(d) within cluster dis-
             tances, min(d_(w)) is the sum of the n_(d) small-
             est pairwise distances in the data set, and max
             (d_(w)) is the sum of the n_(d) biggest pairwise
             distances. In order to compute the C-Index all
             pairwise distances in the data set have to be com-
             puted and stored. In the case of binary data, the
             storage of the distances is creating no problems
             since there are only a few possible distances.
             However, the computation of all distances can make
             this index prohibitive for large data sets.

           * db: R=(1/n)*sum(R_(i)) where R_(i) stands for the
             maximum value of R_(ij) for ineq j, and R_(ij) for
             R_(ij)=(SSW_(i)+SSW_(j))/DC_(ij), where DC_(ij) is
             the distance between the centers of two clusters
             i, j.

           * likelihood: under the assumption of independence
             of the variables within a cluster, a cluster solu-
             tion can be regarded as a mixture model for the
             data, where the cluster centers give the probabil-
             ities for each variable to be 1. Therefore, the
             negative Log-likelihood can be computed and used
             as a quantity measure for a cluster solution. Note
             that the assumptions for applying special penalty
             terms, like in AIC or BIC, are not fulfilled in
             this model, and also they show no effect for these
             data sets.

           * ssi: this ``Simple Structure Index'' combines
             three elements which influence the interpretabil-
             ity of a solution, i.e., the maximum difference of
             each variable between the clusters, the sizes of
             the most contrasting clusters and the deviation of
             a variable in the cluster centers compared to its
             overall mean. These three elements are multiplica-
             tively combined and normalized to give a value
             between 0 and 1.

   VVaalluuee::

        Returns an vector with the indexes values.

   AAuutthhoorr((ss))::

        Evgenia Dimitriadou and Andreas Weingessel

   RReeffeerreenncceess::

        Andreas Weingessel, Evgenia Dimitriadou and Sara Dol-
        nicar, An Examination Of Indexes For Determining The
        Number Of Clusters In Binary Data Sets,
        <URL: http://www.wu-wien.ac.at/am/workpap.html#29>
        and the references therein.

   SSeeee AAllssoo::

        `cclust', `kmeans'

   EExxaammpplleess::

        # a 2-dimensional example
        x<-rbind(matrix(rnorm(100,sd=0.3),ncol=2),
                 matrix(rnorm(100,mean=1,sd=0.3),ncol=2))
        cl<-cclust(x,2,20,verbose=TRUE,method="kmeans")
        resultindexes <- clustindex(cl,x, index="all")
        resultindexes

