

   DDiissssiimmiillaarriittyy MMaattrriixx CCaallccuullaattiioonn

        daisy(x, metric = "euclidean", stand = F, type = list())

   AArrgguummeennttss::

          x: data matrix or dataframe. Dissimilarities will be
             computed between the rows of `x'.  Columns of
             class `numeric' will be recognized as interval
             scaled variables, columns of class `factor' will
             be recognized as nominal variables, and columns of
             class `ordered' will be recognized as ordinal
             variables.  Other variable types should be speci-
             fied with the `type' argument.  Missing values
             (NAs) are allowed.

     metric: character string specifying the metric to be used.
             The currently available options are "euclidean"
             and "manhattan".  Euclidean distances are root
             sum-of-squares of differences, and manhattan dis-
             tances are the sum of absolute differences.  If
             not all columns of `x' are numeric, then this
             argument will be ignored.

      stand: logical flag: if TRUE, then the measurements in
             `x' are standardized before calculating the dis-
             similarities. Measurements are standardized for
             each variable (column), by subtracting the vari-
             able's mean value and dividing by the variable's
             mean absolute deviation.  If not all columns of
             `x' are numeric, then this argument will be
             ignored.

       type: list containing some (or all) of the types of the
             variables (columns) in `x'. The list may contain
             the following components: `ordratio' (ratio scaled
             variables to be treated as ordinal variables),
             `logratio' (ratio scaled variables that must be
             logarithmically transformed), `asymm' (asymmetric
             binary variables). Each component's value is a
             vector, containing the names or the numbers of the
             corresponding columns of `x'.  Variables not men-
             tioned in the `type' list are interpreted as usual
             (see argument `x').

   DDeessccrriippttiioonn::

        Returns a matrix containing all the pairwise dissimi-
        larities (distances) between observations in the
        dataset.  The original variables may be of mixed types.

   DDeettaaiillss::

        `daisy' is fully described in chapter 1 of Kaufman and
        Rousseeuw (1990).  Compared to `dist' whose input must
        be numeric variables, the main feature of `daisy' is
        its ability to handle other variable types as well
        (e.g. nominal, ordinal, asymmetric binary) even when
        different types occur in the same dataset.

        In the `daisy' algorithm, missing values in a row of x
        are not included in the dissimilarities involving that
        row. If all variables are interval scaled, the metric
        is "euclidean", and ng is the number of columns in
        which neither row i and j have NAs, then the dissimi-
        larity d(i,j) returned is sqrt(ncol(x)/ng) times the
        Euclidean distance between the two vectors of length ng
        shortened to exclude NAs. The rule is similar for the
        "manhattan" metric, except that the coefficient is
        ncol(x)/ng.  If ng is zero, the dissimilarity is NA.

        When some variables have a type other than interval
        scaled, the dissimilarity between two rows is the
        weighted sum of the contribution of each variable.  The
        weight becomes zero when that variable is missing in
        either or both rows, or when the variable is asymmetric
        binary and both values are zero. In all other situa-
        tions, the weight of the variable is 1.  The contribu-
        tion of nominal or binary variable a to the total dis-
        similarity is zero if both values are different, else
        it is equal to 1. The contribution of other variables
        is the absolute difference of both values, divided by
        the total range of that variable.  Ordinal variables
        are first converted to ranks.  If nok is the number of
        nonzero weights, the dissimilarity is multiplied by the
        factor 1/nok and thus ranges between 0 and 1.  If nok
        is zero, the dissimilarity is NA.

   VVaalluuee::

        an object of class `"dissimilarity"' containing the
        dissimilarities among the rows of x. This is typically
        the input for the functions `pam', `fanny', `agnes' or
        `diana'. See dissimilarity.object for details.

   BBAACCKKGGRROOUUNNDD::

        Dissimilarities are used as inputs to cluster analysis
        and multidimensional scaling. The choice of metric may
        have a large impact.

   RReeffeerreenncceess::

        Kaufman, L. and Rousseeuw, P.J. (1990).  Finding Groups
        in Data: An Introduction to Cluster Analysis.  Wiley,
        New York.

        Struyf, A., Hubert, M. and Rousseeuw, P.J. (1997).
        Integrating Robust Clustering Techniques in S-PLUS,
        Computational Statistics and Data Analysis, 26, 17-37.

   SSeeee AAllssoo::

        `dissimilarity.object', `dist', `pam', `fanny',
        `clara', `agnes', `diana'.

   EExxaammpplleess::

        data(agriculture)
        ## Example 1 in ref
        ## Compute the dissimilarities using Euclidean metric and without
        ## standardization
        daisy(agriculture, metric = "euclidean", stand = FALSE)

        data(flower)
        ## Example 2 in ref
        daisy(flower, type = list(asymm = 3))
        daisy(flower, type = list(asymm = c(1, 3), ordratio = 7))

