

   CCoorrrreessppoonnddeennccee AAnnaallyyssiiss

        ca(a)

   AArrgguummeennttss::

          a: data matrix to be decomposed, the rows represent-
             ing observations and the columns variables.

         nf: number of factors or axes to be sought; default 7.

      rproj: projections of row points on the factors.

      cproj: projections of column points on the factors.

      evals: eigenvalues associated with the new factors. These
             provide figures of merit for the "inertia
             explained" by the factors.  They are usually
             quoted in terms of percentage of the total, or in
             terms of cumulative percentage of the total.

      evecs: definition of the factors in terms of the original
             variables.  The first column is the linear combi-
             nation of columns of `a' defining the first fac-
             tor, etc.

      rcntr: contributions of observations to the factors.  The
             contributions are mass times projection (on the
             factor) squared.  Since contributions take account
             of the mass, they more accurately indicate influ-
             ential observations for the interpretation of the
             factor, compared to the projections alone.

      ccntr: contributions of variables to the factors. See
             above remark concerning row contributions.

   DDeessccrriippttiioonn::

        Finds a new coordinate system for multivariate data
        such that the first coordinate has maximal inertia, the
        second coordinate has maximal inertia subject to being
        orthogonal to the first, etc.  Compared to Principal
        Components Analysis, each row and column point has an
        associated mass (related to the row or column totals);
        and the chi-squared distance takes the place of the
        Euclidean distance.  The issue of how to code the input
        data is important: this takes the place of input data
        transformation in PCA.

   NNOOTTEE::

        Very small negative eigenvalues, if they arise, are an
        artifact of the SVD algorithm used, and are to be
        treated as zero.

   MMEETTHHOODD::

        A singular value decomposition is carried out.

   BBAACCKKGGRROOUUNNDD::

        Correspondence analysis defines the axis which provides
        the best fit to both the row points and the column
        points.  A second axis is determined which best fits
        the data subject to being orthogonal to the first.
        Third and subsequent axes are similarly found.  Best
        fit is in the least squares sense, relative to the chi-
        squared distance.  This can be viewed as a weighted
        Euclidean distance between `profiles'.

        The question of `coding' of input data is an important
        one.  For instance, in a matrix of scores, one might
        wish to adjoin extra columns to the input matrix such
        that both the initial score, and the maximum score
        minus it, are included in the observation's set of val-
        ues.  Note that this has the effect that all row masses
        are equal.  Hence the variables alone are differen-
        tially weighted.  This is known as `doubling' the
        observations.  In the case of binary data, such coding
        is known as `complete disjunctive form'.

        Other forms of input data for which correspondence
        analysis can be used include frequencies, or contin-
        gency-type data.  In this case, the totaled chi-squared
        distances of all (row or column) points from the origin
        is the familiar chi-squared statistic. Hence the graph-
        ical output of correspondence analysis allows assess-
        ment of departure from a null hypothesis of no depen-
        dence of rows and columns.

        Supplementary rows or columns are projected into the
        factor space, after carrying out a correspondence anal-
        ysis.  That is to say, such row or column profiles are
        assumed to have zero mass, and their projections are to
        be found under such an assumption.  Functions `supplr'
        and `supplc' may be used for this purpose.  Supplemen-
        tary rows or columns are of a different nature compared
        to the basis data analyzed (e.g. sex in the context of
        a questionnaire); or they are rows or columns which,
        one suspects, would untowardly influence the definition
        of the factors.

   RReeffeerreenncceess::

        Extensive works of J.-P. Benzecri including Correspon-
        dence Analysis Handbook Marcel Dekker, Basel, 1992.

        M.J. Greenacre, Theory and Applications of Correspon-
        dence Analysis Academic Press, New York, 1984.

        L. Lebart, A. Morineau and K.M. Warwick, Multivariate
        Descriptive Statistical Analysis Wiley, New York, 1984.

        S. Nishisato, Analysis of Categorical Data: Dual Scal-
        ing and Its Applications University of Toronto Press,
        Toronto, 1980.

        (An extensive annotated bibliography is to be found in
        Greenacre.)

   SSeeee AAllssoo::

        Supplementary rows and columns: `supplr', `supplc'.
        Initial data coding: `flou', `logique'.  Other related
        functions: `pca', `prcomp', `cancor', `sammon', `cmd-
        scale'.  Plotting tool: `plaxes'.

   EExxaammpplleess::

        ###
        ### WARNING: Examples cannot be executed!!!
        ###
        # correspondence analysis of the breakfast cereal data,
        # in complete disjunctive form:
        bfpos <- t(cereal.attitude)
        bfneg <- max(bfpos) - bfpos
        bfposneg <- cbind(bfpos, bfneg)
        corr <- ca(bfposneg)
        # plot of first and second factors
        plot(corr$rproj[,1], corr$rproj[,2],type="n")
        text(corr$rproj[,1], corr$rproj[,2], labels=dimnames(bfposneg[[1]]))
        # Place additional axes through x=0 and y=0:
        plaxes(corr$rproj[,1], corr$rproj[,2])
        # check of row contributions
        corr$rcntr
        #
        # Fuzzy coding of input variables, `a', `b', `c':
        a.fuzz <- flou(a)
        b.fuzz <- flou(b)
        c.fuzz <- flou(c)
        newdata <- cbind(a.fuzz, b.fuzz, c.fuzz)
        ca.newdata <- ca(newdata)

