mda                   package:mda                   R Documentation

_M_i_x_t_u_r_e _D_i_s_c_r_i_m_i_n_a_n_t _A_n_a_l_y_s_i_s

_D_e_s_c_r_i_p_t_i_o_n:

     Mixture discriminant analysis.

_U_s_a_g_e:

     mda(formula, data, subclasses, sub.df, tot.df, dimension, eps,
         iter, weights, method, keep.fitted, trace, ...)

_A_r_g_u_m_e_n_t_s:

 formula: of the form `y~x' it describes the response and the
          predictors.  The formula can be more complicated, such as
          `y~log(x)+z' etc (see `formula' for more details). The
          response should be a factor representing the response
          variable, or any vector that can be coerced to such (such as
          a logical variable).

    data: data frame containing the variables in the formula
          (optional).

subclasses: Number of subclasses per class, default is 3.  Can be a
          vector with a number for each class.

  sub.df: If subclass centroid shrinking is performed, what is the
          effective degrees of freedom of the centroids per class.  Can
          be a scalar, in which case the same number is used for each
          class, else a vector.

  tot.df: The total df for all the centroids can be specified rather
          than separately per class.

dimension: The dimension of the reduced model.  If we know our final
          model will be confined to a discriminant subspace (of the
          subclass centroids), we can specify this in advance and have
          the EM algorithm operate in this subspace.

     eps: A numerical threshold for automatically truncating the
          dimension.

    iter: A limit on the total number of iterations,  default is 5.

 weights: NOT observation weights!  This is a special weight structure,
          which for each class assigns a weight (prior probability) to
          each of the observations in that class of belonging to one of
          the subclasses.  The default is provided by a call to
          `mda.start(x, g, subclasses, trace, ...)' (by this time `x'
          and `g' are known).  See the help for `mda.start'.  Arguments
          for `mda.start' can be provided via the `...' argument to
          mda, and the `weights' argument need never be accessed.  A
          previously fit mda object can be supplied, in which case the
          final subclass `responsibility' weights are used for
          `weights'.  This  allows the iterations from a previous fit
          to be continued.

  method: regression method used in optimal scaling.  Default is linear
          regression via the function `polyreg', resulting in the usual
          mixture model.  Other possibilities are `mars' and  `bruto'. 
          For penalized mixture discriminant models `gen.ridge' is
          appropriate.

keep.fitted: a logical variable, which determines whether the
          (sometimes large) component `"fitted.values"' of the `fit'
          component of the returned `mda' object should be kept.  The
          default is `TRUE' if `n * dimension < 1000'.

   trace: if `TRUE', iteration information is printed.  Note that the
          deviance reported is for the posterior class likelihood, and
          not the full likelihood, which is used to drive the EM
          algorithm under `mda'.  In general the latter is not
          available.

     ...: additional arguments to `mda.start' and to `method'.

_V_a_l_u_e:

     An object of class `c("mda", "fda")'.  The most useful extractor
     is `predict', which can make many types of predictions from this
     object.  It can also be plotted, and any functions useful for fda
     objects will work here too, such as `confusion' and `coef'.

     The object has the following components: 

percent.explained: the percent between-group variance explained by each
          dimension (relative to the total explained.)

  values: optimal scaling regresssion sum-of-squares for each dimension
          (see reference).

   means: subclass means in the discriminant space.  These are also
          scaled versions of the final theta's or class scores, and can
          be used in a subsequent call to `mda' (this only makes sense
          if some columns of theta are omitted-see the references)

theta.mod: (internal) a class scoring matrix which allows `predict' to
          work properly.

dimension: dimension of discriminant space.

sub.prior: subclass membership priors, computed in the fit.  No effort
          is currently spent in trying to keep these above a threshold.

   prior: class proprotions for the training data.

     fit: fit object returned by `method'.

    call: the call that created this object (allowing it to be
          `update'-able).

confusion: confusion matrix when classifying the training data.

 weights: These are the subclass membership probabilities for each
          member of the training set; see the weights argument.

assign.theta: a pointer list which identifies which elements of certain
          lists belong to individual classes.

deviance: The multinomial log-liklihood of the fit.  Even though the
          full log-likelihood drives the iterations, we cannot in
          general compute it because of the flexibility of the `method'
          used. The deviance can increase with the iterations, but
          generally does not.


     The `method' functions are required to take arguments `x' and `y'
     where both can be matrices, and should produce a matrix of
     `fitted.values' the same size as `y'.  They can take additional
     arguments `weights' and should all have a `...' for safety sake. 
     Any arguments to method() can be passed on via the `...' argument
     of `mda'.  The default method `polyreg' has a `degree' argument
     which allows polynomial regression of the required total degree. 
     See the documentation for `predict.fda' for further requirements
     of `method'.

     The function `mda.start' creates the starting weights; it takes
     additional arguments which can be passed in via the `...' argument
     to `mda'.  See the documentation for `mda.start'.

_N_o_t_e:

     This software it is not well-tested, we would like to hear of any
     bugs.

_A_u_t_h_o_r(_s):

     Trevor Hastie and Robert Tibshirani

_R_e_f_e_r_e_n_c_e_s:

     ``Flexible Disriminant Analysis by Optimal Scoring'' by Hastie,
     Tibshirani and Buja, 1994, JASA, 1255-1270.

     ``Penalized Discriminant Analysis'' by Hastie, Buja and
     Tibshirani, Annals of Statistics, 1995 (in press).

     ``Discriminant Analysis by Gaussian Mixtures'' by Hastie and
     Tibshirani, 1994, JRSS-B (in press).

_S_e_e _A_l_s_o:

     `predict.mda', `mars', `bruto', `polyreg', `gen.ridge', `softmax',
     `confusion'

_E_x_a_m_p_l_e_s:

     data(iris)
     irisfit <- mda(Species ~ ., data = iris)
     irisfit
     ## Call:
     ## mda(formula = Species ~ ., data = iris)
     ##
     ## Dimension: 4
     ##
     ## Percent Between-Group Variance Explained:
     ##     v1     v2     v3     v4
     ##  96.02  98.55  99.90 100.00
     ##
     ## Degrees of Freedom (per dimension): 5
     ##
     ## Training Misclassification Error: 0.02 ( N = 150 )
     ##
     ## Deviance: 15.102

     data(glass)
     # random sample of size 100
     samp <- c(1, 3, 4, 11, 12, 13, 14, 16, 17, 18, 19, 20, 27, 28, 31,
               38, 42, 46, 47, 48, 49, 52, 53, 54, 55, 57, 62, 63, 64, 65,
               67, 68, 69, 70, 72, 73, 78, 79, 83, 84, 85, 87, 91, 92, 94,
               99, 100, 106, 107, 108, 111, 112, 113, 115, 118, 121, 123,
               124, 125, 126, 129, 131, 133, 136, 139, 142, 143, 145, 147,
               152, 153, 156, 159, 160, 161, 164, 165, 166, 168, 169, 171,
               172, 173, 174, 175, 177, 178, 181, 182, 185, 188, 189, 192,
               195, 197, 203, 205, 211, 212, 214) 
     glass.train <- glass[samp,]
     glass.test <- glass[-samp,]
     glass.mda <- mda(Type ~ ., data = glass.train)
     predict(glass.mda, glass.test, type="post") # abbreviations are allowed
     confusion(glass.mda,glass.test)

