fda                   package:mda                   R Documentation

_F_l_e_x_i_b_l_e _D_i_s_c_r_i_m_i_n_a_n_t _A_n_a_l_y_s_i_s

_D_e_s_c_r_i_p_t_i_o_n:

     Flexible discriminant analysis.

_U_s_a_g_e:

     fda(formula, data, weights, theta, dimension, eps, method,
         keep.fitted, ...)

_A_r_g_u_m_e_n_t_s:

 formula: of the form `y~x' it describes the response and the
          predictors.  The formula can be more complicated, such as
          `y~log(x)+z' etc (see `formula' for more details). The
          response should be a factor representing the response
          variable, or any vector that can be coerced to such (such as
          a logical variable).

    data: data frame containing the variables in the formula
          (optional).

 weights: an optional vector of observation weights.

   theta: an optional matrix of class scores, typically with less than
          `J-1' columns.

dimension: The dimension of the solution, no greater than `J-1', where
          `J' is the number classes.  Default is `J-1'.

     eps: a threshold for small singular values for excluding
          discriminant variables; default is `.Machine$double.eps'.

  method: regression method used in optimal scaling.  Default is linear
          regression via the function `polyreg', resulting in linear
          discriminant analysis.  Other possibilities are `mars' and
          `bruto'.  For Penalized Discriminant analysis `gen.ridge' is
          appropriate.

keep.fitted: a logical variable, which determines whether the
          (sometimes large) component `"fitted.values"' of the `fit'
          component of the returned fda object should be kept.  The
          default is `TRUE' if `n * dimension < 1000'.

     ...: additional arguments to `method'.

_V_a_l_u_e:

     an object of class `"fda"'.  Use `predict' to extract discriminant
     variables, posterior probabilities or predicted class memberships.
      Other extractor functions are `coef', `confusion' and `plot'.

     The object has the following components: 

percent.explained: the percent between-group variance explained by each
          dimension (relative to the total explained.)

  values: optimal scaling regresssion sum-of-squares for each dimension
          (see reference).  The usual discriminant analysis eigenvalues
          are given by `values / (1-values)', which are used to define
          `percent.explained'.

   means: class means in the discriminant space.  These are also scaled
          versions of the final theta's or class scores, and can be
          used in a subsequent call to `fda' (this only makes sense if
          some columns of theta are omitted-see the references).

theta.mod: (internal) a class scoring matrix which allows `predict' to
          work properly.

dimension: dimension of discriminant space.

   prior: class proprotions for the training data.

     fit: fit object returned by `method'.

    call: the call that created this object (allowing it to be
          `update'-able)

confusion: confusion matrix when classifying the training data.


     The `method' functions are required to take arguments `x' and `y'
     where both can be matrices, and should produce a matrix of
     `fitted.values' the same size as `y'.  They can take additional
     arguments `weights' and should all have a `...' for safety sake. 
     Any arguments to `method' can be passed on via the `...' argument
     of `fda'.  The default method `polyreg' has a `degree' argument
     which allows polynomial regression of the required total degree. 
     See the documentation for `predict.fda' for further requirements
     of `method'.

_N_o_t_e:

     This software it is not well-tested, we would like to hear of any
     bugs.

_A_u_t_h_o_r(_s):

     Trevor Hastie and Robert Tibshirani

_R_e_f_e_r_e_n_c_e_s:

     ``Flexible Disriminant Analysis by Optimal Scoring''  by Hastie,
     Tibshirani and Buja, 1994, JASA, 1255-1270.

     ``Penalized Discriminant Analysis'' by Hastie, Buja and
     Tibshirani, Annals of Statistics, 1995 (in press).

_S_e_e _A_l_s_o:

     `predict.fda', `mars', `bruto', `polyreg', `softmax', `confusion',

_E_x_a_m_p_l_e_s:

     data(iris)
     irisfit <- fda(Species ~ ., data = iris)
     irisfit
     ## fda(formula = Species ~ ., data = iris)
     ##
     ## Dimension: 2 
     ##
     ## Percent Between-Group Variance Explained:
     ##     v1     v2 
     ##  99.12 100.00 
     ##
     ## Degrees of Freedom (per dimension): 5 
     ##
     ## Training Misclassification Error: 0.02 ( N = 150 )

     confusion(irisfit, iris)
     ##            Setosa Versicolor Virginica 
     ##     Setosa     50          0         0
     ## Versicolor      0         48         1
     ##  Virginica      0          2        49
     ## attr(, "error"):
     ## [1] 0.02

     plot(irisfit)

     coef(irisfit)
     ##           [,1]        [,2]
     ## [1,] -2.126479 -6.72910343
     ## [2,] -0.837798  0.02434685
     ## [3,] -1.550052  2.18649663
     ## [4,]  2.223560 -0.94138258
     ## [5,]  2.838994  2.86801283

     marsfit <- fda(Species ~ ., data = iris, method = mars)
     marsfit2 <- update(marsfit, degree = 2)
     marsfit3 <- update(marsfit, theta = marsfit$means[, 1:2]) 
     ## this refits the model, using the fitted means (scaled theta's)
     ## from marsfit to start the iterations

