fda                   package:mda                   R Documentation

_F_l_e_x_i_b_l_e _D_i_s_c_r_i_m_i_n_a_n_t _A_n_a_l_y_s_i_s

_U_s_a_g_e:

     fda(formula, data, weights, theta, dimension, eps, method, ...)

_A_r_g_u_m_e_n_t_s:

 formula: of the form `y~x' it describes the response and the
          predictors. The formula can be more complicated, such as
          `y~log(x)+z' etc (type `?formula' for more details). The
          response should be a factor or category representing the
          response variable, or any vector that can be coerced to such
          (such as a logical variable).

    data: data frame containing the variables in the formula
          (optional).

 weights: an optional vector of observation weights.

   theta: an optional matrix of class scores, typically with less than
          `J-1' columns.

dimension: The dimension of the solution, no greater than `J-1', where
          `J' is the number classes. Default is `J-1'.

     eps: a threshold for small singular values for excluding
          discriminant variables; default is `.Machine$double.eps'.

  method: regression method used in optimal scaling. Default is linear
          regression via the function `polyreg', resulting in linear
          discriminant analysis.  Other possibilities are `mars' and
          `bruto'.  For Penalized Discriminant  analysis `gen.ridge' is
          appropriate.

keep.fitted: a logical variable, which determines whether the
          (sometimes large) component `"fitted.values"' of the `"fit"'
          component of the returned `fda' object should be kept. The
          default is `TRUE' if `n * dimension < 1000'

     ...: additional arguments to `method()'.

_V_a_l_u_e:

     an object of class `"fda"'. Use `predict' to extract discriminant
     variables, posterior probabilities or predicted class memberships.
     Other extractor functions are `coef', `confusion' and `plot'. 

     The object has the following components: 

percent.explained: the percent between-group variance explained by each
          dimension (relative to the total explained.)

  values: optimal scaling regresssion sum-of-squares for each dimension
          (see reference).  The usual discriminant analysis eigenvalues
          are given by `values/(1-values)', which are used to define
          `percent.explained'

   means: class means in the discriminant space. These are also scaled
          versions of the final theta's or class scores, and can be
          used in a subsequent call to `fda()' (this only makes sense
          if some columns of theta are omitted-see the references)

theta.mod: (internal) a class scoring matrix which allows predict to
          work properly.

dimension: dimension of discriminant space

   prior: class proprotions for the training data

     fit: fit object returned by "method"

    call: the call that created this object (allowing it to be
          `update()'-able)

confusion: confusion matrix when classifying the training data


     The `method' functions are required to take arguments `x' and `y'
     where both can be matrices, and should produce a matrix of
     `fitted.values' the same size as `y'. They can take additional
     arguments `weights' and should all have a `...{}' for safety sake.
      Any arguments to method() can be passed on via the `...{}'
     argument of `fda()'. The default method `polyreg()' has a `degree'
     argument which allows polynomial regression of the required total
     degree.  See the documentation for `predict.fda()' for further
     requirements of `method'.

_N_o_t_e:

     This software it is not well-tested, we would like to hear of any
     bugs.

_A_u_t_h_o_r(_s):

     Trevor Hastie and Robert Tibshirani

_R_e_f_e_r_e_n_c_e_s:

     ``Flexible Disriminant Analysis by Optimal Scoring''  by Hastie,
     Tibshirani and Buja, 1994, JASA, 1255-1270.

     ``Penalized Discriminant Analysis'' by Hastie, Buja and
     Tibshirani, Annals of Statistics, 1995 (in press).

_S_e_e _A_l_s_o:

     `predict.fda', `mars', `bruto', `polyreg', `softmax', `confusion',

_E_x_a_m_p_l_e_s:

     data(iris)
     irisfit <- fda(Species ~ ., data = iris)
     irisfit
     ## fda(formula = Species ~ ., data = iris)
     ##
     ## Dimension: 2 
     ##
     ## Percent Between-Group Variance Explained:
     ##     v1     v2 
     ##  99.12 100.00 
     ##
     ## Degrees of Freedom (per dimension): 5 
     ##
     ## Training Misclassification Error: 0.02 ( N = 150 )

     confusion(irisfit, iris)
     ##            Setosa Versicolor Virginica 
     ##     Setosa     50          0         0
     ## Versicolor      0         48         1
     ##  Virginica      0          2        49
     ## attr(, "error"):
     ## [1] 0.02

     plot(irisfit)

     coef(irisfit)
     ##           [,1]        [,2]
     ## [1,] -2.126479 -6.72910343
     ## [2,] -0.837798  0.02434685
     ## [3,] -1.550052  2.18649663
     ## [4,]  2.223560 -0.94138258
     ## [5,]  2.838994  2.86801283

     marsfit <- fda(Species ~ ., data = iris, method = mars)
     marsfit2 <- update(marsfit, degree = 2)
     marsfit3 <- update(marsfit, theta = marsfit$means[, 1:2]) 
     ## this refits the model, using the fitted means (scaled theta's)
     ## from marsfit to start the iterations

