SAS exercises on discriminant analysis

1. Print dataset

There is a dataset "IRIS" on the disk. Print it using:

proc print data=stat2.iris;

The variables correspond to the examples 6.5 and 6.9 from the book.

SEPALLEN = length of sepal
SEPALWID = width of sepal
PETALLEN = length of petal
PETALWID = width of sepal
SPECIES = type of iris

2. Try the following SAS job:

proc discrim data=stat2.iris wcov wcorr pcov pcorr list
pool=yes;
class species;

3. Experiment with priors

Try to alter some of the priors.
Try fewer variables (eg. VAR SEPALLEN PETALLEN;) Try POOL=NO and POOL=TEST.

4. Training and test set

Classifying the same observations that were used for "training" is probably not a sound idea. Two other datasets on the disk give you the possibility to "train" on one set and "classify" the other. The datasets are: "IRISCALI" and "IRISTEST". Print them using:

proc print data=stat2.iriscali;
proc print data=stat2.iristest;

The two datasets together form the complete "IRIS" dataset.

5. Try the following sasjob:

proc discrim data=stat2.iriscali wcov wcorr pcov pcorr list
pool=yes testdata=stat2.iristest testlist;
class species;
testclass species;

6. Try the following sasjob:

proc candisc data=stat2.iris out=toplot distance anova;
class species;

proc plot data=toplot;
plot can2*can1=species;
run;

7. Classification results

Compare the plug in estimates of probabilities of misclassification with the cross validated values.