There is a dataset "IRIS" on the disk. Print it using:
proc print data=stat2.iris;
The variables correspond to the examples 6.5 and 6.9 from the book.
SEPALLEN = length of sepal
SEPALWID = width of sepal
PETALLEN = length of petal
PETALWID = width of sepal
SPECIES = type of iris
proc discrim data=stat2.iris wcov wcorr pcov pcorr list
pool=yes;
class species;
Classifying the same observations that were used for "training" is probably not a sound idea. Two other datasets on the disk give you the possibility to "train" on one set and "classify" the other. The datasets are: "IRISCALI" and "IRISTEST". Print them using:
proc print data=stat2.iriscali;
proc print data=stat2.iristest;
The two datasets together form the complete "IRIS" dataset.
proc discrim data=stat2.iriscali wcov wcorr pcov pcorr list
pool=yes testdata=stat2.iristest testlist;
class species;
testclass species;
proc candisc data=stat2.iris out=toplot distance anova;
class species;
proc plot data=toplot;
plot can2*can1=species;
run;
By now you should have a good display of how well the data are seperated in just 2 dimensions.
Check the conclusions from last time with this intuitive plot.
You can do the same for a 3D-plot and obtain an even better view of the data. Try!
You can even have a matrix of scatterplots (or 3D-plots!) by selecting all 4 variables SEPALLEN SEPALWID PETALLEN PETALWID (keep the mouse-button down to select all 4) as X and all 4 as Y.
Finally, you can "brush" observations. Point the mouse at some interesting observations press the mouse button and "drag" a square from there. When you release the button the observations in the square are high-lighted in ALL the plots! In this way you can gain insight (!) in multi-dimensional data. Can you find the observation(s) which were most easily mis-classified?
Compare the plug in estimates of probabilities of misclassification with the cross validated values.