Principal Component Analysis on the ’Sundhed’ dataset

Analysis on all variables
1. Should we analyze the correlation- or the variance-covariance matrix? Why?
2. Perform statistical tests on the eigenvalues in order to assess the relevant number of components to retain!
Exclude ‘alder’ and ‘vegt’
1. Repeat the above analyses!
Condition on alder and vegt
1. Compare the partial correlations and the eigenstructure based on those with the results in B!
2. Are there big differences?
3. Explain this by means of adequate statistical analyses.
Partial correlations based on vegt (by hand!)
1. Reproduce some of the partial correlations, the test statistic for assessing whether the true partial correlations are zero and find the associated p-values!

Principal Component Analysis on Beef characterization

Hypertrophied Piemontese	n=23
Normal Piemontese	n=12
Hypertrophied Piemontese x Friesian Crossbreed	n=10
Friesian	n=11
Belgian blue and white	n=23

here

Principal Component Analysis on Olympic data

data hep;
input name $1-20 hurdles highjump shot run200m longjump javelin run800ml;
cards;
Joyner Kersee (USA)  12.69 1.86 15.8 22.56 7.27 45.66 128.51
John (GDR)           12.85 1.8 16.23 23.65 6.71 42.56 126.12
Behmer (GDR)         13.2 1.83 14.2 23.1 6.68 44.54 124.2
Sablovskaite (URS)   13.61 1.8 15.23 23.92 6.25 42.78 132.24
Choubenkova (URS)    13.51 1.74 14.76 23.93 6.32 47.46 127.9
Schulz (GDR)         13.75 1.83 13.5 24.65 6.33 42.82 125.79
Fleming (AUS)        13.38 1.8 12.88 23.59 6.37 40.28 132.54
Greiner (USA)        13.55 1.8 14.13 24.48 6.47 38 133.65
Lajbnerova (CZE)     13.63 1.83 14.28 24.86 6.11 42.2 136.05
Bouraga (URS)        13.25 1.77 12.62 23.59 6.28 39.06 134.74
Wijnsma (HOL)        13.75 1.86 13.01 25.03 6.34 37.86 131.49
Dimitrova (BUL)      13.24 1.8 12.88 23.59 6.37 40.28 132.54
Scheider (SWI)       13.85 1.86 11.58 24.87 6.05 47.5 134.93
Braun (FRG)          13.71 1.83 13.16 24.78 6.12 44.58 142.82
Ruotsalainen (FIN)   13.79 1.8 12.32 24.61 6.08 45.44 137.06
Yuping (CHN)         13.93 1.86 14.21 25 6.4 38.6 146.67
Hagger (GB)          13.47 1.8 12.75 25.47 6.34 35.76 138.48
Brown (USA)          14.07 1.83 12.69 24.83 6.13 44.34 146.43
Mulliner (GB)        14.39 1.71 12.68 24.92 6.1 37.76 138.02
Hautenauve (BEL)     14.04 1.77 11.82 25.61 5.99 35.68 133.9
Kytola (FIN)         14.31 1.77 11.66 25.69 5.75 39.48 133.35
Geremias (BRA)       14.23 1.71 12.95 25.5 5.5 39.64 144.02
Hui-Ing (TAI)        14.85 1.68 10 5.23 5.47 39.14 137.3
Jeong-Mi (KOR)       14.53 1.71 10.83 26.61 5.5 39.26 139.17
Launa (PNG)          16.42 1.5 11.78 26.16 4.88 46.38 163.43
;

hurdles - results 100m hurdless
highjump - results high jump
shot - results shot
run200m - results 200m race
longjump - results long jump
javelin - results javelin
run800m - results 800m race

Step 1

What relationship do you see between the plots and the correlation coefficients?

Step 2

How many principle components should you use?
Which variables are explained by each principle component? What is the relation to the correlation matrix?
Do you see any outliers?