Demos
Demo 3: Interpretation of PCA
If you have problems displaying the page, try zooming out (Ctrl+- or Cmd+-) and refreshing the page (F5 or Cmd+R). Remember that you can ask questions or leave a comment on the last text slide.While the previous demo meant to provide some intuition on how a PCA works, the present demo is meant to show how we are to analyze the results and various outputs we usually get from a PCA.
If you're eager to start, and just want to play with the plots with less reading (even if it's not recommended), you should skip to the slide Selection and select all features, and then checkout the two slides of plots on the right hand side (a matrix plot and the PCA outputs if you press the arrow to the right of the figures).
Visualizating higher-dimensional data
First off, we know that the Iris data is 4-dimensional, but we only investigated the 2 petal dimensions in the first PCA demo. A simple way of getting an overview of higher-dimensional data - as long as it isn't too high-dimensional - is by means of a matrix plot, or grid plot.
We have made such a plot on the right. The plot has both a row and a column for each attribute, and each "element" (subplot) then shows a scatter plot of the attributes in the corresponding row and column. For instance, the top row is petal width, and the first column is sepal length, and as a result we see a scatter plot of the petal widths versus the sepal lengths in the top-left plot. The colouring is the same as previous demos, with setosas in red, versicolor in green, and virginica in blue. When the feature in the row and column correspond to the same feature, the scatter plot is replaced by a histogram showing that feature's distribution.
Complexity of a matrixplot
Notice, however, that we can only really look at interactions between two features at once with this type of visualization. Furthermore, there are many plots, even when we only look at 4 dimensions. We often investigate 10 or (way more) features, and the matrix plot becomes very complex, and rather hard to get any good information from. A way to visualize such high-dimensional data is, of course, a PCA, where we only look at two or three components at a time.
Selection of features to use
When we do the PCA, it can be illustrative to see how the final projection is affected by which features are included in the analysis (i.e. "given to the PCA"). Usually when doing a PCA we simply give it the full data set initially.
We do not include the label information in the PCA. We are often trying to predict the label (the species) based on the features (the leaf sizes). As an initial step, we perform dimensionality reduction on the features; if we include the labels in the PCA we provide the label information directly to any model that get the projected data as input afterwards.
For choosing which features to include in doing the PCA in this demo, select the features using the checkbuttons below. The selected features should be highlighted with green in the matrixplot, and the un-used features highlighted with red. As a beginning, try to include all the features, and then we'll take a look at the outputs of the PCA before you can go back and try removing some of the features.
Variance explained by each component
The features that were selected have now been used in a PCA. They define an ($M$ by $N$)-dimensional matrix, $\textbf{X}$, where $M$ are the selected features and $N$ are the amount of observations. We have then shown the projection of $\textbf{X}$ onto the first and second principal directions ($PC1$ and $PC2$) on the top-left plot.
Besides the projection, which we discussed more in the first demo on PCA, a very important output of the PCA is the variance explained by each component. The (fractional) variance explained for each component is a number that indicates how much of the total variance in the dataset a given components "explains". If the component explained it all, the fraction would of course be 1, and if the component explained no variance (was just a constant) the fraction would be zero. Since we are often using the PCA to reduce the amount of dimensions, it's useful to determine how many we should include (say we have a 100-dimensional dataset, how many components should we use? 2? 50?).
Cumulative variance explained
A way to approach answering that question is to choose to include a certain fraction of the variance explained, e.g. 80 % or 95 %.If we determine the cumulative variance explained as a function of components included, we can make a plot like the plot in the top-right. From this plot, we see that when we include all four features, we get more than 90 % of the variance explained with just a single component, but have to include 2 components to get more than 95 %. You can read more about the theoretical aspects of the variance explained the end of the PCA chapter from the book.
PCA Component Loadings
Another output of a PCA is the component loadings. The loading for any given component and feature can be thought of as the correlation between the principal component and the feature. We have plotted the component loadings on the bottom-left plot as bar charts.When we include all four features, we see that the first component, $PC1$, is dominated by the petal lengths, but is also related to the sepal lengths and petal widths. We therefore e.g. expect an increase in the petal length of a flower to result in an increased value of the projection onto the first principal component (a higher $PC1$ value). Similarly, we see a negative loading for sepal width and length for the second principal component. This means, that we expect to see a smaller projection onto $PC2$ when we increase the size of the sepals (a negative correlation).
Visualization of reconstructed point
To visualize these concepts, we have once again shown the abstract iris from the demo on the iris data in the bottom-right corner. If you move your mouse into the projection plot and move your mouse around in the projection space (the $PC1$/$PC2$-space), you should see the iris change. The change is based on constructing the point in the principal component space in the original feature space (achieved by multiplying the projection with the Eigenvectors and adding the mean of the data). Can you see the expected changes when you move the mouse in a straight horizontal line (i.e. only changes in $PC1$?) or in a vertical line? Try predicting the changes you will see before moving the cursor (do this by looking at the loadings!).
There seems to be a negative correlation between $PC1$ and $PC2$, especially evident for the green and blue dots. Try moving your mouse along the slant of the versicolor and virginica flowers' point clouds and see how the flower changes (it looks like it's merely changing in scale, which is interesting since it's only the two PCs and not all four dimensions, that is "moved along").