Principal Component Analysis

Sample Covariance Matrix

Convert the matrix a to mean deviation form and calculate its sample covariance matrix.
(See Lay 7.5, pp489-491)

In[1]:=

Clear[a] ; 

a = ({{120, 125, 125, 135, 145}, {61, 60, 64, 68, 72}}) ; 

ListPlot[Transpose[a], ImageSize400, AxesLabel {"x1 (weight)", "x2 (height)"}, PlotStyle {PointSize[0.02], Red}] ;

[Graphics:HTMLFiles/7_principal_components_4.gif]

Write a procedure to calculate the sample mean of a matrix and use it to calculate the sample mean of a.

In[4]:=

sampleMean[a_ ? MatrixQ] := Module[{n = Dimensions[a][[2]]},               1/nApply[Plus, Transpose[a]]]

In[5]:=

m = sampleMean[a] ;

%//MatrixForm

Out[6]//MatrixForm=

( {{130}, {65}} )

Write a procedure to calculate the mean deviation form of a matrix and use it to calculate the mean deviation form of a.

In[7]:=

<<LinearAlgebra`MatrixManipulation`

In[8]:=

meanDeviationForm[a_ ? MatrixQ] := Module[{m = sampleMean[a], n = Dimensions[a][[2]] ... sp;              {k, 1, n}]]] ;

In[9]:=

b = meanDeviationForm[a] ;

%//MatrixForm

ListPlot[Transpose[b], ImageSize400, AxesLabel {"x1Hat (weight)", "x2Hat (height)"}, PlotStyle {PointSize[0.02], Red}] ;

Out[10]//MatrixForm=

( {{-10, -5, -5, 5, 15}, {-4, -5, -1, 3, 7}} )

[Graphics:HTMLFiles/7_principal_components_15.gif]

Check: b should have zero sample mean.

In[12]:=

sampleMean[b] ;

%//MatrixForm

Out[13]//MatrixForm=

( {{0}, {0}} )

Write a procedure to calculate the sample covariance matrix of a given matrix and use it to calculate the sample covariance matrix of a.

In[14]:=

sampleCovarianceMatrix[a_ ? MatrixQ] := Module[{n = Dimensions[a][[2]]}, &nb ... sp;           1/(n - 1) a . Transpose[a]]

In[15]:=

s = sampleCovarianceMatrix[b] ;

%//MatrixForm

Out[16]//MatrixForm=

( {{100, 95/2}, {95/2, 25}} )

Principal Components

Bring forward the sample covariance matrix from the previous example.

In[17]:=

s//MatrixForm

Out[17]//MatrixForm=

( {{100, 95/2}, {95/2, 25}} )

We seek to diagonalize the matrix s.
Calculate the eigendata for s.

In[18]:=

{evals, evecs} = Eigensystem[s]

Out[18]=

{{1/2 (125 + 5 586^(1/2)), 1/2 (125 - 5 586^(1/2))}, {{-10/19 + 1/95 (125 + 5 586^(1/2)), 1}, {-10/19 + 1/95 (125 - 5 586^(1/2)), 1}}}

Construct d, the diagonal matrix of eigenvalues in decreasing order.

In[19]:=

d = DiagonalMatrix[evals]//N ;

%//MatrixForm

Out[20]//MatrixForm=

( {{123.019, 0.}, {0., 1.98141}} )

Construct the orthonormal matrix p, consisting of normalized eigenvectors of s.

In[21]:=

p = Transpose[Table[evecs[[k]]/Norm[evecs[[k]]//N],  {k, 2}]] ;

%//MatrixForm

Out[22]//MatrixForm=

( {{0.899901, -0.436094}, {0.436094, 0.899901}} )

Check the diagonalization.

In[23]:=

sp . d . Transpose[p]

Out[23]=

True

The columns of p are the principal components of the data.

In[24]:=

<<Graphics`Graphics`

<<Graphics`Arrow`

In[26]:=

showColorfulVectors[vecs_, color_, opts___] := Show[Graphics[Flatten[{Thickness[.004], color ... ]] + vecs[[i, 2]]], {i, Length[vecs]}]}]], opts, AspectRatioAutomatic, AxesTrue]

In[27]:=

{u1, u2} = Transpose[p]

o = {0, 0} ; 

DisplayTogether[ListPlot[Transpose[b], ImageSize400, AxesLab ... tSize[0.02], Red}], Show[showColorfulVectors[{{o, 10 u1}, {o, 10u2}}, ManganeseBlue]]] ;

Out[27]=

{{0.899901, 0.436094}, {-0.436094, 0.899901}}

[Graphics:HTMLFiles/7_principal_components_42.gif]

Use the principal components to define and relate the variables X and Y.
X represents the original data. Y represents the transformed data.
The change of basis matrix p is orthonormal.

In[30]:=

x = {x1, x2} ;

y = {y1, y2} ; 

y = Transpose[p] . x ;

%//MatrixForm

Out[33]//MatrixForm=

( {{0.899901 x1 + 0.436094 x2}, {-0.436094 x1 + 0.899901 x2}} )

The matrix d is the covariance matrix for the transformed data.
y1 and y2 are independent.

In[34]:=

d//MatrixForm

Out[34]//MatrixForm=

( {{123.019, 0.}, {0., 1.98141}} )

In[35]:=

v1 = {1, 0} ;

v2 = {0, 1} ;

o = {0, 0} ; 

DisplayTogether[ListPlot[Transpose[Transpose[p] . b], ImageSize400,  ... tSize[0.02], Red}], Show[showColorfulVectors[{{o, 10 v1}, {o, 10v2}}, ManganeseBlue]]] ;

[Graphics:HTMLFiles/7_principal_components_54.gif]

Calculate the percentage of the total variance contained in the first principal component.
Tr[d] is the trace of the matrix d.

In[39]:=

firstPC = u1

percentVarianceFirstPC = d[[1, 1]]/Tr[d]

Out[39]=

{0.899901, 0.436094}

Out[40]=

0.984149

The first principal component accounts for 98.4% of the variance in this data.


Created by Mathematica  (April 6, 2005) Valid XHTML 1.1!