2 min read

Clustering

  1. Hierarchical Clustering Methods
  2. Nonhierarchical Clustering Methods
  3. Correspondence Analysis
    Matrix \(\mathbf X\), with elements \(x_{ij}\), is an two-way \((I\times J=n),i=1,2,\cdots,I;j=1,2,\cdots,J\) contingency table of unscaled frequencies or counts. The matrix of proportions \(\mathbf P=\{p_{ij}\}\) with elements \(p_{ij}=\frac{1}{n}x_{ij}\), is called the The correspondence matrix. The row sums are the vector \[\mathbf r=\{r_{i}=\sum_{j=1}^{J}p_{ij}=\sum_{j=1}^{J}\frac{1}{n}x_{ij}\}\] or \[\underset{(I\times 1)}{\mathbf r}=\underset{(I\times J)}{\mathbf P}\underset{(J\times1)}{\mathbf 1_J}\] The column sums are the vector \[\mathbf c=\{c_{j}=\sum_{i=1}^{I}p_{ij}=\sum_{i=1}^{I}\frac{1}{n}x_{ij}\}\] or \[\underset{(J\times 1)}{\mathbf c}=\underset{(J\times I)}{\mathbf P^T}\underset{(I\times1)}{\mathbf 1_I}\] Let diagonal matrix \[\mathbf D_r=diag(r_1,r_2,\cdots,r_I)\] \[\mathbf D_c=diag(c_1,c_2,\cdots,c_J)\] Correspondence analysis can be formulated as the weighted least squares problem to select matrix \(\hat{\mathbf P}=\{\hat{p}_{ij}\}\), which is specified reduced rank and can minimize the sum of squares \[\sum_{i=1}^{I}\sum_{j=1}^{J}\frac{(p_{ij}-\hat{p}_{ij})^2}{r_ic_j}=tr\Bigl[(\mathbf D_r^{-1/2}(\mathbf P-\hat{\mathbf P})\mathbf D_c^{-1/2})(\mathbf D_r^{-1/2}(\mathbf P-\hat{\mathbf P})\mathbf D_c^{-1/2})^T\Bigr]\] since \((p_{ij}-\hat{p}_{ij})/\sqrt{r_ic_j}\) is the \((i,j)\) element of \(\mathbf D_r^{-1/2}(\mathbf P-\hat{\mathbf P})\mathbf D_c^{-1/2}\) The scaled version of the correspondence matrix \(\mathbf P=\{p_{ij}\}\) is \[\mathbf B=\mathbf D_r^{-1/2}\mathbf P\mathbf D_c^{-1/2}\] the best low \(\text{rank}=s\) approximation \(\hat{\mathbf B}\) to \(\mathbf B\) is given by the first \(s\) terms in the the singular-value decomposition \[\mathbf D_r^{-1/2}\mathbf P\mathbf D_c^{-1/2}=\sum_{k=1}^{J}\widetilde{\lambda}_k\widetilde{\mathbf u}_k\widetilde{\mathbf v}_k^T\] where \[\mathbf D_r^{-1/2}\mathbf P\mathbf D_c^{-1/2}\widetilde{\mathbf v}_k=\widetilde{\lambda}_k\widetilde{\mathbf u}_k\] and \[\widetilde{\mathbf u}_k^T\mathbf D_r^{-1/2}\mathbf P\mathbf D_c^{-1/2}=\widetilde{\lambda}_k\widetilde{\mathbf v}_k^T\] Then the approximation to \(\mathbf P\) is then given by \[\hat{\mathbf P}=\mathbf D_r^{1/2}\hat{\mathbf B}\mathbf D_c^{1/2}\approx\sum_{k=1}^{s}\widetilde{\lambda}_k(\mathbf D_r^{1/2}\widetilde{\mathbf u}_k)(\mathbf D_c^{1/2}\widetilde{\mathbf v}_k)^T\] and the error of approximation is \[\sum_{k=s+1}^{J}\widetilde{\lambda}_k^2\] The term \(\mathbf r\mathbf c^T\) always provides the best rank one approximation to the correspondence matrix \(\mathbf P\), this corresponds to the assumption of independence of the rows and columns. Let \(\widetilde{\mathbf u}_1=\mathbf D_r^{1/2}\mathbf 1_I\) and \(\widetilde{\mathbf v}_1=\mathbf D_c^{1/2}\mathbf 1_J\) Then \[\widetilde{\mathbf u}_1^T(\mathbf D_r^{-1/2}\mathbf P\mathbf D_c^{-1/2})=(\mathbf D_r^{1/2}\mathbf 1_I)^T(\mathbf D_r^{-1/2}\mathbf P\mathbf D_c^{-1/2})=\mathbf 1_I^T\mathbf P\mathbf D_c^{-1/2}\\ =\mathbf c^T\mathbf D_c^{-1/2}=[\sqrt{c_1},\sqrt{c_2},\cdots,\sqrt{c_J}]=(\mathbf D_c^{1/2}\mathbf 1_J)^T=\widetilde{\mathbf v}_1^T\] and \[(\mathbf D_r^{-1/2}\mathbf P\mathbf D_c^{-1/2})\widetilde{\mathbf v}_1=(\mathbf D_r^{-1/2}\mathbf P\mathbf D_c^{-1/2})(\mathbf D_c^{1/2}\mathbf 1_J)=\mathbf D_r^{-1/2}\mathbf P\mathbf 1_J\\ =\mathbf D_r^{-1/2}\mathbf r=\begin{bmatrix} \sqrt{r_1}\\ \sqrt{r_2}\\ \vdots\\ \sqrt{r_I} \end{bmatrix}=\mathbf D_r^{1/2}\mathbf 1_I=\widetilde{\mathbf u}_1\] that is \((\widetilde{\mathbf u}_1,\widetilde{\mathbf v}_1)\) are singular vectors associated with singular value \(\widetilde{\lambda}_1=1\) For any correspondence matrix \(\mathbf P\), the \(\text{rank}=1\) approximation to \(\mathbf P\) is then given by \[\hat{\mathbf P}=\mathbf D_r^{1/2}\mathbf u_1\mathbf v_1^T\mathbf D_c^{1/2}=\mathbf D_r\mathbf 1_I\mathbf 1_J^T\mathbf D_c=\mathbf r\mathbf c^T\] Then \(\mathbf P\) can always be expressed as \[\mathbf P=\mathbf r\mathbf c^T+\sum_{k=2}^{J}\widetilde{\lambda}_k(\mathbf D_r^{1/2}\widetilde{\mathbf u}_k)(\mathbf D_c^{1/2}\widetilde{\mathbf v}_k)^T\] Or equivalently \[\mathbf D_r^{-1/2}(\mathbf P-\mathbf r\mathbf c^T)\mathbf D_c^{-1/2}=\sum_{k=2}^{J}\widetilde{\lambda}_k\widetilde{\mathbf u}_k\widetilde{\mathbf v}_k^T\] which is the singular-value decomposition of \(\mathbf D_r^{-1/2}(\mathbf P-\mathbf r\mathbf c^T)\mathbf D_c^{-1/2}\) in terms of the singular values and vectors obtained from \(\mathbf D_r^{-1/2}\mathbf P\mathbf D_c^{-1/2}\) In terms of the singular value decomposition for \(\mathbf D_r^{-1/2}(\mathbf P-\mathbf r\mathbf c^T)\mathbf D_c^{-1/2}\) is \[\mathbf D_r^{-1/2}(\mathbf P-\mathbf r\mathbf c^T)\mathbf D_c^{-1/2}=\sum_{k=1}^{J-1}\lambda_k\mathbf u_k\mathbf v_k^T\] and the best rank \(K\) approximation to \[\mathbf P-\mathbf r\mathbf c^T\approx\sum_{k=1}^{K}\lambda_k(\mathbf D_r^{1/2}\mathbf u_k)(\mathbf D_c^{1/2}\mathbf v_k)^T\]