Lecture Notes on Eigenvalues, Eigenvectors, and Matrix Decompositions

Author

Your Name

Published

February 3, 2025

Introduction

This lecture explores the fundamental concepts of eigenvalues and eigenvectors, their properties, and their significance in linear algebra. We will examine the linear independence of eigenvectors, the special characteristics of symmetric matrices, and the geometric interpretation of eigenvalues and eigenvectors. We will also discuss important matrix decompositions, including Cholesky decomposition, diagonalization, and Singular Value Decomposition (SVD), highlighting their computational aspects and applications in fields such as data analysis and machine learning. This lecture aims to provide a comprehensive understanding of these essential linear algebra tools and their practical implications.

Eigenvalues and Eigenvectors

Linear Independence of Eigenvectors

The eigenvectors \(\mathbf{x}_1, \dots, \mathbf{x}_n\) of a matrix \(A \in \mathbb{R}^{n \times n}\) with \(n\) distinct eigenvalues \(\lambda_1, \dots, \lambda_n\) are linearly independent.

Theorem [thm:linear_independence_eigenvalues] states that if a matrix has distinct eigenvalues, the corresponding eigenvectors will always be linearly independent. This is a crucial property because linearly independent vectors can form a basis for a vector space. For an \(n \times n\) matrix, finding \(n\) distinct eigenvalues guarantees \(n\) linearly independent eigenvectors, which can form a basis for \(\mathbb{R}^n\).

If a matrix does not have \(n\) linearly independent eigenvectors (which can happen when eigenvalues are repeated), it is called a defective matrix. For defective matrices, it is not possible to form a basis of eigenvectors for the entire vector space.

Repeated Eigenvalues and Defective Matrices

When eigenvalues are repeated, the corresponding eigenvectors may or may not be linearly independent. A matrix is called defective if it does not possess a complete set of linearly independent eigenvectors, meaning it’s impossible to find \(n\) linearly independent eigenvectors for an \(n \times n\) matrix.

Symmetric Matrices and the Spectral Theorem

If \(A \in \mathbb{R}^{n \times n}\) is a symmetric matrix (i.e., \(A = \transpose{A}\)), then:

  1. All eigenvalues of \(A\) are real.

  2. There exists an orthonormal basis of \(\mathbb{R}^n\) consisting of eigenvectors of \(A\).

The Spectral Theorem ([thm:spectral_theorem]) is a cornerstone result for symmetric matrices. It guarantees two significant properties: first, the eigenvalues are real numbers, and second, we can find an orthonormal basis composed of eigenvectors. An orthonormal basis simplifies many calculations, especially in projections and coordinate transformations.

Properties of Symmetric Matrices

Real Eigenvalues: For any symmetric matrix, all eigenvalues are real numbers, which simplifies analysis as we do not need to deal with complex eigenvalues.

Orthonormal Eigenbasis: The existence of an orthonormal eigenbasis means that the eigenvectors are mutually orthogonal (perpendicular) and normalized (unit length). This is highly beneficial for simplifying calculations and understanding the geometric action of symmetric matrices.

Eigendecomposition of Symmetric Matrices

Decomposition: For a symmetric matrix \(A\), the Spectral Theorem ensures it can be diagonalized as: \[A = P D \transpose{P}\] where:

  • \(D\) is a diagonal matrix with the eigenvalues of \(A\) on the diagonal.

  • \(P\) is an orthogonal matrix whose columns are the orthonormal eigenvectors of \(A\). Since \(P\) is orthogonal, \(\inverse{P} = \transpose{P}\).

This decomposition is known as eigendecomposition or spectral decomposition.

Eigendecomposition reveals how a symmetric matrix transformation acts geometrically. It stretches or compresses space along orthogonal directions defined by the eigenvectors. The eigenvalues represent the scaling factors in these directions.

Geometric Interpretation of Eigenvalues and Eigenvectors

Eigenvectors represent the directions that remain unchanged (up to scaling) when a linear transformation is applied, and eigenvalues are the scaling factors in these directions. For a symmetric matrix, these directions are orthogonal. Geometrically, a symmetric matrix transformation stretches or compresses space along these orthogonal eigenvector directions.

Relationships Between Eigenvalues and Matrix Properties

The determinant of a matrix \(A \in \mathbb{R}^{n \times n}\) is the product of its eigenvalues, i.e., \[\det(A) = \prod_{i=1}^{n} \lambda_i\] where \(\lambda_i \in \C\) are the (possibly repeated) eigenvalues of \(A\).

Theorem [thm:determinant_eigenvalues] provides a straightforward way to compute the determinant using eigenvalues. Instead of direct computation, which can be complex for large matrices, we can find the eigenvalues and multiply them.

The trace of a matrix \(A \in \mathbb{R}^{n \times n}\) is the sum of its eigenvalues, i.e., \[\text{tr}(A) = \sum_{i=1}^{n} \lambda_i\] where \(\lambda_i \in \C\) are the (possibly repeated) eigenvalues of \(A\).

Theorem [thm:trace_eigenvalues] relates the trace of a matrix (sum of diagonal elements) to the sum of its eigenvalues. This provides another elegant connection between eigenvalues and fundamental matrix properties.

Eigenvalues and Linear Transformations

Scaling of Area/Volume by the Determinant: Eigenvalues describe how linear transformations affect area and volume. For a \(2 \times 2\) matrix with eigenvalues \(\lambda_1\) and \(\lambda_2\), the determinant \(\det(A) = \lambda_1 \lambda_2\) represents the factor by which areas are scaled under the transformation. In \(n\) dimensions, the determinant (product of all eigenvalues) scales the \(n\)-dimensional volume.

Methods to analyze and learn from network data are essential in machine learning. Connectivity matrices capture network node connections. For the worm C. elegans, a \(277 \times 277\) connectivity matrix \(A \in \mathbb{R}^{277 \times 277}\) represents neuron connections. \(A_{ij} = 1\) if neuron \(i\) connects to neuron \(j\), and \(0\) otherwise. Since \(A\) is not symmetric, eigenvalues may be complex. A symmetrized matrix \(A_{\text{sym}} = A + \transpose{A}\) is considered, where \(A_{\text{sym}_{ij}} \neq 0\) if neurons are connected regardless of direction.

Example of a simplified neural network connectivity diagram.
Example of an eigenspectrum plot for a symmetrized connectivity matrix.

Figures 1 and 2 show a simplified neural network connectivity diagram and its corresponding eigenspectrum. The S-like shape of the eigenspectrum is typical for biological neural networks and is an area of active neuroscience research.

Matrix Decompositions

Cholesky Decomposition

A symmetric, positive definite matrix \(A\) can be factorized into a product \(A = L \transpose{L}\), where \(L\) is a lower triangular matrix with positive diagonal elements. This decomposition is unique.

A symmetric matrix \(A\) is positive definite if for any non-zero vector \(\mathbf{x}\), \(\transpose{\mathbf{x}} A \mathbf{x} > 0\). Positive definite matrices are important in optimization and statistics.

Decomposition: The Cholesky decomposition expresses a symmetric positive definite matrix \(A\) as a product of a lower triangular matrix \(L\) and its transpose \(\transpose{L}\): \[A = L \transpose{L}\]

Recursive Computation of the Lower Triangular Matrix \(L\)

The matrix \(L\) can be computed recursively. For a \(3 \times 3\) matrix \(A = L \transpose{L}\), we can write: Let \[A = \begin{pmatrix} a_{11} & a_{12} & a_{13} \\ a_{21} & a_{22} & a_{23} \\ a_{31} & a_{32} & a_{33} \end{pmatrix}, \quad L = \begin{pmatrix} l_{11} & 0 & 0 \\ l_{21} & l_{22} & 0 \\ l_{31} & l_{32} & l_{33} \end{pmatrix}\] Then \(A = L\transpose{L}\) gives: \[\begin{pmatrix} a_{11} & a_{12} & a_{13} \\ a_{21} & a_{22} & a_{23} \\ a_{31} & a_{32} & a_{33} \end{pmatrix} = \begin{pmatrix} l_{11} & 0 & 0 \\ l_{21} & l_{22} & 0 \\ l_{31} & l_{32} & l_{33} \end{pmatrix} \begin{pmatrix} l_{11} & l_{21} & l_{31} \\ 0 & l_{22} & l_{32} \\ 0 & 0 & l_{33} \end{pmatrix} = \begin{pmatrix} l_{11}^2 & l_{11}l_{21} & l_{11}l_{31} \\ l_{21}l_{11} & l_{21}^2 + l_{22}^2 & l_{21}l_{31} + l_{22}l_{32} \\ l_{31}l_{11} & l_{31}l_{21} + l_{32}l_{22} & l_{31}^2 + l_{32}^2 + l_{33}^2 \end{pmatrix}\] From these equations, entries of \(L\) can be solved recursively, starting with \(l_{11} = \sqrt{a_{11}}\), then \(l_{21} = a_{21} / l_{11}\), \(l_{22}=\sqrt{a_{22}-l_{21}^2}\), and so on.

Complexity Analysis of Cholesky Decomposition

The computational complexity of Cholesky decomposition is approximately \(\frac{1}{3}n^3\) flops (floating-point operations), making it more efficient than LU decomposition for symmetric positive definite matrices.

Applications of Cholesky Decomposition

Cholesky decomposition is used in:

  • Solving linear least squares problems.

  • Monte Carlo simulations.

  • Optimization algorithms.

It is efficient for symmetric positive definite matrices because solving linear systems with triangular matrices is faster and determinant/inverse calculations are simpler.

Diagonalization

A matrix \(A \in \mathbb{R}^{n \times n}\) is diagonalizable if it is similar to a diagonal matrix, i.e., if there exists an invertible matrix \(P \in \mathbb{R}^{n \times n}\) such that \(D = \inverse{P} A P\), where \(D\) is a diagonal matrix.

The transformation \(A \mapsto \inverse{P} A P\) is called a similarity transformation. Similar matrices represent the same linear transformation under different bases.

Diagonalization: Diagonalization transforms a matrix \(A\) into a diagonal matrix \(D\) using a similarity transformation: \[D = \inverse{P} A P\] The diagonal matrix \(D\) contains the eigenvalues of \(A\), and the columns of \(P\) are the corresponding eigenvectors.

Diagonalization as a Change of Basis to the Eigenbasis

Diagonalization can be viewed as changing the basis to the eigenbasis. In the eigenbasis, the linear transformation represented by \(A\) acts simply as scaling along each eigenvector direction, which is represented by the diagonal matrix \(D\).

The Equation \(AP = PD\) and its Relation to Eigenvectors

The equation \(AP = PD\) is fundamental to diagonalization. If we write \(P = [\mathbf{p}_1, \mathbf{p}_2, \dots, \mathbf{p}_n]\) where \(\mathbf{p}_i\) are the columns of \(P\) (eigenvectors), and \(D = \text{diag}(\lambda_1, \lambda_2, \dots, \lambda_n)\) where \(\lambda_i\) are eigenvalues, then \(AP = PD\) is equivalent to the set of eigenvector equations: \[A\mathbf{p}_i = \lambda_i \mathbf{p}_i, \quad i = 1, 2, \dots, n\] Diagonalization is thus intrinsically linked to finding eigenvectors and eigenvalues.

Singular Value Decomposition (SVD)

Let \(A \in \mathbb{R}^{m \times n}\) be a rectangular matrix of rank \(r \in [0, \min(m, n)]\). The Singular Value Decomposition (SVD) of \(A\) is a factorization of the form \[A = U \Sigma \transpose{V}\] where:

  • \(U \in \mathbb{R}^{m \times m}\) is an orthogonal matrix.

  • \(\Sigma \in \mathbb{R}^{m \times n}\) is a rectangular diagonal matrix with non-negative diagonal entries \(\sigma_i\) (singular values).

  • \(V \in \mathbb{R}^{n \times n}\) is an orthogonal matrix.

Orthogonal Matrices: Matrices \(U\) and \(V\) are orthogonal, meaning their columns are orthonormal vectors. For an orthogonal matrix \(Q\), \(\transpose{Q} = \inverse{Q}\).

Rectangular Diagonal Matrix \(\Sigma\): \(\Sigma\) is an \(m \times n\) matrix with diagonal entries \(\sigma_1 \geq \sigma_2 \geq \dots \geq \sigma_r > 0\) and zeros elsewhere. The \(\sigma_i\)’s are called singular values.

Decomposition: SVD decomposes any matrix \(A\) (even rectangular) into a product of three matrices: \[A = U \Sigma \transpose{V}\] where \(U\) and \(V\) are orthogonal and \(\Sigma\) is rectangular diagonal.

Components of the SVD

  • \(U\): Columns are left singular vectors, which are eigenvectors of \(A \transpose{A}\).

  • \(V\): Columns are right singular vectors, which are eigenvectors of \(\transpose{A} A\).

  • \(\Sigma\): Diagonal entries are singular values \(\sigma_i\), where \(\sigma_i^2\) are eigenvalues of both \(\transpose{A} A\) and \(A \transpose{A}\).

Geometric Interpretation of SVD

Geometrically, SVD decomposes a linear transformation into three steps:

  1. Rotation/Reflection (by \(\transpose{V}\)).

  2. Scaling along orthogonal axes (by \(\Sigma\)).

  3. Rotation/Reflection (by \(U\)).

\(\transpose{V}\) rotates the input space, \(\Sigma\) scales along coordinate axes, and \(U\) rotates the output space.

Singular Values as Generalized Eigenvalues

Singular values are generalized eigenvalues for rectangular matrices, indicating the "strengths" of transformation in different directions.

Computation of SVD

To compute SVD:

  1. Compute \(\transpose{A} A\) and \(A \transpose{A}\).

  2. Find eigenvectors and eigenvalues of \(\transpose{A} A\) and \(A \transpose{A}\).

  3. Construct \(V\) from eigenvectors of \(\transpose{A} A\), \(U\) from eigenvectors of \(A \transpose{A}\), and \(\Sigma\) from the square roots of non-zero eigenvalues.

More efficient algorithms exist for direct SVD computation.

Applications of SVD

SVD is a versatile tool used in:

  • Dimensionality reduction.

  • Image compression.

  • Recommendation systems.

It’s often called the "Fundamental Theorem of Linear Algebra" due to its broad applicability.

Example: Movie Rating Analysis

In movie rating analysis, SVD can be applied to a matrix of movie ratings by viewers to find patterns. Left-singular vectors in \(U\) represent "stereotypical movies," right-singular vectors in \(V\) represent "stereotypical viewers," and singular values in \(\Sigma\) indicate their importance.

Low-Rank Approximation and the Eckart-Young Theorem

Consider a matrix \(A \in \mathbb{R}^{m \times n}\) of rank \(r\). For any \(k \leq r\), let \(A^{(k)} = \sum_{i=1}^{k} \sigma_i \mathbf{u}_i \transpose{\mathbf{v}}_i\) be the rank-\(k\) approximation of \(A\) obtained by truncating the SVD. Then \(A^{(k)}\) is the best rank-\(k\) approximation of \(A\) in terms of the spectral norm, and the error is given by: \[A^{(k)} = \underset{\substack{B \in \mathbb{R}^{m \times n} \\ \text{rank}(B) = k}}{\text{argmin}} \| A - B \|_2, \quad \| A - A^{(k)} \|_2 = \sigma_{k+1}\]

The Eckart-Young Theorem ([thm:eckart_young_theorem]) states that truncated SVD provides the best low-rank approximation of a matrix in the spectral norm sense.

Example: Image Compression using SVD

SVD is used for image compression. An image matrix can be approximated by a lower-rank matrix using truncated SVD, retaining only the top singular values and vectors. This reduces data while preserving essential image features.

Example of Image Compression using SVD

Figure 3 demonstrates how image quality improves with increasing rank approximations using SVD.

Conclusion

This lecture covered essential concepts in linear algebra, including eigenvalues, eigenvectors, and various matrix decompositions. We explored the properties of eigenvalues and eigenvectors, particularly for symmetric matrices as described by the Spectral Theorem. We discussed Cholesky decomposition for symmetric positive definite matrices, diagonalization for simplifying linear transformations, and Singular Value Decomposition (SVD) as a powerful tool for general matrices, including rectangular ones. Key takeaways include the relationships between eigenvalues and matrix determinants and traces, the geometric interpretations of these concepts, and the wide-ranging applications of matrix decompositions in data analysis, machine learning, and engineering. Further study could explore the computational algorithms for these decompositions and delve deeper into their applications in specific domains.