250syl.html

Lecture 19

Diagonalization of Matrices

We have already seen that for square matrices that describe the change in population distributions after a fixed period of time, the positive integer powers of those matrices are used to find population distributions after multiple time intervals. Similar statements can be made about a variety of input-output situations where the unit-time transition is described by the action of a square matrix. Unfortunately, for ageneral matrix any reasonably large size, the number of computations involved in computing its successively higher powers is phenomenally high.

On the other hand, we do know matrices whose powers are easy to compute - the diagonal ones. As is easily verified, if D is a diagonal matrix with diagonal entries d₁, . . ., d_n, then D ^k is the diagonal matrix with diagonal entries the corresponding powers d₁^k, . . ., d_n^k.

Are there more general matrices whose powers can be computed this easily? Suppose A is a square matrix which is similar (in the sense defined before) to a diagonal matrix - this means that A = PDP^-1 for some invertible matrix P. Conveniently,

A^k = (PDP^-1 )(PDP^-1 )· · · (PDP^-1 )= PD ^k P^-1 .

Thus the computation of a high power of a m atrix similarto a diagonal matrix is almost as simple to compute as that of a diagonalmatrix.

Def. An n x n matrix A is called diagonalizable if A = PDP^-1 for some diagonal n x n matrix D and some invertiblen x n matrix P.

Example

Example: a 2 x 2 matrix which is not diagonalizable.

How can we recognize whether a matrix is diagonalizable or not? We must consider how to find the diagonal matrix and the invertible matrix P, if it is diagonalizable. We are looking for P invertible, and D diagonal, satisfying the following: AP = PD. Write P in terms of its columns:

P = [p₁ . . . p_n].

Notice that the column vectors form a basis of Rⁿ, because P is invertible. Now, AP can be written in terms of its columns as

AP = [Ap₁ . . . Ap_n].

On the other hand, if the diagonal entries of D are d₁, . . ., d_n, then clearly

PD = [d₁p₁ . . . d_np_n].

Since AP and PD have the same columns, this means that each of the columns satisfies Ap_k = d_k p_k. In other words, if A is diagonalizable, then automatically the diagonal matrix D to which it is similar has eigenvalues of A as its diagonal entries, and the invertible matrix P which diagonalizes A has the corresponding eigenvectors as its entries. In fact, these statements provide a complete characterization of the diagonalizable matrices.

Diagonalization Theorem An n x n matrix A is diagonalizable if and only if there is a basis of Rⁿ consisting of eigenvectors of A. A = PDP^-1 for P invertible and D diagonal if and only if the diagonal entries of D are eigenvalues of A and the columns of P are the corresponding eigenvectors.

We have already seen that if A is diagonalizable, then D and P have the given properties, and the columns ofP provide a basis of Rⁿ consisting of eigenvectors of A. On the other hand, suppose Rⁿ has such a basis, and we form the corresponding matrix P with the basis vectors as its columns. Then AP = PD as above, where D has the corresponding eigenvalues as its diagonal entries. Thus A is diagonalizable and A = PDP^-1.

Examples

How can we be determine given an n x n matrix A, whether we can, or cannot, find a basis of Rⁿ consisting of eigenvectors? We already know an algorithm for finding the eigenvalues and eigenvectors, and that the number of linearly independent eigenvectors associated with an eigenvalue is at least one, and is at most the multiplicity of the eigenvector. Moreover, since the multiplicities of all of the (real) eigenvalues add up to at most n, if there are fewer than n real eigenvalues, or if one of the associated eigenspaces has dimension less than the multiplicity of the corresponding eigenvalue, then the total number of linearly independent eigenvectors is smaller than n, and they do not form a basis for Rⁿ. On the other hand, if there are n real eigenvalues, and all of the dimensions of the corresponding eigenspaces equal the multiplicities of the corresponding eigenvalues, then we can find a total of n eigenvectors, in linearly independent subsets corresponding to the distinct eigenvalues. We will see below that the n vectors in the union of these linearly independent subsets form a linearly independent set, and hence form a basis of Rⁿ . This means that we know exactly when a matrix is diagonalizable.

Test for Diagonalizability
An n xn matrix A is diagonalizable if and only if both of the following conditions hold:
(a) the characteristic polynomial of A has n real roots (not necessarily distinct);
(b) the dimension of the eigenspace corresponding to each eigenvalue is equal to the multiplicity of the eigenvalue.

Corollary
If the characteristic polynomial of an n x n matrix A has n distinct real roots, then A is diagonalizable.

(Indeed, if the real roots are distinct, then each has multiplicity one, and we know that the eigenspaces then each have dimension one.)

Examples

The only point in the proof of the diagonalizability testwhich we have not yet established is the statement that, if S ₁, . . ., S _k are lin early independent subsets of eigenspaces corresponding to distinct eigenvalues l₁, . . ., l_k of the n x n matrix A, then their union is linearly independent. To see this, suppose that the vectors were linearly dependent. Then we can find a "shortest" dependence relationship (counting the nonzero coeficients), and it must involve vectors from at least two of the different eigenspaces, say the first and last. Then with the obvious notation we have

c₁v₁ + · · · + c_kv _k = 0.

Hence A(c₁v₁ + · · · + c_kv _k ) = 0, and using the fact that these vectors are eigenvectors we have

l₁c₁v ₁ + · · · + l_kc_kv_k = 0.

We multiply the earlier equation by l_k and then subtract the result from the last equation, leaving

(l_k - l₁) c₁ v₁ + · · · + (l_k - l_k) c_k v_k = 0.

This last coefficient is zero, but the first coefficient is not, so we are left with a shorter dependence relationship than the one we had! Now since we started with the the shortest possible relationship, the original supposition of linear dependence must be false, and the result is proved.

$Back$ 250 Lecture Index