Introduction to Matrix Exponentials

Important Note: If you see funny characters in your display, and if you are using Xwindows (for instance, under linux), please see the following file for information on configuring your browser: Xfonts (unfortunately, printing will still give funny fonts - please use the pdf or postscript versions of this document for printing!)

1 Definition of Exponentials

A system of autonomous linear differential equations can be written as

= A Y

where A is an n by n matrix and Y = Y(t) is a vector listing the n dependent variables. (In most of what we'll do, we take n = 2, since we study mainly systems of 2 equations, but the theory is the same for all n.)

If we were dealing with just one linear equation

y� = ay

then the general solution of the equation would be e^at. It turns out that also for vector equations the solution looks like this, provided that we interpret what we mean by ``e^At'' when A is a matrix instead of just a scalar. How to define e^At? The most obvious procedure is to take the power series which defines the exponential, which as you surely remember from Calculus is

e^x = 1 + x +

x² +

x³ + �+

x^k + �

and just formally plug-in x = At. (The answer should be a matrix, so we have to think of the term ``1'' as the identity matrix.) In summary, we define :

e^At = I + At +

(At)² +

(At)³ + �+

(At)^k + �

where we understand the series as defining a series for each coefficient. One may prove that:

e^A(t+s) = e^At e^As for all s,t .

(1)

and therefore, since (obviously) e^A0 = I, using s = -t gives

e^-At = (e^At)^-1

(2)

(which is the matrix version of e^-x = ¹/_x). We now prove that this matrix exponential has the following property:

d e^At

= A e^At = e^At A

(3)

for every t. Proof: Let us differentiate the series term by term:

d e^At

�
�
�

I + At +

(At)² +

(At)³ + �+

(At)^k + �

�
�
�

0 + A + A²t +

A³t² + �+

(k-1)!

A^k t^k-1 +�

�
�
�

I + At +

(At)² +

(At)³ + �+

(At)^k + �

�
�
�

A e^At

and a similar proof, factoring A on the right instead of to the left, gives the equality between the derivative and e^At A. (Small print: the differentiation term-by-term can be justified using facts about term by term differentiation of power series inside their domain of convergence.) The property (3) is the fundamental property of exponentials of matrices. It provides us immediately with this corollary:

The initial value problem [(dY)/( dt)] = A Y, Y(0) = Y₀ has the unique solution Y(t) = e^At Y₀.

We can, indeed, verify that the formula Y(t) = e^At Y₀ defines a solution of the IVP:

d Y(t)

d e^AtY₀

d e^At

Y₀ = (A e^At)Y₀ = A (e^AtY₀) = A Y(t) .

(That it is the unique, i.e., the only, solution is proved as follows: if there were another solution Z(t) of the same IVP, then we could let W(t) = Y(t)-Z(t) and notice that W� = Y�-Z� = A (Y-Z) = AW, and W(0) = Y(0)-Z(0) = 0. Letting V(t) = e^-AtW(t), and applying the product rule, we have that

V� = -Ae^-AtW + e^-AtW� = -e^-AtAW + e^-AtAW = 0

so that V must be constant. Since V(0) = W(0) = 0, we have that V must be identically zero. Therefore W(t) = e^AtV(t) is also identically zero, which because W = Y-Z, means that the functions Y and Z are one and the same, which is what we claimed.)

So we have, in theory, solved the general linear differential equation. A potential problem is, however, that it is not always easy to calculate e^At.

1.1 Some Examples

We start with this example:

A =

�
�
�

(4)

We calculate the series by just multiplying A by t:

At =

�
�
�

and now calculating the powers of At. Notice that, because At is a diagonal matrix, its powers are very easy to compute: we just take the powers of the diagonal entries (why? if you don't understand, stop and think it over right now). So, we get

e^At =

�
�
�

t²

(2t)²

�
�
�

t³

(2t)³

�
�
�

+ � +

�
�
�

t^k

(2t)^k

�
�
�

+ �

and, just adding coordinate-wise, we obtain:

e^At =

�
�
�
�
�
�
�

1 + t +

t² +

t³ + �+

t^k + �

1 + 2t +

(2t)² +

(2t)³ + �+

(2t)^k + �

�
�
�
�
�
�
�

which gives us, finally, the conclusion that

e^At =

�
�
�

e^t

e^2t

�
�
�

So, in this very special case we obtained the exponential by just taking the exponentials of the diagonal elements and leaving the off-diagonal elements zero (observe that we did not end up with exponentials of the non-diagonal entries, since e⁰ = 1, not 0).

In general, computing an exponential is a little more difficult than this, and it is not enough to just take exponentials of coefficients. Sometimes things that seem surprising (the first time that you see them) may happen. Let us take this example now:

A =

�
�
�

-1

�
�
�

(5)

To start the calculation of the series, we multiply A by t:

At =

�
�
�

-t

�
�
�

and again calculate the powers of At. This is a little harder than in the first example, but not too hard:

(At)² =

�
�
�

-t²

�
�
�

(At)³ =

�
�
�

-t³

t³

�
�
�

(At)⁴ =

�
�
�

t⁴

�
�
�

(At)⁵ =

�
�
�

t⁵

-t⁵

�
�
�

(At)⁶ =

�
�
�

-t⁶

�
�
�

and so on. We won't compute more, because by now you surely have recognized the pattern (right? ). We add these up (not forgetting the factorials, of course):

e^At =

�
�
�

-t

�
�
�

-t²

�
�
�

-t³

t³

�
�
�

t⁴

�
�
�

+ �

and, just adding each coordinate, we obtain:

e^At =

�
�
�
�
�
�
�

1 -

t²

t⁴

- �

t -

t³

t⁵

- �

-t +

t³

t⁵

+ �

1 -

t²

t⁴

- �

�
�
�
�
�
�
�

which gives us, finally, the conclusion that

e⁽

-1

) t = e^At =

�
�
�

cost

sint

-sint

cost

�
�
�

It is remarkable that trigonometric functions have appeared. Perhaps we made a mistake? How could we make sure? Well, let us check that property (3) holds (we'll check only the first equality, you can check the second one). We need to test that

�
�
�

cost

sint

-sint

cost

�
�
�

= A

�
�
�

cost

sint

-sint

cost

�
�
�

(6)

Since (sint)� = cost, and (cost)� = -sint, we know that

�
�
�

cost

sint

-sint

cost

�
�
�

-sint

cost

-cost

-sint

�
�
�

and, on the other hand, multiplying matrices:

�
�
�

-1

�
�
�

cost

sint

-sint

cost

�
�
�

-sint

cost

-cost

-sint

�
�
�

so we have verified the equality (6).

As a last example, let us take this matrix:

A =

�
�
�

(7)

Again we start by writing

At =

�
�
�

and calculating the powers of At. It is easy to see that the powers are:

(At)^k =

�
�
�

t^k

kt^k

t^k

�
�
�

since this is obviously true for k = 1 and, recursively, we have

(At)^k+1 = (At)^k A =

�
�
�

t^k

kt^k

t^k

�
�
�

t^kt

t^kt+kt^kt

t^kt

�
�
�

t^k+1

(k+1)t^k+1

t^k+1

�
�
�

Therefore,

e^At

�
�
k = 0

�
�
�

t^k/k!

kt^k/k!

t^k/k!

�
�
�

�
�
�
�
�
�

�
�
k = 0

t^k/k!

�
�
k = 0

kt^k/k!

�
�
k = 0

t^k/k!

�
�
�
�
�
�

�
�
�

e^t

te^t

e^t

�
�
�

To summarize, we have worked out three examples:

The first example (4) is a diagonal matrix, and we found that its exponential is obtained by taking exponentials of the diagonal entries.
The second example (5) gave us an exponential matrix that was expressed in terms of trigonometric functions. Notice that this matrix has imaginary eigenvalues equal to i and -i, where i = [�(-1)].
The last example (7) gave us an exponential matrix which had a nonzero function in the (1,2)-position. Notice that this nonzero function was not just the exponential of the (1,2)-position in the original matrix. That exponential would give us an e^t term. Instead, we got a more complicated te^t term.

In a sense, these are all the possibilities. Exponentials of all two by two matrices can be obtained using functions of the form e^at, te^at, and trigonometric functions (possibly multiplied by e^at). (Not only that, but exponentials of any size matrices, not just 2 by 2, can be expressed using just polynomial combinations of t, scalar exponentials, and trigs. We will not quite prove this fact here; see any linear algebra book for the details.)

Calculating exponentials using power series is OK for very simple examples, and important to do a few times, so that you understand what this all means. But in practice, one uses very different methods for computing matrix exponentials. (remember how you first saw the definition of derivative using limits of incremental quotients, and computed some derivatives in this way, but soon learned how to use ``the Calculus'' to calculate derivatives of complicated expressions using the multiplication rule, chain rule, and so on.)

2 Computing Matrix Exponentials

We wish to calculate e^At. The key concept for simplifying the computation of matrix exponentials is that of matrix similarity . Suppose that we have found two matrices, L and S, where S is invertible,such that this formula holds:

A = S L S^-1

(8)

(one says that A and L are similar matrices). Then, we claim, it is true that also:

e^At = S e^{L t} S^-1

(9)

for all t. Therefore, if the matrix L is one for which e^{L t} is easy to find (for example, if it is a diagonal matrix), we can then multiply by S and S^-1 to get e^At. To see why (9) is a consequence of (8), we just notice that At = S (L t) S^-1 and we have the following ``telescopic'' property for powers:

(At)^k = [S(L t)S^-1][S(L t)S^-1]�[S(L t)S^-1] = S(L t)^kS^-1

since all the in-between pairs S^-1S cancel out. Therefore,

e^At

I + At +

(At)² +

(At)³ + �+

(At)^k + �

I + S(L t)S^-1 +

S(L t)²S^-1 +

S(L t)³S^-1 + �+

S(L t)^kS^-1 + �

�
�
�

I + L t +

(L t)² +

(L t)³ + �+

(L t)^k + �

�
�
�

S^-1

S e^{L t} S^-1

as we claimed.

The basic theorem is this one:

Theorem. For every n by n matrix A, one can find an invertible matrix S and an upper triangular matrix L such that (8) holds.

Remember that an upper triangular matrix is one that has the following form:

�
�
�
�
�
�
�
�
�
�
�
�

l₁

�

l₂

�

l₂

�

l_n-1

�

l_n

�
�
�
�
�
�
�
�
�
�
�
�

where the stars are any numbers. The numbers l₁, �, l_n turn out to be the eigenvalues of A.

There are two reasons that this theorem is interesting. First, it provides a way to compute exponentials, because it is not difficult to find exponentials of upper triangular matrices (the example (7) is actually quite typical) and second because it has important theoretical consequences.

(Two stronger theorems are possible. One is the ``Jordan canonical form'' theorem, which provides a matrix L that is not only upper triangular but which has an even more special structure. Jordan canonical forms are not very useful from a computational point of view, because they are what is known in numerical analysis as ``numerically unstable'', meaning that small perturbations of A can give one totally different Jordan forms. A second strengthening is the ``Schur unitary triangularization theorem'' which says that one can pick the matrix S to be unitary. This last theorem is extremely useful in practice, and is implemented in many numerical algorithms.)

We do not prove the theorem here in general, but only show it for n = 2; the general case can be proved in much the same way, by means of a recursive process.

We start the proof by remembering that every matrix has at least one eigenvalue, let us call it l, and an associate eigenvector, v. That is to say, v is a vector different from zero, and

Av = lv .

(10)

To find l, we find a root of the characteristic equation

det

( lI - A ) = 0

which, for two-dimensional systems is the same as the equation

l² - trace(A) l+

det

(A) = 0

and, recall,

trace

�
�
�

a+d

det

�
�
�

ad-bc .

(There are, for 2 by 2 matrices, either two real eigenvalues, one real eigenvalue with multiplicity two, or two complex einegvalues. In the last case, the two complex eingenvalues must be conjugates of each other.) An eigenvector associated to an eigenvalue l is then found by solving the linear equation

(A - lI) v = 0

(there are an infinite number of solutions; we pick any nonzero one).

With an eigenvalue l and eigenvector v found, we next pick any vector w with the property that the two vectors v and w are linearly independent. For example, if

v =

�
�
�

and a is not zero, we can take

w =

�
�
�

(what would you pick for w is a were zero?). Now, since the set {v,w} forms a basis of two-dimensional space, we can find coefficients c and d so that

Aw = cv + dw .

(11)

We can summarize both (10) and (11) in one matrix equation:

A (v w) = (v w)

�
�
�

We let S = (v w) and

L =

�
�
�

Then,

AS = SL

which is the same as what we wanted to prove, namely A = SL S^-1. Actually, we can even say more. It is a fundamental fact in linear algebra that, if two matrices are similar, then their eigenvalues must be the same. Now, the eigenvalues of L are l and d, because the eigenvalues of any triangular matrix are its diagonal elements. Therefore, since A and L are similar, d must be also an eigenvalue of A.

3 The Three Cases for n = 2

The following special cases are worth discussing in detail:

A has two different real eigenvalues.
A has two complex conjugate eigenvalues.
A has a repeated real eigenvalue.

In cases 1 and 2, one can always find a diagonal matrix L . To see why this is true, let us go back to the proof, but now, instead of taking just any linearly independent vector w, let us pick a special one, namely an eigenvector corresponding to the other eigenvalue of A:

Aw = mw .

This vector is always linearly independent of v, so the proof can be completed as before. Notice that L is now diagonal, because d = m and c = 0.

(Proof that v and w are linearly independent: if av+bw = 0, then alv+bmw = A(av+bw) = 0. On the other hand, multiplying av+bw = 0 by l we would have alv+blw = 0. Subtracting we would obtain b(l-m)w = 0, and as l-m � 0 we would arrive at the conclusion that bw = 0. But w, being an eigenvector, is required to be nonzero, so we would have to have b = 0. Plugging this back into our linear dependence would give av = 0, which would require a = 0 as well. This shows us that there are no nonzero coefficients a and b for which av+bw = 0, which means that the eigenvectors v and w are linearly independent.)

Notice that in cases 1 and 3, the matrices L and S are both real. In case 1, we will interpret the solutions with initial conditions on the lines that contain v and w as ``straight line solutions'' and this is the subject of Section 3.2 in the book.

In case 2, the matrices L and S are, in general, not real. Note that, in case 2, if Av = lv, taking complex conjugates gives

_
v

and we note that

� l

because l is not real. So, we can always pick w = the conjugate of w. It will turn out that solutions can be re-expressed in terms of trigonometric functions - remember example (5) - as we'll see in the next section and in Section 3.4 of the book.

Now let's consider Case 3 (the repeated real eigenvalue). We have that

L =

�
�
�

so we can also write L = lI+cN, where N is the following matrix:

N =

�
�
�

Observe that:

(lI+cN)² = (lI)²+c²N²+2lcN = l²I + 2lcN

(because N² = 0) and, for the general power k, recursively:

(lI+cN)^k

(l^k-1I + (k-1)l^k-2cN)(lI+cN)

l^kI + (k-1)l^k-1cN + l^k-1cN + (k-1)l^k-2a²N²

l^kI + kl^k-1cN

(lI+cN)^kt^k = (l^kI + kl^k-1cN)t^k =

�
�
�

l^kt^k

kl^k-1ct^k

l^kt^k

�
�
�

and therefore

e^{L t} =

�
�
�

e^lt

cte^lt

e^lt

�
�
�

(12)

because 0 + ct + (2lc)t²/2 + (3l²c)t³/6! + � = cte^lt. (This generalizes the special case in example (7).)

4 A Shortcut

If we just want to find the form of the general solution of Y� = AY, we do not need to actually calculate the exponential of A and the inverse of the matrix S.

Let us first take the cases of different eigenvalues (real or complex, that is, cases 1 or 2, it doesn't matter which one). As we saw, L can be taken to be the diagonal matrix consisting of these eigenvalues (which we call here l and m instead of l₁ and l₂), and S = (v w) just lists the two eigenvectors as its columns. We then know that the solution of every initial value problem Y� = AY, Y(0) = Y₀ will be of the following form:

Y(t) = e^At Y₀ = S e^{L t} S^-1 Y₀ = (v w)

�
�
�

e^lt

e^mt

�
�
�

= a e^ltv + b e^mtw

where we just wrote S^-1Y₀ as a column vector of general coefficients a and b. In conclusion:

The general solution of Y� = AY, when A has two eigenvalues l and m with respective eigenvectors v and w, is of the form

a e^ltv + b e^mtw
(13)
for some constants a and b.

So, one approach to solving IVP's is to first find eigenvalues and eigenvectors, write the solution in the above general form, and then plug-in the initial condition in order to figure out what are the right constants. Section 3.2 in the book gives us lots of practice with this procedure.

In the case of non-real eigenvalues, recall that we showed that the two eigenvalues must be conjugates of each other, and the two eigenvectors may be picked to be conjugates of each other. Let us show now that we can write (13) in a form which does not involve any complex numbers. In order to do so, we start by decomposing the first vector function which appears in (13) into its real and imaginary parts:

e^ltv = Y₁(t) + i Y₂(t)

(14)

(let us not worry for now about what the two functions Y₁ and Y₂ look like). Since m is the conjugate of l and w is the conjugate of v, the second term is:

e^mtw = Y₁(t) - i Y₂(t) .

(15)

So we can write the general solution shown in (13) also like this:

a(Y₁+iY₂) + b(Y₁-iY₂) = (a+b) Y₁ + i(a-b)Y₂ .

(16)

Now, it is easy to see that a and b must be conjugates of each other. (Do this as an optional homework problem. Use the fact that these two coefficents are the components of S^-1 Y₀, and the fact that Y₀ is real and that the two columns of S are conjugates of each other.) This means that both coefficients a+b and i(a-b) are real numbers. Calling these coefficients ``k₁'' and ``k₂'', we can summarize the complex case like this:

The general solution of Y� = AY, when A has a non-real eigenvalue l with respective eigenvector v, is of the form

k₁ Y₁(t) + k₂ Y₂(t)
(17)
for some real constants k₁ and k₂. The functions Y₁ and Y₂ are found by the following procedure: calculate the product e^ltv and separate it into real and imaginary parts as in Equation (14).

What do Y₁ and Y₂ really look like? This is easy to answer using Euler's formula, which gives

e^lt = e^at+ibt = e^at(cosbt + isinbt) = e^atcosbt +i e^atsinbt

where a and b are the real and imaginary parts of l respectively. This is what we do in Section 3.4 of the book.

Finally, in case 3 (repeated eigenvalues) we can write, instead:

Y(t) = e^At Y₀ = S e^{L t} S^-1 Y₀

(v w)

�
�
�

e^lt

cte^lt

e^mt

�
�
�

a e^ltv + b e^lt(c tv + w) .

When c = 0 we have from A = SL S^-1 that A must have been the diagonal matrix

�
�
�

to start with (because S and L commute). When c � 0, we can write k₂ = bc and redefine w as ¹/_cw. Note that then (11) becomes Aw = v + lw, that is, (A-lI)w = v. Any vector w with this property is linearly independent from v (why?).

So we conclude, for the case of repeated eigenvalues:

The general solution of Y� = AY, when A has a repeated (real) eigenvalue l is either of the form e^ltY₀ (if A is a diagonal matrix) or, otherwise, is of the form

k₁ e^ltv + k₂ e^lt(tv + w)
(18)
for some real constants k₁ and k₂, where v is an eigenvector corresponding to l and w is any vector which satisfies (A-lI)w = v.

Observe that (A-lI)² w = (A-lI)v = 0. general, one calls any nonzero vector such that (A-lI)^k w = 0 a generalized eigenvector (of order k) of the matrix A (since, when k = 1, we have eigenvectors).

5 Forcing Terms

The use of matrix exponentials also helps explain much of what is done in chapter 4 (forced systems), and renders Laplace transforms unnecessary. Let us consider non-homogeneous linear differential equations of this type:

(t) = A Y(t) + u(t) .

(19)

We wrote the arguments ``t'' just this one time, to emphasize that everything is a function of t, but from now on we will drop the t's when they are clear from the context.

Let us write, just as we did when discussing scalar linear equations , Y�-AY = u. We consider the ``integrating factor'' M(t) = e^-At. Multiplying both sides of the equation by M, we have, since (e^-AtY)� = e^-AtY�- e^-AtAY (right?):

de^-AtY

= e^-Atu .

Taking antiderivatives:

e^-AtY =

�
�

t

0

e^-Asu(s) ds + Y₀

for some constant vector Y₀. Finally, multiplying by e^-At and remembering that e^-Ate^At = I, we conclude:

Y(t) = e^At Y₀ + e^At

�
�

t

0

e^-Asu(s) ds .

(20)

This is sometimes called the ``variation of parameters'' form of the general solution of the forced equation (19). Of course, Y₀ = Y(0) (just plug-in t = 0 on both sides).

Notice that, if the vector function u(t) is a polynomial in t, then the integral in (20) will be a combination of exponentials and powers of t (integrate by parts). Similarly, if u(t) is a combination of trigonometric functions, the integral will also combine trigs and polynomials. This observation justifies the ``guesses'' made for forced systems in chapter 4 (they are, of course, not guesses, but consequences of integration by parts, but the book would lead you to believe otherwise).

6 Exercises

In each of the following, factor the matrix A into a product SL S^-1, with L diagonal:

a.        A = �
�
�

1

1
0

0
�
�
�

b.        A = �
�
�

5

6
-1

-2
�
�
�

c.        A = �
�
�

2

-8
1

-4
�
�
�

d.        A = �
�
�
�
�
�

2

2

1
0

1

2
0

0

-1
�
�
�
�
�
�
For each of the matrices in Exercise 1, use the SL S^-1 factorization to calculate A⁶ (do not just mutiply A by itself).
For each of the matrices in Exercise 1, use the SL S^-1 factorization to calculate e^At.
Calculate e^At for this matrix:

�
�
�
�
�
�

0

1

2
0

0

1
0

0

0
�
�
�
�
�
�

using the power series definition.
Consider these matrices:

A = �
�
�

1

1
0

0
�
�
� B = �
�
�

0

-1
0

0
�
�
�

and calculate e^At, e^Bt, and e^(A+B)t.
Answer, true or false: is e^Ate^Bt = e^(A+B)t?
(Challenge problem) Show that, for any two matrices A and B, it is true that

e^Ate^Bt = e^(A+B)t for all t

if and only if AB-BA = 0. (The expression ``AB-BA'' is called the ``Lie bracket'' of the two matrices A and B, and it plays a central role in the advanced theory of differential equations.)

File translated from T_EX by T_TH, version 2.00.
On 16 Jan 2000, 17:44.

This page was last updated on August 01, 2017 at 11:16 am and is maintained by webmaster@math.rutgers.edu.

For questions regarding courses and/or special permission, please contact ugoffice@math.rutgers.edu.

For questions or comments about this site, please contact help@math.rutgers.edu.