Math 421 diary, fall 2005

Math 421 diary, fall 2005
Linear Algebra

Confession

Thursday, November 3

Linear algebra
terms

Exam!
The second in-class exam will be on Thursday, November 17. Further information will be shown soon. I also plan to have a review session the evening before the exam, on Wednesday, November 16, with an exact time and place to be announced.

Inner product review
If x=(x₁,x₂,...,x_n) y=(y₁,y₂,...,y_n) are vectors in Rⁿ, then <x,y>=SUM_j=1ⁿx_jy_j. There are some very useful fundamental properties of inner product.

<cx,y>=c<x,y> for any scaler c.
<x+w,y>=<x,y>=<w,y>. (This and the previous property are called linearity.)
<x,y>=<y,x> (Commutativity).
<x,x> is non-negative, and is 0 only when x=0.

Vectors x and y in Rⁿ are orthogonal or perpendicular if =0. I tried to persuade you last time that eigenvectors associated to distinct eigenvalues of a symmetric matrix had to be orthogonal. This is only part of a beautiful and very useful result about symmetric matrices. I tried to introduce this result by computing with an example.

Significant example
I wanted to diagonalize the matrix A given by

( 0 -1 1)
(-1  1 2)
( 1  2 1)

I will need (if possible, but it will be possible!) to find an invertible matrix C and a diagonal matrix D so that C^-1AC=D. The diagonal entries in D will be the eigenvalues of A, in order, and the columns of C will be corresponding eigenvectors of A. the characteristic polynomial of C is

   (-  -1   1)
det(-1 1-   2)
   ( 1   2 1-)

which I computed somehow and it was what I thought it would be. Maple has the following command:

>charpoly(A,x);
                          3      2
                         x  - 2 x  - 5 x + 6

I found one eigenvector, and students, as their QotD, found the others.
If =1, take (2,-1,1)
If =-2, take (-1,-1,1)
If =3, take (0,1,1).
I asked students to look at these eigenvectors. They did, and we didn't seem to learn much.

The C matrix is

( 2 -1 0)
(-1 -1 1)
( 1  1 1)

Now look at C^t:

( 2 -1 1)
(-1 -1 1)
( 0  1 1)

We computed the matrix product C^tC. The result was

(6 0 0)
(0 3 0)
(0 0 2)

The 0's were caused by the orthogonality of the different vectors in the basis of eigenvectors. If the result had diagonal entries 1, 1, and 1, then the transpose would be the inverse. But I can make this happen by choosing slightly different eigenvectors: multiplying them by a scalar ("normalizing" them) to have length 1. So choose C to be, instead,

( 2/sqrt(6) -1/sqrt(3)    0     )
(-1/sqrt(6)  1/sqrt(3) 1/sqrt(2))
(-1/sqrt(6)  1/sqrt(3) 1/sqrt(2))

and then the transpose of C will be C^-1. This really is remarkable.

The major result
We can always diagonalize any symmetric matrix by using a matrix of normalized eigenvalues. The eigenvalues are orthogonal, and the resulting matrix used for changing bases has the wonderful property that its inverse is its transpose. A matrix C such that C^-1=C^t is called orthogonal.
Here's another version:

Every symmetric matrix has a basis of orthogonal eigenvectors, which, when normalized, form an orthogonal matrix. The matrix of eigenvectors and its {inverse|transpose} diagonalize the symmetric matrix.

This result is not easy to prove, and is usually the triumph (?) of our Math 250 course.

Why is an orthonormal basis (length of each basis element 1 and each perpendicular to the other) nice to compute with? Well, look:

Using a random basis
The vectors v₁=(3,1,2) and v₂=(-1,3,2) and v₃=(2,1,2) form a basis of R³. (You can check this by computing the determinant of the matrix obtained by sticking the vectors together to make a 3-by-3 matrix: the determinant is 4 and since this is not 0, the matrix has rank 3). This means that any vector in R³ can be written as a linear combination of v₁, v₂, and v₃. Suppose the vector is (44,39,16). What is the linear combination? Uhhh ... with some effort, Maple and I came up with
(44,39,16)=(149/2)v₁+(31/2)v₂-82v_2.

Using an orthonormal basis
The vectors e₁=(2/sqrt(6),-1/sqrt(6),-1/sqrt(6)) and e₂=(-1/sqrt(3),1/sqrt(3),1/sqrt(3)) and e₃=(0,1/sqrt(2),1/sqrt(2)) form an orthonormal basis. Now if I want to write
(44,39,16)=ae₁+be₂+ce₃ just take the inner product with respect to e₁ and use properties 1 and 2 of inner product above: <(44,39,16),e₁>=a<e₁,e_{1>+b<e₂,e_{1>+c<e₃,e_{1>=a.
This is because <e_i,e_{j>
equals 0 if i is different from j and equals 1 if they are the
same.

Therefore a=<(44,39,16),(2/sqrt(6),-1/sqrt(6),-1/sqrt(6))>=(44·2-39-16)/sqrt(6), similarly,
b=<(44,39,16),e₂> and
c=<(44,39,16),e₃>.
It is very easy to find the coefficients in the linear combination
when we have linear combinations of orthonormal vectors.

Transition...

It turns out that this way of looking at symmetric matrices, using
orthogonal change of basis to get a diagonal matrix, was so successful
that it has been applied to many things, in particular to the study of
ordinary and partial differential equations. The setting might seem
initially somewhat strange to you, but as your understanding
increases, as I hope it will, you should see the analogy more
clearly.

HOMEWORK

Please read sections 8.8, 8.10, and 8.12 of the text. The following
problems almost surely can be done easily by Maple or
Matlab or any grown-up calculator. I strongly urge
students to do these problems by hand and get the correct
answers.

8.8: 13, 19.
8.10: 15
8.12: 17, 39.

Confession

Monday, October 31

Linear algebra
terms

The instructor wrote lots of definitions on the board (eigenvector,
eigenvalue, characteristic polynomial ...). Where can all this information be found?!

Thanks to Ms. Rose for putting a
solution to the previous QotD on the board. That was the
following:

Find an eigenvector associated to the eigenvalue 2 for the matrix

(5 2)

(3 4)

So for this we need to solve
(5-2 2 )(x₁) = (0)
( 3 4-2)(x₂) (0)
This system is just3x₁+2x₂=0
3x₁+2x₂=0
and we need to find a non-trivial solution. Well, these are all
multiples of (1,-3/2). I chose (2,-3) as my eigenvector in what
follows.

Now what?

Well, we know that if A is the 2-by-2 matrix

(5 2)

(3 4)

then A has eigenvalue 2 with one associated eigenvector (2,-3) and
eigenvalue 7 with one associated eigenvector (1,1). This is neat.
Here is what you should consider.

Start with R², and (thinking about
(x₁,x₂) as a column vector) multiply by A,
getting another vector in R².

I have a basis of eigenvectors. Maybe I can rethink multiplication by
A into three stages. Here I will start with a vector described in
terms of the standard basis (1,0) and (0,1).

Change information to be in terms of the basis of eigenvectors.
Multiply by A.
Change back to the standard basis information.

Now I will try to implement this scheme.

I want to change (1,0) as a column vector to (1,1) and change (0,1) as
a column vector to (2,-3). So I need to find a matrix C (for change
of basis) which does the
following:
(a b)(1) is ( 2) and (a b)(0) is (1).
(c d)(0) (-3) (c d)(1) (1)
This isn't too difficult: a and c are specified by the first
requirement and b and d are specified by the second
requirement. Therefore C=( 2 1)
(-3 1)

How can I undo switching information from the standard basis to the
basis of eigenvectors? "Clearly" I should find C^-1:
( 2 1 | 1 0)~(1 1/2 | 1/2 0)~(1 0 | 1/5 -1/5 )
(-3 1 | 0 1) (0 5/2 | 3/2 1) (0 1 | 3/5 2/5 )
We checked that the result was indeed C^-1.

A multiplication

I computed C^-1AC. I noted that this could be computed in
two ways, as (C^-1A)C or as
C^-1(AC). Matrix multiplication is
associative so you can regroup as you wish. It is not
always commutative (recall an early linear
algebra QotD!) so you can't change the order of the matrices you
are multiplying.
C^-1 A C^-1AC
(1/5 -1/5)(5 2) is ( 2/5 -2/5) and then ( 2/5 -2/5)( 2 1) is (2 0).
(3/5 2/5)(3 4) (21/5 14/5) (21/5 14/5)(-3 1) (0 7)
I'll call the diagonal matrix just D. You should see, I hope, that
this computation verifies the philosphy above. Finding an invertible
matrix C and a diagonal matrix D so that C^-1AC=D is called
diagonalization. We could also write A=CDC^-1.

Why do this?

Computations with diagonal matrics are very easy. For example,
D² is just
(2² 0)
(0 7²)
In fact, D¹⁰ is(2¹⁰ 0)
(0 7¹⁰)
Now A¹⁰=A·A····A (ten
times).

But since A=CDC^-1, I know that
A¹⁰=CDC^-1·CDC^-1····CDC^-1.
Now we can associate in any way: put parentheses around the factors of
the product. The C^-1C on the "inside" cancel each other. So
A¹⁰=CD(C^-1·C)D(C^-1····C)DC^-1=CD¹⁰C^-1.
So we can find powers of A by finding powers of the diagonal
matrix.

Computational "load"

There's an initial computation needed to find the eigenvalues and
eigenvectors (that's C and D) and then C^-1. But once that
is done, then finding powers of A is much faster. And even finding
exponentials of A is faster.

e^A=SUM_n=0^infinity(1/n!)Aⁿ=SUM_n=0^infinity(1/n!)CDⁿC^-1.
Now Dⁿ is
(2ⁿ 0)
(0 7ⁿ)
and
SUM_n=0^infinity(1/n!)CDⁿC^-1=C(SUM_n=0^infinity(1/n!)Dⁿ)C^-1. What is
SUM_n=0^infinity(1/n!)Dⁿ? It
is(SUM_n=0^infinity2ⁿ/n! 0)
(0 SUM_n=0^infinity7ⁿ/n!)
and this is e^D=(e² 0)
(0 e⁷)
So a potentially complicated sum,
SUM_n=0^infinity(1/n!)Aⁿ, is
actually Ce^DC^-1: a very simple computation.

Such a computation can help to write solutions of such systems as
(dx/dt}=5x(t)+2y(t)
{dy/dt}=3x(t)+4y(t)
If we make X(t) the column vector corresponding to (x(t),y(t)), then
this system can be abbreviated in matrix form as X'(t)=AX(t). By
analogy with the scalar case, I would hope that a solution could be
written as X(t)=e^At(Const), and here the (Const) would be a
2 by 1 column vector of initial conditions: x(0) and y(0).

I was going to compute another example. Let me show you what this
would have been:
> with(linalg):
Warning, the protected names norm and trace have been redefined and unprotected
> A:=matrix(3,3,[1,2,2,-1,1,1,2-1,-1]); #Not a very random matrix.
#2^nd col=3^rd col, so rank is 2.
[ 1 2 2]
[ ]
A := [-1 1 1]
[ ]
[ 2 -1 -1]
> charpoly(A,x); #0 is an eigenvalue of A, and then with a common
#factor of x deleted, the result is a quadratic which can be factored.
3 2
x - x - 2 x
> eigenvects(A);#A list of eigenvalues and eigenvectors.
[0, 1, {[0, -1, 1]}], [-1, 1, {[1, 2, -3]}], [2, 1, {[-4, 1, -3]}]
> C:=matrix(3,3,[0,1,-4,-1,2,1,1,-3,-3]); #Changing basis to the
#basis of eigenvectors.
[ 0 1 -4]
[ ]
C := [-1 2 1]
[ ]
[ 1 -3 -3]
> inverse(C);#Back from the eigenvectors to the "standard" basis.
[1/2 -5/2 -3/2]
[ ]
[1/3 -2/3 -2/3]
[ ]
[-1/6 -1/6 -1/6]
> evalm(inverse(C)&*A&*C);#Verifying that we've got a diagonalization.
[0 0 0]
[ ]
[0 -1 0]
[ ]
[0 0 2]

When can we diagonalize a matrix?

Certainly if we knew that Rⁿ had a basis of eigenvectors of
A, then we can diagonalize A. We'd just assemble C by filling in the
eigenvectors as columns of C. And D would be a diagonal matrix with
diagonal entries the eigenvalues of A in the same order as the matrix
C was assembled. And C^-1AC=D.

What problems could you expect?

Consider the matrix A=
(1 5)
(0 1)
This matrix represents a shear. Its determinant is 1. The area
magnification factor is 1. The picture shows what happens to the unit
square under multiplication by A. det(A-I₂) is just
(1-)². The root(s)
of this quadratic polynomial are ... is ... just =1. And what about the corresponding
eigenvectors: the solutions of (1-1 5 )(x₁)=(0)
( 0 1-1)(x₂) (0)
so 5x₂=0. Therefore x₂ must be 0. The
eigenvectors are all (non-zero!) multiples of (1,0).

There are not enough eigenvectors to be a
basis of R².

Now look at A=
( 1/sqrt(2) -1/sqrt(2) )
( 1/sqrt(2) 1/sqrt(2) )
This takes (1,0) to the vector (1/sqrt(2),1/sqrt(2)) and the
vector (0,1) to (-1/sqrt(2),1/sqrt(2)). It is rotation by Pi/4
(45^o) around the origin. The determinant is again 1, so
there is no area distortion. det(A-I₂) is
(1/sqrt(2)-)²+1/2.
This is a quadratic polynomial, and no real number
will make it equal to 0,
because it is a sum of squares.

There are no real eigenvalues.

So you would need to compute using complex numbers to diagonalize this
matrix (in C², not R²!).

Some realistic conditions

The matrices occuring in engineering applications are rarely
"random". They frequently have some internal structure. The simplest
structures to explain are symmetric and skew-symmetric.

A matrix is symmetric if it is equal to its transpose.

Such a matrix must be square, otherwise its dimensions could not
agree. An example is, say, in a molecule, if the (i,j)^th
entry is the distance between atom i and atom j. Then since distance
is symmetric, so is the matrix. On a road map, frequently you will
find that the printer uses storage efficiently by only printing half
of the table containing distances between cities. Symmetry permits
storage to be halved!

Examples
( 5 3) ( 7 1 4)
(-3 2) (-1 2 -6)
(-4 6 -3)

A matrix is skew-symmetric if it is equal to minus its
transpose.

Again, such a matrix must be square, otherwise its dimensions could not
agree. An example is, say, in a molecule, if the (i,j)^th
entry is the force from atom i to atom j. The force is a directed
quantity, so the (i,j)^th entry is minus the
(j,i)^th entry.
Also skewsymmetry permits
storage to be halved!

Examples
( 0 3) ( 0 1 4)
(-3 0) (-1 0 -6)
(-4 6 0)
Please notice that the diagonal entries of a skew-symmetric matrix are
always 0. The determinant of the first example is 9, and the
determinant of the second example is 0. Indeed (!) the determinant of
an n-by-n skewsymmetric matrix is positive if n is even and is 0 if n
is odd. This is because det(A)=det(A^t) as we already
observed,
and det(-A)=(-1)ⁿdet(A).

I then recalled the {dot|scalar|inner} product for vectors in
Rⁿ. If x=(x₁,x₂,...,x_n)
y=(y₁,y₂,...,y_n) then
<x,y>=SUM_j=1ⁿx_jy_j.
In matrix form this is x^ty, converting the "row vector" x
to a "column vector" and then multiply as matrices to get a number.

If A is an n-by-n matrix, we've studied the geometric effect of
multiplying vectors x by A. Actually, that's really the matrix
multiplication Ax^t, and then another transpose to get the
darn column vector back to a row vector:
(Ax^t)^t.

I then tried to verify that
<Ax,y>=<x,A^ty>.

This has an immediate consequence that eigenvectors corresponding to
distinct eigenvectors of a symmetric matrix are orthogonal. More to come!

Confession

Thursday, October 27

Linear algebra
terms

Maple does some examples

What did I do? I wrote a "random" 3-by-3 matrix (yeah, yeah, with
small integer entries!), and then proceeded to check the textbook's
formula for A^-1. The formula I refer to is on p. 383
(Theorem 8.18). I believe engineering students should be aware that
such an explicit formula exists, so that if such
a formula is necessary, then it can be used.

I had students compute the entries of the adjoint of A, and we checked
(at least a little bit!) that the result was A^-1. I decided
for the purposes of this diary to have Maple find minors,
then get determinants, then adjust the signs, and then take the
transpose: the result should be (after division of the entries by
det(A)) the inverse. And it was. I mention that it took me
longer to write the darn Maple code than it did to have
people do the computations in class. The # indicates a
comment in Maple and is ignored by the program.

> with(linalg):
Warning, the protected names norm and trace have been redefined and unprotected
> A:=matrix(3,3,[1,-2,3,2,1,0,2,1,3]); #defines a matrix, A.
[1 -2 3]
[ ]
A := [2 1 0]
[ ]
[2 1 3]

> det(A); #Finds the determinant
15

> minor(A,2,1); #This gets the (2,1) minor, apparently.
[-2 3]
[ ]
[ 1 3]

>B:=matrix(3,3,[seq(seq((-1)^(i+j)*det(minor(A,i,j)),j=1..3),i=1..3)]);
#This mess, when parsed correctly, creates almost the adjoint of A.
[ 3 -6 0]
[ ]
B := [ 9 -3 -5]
[ ]
[-3 6 5]
> evalm(A&*transpose(B)); #We need to divide by 15, which is det(A).
# evalm and &* do matrix multiplication.
[15 0 0]
[ ]
[ 0 15 0]
[ ]
[ 0 0 15]

> inverse(A);
[1/5 3/5 -1/5]
[ ]
[-2/5 -1/5 2/5 ]
[ ]
[ 0 -1/3 1/3 ]

> evalm((1/15)*transpose(B)); # I think B^t is the adjoint of A.
[1/5 3/5 -1/5]
[ ]
[-2/5 -1/5 2/5 ]
[ ]
[ 0 -1/3 1/3 ]
> Y:=matrix(3,1,[y1,y2,y3]);
[y1]
[ ]
B := [y2]
[ ]
[y3]

> evalm(inverse(A)&*Y); # How to solve the matrix equation AX=Y
# Left multiply by A^-1.
[ y1 3 y2 y3 ]
[ ---- + ---- - ---- ]
[ 5 5 5 ]
[ ]
[ 2 y1 y2 2 y3]
[- ---- - ---- + ----]
[ 5 5 5 ]
[ ]
[ y2 y3 ]
[ - ---- + ---- ]
[ 3 3 ]

> A2:=matrix(3,3,[1,y1,3,2,y2,0,2,y3,3]); # Preparing for Cramer's rule
[1 y1 3]
[ ]
A2 := [2 y2 0]
[ ]
[2 y3 3]

> det(A2)/det(A); # The same result as the second entry in A^-1.
2 y1 y2 2 y3
- ---- - ---- + ----
5 5 5

I "verified" Cramer's rule in a specific case. A similar computation
is shown above done by Maple. Thie is on p. 392, Theorem
8.23, of the text.

I have never used Cramer's rule in any significant
situation. But, again, if you need such an
explicit formula, you should be aware that it exists. But I also
mention that, using my Maple on a fairly new computer, over
half a second was needed to computing the inverse of a 5-by-5 symbolic
matrix took half a second. So you should recognize the time/space
requirements for such computations. Sigh.

This is all too many formulas, especially since they are not likely to
be used. Mr. Pappas asked how they
might be used in the real world, and my reply was that the matrices
which students were likely to work with probably would have some
structure: they might be sparse, with, say, entries non-zero only on
or near the diagonal (spline algorithms), or they might have some
perceptible block structure. For these matrices, running time
frequently can be trimmed quite a bit. But still there will be
problems in implementation (numerical stability) and writing and using
programs needs some care.

Something a bit different

I've neglected to discuss the geometrical effect of a matrix. My model
matrix here will be 2-by-2, just A=
( 5 2 )
( 3 4 )

Geometry: det as area distortion factor

det(A), which is in this case 20-6=14, represents the area distortion
factor as A maps R² to R². That is, take
X=(x₁,x₂), and feed it into A. By this I mean do
the following: take the transpose, multiply by A, and then look at the
point you get in R² (I guess another transpose is
needed). So the output is, I hope,
(5x₁+2x₂,3x₁+4x₂).
You can compute that:

(0,0) gets changed to (0,0).
(1,0) gets changed to (5,3).
(0,1) gets changed to (2,4).
(1,1) gets changed to (7,7).
The lines between these points get changed to lines, because this is
a linear transformation, and the equations are all first degree.

The
image of the unit square is a parellelogram, and the area is 14. All
areas are determined by multiples and movements of the unit square
(think about it: you fill things up with tiny little squares, etc.) so
all affected areas are multiplied by 14.

One nice consequence is the following:

If A and B are n-by-n matrices, then det(AB)=det(A)det(B).

Comment Since A "maps" Rⁿ to Rⁿ by
matrix multiplication, and det(A), it turns out, is the way
n-dimensional volumes are stretched. So multiplying by A and then by B
concatenates the effects. N

Geometry: the circle transformed by a matrix

Now here is what I showed on the screen as I attempted to catch up
with the last decade or so of the twentieth century in terms of
technical pedagogy.

>A:=matrix(2,2,[5,2,3,4]);
[5 2]
A := [ ]
[3 4]

>plot([5*cos(t)+2*sin(t),3*cos(t)+4*sin(t),t=0..2*Pi],
thickness=3,color=black);

>B:=evalm(A^2);
[31 18]
B := [ ]
[27 22]

>plot([31*cos(t)+18*sin(t),27*cos(t)+22*sin(t),t=0..2*Pi],
thickness=3,color=black);

What is going on here? One last Maple command and answer
before we start to think:
>eigenvects(A);
[2, 1, {[1, -3/2]}], [7, 1, {[1, 1]}]

So we are looking at the effect of the matrix on the unit
circle, those points (x,y) in the plane which can be written as
(cos(t),sin(t)) for t between 0 and 2Pi. This is a circle. Then a
matrix, A, creates what is called a linear transformation,
which is a mapping taking R² to R² and taking
linear combinations in the domain to corresponding linear combinations
in the range.

If A=(a b)
(c d)
and if a vector X in R² is
(x₁,x₂), then AX^t (the transpose
changes the row vector to a column vector) is (back in row vector
form [another transpose])
the point
(ax₁+bx₂,cx₁+dx₂). What
does this look like geometrically? The circle gets distorted
geometrically. It turns out that for many matrices, the figure
obtained will be an ellipse, with (0,0) as center of symmetry. The
biggest stretch of this ellipse is in a certain direction. If we then
multiply again by A, (that's B in this case, the evalm command in
Maple asks it to do arithmetic on matrices) the picture is
another matrix. Because the pictures are a bit difficult to compare, I
asked Maple to rescale (important!) and graph both
pictures together. The B picture is the narrower one, inside the A
picture's ellipse. One way to find this principal preferred direction,
where A seems to want to send stuff, is (as I tried to suggest in
class) to take a point on the circle at random. Then some high power
of A will likely send it more or less in about the correct direction
(!!). This direction is called an eigenvector.

The unit circle and its transform by A. The graph goes
from about -5.5 to 5.5 both horizontally and vertically.
The transform by A and by A². The graph goes
from about -35 to 35 both horizontally and vertically.
The transform by A² and by A³. The graph goes
from about -245 to 245 both horizontally and vertically.

Eigenvectors, eigenvalues, etc.

VOCABULARY
TIME!
If A is an n-by-n matrix, an n-dimensional vector X is called an
eigenvector of A is X is not zero and if there is a scalar
so that AX=X. The scalar is called an eigenvalue and
X is an eigenvector associated to that eigenvalue.

Understanding eigenvalues and eigenvectors is important if you are
going to any sort of extensive computation with matrices (by hand, by
computer, by thinking). As I observed in class, the requirement that
an eigenvector be not equal to 0 is very important. Without it,
any value is an eigenvalue! Eigen{value|vector} is also called
characteristic {value|vector} and proper {value|vector}, depending on
what you read and what you work with.

Example

I worked out the details, the indispensable details, of course,
for the example which generated the pictures above. So A=
(5 2)
(3 4)
and if X is (x₁,x₂) then we would like
to find solutions of the system corresponding to the matrix equation
AX^t=X^t.
This system is5x₁+2x₂=x₁
3x₁+4x₂=x₂
and this sytem is the same as(5-)x₁+2x₂=0
3x₁+(4-)x₂=0
I want non-trivial
solutions to this homogeneous system. This is the same as asking that
the RREF of the system's coefficient matrix is not equal to
I₂, the 2-by-2 identity matrix. And this is the same as
asking that the system's coefficient matrix be singular. We can test
for singularity by asking for which the matrix(5- 2)
(3 4-) has determinant 0. It is easy to find the determinant
of this matrix: it is (5-)(4-b)-6=²-9+14.

VOCABULARY
TIME!
If A is an n-by-n matrix, and we let I_n be the n-by-n
identity matrix (1's on the diagonal with 0's elsewhere) then the
characteristic polynomial of A is det(A-I_n). This
is a polynomial of degree n. The roots of the characteristic
polynomial are the eigenvalues of A.

The roots of ²-9+14=0 are 7 and 2. Nice people could
factor this polynomial, which will work since it is an example in a
classroom. People who wear suspenders and a belt on their
pajamas use the quadratic formula. By the way, there is no general
formula for the roots of a polynomial, and frequently numerical
approximation schemes are needed. Knowing these values of doesn't end our task. I would like
to find eigenvectors for each of these eigenvalues.

=7

Go back to the homogeneous system
(5-)x₁+2x₂=0
3x₁+(4-)x₂=0
and plug in =7 to get-2x₁+2x₂=0
3x₁-3x₂=0 and the equations are a multiple
of one another, so solutions are what solves
x₁-x₂=0. One solution is (1,1). Another is
(-46,-46). Etc.

QotD

Find an eigenvector associated to the eigenvalue 2.

HOMEWORK
We'll be
ending the official linear algebra part of the course this coming
week. You should try to read the remainder of the part of chapter 8
which we
will cover.

Please hand in these problems on Monday: 8.5: 33, 37, 39, 8.8: 6, 21.

Confession

Monday, October 24

Linear algebra
terms

Let's see: some operational matters were discussed. What the heck do I
mean? First, I mentioned that Mr. O'Sullivan had carefully reviewed his
exam as I recommended, and indeed had found a place where my grading
was incorrect. I changed (and increased!) his grade, and again
would like to recommend that students look at their exams and check that the
grading was done correctly.

Mr. Naumann kindly put the answer to
the previous QotD on the board. It is necessary to know a bit
about determinants to get the correct sign. Your "feelings" based on
2-by-2 and 3-by-3 determinant experience won't
always be correct.

The last homework problem assigned (8.6: 58) was probably the most
realistic we'll have this whole semester. The idea is to "approximate"
the solution of a partial differential equation (to be discussed later
in the course). The matrix involved
(-4 1 0 1 )
( 1 -4 1 0 )
( 0 1 -4 1 )
( 1 0 1 -4 )

should really be seen as an example of a much larger matrix, with lots
of 0 entries and sort of "stripes" of 1's and -4's running
diagonally. This is an example of what is called a sparse
matrix (that term has almost two million Google
entries!). An "average" (?) matrix might have n² non-zero
entries. A sparse matrix will have about (Constant)n non-zero
entries. Computations with sparse matrices can frequently be done
faster and more efficiently than with general matrices.

I then remarked that my accumulation of Linear algebra terms
is being written partly
because on the next exam I will ask for the definitions of
linear algebra objects. Little partial credit will be given or
can be earned with such questions: there's really only one way to
answer questions about definitions. I therefore respectfully request
that students be familiar with these definitions.

So the stuff last time was really background. I don't think you need
to know everything I discussed, but I do honestly believe that
engineers should have some feeling for the real definition,
even if it is very painful (and it is, indeed it is). Determinants
come up constantly in complicated calculations. If you ever see big formulas involving addition of lots of choices of
signs and products in applications, my guess is that some determinants
will be hiding behind the curtains, and maybe if you recognize this
you can understand the situation and the computations better.

If we look at the definition of determinant you
can see for every rook that's in the upper right-hand quadrant of
another rook, that other rook is in the lower left-hand quadrant of
the first rook. From this I hope you are willing to believe that the
determinant of a matrix is the same as the determinant of its
transpose. Ooops: I need to define transpose:

If B is a p-by-q matrix, then B^t, the transpose of
B, is a q-by-p matrix, and the (i,j)^th entry of
B^t is the (j,i)^th entry of B.

( 5 z 27 4 ) ( 5 0 r )
If B = ( 0 k -1 2 ) then B^t is ( z k 2 )
( r 2 .3 1 ) ( 27 -1 .3 )
( 4 2 1 )

Fact: If A is a square matrix, then det(A)=det(A^t)

Another (and very useful!) consequence is that to determinant, rows
and columns are "the same". Any determinant fact regarding rows has a
corresponding fact regarding columns.

Row operations and determinants

We discussed how row operations change determinants. I tried to indicate
why the first and second of the following facts were true using
the definition of determinant which I gave, but I could not, considered
the time allocation and objectives of the course, tell why the last
fact is true.

Row operations and their effects on determinants
The row operation What it does to det
Multiply a row by a constant
Multiplies det by that constant
Interchange adjacent rows
Multiplies det by -1
Add a row to another row
Doesn't change det

Examples

Suppose A is this matrix:
( -3 4 0 18 )
( 2 -9 5 6 )
( 22 14 -3 -4 )
( 4 7 22 5 )
Then the following matrix has determinant twice the value of det(A):( -6 8 0 36 )
( 2 -9 5 6 )
( 22 14 -3 -4 )
( 4 7 22 5 )
because the first row is doubled.

Also, the following matrix
has determinant -det(A)( -3 4 0 18 )
( 22 14 -3 -4 )
( 2 -9 5 6 )
( 4 7 22 5 )
because the second and third rows are interchanged.
Notice that you've got to keep track of signs, so that if we
interchange, say, the second and fourth rows leaves the value
of the determinant would not be changed.

The following matrix has the same value of determinant as det(A)( -3 4 0 18 )
( 2 -9 5 6 )
( 24 5 2 2 )
( 4 7 22 5 )
because I got it by adding the second row to the
third row and placing the result in the third row.

Math 250 again

I also mentioned a beautiful (?) example of another exam problem from Math
250: suppose C is a 5-by-5 matrix, and det(C)=-7. What is det(2C)?
This takes a bit of thought, because we are actually multiplying each
of the 5 rows of C by 2. Therefore we multiply the determinant
by 2·2·2·2·2=2⁵=32. So
det(2C)=(-7)·32=-224.

Silly examples (?)

Look:
( 1 2 3 ) ( 2 3 4 )
det( 4 5 6 )=det( 3 3 3 ) (row₂-row₁)
( 7 8 9 ) ( 3 3 3 ) (row₃-row₂)
Now if two rows are identical, the det is 0, since interchanging them
both changes the sign and leaves the matrix unchanged. So since
det(A)=-det(A), det(A) must be 0.

Look even more at this:
( 1 4 9 16 ) ( 1 4 9 16 ) ( 1 4 9 16 )
det( 25 36 49 64 )=det( 24 32 40 48 ) (row₂-row₁)=det( 24 32 40 48 )
( 81 100 121 144 ) ( 56 64 72 80 ) (row₃-row₂) ( 32 32 32 32 ) (row₃-row₂)
( 169 196 225 256 ) ( 88 96 104 112 ) (row₄-row₃) ( 32 32 32 32 ) (row₃-row₂)
so since the result has two identical rows, the deteminant of the
original matrix must be 0.

There are all sorts of tricky things one can do with determinant
evaluations, if you want. Please notice that the linear systems gotten
from, say, the finite element method applied to important PDE's
definitely give coefficient matrices which are not random: they have
lots of structure. So the tricky things above aren't that ridiculous.

Use row operations to ...

One standard way of evaluating determinants is to use row operations
to change a matrix to either upper or lower triangular form (or even
diagonal form, if you are lucky). Then the determinant will be the
product of the diagonal terms. Here I used row operations (actually I
had Maple use row operations!) to change this random (well, the
entries were produced sort of randomly by Maple) to an
upper-triangular matrix.
[1 -1 3 -1] And now I use multiples of the first row to create 0's
[4 4 3 4] below the (1,1) entry. The determinant won't change:
[3 2 0 1] I'm not multiplying any row in place, just adding
[3 1 3 3] multiples of row₁ to other rows.

[1 -1 3 -1] And now multiples of the second row to create 0's
[0 8 -9 8] below the (2,2) entry.
[0 5 -9 4]
[0 4 -6 6]

[1 -1 3 -1] Of course, multiples of the third row to create
[0 8 9 8] 0's below the (3,3) entry.
[0 0 -27/8 -1]
[0 0 -3/2 2]

[1 -1 3 -1 ] Wow, an upper triangular matrix!
[0 8 -9 8 ]
[0 0 -27/8 -1 ]
[0 0 0 22/9]
The determinant of the original matrix must be
1·8·(-27/8)·(22/9). Sigh. This should be -66,
which is what Maple told me was the value of the determinant
of the original matrix. And it is!

More generally if you need to interchange rows on the way to creating
an upper- or lower-triangular matrix, you may need to keep track of
signs in order to make sure that your final answer is correct.

Minors

I asked for a definition of the word minor and the students
definitely did well. I was told:

(Music) A minor key, scale, or interval.
(Law) One who has not reached full legal age.
Laborer who works in a mine [syn: mineworker] (o.k., spelled
miner but sort of eligible).

There are also Minor, Alabama
and Miner,
Missouri.

Let me try to stay "on-task":

Minors

If A is an n-by-n matrix, then the (i,j)^th minor of A is
the (n-1)-by-(n-1) matrix obtained by throwing away the
i^th row and j^th column of A.

For example, if A is
[1 -1 3 -1]
[4 4 3 4]
[3 2 0 1]
[3 1 3 3]

Then the (2,3) minor is gotten by deleting the second row and the
third column:>minor(A,2,3);
[1 -1 -1]
[3 2 1]
[3 1 3]
Of course I had Maple do this, with the
appropriate command.

Evaluating determinants by cofactor expansions

This field has a bunch of antique words. Here is another. It turns out
that the determinant of a matrix can be evaluated by what are called
cofactor expansions. This is rather weird. When I've gone
through the proof that cofactor expansions work, I have not really
felt enlightened. So I will not discuss proofs. Here is the
idea. Suppose A is an n-by-n matrix. Each
(i,j) position in this n-by-n matrix has an associated minor which
I'll call M_ij. Then:

For any i,
det(A)=SUM_j=1ⁿ(-1)^i+ja_ijdet(M_ij).
This is called expanding along the
i^th row.

For any j,
det(A)=SUM_i=1ⁿ(-1)^i+ja_ijdet(M_ij).
This is called expanding along the
j^th column.

The (-1)^i+j pattern is an alternating pattern of +/-1's
starting with +1 at the (1,1) place (think again about {checker|chess}
boards).

Here: let's try an example. Suppose A is
[1 -1 3 -1]
[4 4 3 4]
[3 2 0 1]
[3 1 3 3]
as before. I asked Maple to compute the determinants of the
minors across the first row.

Here are the results:> det(minor(A,1,1));
-3
> det(minor(A,1,2));
6
> det(minor(A,1,3));
-16
> det(minor(A,1,4));
-21
Remember that the first row is [1 -1 3 -1]
Now the sum, with the correct +/- signs, is

+1·(-3)-(-1)·6+3·(-16)-(-1)·(-21)
-3+6-48-21=-66. But I already know that det(A)=-66.

Recursion and strategy

You should try some examples, of course. Doing examples about the only
way I know to learn this stuff. If I had to give a short definition of
determinant, and if I were allowed to use recursion, I think that I
might write the following:

Input A, an n-by-n matrix.


If n=1, then det(A)=a₁₁


If n>1, then
det(A)=SUM_j=1ⁿ(-1)^j+1a_1jdet(M_1j)
where M_1j is the (n-1)-by-(n-1) matrix obtaining by
deleting the first row and the j_th column.

This is computing det(A) by repeatedly expanding along the first
row. I've tried to write such a program, and if you have the time and
want some amusement, you should try this also. The recursive nature
rather quickly fills up the stack (n! is big big
big) so this isn't too practical. But there are
certainly times when the special form of a matrix allows quick and
efficient computation by cofactor expansions.

More formulas

You may remember that we had a

A decision problem Given an n-by-n matrix, A, how can
we decide if A is invertible?

Here is how to decide:

       A is invertible exactly when
det(A) is not 0.

Whether this is practical depends on the
situation.

There was also a

Computational problem If we know A is invertible, what is the
best way of solving AX=B? How can we create A^-1
efficiently?

Next time

I will finish up with formulas. I will tell you how to write
A^-1 as a formula involving A (yes, division by det(A) is
part of the formula). See p. 383 of the text. Also, I will write a
formula for the solution of AX=B. This is called Cramer's
Rule. See p. 392 of the text. I just want to add the following
which I got from Wikipedia:

Cramer's rule is a theorem in linear algebra, which gives the solution of a system of linear equations in terms of determinants. It is named after Gabriel Cramer (1704 - 1752).

Computationally, it is generally inefficient and thus not used in practical applications which may involve many equations. However, it is of theoretical importance in that it gives an explicit expression for the solution of the system.

HOMEWORK
Please read the book and do some of the suggested textbook problems!

Confession

Thursday, October 20

Linear algebra
terms

I briefly discussed the previous QotD. A solution was presented by
Mr. Albert. Some students in Math 250
painfully found the inverse of the inverse of A, and then took the
inverse of that to compute the solution to the problem. Sigh.

A fact about inverses: (A^-1)^-1=A

Why? Look: A(A^-1)=I_n declares that the matrix
which multiplies A^-1 to get I_n is A. That
exactly means the inverse of the inverse is the matrix itself.

And another: left and right inverses of a matrix are the same.

B is a left inverse of A if BA=I_n. C is a right
inverse of A if AC=I_n. SInce matrix multiplication is
not generally commutative, we might be suspicious that such B's and
C's might not be equal. They are, because matrix multiplication is
associative. Look at BAC and compute it two different ways:

BAC=(BA)C=I_nC=C and BAC=B(AC)=BI_n=B. Therefore B
and C must be equal.

Solving linear equations: contrasting two approaches

Suppose A is a square, n-by-n coefficient matrix, and we want to solve
AX=B. Here X and B are n-by-1 matrices, "column vectors". There's an algorithmic approach. We create A^-1
and then X=A^-1B is the answer. What if we wanted to analyze
more subtle aspects of the AX=B equation? For example, we could
imagine that the system models a physical situation where some of the
entries in A could be changed. We might want to know how to analyze
the results of such changes. In that case, it might be more efficient
to have a formula for the entries of X in terms of the entries
of B and A. Here is the n=2 case. The solutions of the system
ax₁+bs₂=y₁
cx₁+dx₂=y₂
are y₁d-y₂b
x₁ = -------
ad-bc

y₂a-y₁c
x₂ = -------
ad-bc These are explicit formulas. Of course, I don't know
what to do if ad-bc=0 but ... We shouldn't expect the general case
(n-by-n) to be easy or obvious. The formulas will be complicated. If
you think about it, A^-1Y will be n different formulas (all
of the entries in X) involving n²+n variables (all of the
entries in A and in Y). So if you want explicit formulas, they do exist,
but the computational load is significant.

Oh yeah

The objects appearing in the top and bottom of the n=2 formulas are
called determinants. The general formula is called Cramer's
Rule and also involves a quotient of determinants. What Mr. Wolf was jumping up and down in his
eagerness to have me say last time was the following:

A square matrix has rank=n exactly when its determinant is not 0.

Therefore we can "check" for rank n merely (?) by computing the
determinant. Maybe we had better learn about determinants.

The official definition of determinant

This material is not in the textbook, and you will soon see why. It is
lengthy and complicated. You should take a look at this approach, and
learn what you can from it. Or maybe learn what you don't want to
know! I want to find the determinant of an n-by-n
matrix, A. So here we go:

First imagine that we have an n-by=n chessboard. Recall that a rook
(the thing that looks like a castle) on a chessboard can move freely
on rows and columns (ranks and files: thank you, Mr. Clark). Anyway, I asked students for the
largest number of rooks which could be put on a chessboard so that
they don't attack one another. We thought for a while, and then decided
that we could put one rook in each column and each row: thus a rook
arrangement could have n rooks. In the diagrams below, I will use *
for a rook placed on the board and 0 to indicate an empty position on
the board.

How many different rook arrangements are there? Well, there are n
places to put a rook in the first column. Once a rook is placed there,
a whole row is eliminated for further rook placement. So there are n-1
places (non-attacking places!) to put a rook in the seoond column. And
there n-2 in the third column, etc. Some thought should convince you
that there are n! (n factorial) different rook arrangements. n! grows
very fast with n. Everyone should know, even just vaguely, the
Stirling approximation to n!. This says:

n! is approximately (n/e)ⁿsqrt(2Pi n).

In fact, Maple tells me
that 20! is exactly
2432 90200 81766 40000 while the Stirlling
approximation is
2422 78684 67611 33393.1 (about)
But I am not
particularly interested in specific values of factorials. The
important fact is that the factorial function is superexponential, and
grows much faster, for example, than any polynomial. Computational
tasks which have some flavor of factorial amount of work should be
done very carefully and reluctantly.

Each rook arrangement has a sign or signature attached to it, either a
+ or -. This sign is gotten by computing (-1)^# where what
matters is the parity (even or oddness) of the number, #. What's the
number? For each rook, look in the rook's "first quadrant", up and to
the right. Count the rooks there. Then take the total of all of the
"first quadrant" rooks for each of the rooks. That's the number, #.
Here are two examples.

(* 0 0 0 0) First quadrant has 0 rooks. (0 0 0 0 *) First quadrant has 0 rooks.
(0 0 * 0 0) First quadrant has 0 rooks. (0 0 0 * 0) First quadrant has 1 rook.
(0 * 0 0 0) First quadrant has 1 rook. (* 0 0 0 0) First quadrant has 2 rooks.
(0 0 0 0 *) First quadrant has 0 rooks. (0 * 0 0 0) First quadrant has 0 rooks.
(0 0 0 * 0) First quadrant has 1 rook. (0 0 * 0 0) First quadrant has 1 rook.
1+1=2=#. 1+2+1=4=#.
Since both of these #'s are even, the sign for both of these rook
arrangements is +1. In general, it turns out that half of the rook
arrangements have - signs and half have + signs.

The official definition of determinant is as
follows:

Now for each rook arrangement, take the product of the terms in the
matrix A which are in the rook places: the product of n entries of
A. Then prefix this by the sign, (-1)^# as mentioned
above. And, then, finally, take the sum of this strange signed product
over all n! rook arrangements. This sum is det(A).

And that was the official definition of
determinant.

The official definition is ridiculous and complicated. In fact, it is
almost so complicated that one might suspect it is a joke. What is
even more amusing is that determinants occur in almost every area of
mathematics and its applications. The weirdest coincidences occur
constantly.

Determinants when n=2

The matrix A is
(a b)
(c d)

Formula for 2-by-2 determinants
Rook arrangement (* 0)
(0 *) (0 *)
(* 0)
# 0 1
Sign (-1)⁰= + (-1)¹= -
Product with sign +ad -bc
Total, which is det(A): ad-bc

Determinants when n=3

The matrix A is
(a b c)
(d e f)
(g h i)

Formula for 3-by-3 determinants
Rook arrangement
(* 0 0)
(0 * 0)
(0 0 *)
(* 0 0)
(0 0 *)
(0 * 0)
(0 * 0)
(* 0 0)
(0 0 *)
(0 0 *)
(* 0 0)
(0 * 0)
(0 * 0)
(0 0 *)
(* 0 0)
(0 0 *)
(0 * 0)
(* 0 0)

# 0
1
1
2
2
3
Sign
(-1)⁰= +
(-1)¹= -
(-1)¹= -
(-1)²= +
(-1)²= +
(-1)³= -

Product with sign
+aei
-afh
-dbi
+dch
+bfg
-gec
Total, which is det(A):
aei-afh-dbi+dch+bfg-gec

Determinants when n=4 (enough already!)

Here is what Maple says:
> A:=matrix(4,4,[a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p]);
[a b c d]
[ ]
[e f g h]
A := [ ]
[i j k l]
[ ]
[m n o p]

> det(A);
a f k p - a f l o - a j g p + a j h o + a n g l - a n h k - e b k p +
e b l o + e j c p - e j d o - e n c l + e n d k + i b g p - i b h o -
i f c p + i f d o + i n c h - i n d g - m b g l + m b h k + m f c l -
m f d k - m j c h + m j d g

And this doesn't help me much. You can check, if you want, that there
are 24 (that is, 4!) terms, and that half (12 of them) have + signs,
while the other half has - signs. Although I (and everyone!) know ways of
remembering det for n=2 and n=3, I (at least) don't know a way to
remember n=4. And why should I?

I found the determinant of an 8-by-8 matrix. I think the matrix looked
like this:
( 1 17 -Pi sqrt(3) .007 19 5.023 -2 )
( 0 1 .3 278 -3 3 .007 19 )
( 0 0 1 e ln(2) 0 sin(5) 38 )
( 0 0 0 1 Pi/2 -4 8 2^.01 )
( 0 0 0 0 1 1/3 16 99 )
( 0 0 0 0 0 1 10²³ e^Pi )
( 0 0 0 0 0 0 1 -2.3 )
( 0 0 0 0 0 0 0 1 )

Well, there are 8!=40,320 rook arrangements. But due to the "geometry"
of this matrix, one, exactly one rook arrangement will
not have a 0 in a place a rook is sitting. Where can we put a
rook in the first column so it won't sit on a 0? Just in the top
row. Now, in the second column, we cannot put a rook in the top row
because that's already taken, so (for it to be "sitting" on top of a
non-zero entry, the rook must be placed in the (2,2)
position. Etc. The only rook arrangement is the (main) diagonal
one. The associated sign is + because each of the rooks has
no companion in its first quadrant, and
(-1)⁰=+1. The product of the diagonal entries is 1, so the
determinant is 1.

Maybe a strategy to compute determinants

The determinant of an upper-triangular matrix, such as the one above,
is easy: the determinant is the product of the main diagonal elements.
A matrix A=(a_ij) is upper=triangular if a_ij=0
when i>j (I hope this is correct!). What will our strategy be? We
will take a "random" matrix A, use row operations to transform A into
an upper-triangular matrix (we will need to pay some penalties while
we're doing this), and then find the determinant of the resulting
upper-triangular matrix. So next time we'll discover what determinant
"penalties" we need to pay for using row operations on a matrix.

QotD

What is the determinant of
( 0 0 0 0 E )
( 0 0 0 D 0 )
( 0 0 C 0 0 )
( 0 B 0 0 0 )
( A 0 0 0 0 ) This is ABCDE multiplied by a sign, and the # here
is 4+3+2+1=10, so (-1)¹⁰=1, and the determinant is ABCDE.

HOMEWORK
Much of this
material not in the text. Please: it is very
AAAnecessary that you read the textbook's relevant sections:
8.4 and 8.5 and 8.7. These sections contain very useful material and
not everything can be discussed in class.

Please hand in these problems on Monday: 8.6: 25, 28, 49, 58.

Confession

Monday, October 17

Linear algebra
terms

'Tilda

Compliments were accepted about the presentation of the beautiful and
peaceful monster, 'Tilda, which eats matrices and produces
their RREF's.

The monster diagram

I also indicated that I wanted the diagram on p. 366
"internalized". Again, I certainly do not have it memorized, nor do I believe that students should remember it exactly. But:
the information in the diagram is always "around" when I am dealing
with linear algebra problems.

I will today to solve some systems of
linear equations (by hand!) with this diagram in mind. The coefficient
matrix for the systems I'll write is
( 3 7 )
(-2 5 )
I remarked that this is sort of a random(ish) 2-by-2 matrix. I
selected it for small entries so I could do arithmetic conveniently in
public. Also I wanted the entries to "lool" random, so I made them
silly. I did remark the following: we know that the rank of the
coefficient matrix plays a very important part in the behavior of the
solutions (look again at the diagram if you don't
beli9eve me). Most of the time, random matrices seem to want to have
the highest possible rank (most degrees of freedom). This is maybe not
so apparent in small matrices with small entries, but people have
studied large matrices with entries that have good reason to seem
random, and almost surely, a random matrix wants to have high
rank. For example, if in an application you are considering a 50-by-50
matrix (it is has real entries, that's two thousand five hundred
numbers!), it really almost should have rank 50. If in your
application or model, your matrices seem to all have rank 35, there's
probably some underlying reason or symmetry or connection between
pieces of the matrix, reflecting possibly some internal structure you
may not have recognized.

Solving a few systems

I wrote and solve some linear systems. Here they are, and here is
(more or less) how I solve them.}}}}

Row operations and their effects on determinants
The row operation	What it does to det
Multiply a row by a constant	Multiplies det by that constant
Interchange adjacent rows	Multiplies det by -1
Add a row to another row	Doesn't change det

Formula for 2-by-2 determinants
Rook arrangement	(* 0) (0 *)	(0 ) ( 0)
#	0	1
Sign	(-1)⁰= +	(-1)¹= -
Product with sign	+ad	-bc
Total, which is det(A): ad-bc

Formula for 3-by-3 determinants
Rook arrangement	(* 0 0) (0 * 0) (0 0 *)	(* 0 0) (0 0 ) (0 0)	(0 * 0) (* 0 0) (0 0 *)	(0 0 ) ( 0 0) (0 * 0)	(0 * 0) (0 0 ) ( 0 0)	(0 0 ) (0 0) (* 0 0)
#	0	1	1	2	2	3
Sign	(-1)⁰= +	(-1)¹= -	(-1)¹= -	(-1)²= +	(-1)²= +	(-1)³= -
Product with sign	+aei	-afh	-dbi	+dch	+bfg	-gec
Total, which is det(A): aei-afh-dbi+dch+bfg-gec

The first example
3x₁+7x₂=1
-2x₁+5x₂=0
Our technique is to write a coefficient matrix, augment it with the constants, and then row reduce.

( 3  7 | 1 ) ~ ( 1  7/3 | 1/3 ) ~ ( 1  0 | 5/29 )
(-2  5 | 0 )   ( 0 29/3 | 2/3 )   ( 0  1 | 2/29 )

Everytime I try arithmetic in public ... well, this really isn't too bad.
The first ~ Divide the first row by 3, multiply the resulting first row by 2 and add it to the second row.
The second ~ Divide the second row by 29/3, multiply the resulting second row by -7/(3·29) and add it to the first row.
The resulting system of equations is exceptionally simple, of course:
x₁=5/29 and x₂=2/29.
I think I checked the suggested answer in at least one equation, maybe the first:
3x₁+7x₂=1 becomes 3(5/29)+7(2/29) which certainly does equal 1. I was happy.

The second example
3x₁+7x₂=0
-2x₁+5x₂=1
Some muttering occurred but students let me talk on ...
Our technique is to write a coefficient matrix, augment it with the constants, and then row reduce.

( 3  7 | 0 ) ~ ( 1  7/3 | 0 ) ~ ( 1  0 | -7/29 )
(-2  5 | 1 )   ( 0 29/3 | 1 )   ( 0  1 |  3/29 )

Everytime I try arithmetic in public ... well, this really isn't too bad.
The first ~ Divide the first row by 3, multiply the resulting first row by 2 and add it to the second row.
The second ~ Divide the second row by 29/3, multiply the resulting second row by -7/(3·29) and add it to the first row.
The resulting system of equations is exceptionally simple, of course:
x₁=-7/29 and x₂=3/29.
I think I checked the suggested answer in at least one equation, maybe the second:
-2x₁+5x₂=1 becomes -2(-7/29)+5(3/29) which indeed, indeed is 1. I was even happier.

The third example
3x₁+7x₂=8
-2x₁+5x₂=10
Now the muttering was louder as students discussed whether they could still switch into another section of the course. I continued to talk.
Our technique is to write a coefficient matrix, augment it with the constants, and then row reduce.

( 3  7 |  8 ) ~ ( 1  7/3 |  8/3 ) ~ ( 1  0 | -30/29 )
(-2  5 | 10 )   ( 0 29/3 | 46/3 )   ( 0  1 |  46/29 )

Everytime I try arithmetic in public ... well, this really isn't too bad.
The first ~ Divide the first row by 3, multiply the resulting first row by 2 and add it to the second row.
The second ~ Divide the second row by 29/3, multiply the resulting second row by -7/(3·29) and add it to the first row.
The resulting system of equations is exceptionally simple, of course:
x₁=-30/29 and x₂=46/29.
I think I checked the suggested answer in at least one equation, maybe the first:
3x₁+7x₂=8 becomes 3(-30)/29+7(46)/29 which is indeed, indeed, indeed 8. I was happiest, until ... ~

Thinking about what we did
Maybe we should think about what we did. Certainly a number of questions arise. I can think of two questions, immediately.

What the heck is the purpose of the darn 29's which keep showing up in all of these computations?
O.k., wait until Thursday, please, when we might investigate them.
There's a great deal of similar work going on. Maybe we should be a little bit clever about what we're doing so that we can understand and be more efficient.

And now some more
Well, I could think a bit about what's going on. We are really studying a linear system. Here it is:
3x₁+7x₂=y₁
-2x₁+5x₂=y₂
Maybe look at the idea:

        |-------------| 
   y₁-->| SIMPLE LIN- |-->x₁
   y₂-->| EAR SYSTEM  |-->x₂
        |-------------|

The system of linear equations reflects some sort of "device", if you wish. The inputs, the pair of y's, push some information out to the x's. Everything is linear, a very simple model which is hardly ever fulfilled in real life (if you believe in Hooke's Law, pull a rubber band for twenty feet!). But if we multiply (in this model) by, say, 365, then the x's will get multiplied by 365. If, also, we have two different pairs of y's and add them, and compare the pairs of x's, the result should be the sum of the outputs, the x's, from the two (pairs of) inputs.
TAKE ADVANTAGE OF LINEARITY!!!

Well the x's which correspond to the input of

(1) are the pair (5/29) 
(0)              (2/29)

and the x's which correspond to the input of

(0) are the pair (-7/29) 
(1)              ( 3/29)

Now in the third example, we want the x's which correspond to

( 8) = 8(1)+10(0)
(10)    (0)   (1)

but if you believe in what I wrote above, this must be

8(5/29)+10(-7/29)=({40-70}/29)=(-30/29)
 (2/29)   ( 3/29) ({16+30}/29) ( 46/29)

Hey, hey: this is the answer we got. It certainly should be.

The inverse of a matrix
Of course this should all be systematized. Well, the answer can be recognized as a matrix product:

( 5/29 -7/29 ) (  8 )
( 2/29  3/29 ) ( 10 )

and the 2-by-2 matrix which appears there is called A inverse, usually written A^-1. If you multiply A by this matrix, the result is the matrix

(1 0)
(0 1)

which is called the 2-by-2 identity matrix.

Definition Suppse A is an n-by-n matrix. Then the inverse of A, usually written A^-1, is an n-by-n matrix whose product with A is the n-by-n identity matrix, I_n (a square matrix with diagonal entries equal to 1 and offdiagonal entries equal to 0).

Now if

A=( 3  4 ) and, say, C= ( -3    2 ) you can check that AC=(1 0)=I₂
  ( 5  6 )              ( 5/2 -3/2)                       (0 1)

so that C is A^-1. Solving the linear system AX=B where

X=(x) and B=(u)
  (y)       (v)

can be done by multiplying AX=B by C on the left, and we get
C(AX)=CB so (CA)X=CB so I₂X=CB so X=CB where C=A^-1. The identity matrix is a multiplicative identity.

An algorithm for finding inverses
If A is a square matrix then augment A by I_n, an identity matrix of the same size: (A|I_n). Use row reduction to get (I_n|C). Then C will be A^-1. If row reduction is not successful (the I_n doesn't appear), A does not have an inverse (this can happen: rank(A) could be less than n).

How much work is computing A^-1 this way? Here I used the word "work" in a rather elementary sense. I tried to convince people that a really rough bound on the number of arithmetic operations (add, multiply, divide, etc.) to find A^-1 using this method is, maybe, 6n³.

Where did this come from? Each time we clear a column on the right to make another column of the identity matrix, we might need to divide or multiply a whole row by a number and add or subtract it from another row. This is 2n amount of work (it is actually 6n if you count a bit more carefully). But we have n columns. So there should be 6n³ work. In fact, if you are careful and clever, the coefficient can be reduced fairly easily (but not a lot). More significant in the "real world" is the exponent, which really pushes the computational growth. If you are very clever, the exponent can be reduced, but not by a lot. I also mentioned that of course the storage for this computation is about 2n² (the size of the augmented matrix). All of this actually isn't too bad. Finding A^-1 this way is a problem which can be computed in polynomial time. Such problems are actually supposed to be rather nice computationally.

Vocabulary
A matrix which has no inverse is called singular. A matrix which has an inverse is called invertible or non-singular or regular.

How can it fail
The algorithm can fail of there is a column which has only 0's where there should be something non-zero. This occurs if the rank of the matrix is not n.

AX=B, where A is an n-by-n matrix

Rank
considerations rank(A)=n rank(A)<n

Name(s) Full rank; regular; non-singular; invertible Singular

Homogeneous
equation AX=0 has only the trivial solution. AX=0 has infinitely many solutions in addition
to the trivial solution.

Inhomogeneous
equations AX=Y has a unique solution for all Y.
If you know A^-1, then X=A^-1Y There are Y's for which AX=Y has no solution;
for all other Y's, AX=Y has infinitely many solutions

AX=B, where A is an n-by-n matrix
Rank considerations	rank(A)=n	rank(A)<n
Name(s)	Full rank; regular; non-singular; invertible	Singular
Homogeneous equation	AX=0 has only the trivial solution.	AX=0 has infinitely many solutions in addition to the trivial solution.
Inhomogeneous equations	AX=Y has a unique solution for all Y. If you know A^-1, then X=A^-1Y	There are Y's for which AX=Y has no solution; for all other Y's, AX=Y has infinitely many solutions

There are really two problems here.

A decision problem Given an n-by=n matrix, A, how can we decide if A is invertible?
Computational problem If we know A is invertible, what is the best way of solving AX=B? How can we create A^-1 efficiently?

A Math 250 exam problem, which was the QotD
I gave a problem very much like this on a Math 250 exam. So:
Suppose A is a 3-by-3 matrix and you know that A^-1 is

(3 1  3)
(2 0 -1)
(2 2  2)

and that Y=

( 5)
(-2)
( 3).

Find X solving AX=Y.

Confession

Thursday, October 13

Linear algebra
terms

What RREF does in general
Take your original augmented matrix, and put the coefficient matrix into RREF. Then you get something like

(BLOCK OF 1's & 0's WITH|  JJJJJJ  UU  UU  NN  N  K K | Linear stuff )
(THE 1's MOVING DOWN AND|    JJ    UU  UU  N N N  KK  | from original)
(TO THE RIGHT.          |  JJJJ    UUUUUU  N  NN  K K |  right sides )
(--------------------------------------------------------------------)
(                    MAYBE HERE SEVERAL               |   More such  )
(                       ROWS OF 0'S                   | linear stuff )

Please observe that the lower right-hand corner now plays the part of the compatibility conditions which must be satisfied. All of those linear "fragments" must be equal to 0 if the original system has solutions. Now if these are 0, we can "read off" solutions in much the same manner as the example. The block labeled JUNK in fact tells us with its width how many free parameters there are in the solutions. Notice that the JUNK block could have size 0 (for example, consider the silly system x₁=567,672, already in RREF!) and in that case the system would have only one solution.

Problem
Are the vectors (4,3,2) and (3,2,3) and (-4,4,3) and (5,2,1) in R³ linearly independent? Now I wrote the vector equation:
A(4,3,2)+B(3,2,3)+C(-4,4,3)+D(5,2,1)=0 (this is (0,0,0) for this instantiation of "linear independence") which gives me the system:
4A+3B-4C+5D=0 (from the first components of the vectors)
3A+2B+4C+2D=0 (from the second components of the vectors)
2A+3B+3C+1D=0 (from the third components of the vectors)
and therefore would need to row reduce

( 4 3 -4 5 )
( 3 2  4 2 )
( 2 3  3 1 )

I started to do this, but then ... thought a bit. My goal was to get the RREF of this matrix, and use that to argue about whether the original system had solutions other than the trivial solution.

What can these RREF's look like? Let me write all of the RREF's possible whose first column (as here) has some entry which is not zero.

Those RREF's which have three initial 1's:
( 1 0 0 * ) ( 1 0 * 0 ) ( 1 0 * * ) ( 1 * 0 0 ) ( 0 1 0 0 )
( 0 1 0 * ) ( 0 1 * 0 ) ( 0 1 * * ) ( 0 0 1 0 ) ( 0 0 1 0 )
( 0 0 1 * ) ( 0 0 0 1 ) ( 0 0 0 0 ) ( 0 0 0 1 ) ( 0 0 0 1 )
Those RREF's which have two initial 1's:
( 1 * 0 * ) ( 1 * * 0 ) ( 0 1 0 * ) ( 0 1 * 0 ) ( 0 0 1 0 )
( 0 0 1 * ) ( 0 0 0 1 ) ( 0 0 1 * ) ( 0 0 0 1 ) ( 0 0 0 1 )
( 0 0 0 0 ) ( 0 0 0 0 ) ( 0 0 0 0 ) ( 0 0 0 0 ) ( 0 0 0 0 )
Those RREF's which have one initial 1:
( 1 * * * ) ( 0 1 * * ) ( 0 0 1 * ) ( 0 0 0 1 )   
( 0 0 0 0 ) ( 0 0 0 0 ) ( 0 0 0 0 ) ( 0 0 0 0 )     
( 0 0 0 0 ) ( 0 0 0 0 ) ( 0 0 0 0 ) ( 0 0 0 0 )               
Finally, the only RREF with no initial 1:
( 0 0 0 0 )     
( 0 0 0 0 )     
( 0 0 0 0 )

In all of these 3 by 4 matrices, the entry * stands for something which could be any number, zero or non-zero. I hope I have them all. The point of this silly exercise was to base my conclusions upon some structural element of the RREF rather than to do particular arithmetic with the matrix. I am interesting in whether the homogeneous system represented by these matrices has non-trivial solutions.

Let's look at the first answer:

( 1 0 0 * ) 
( 0 1 0 * ) 
( 0 0 1 * )

Here the system is A+*D=0 and B+*D=0 and C+*D=0. There is no constraint on what D must be. Therefore there are actually non-trivial solutions. So any RREF with any *'s in it represents a homogeneous system with non-trivial solutions.
How about a matrix with no *'s at all? Here is an example:

( 0 0 1 0 )
( 0 0 0 1 ) 
( 0 0 0 0 )

Now the system is C=0 and D=0. The variables A and B are assigned columns which are all 0's, and therefore they have nothing to do with the system. Their values are unconstrained. Thus there are again solutions to the original homogeneous system.

Look at all of the RREF's displayed. They have *'s or columns which are all 0's or, sometimes, both. In all cases, the homogeneous system has non-trivial solutions. The maximum number of initial 1's is 3, but there are 4 columns, and that inequality (4>3) produces either *'s or columns of 0's. This holds in general.

A homogeneous system with more variables than equations always has an infinite number of non-trivial solutions.
Therefore, for example, 377 vectors in R²⁴⁵ must be linearly dependent. This isn't entirely obvious to me.

Many problems in engineering and science turn out to be successfully modeled by systems of linear equations. My examples so far have mostly been"inspired" by the things I believe you will see in studying ODE's and PDE's, and have also been influenced by some of the primary objects in numerical analysis. The textbook has a wonderful diagram, a sort of flow chart, for solving linear systems on p. 366. I think our discussions in class have been sufficient for you to verify the diagram, and the diagram contains just about everything you needs to know about the theory behind solution of systems of linear equations.

My class metaphor here was the reduced row echelon monster, whose name is 'Tilda. 'Tilda roams the linear algebra woods, and eats matrices, and excretes them in RREF. 'Tilda is a large and peaceful lizard, with a much simpler internal construction than, say, the derivative monster. The internal organs of 'Tilda only include add, subtract, multiply, and divide, for use in the row operations. The derivative monster is much more ill-tempered, because it must deal with many different digestive processes.

The textbook has a wonderful diagram, a sort of flow chart, for solving linear systems on p. 366. I think our discussions in class have been sufficient for you to verify the diagram, and the diagram contains just about everything you need to know about the theory behind solution of systems of linear equations. The only remaining definition you need is that of the rank of a matrix. The rank is the number of non-zero rows in the RREF of the matrix. The 3-by-4 RREF's displayed above are shown with rank=3, then rank=2, then rank=1, and finally, rank=0. Here is my version of the diagram in HTML:

For m linear equations in n unknowns AX=B
Two cases: B=0 and B not 0. Let rank(A)=r.

                             AX=0
                              |        
                              |
                              |
                             \ /
                              v
            -----------------------------------
            |                                 |         
            |                                 |
            |                                 |
           \ /                               \ /
            v                                 v
     Unique sol'n: X=0             Infinite number of sol'ns.
     rank(A)=n A                   rank(A)<n, n-r arbitrary
                                   parameters in the sol'n B

                                  AX=B, B not 0
                                         |        
                                         |
                                         |
                                        \ /
                                         v
                       -----------------------------------
                       |                                 |         
                       |                                 |
                       |                                 |
                      \ /                               \ /
                       v                                 v
                  Consistent                      Inconsistent 
                  rank(A)=rank(A|B)               rank(A)<rank(A|B) E
                       |
                       |        
                       |        
                      \ /
                       v
       ----------------------------------
       |                                |
       |                                |
       |                                |
      \ /                              \ /
       v                                v          
  Unique solution               Infinite number of sol'ns    
  rank(A)=n C                   rank(A)<n
                                n-r arbitrary parameters  
                                in the sol'n D

The red letters refer to examples which I will give to illustrate each outcome. Also (vocabulary!), consistent means the system has solutions, and inconsistent means there are none. Here is a tabular listing of the alternatives, if you find this more palatable. The B=0 case, the homogeneous system, always has the trivial solution (used for, say, deciding linear independence). So the B=0 case is always consistent. Two alternatives can occur:

AX=0: m equations in n unknowns; B=0

I II

Unique sol'n: X=0
rank(A)=n A Infinite number of solutions.
rank(A)<n, n-r arbitrary parameters
in the solutions B

AX=0: m equations in n unknowns; B=0
I	II
Unique sol'n: X=0 rank(A)=n A	Infinite number of solutions. rank(A)<n, n-r arbitrary parameters in the solutions B

When B is not zero then we've got:

AX=0: m equations in n unknowns; B not 0

Consistent
rank(A)=rank(A|B) Inconsistent
rank(A)<rank(A|B)

III IV V

Unique solution
rank(A)=n C Infinite number of solutions rank(A)<n-r arbitrary parameters
in the solutions D No solutions E

AX=0: m equations in n unknowns; B not 0
Consistent rank(A)=rank(A\|B)	Inconsistent rank(A)<rank(A\|B)
III	IV	V
Unique solution rank(A)=n C	Infinite number of solutions rank(A)<n-r arbitrary parameters in the solutions D	No solutions E

By the way, I do not recommend that you memorize this information. No one I know has done this, not even the most compulsive. But everyone I know who uses linear algebra has this installed in their brains. As I mentioned in class, I thought that a nice homework assignment would be for students to find examples of each of these (in fact, there have already been examples of each of these in the lectures!). The problem with any examples done "by hand" is that they may not reflect reality. To me, reality might begin with 20 equations in 30 unknowns, or maybe 2,000 equations ....

The examples follow, and are (I hope!) simple. They are different from the examples I gave in class, since I certainly left my brain home before class. Remember:m=number of equations; n=# of variables; r=rank(A), where A is the coefficient matrix:

2x+3y=0
-5x+7y=0
m=2; n=2; r=2. r=2 since
```
( 2  3)~( 1  3/2)~( 1 0 )
(-5  7) ( 0 29/2) ( 0 1 )
```
I think you could already have seen that the rows of this 2-by-2 matrix were linearly independent just by looking at it (don't try "just looking at" a random 500-by-300 matrix!), so r=2. There is a unique solution, x=0 and y=0, the trivial solution.
Here's another example of this case:
2x+3y=0
-5x+7y=0
4x+5y=0
m=2; n=2; r=2. r=2 since r is at least 2 using the previous row reduction, and r can be at most 2 since the number of variables is 2. There is a unique solution, x=0 and y=0, the trivial solution.
2x+3y=0
m=1; n=2; r=1. The solutions are x=[3/2]t and y=-[2/3]t, or all linear combinations of ([3/2],-[2/3]), a basis of the 1 dimensional solution space.
2x+3y=1
-5x+7y=2
m=1; n=2; r=2. Since here rank(A)=rank(A|B) and further row reduction shows that
```
 
( 2  3 | 1)~( 1  3/2 | 1/2)~( 1 0 | 1/2 - (3/2)·(9/29))
(-5  7 | 2) ( 0 29/2 | 9/2) ( 0 1 |       9/29        )
```
There's exactly one solution which row reduction has produced: x=1/2-(3/2)·(9/29)) and y=9/29.
2x+3y=1
m=1; n=2; r=1. The solutions are (-1,1)+t([3/2],-[2/3]), an infinite 1-parameter family of solutions. Where did this come from? Here we first searched for a particular solution of 2x+3y=1, and we guessed x_p=-1 and y_p=1. (Don't try this with a big system: use row reduction or, better, use a computer!). Then I looked at 2x+3y=0 and used list of all solutions of the associated homogeneous system given in example B, above: x_h=[3/2]t and y_h=-[2/3]t. Now what? We use linearity:
2x_p+3y_p=1
2x_h+3y_h=0 (a list of all solutions)
2(x_p+x_h)+3(x_p+y_h)=1.
2x+3y=1
-5x+7y=2
4x+5y=3
Now m=1; n=2; r=2. But look:
```
( 2  3 | 1) ( 1  3/2 | 1/2) ( 1 0 | 1/2 - (3/2)·(9/29))
(-5  7 | 2)~( 0 29/2 | 9/2)~( 0 1 |       9/29        )
( 4  5 | 3) ( 0  -1  |  1 ) ( 0 0 |     1-(9/29)      )
```
I am lazy and I know that 1-(9/29) is not 0, so the row reduction showed that rank(A)=2<rank(A|B)=3: case V, with no solutions.

Well, I understand almost everything in this solution except where the darn 29 come from in the answers to E and F. Part of what we will do next will explain where the 29 comes from.

HOMEWORK
Please read section 8.3 and hand in these problems on Monday: 8.3: 9, 13, 16, 19.

Confession

Thursday, October 5

Linear algebra
terms

Capybaras? "The Capybara (hydrochoerus hydrochaeris) is the world's largest rodent. Full grown Capybaras weigh 35-65 kg and can reach up to 135 cm in length."

I began by writing the following on the side board.

A linear combination of vectors is a sum of scalar multiples of the vectors.
A collection of vectors is spanning if every vector can be written as the linear combination of vectors in the collection.
A collection of vectors is linearly independent if, whenever a linear combination of the vectors is the zero vector, then every scalar coefficient of that linear combination must be zero.

The language and ideas of linear algebra are used everywhere in applied science and engineering. Basic calculus deals fairly nicely with functions defined by formulas involving standard functions. I asked how we could understand more realistic data points. I will simplify, because I am lazy. I will assume that we measure some quantity at one unit intervals. Maybe we get the following:

Tabular presentation
of the data Graphical presentation
of the data

"Independent"
variable (x) "Dependent"
variable (y)

1 5

2 6

3 2

4 4

5 -2

6 4

Piecewise linear interpolation
We could look at the tabular data and try to understand it, or we could plot the data because the human brain has lots of processing ability for visual information. But a bunch of dots is not good enough -- we want to connect the dots. O.k., in practice this means that we would like to fit our data to some mathematical model. For example, we could try (not here!) the best fitting straight line or exponential or ... lots of things. But we could try to do something simpler. We could try to understand this data as just points on a piecewise linear graph. So I will interpolate the data with line segments, and even "complete" the graph by pushing down the ends to 0. The result is something like what's drawn to the right. I will call the function whose graph is drawn F(x).
Well, this interpolation function is not differentiable at various points, but it is continuous and it certainly is rather simple. But in fact it can be written in even a simpler form. Let me show you.

Meet the tent centered at the integer j
First, here is complete information about the function T_j(x) which you are supposed to get from the graph of T_j(x): (j should be an integer)

This is a peculiar function. It is 0 for x<j-1 and for x>j+1. It has height 1 at j, and interpolates linearly through the points (j-1,0), (j,1), and (j+1,0). I don't much want an explicit formula for T_j(x): we could clearly get one, although it would be a piecewise thing. Actually, T_j(x) could be written nicely in Laplace transform language!

We can write F(x) in terms of these T_j(x)'s. Moving (as we were accustomed in Laplace transforms!) from left to right, we first "need" T₁(x). In fact, consider 5T₁(x) and compare it to F(x). I claim that these functions are exactly the same for x<=1. Well, they are both 0 for x<=0. Both of these functions linearly interpolate between the points (0,0) and (1,5), so in the interval from 0 to 1 the graphs must be the same (two points still do determine a unique line!). Now consider 5T₁(x)+6T₁(x) and F(x) compared for x's less than 2. Since T₂(x) is 0 for x<1, there is no interference in the interval (-infinity,1]. But between 1 and 2, both of the "pieces" T₁(x) and T₂(x) are non-zero. But the sum of 5T₁(x)+6T₁(x) and F(x) match up at (1,5) and (2,6) because we chose the coefficients so that they would. And both of the "tents" are degree 1 polynomials so that their sum is also, and certainly the graph of a degree 1 polynomial is a straight line, so (again: lines determined by two points!) the sum 5T₁(x)+6T₁(x) and F(x) totally agree in the interval from 1 to 2. Etc. What do I mean? Well, I mean that
F(x)= 5T₁(x)+6T₁(x)+2T₁(x)+4T₁(x)+ -2T₁(x)+4T₁(x).
These functions agree for all x.

Linear combinations of the tents span these piecewise linear functions
If we piecewise linearly interpolate data given at the integers, then the resulting function can be written as a linear combination of the T_j's. Such linear combinations can be useful in many ways (for example, the definite integral of F(x) is the sum of constants multiplied by the integrals of the T_j(x), each of which has total area equal to 1!). The T_j(x)'s are enough to span all of these piecewise linear functions.

But maybe we don't need all of the T_j's. What if someone came up to you and said, "Hey, you don't need T₃₃(x) because:"
T₃₃(x)=53T₁₂(x)-4T₅(x)+9T₁₄(x)
Is this possibly correct? If it were correct, then the function T₃₃(x) would be redundant (extra, superfluous) in our descriptions of the piecewise linear interpolations, and we wouldn't need it in our linear combinations. But if
T₃₃(x)=53T₁₂(x)-4T₅(x)+9T₁₄(x)
were correct, it should be true for all x's. This means we can pick any x we like to evaluate the functions, and the resulting equation of numbers should be true. Hey: let's try x=33. This is not an especially inspired choice, but it does make T₃₃'s value equal to 1, and the value of the other "tents" in the equation equal to 0. The equation then becomes 1=0 which is currently false.
Therefore we can't throw out T₃₃(x). In fact, we need every T_j(x) (for each integer j) to be able to write piecewise linear interpolations.
We have no extra T_j(x)'s: they are all needed.

Let me rephrase stuff using some linear algebra language. Our "vectors" will be piecewise linear interpolations of data given at integer points, like F(x). If we consider the family of "vectors" given by the T_j(x)'s, for each integer j, then:

This family spans every thing. Every piecewise linear function is a sum of scalar multiples of the T_j(x)'s.
This family is linearly independent. We need each member, each specific T_j(x), in order to be able to write any F(x) as a linear combination of the T_j(x)'s.

I emphasized that one reason I wanted to consider this example first is because we use linear algebra ideas constantly, and presenting them in a more routine setting may discourage noticing this. My second major example does, however, present things in a more routine setting, at least at first.

My "vectors" will be all polynomials of degree less than or equal to 2. So one example is 5x²-6x+(1/2). Another example is -(Pi)xx²+0x+223.67, etc. What can we say about this stuff?

I claim that every polynomial can be written as a sum of 1 and x and x² and the Clark polynomial, which was something like C(x)=3x²-9x+2. It was generous of Mr. Clark to volunteer his polynomial. I verified that, say, the polynomial 17x²+44x-98 could indeed be written as a sum of 1 and x and x² and the (fabulous) Clark polynomial, C(x)=3x²-9x+2 (well, it was something like this!). Thus I need to find numbers filling the empty spaces in the equation below, and the numbers should make the equation correct.
17x²+44x-98=[ ]1+[ ]x+[ ]x²+[ ]C(x)
Indeed, through great computational difficulties I wrote
2+44x-98=-1021+62x+11x²+2C(x)
(I think this is correct, but again I am relying upon carbon-based computation, not silicon-based computation!)

We discussed this and concluded that the span of 1 and x and x² and C(x) is all the polynomials of degree less than or equal to 2. But, really, do we need all of these? That is, do we need the four numbers (count them!) we wrote in the equation above, or can we have fewer? Well, 1 and x and x² and C(x) are not linearly independent. They are, in fact, linearly dependent. There is a linear combination of these four which is zero. Look:
1C(x)+(-3)x²+(9)x+(-2)1=0.
so that C(x)=3x²+-9x+21
and we can "plug" this into the equation
17x²+44x-98=-1021+62x+11x²+2C(x)
to get 17x²+44x-98=-1021+62x+11x²+2(3x²+-9x+21)

But a linear combination of linear combinations is also a linear combination (they concatenate well). So {1,x,x²,C(x)} certainly span the quadratic polynomials, but so does {1,x,x²}. We also notice that {1,x,C(x)} spans everything. But are there any more redundancies? Does {1,C(x)} span all quadratic polynomials? Mr. Mostiero suggested that I try to write 1+x² as (scalar)1+(another scalar)C(x). In order to get the x² term, I need "another scalar" to be not zero. But then the sum (scalar)1+(another scalar)C(x) must have a term involving x, and therefore cannot be equal to 1+x².

We could sort of diagram what's going on. There seem to be various levels. The top level, where the collections of "vectors" are largest, are spanning sets. Certainly any set bigger than a spanning set is also a spanning set! Now shrink the set to try eliminate redundancies, extra "vectors" that you really don't need for descriptions. There may be various ways to do this shrinking. Clearly (?) you can continue shrinking until you eliminate linear dependencies, and the result will be a linearly independent set. Once the set is linearly independent, then even smaller sets will be linearly independent. But if you continue shrinking, you may lose the spanning property. You may not be able to describe everything in terms of linear combinations of the designated "vectors".

Spanning (with redundancy)         {1,x,x²,C(x)}   SPANNING!      
                                    /    \         SPANNING!        
                                   /      \        SPANNING!    
Spanning, no redundancy     {1,x,x²}    {1,x,C(x)} SPANNING! LINEARLY INDEPENDENT!                 
(linearly independent)                       |               LINEARLY INDEPENDENT!         
                                             |               LINEARLY INDEPENDENT!         
                                             |               LINEARLY INDEPENDENT!         
Not spanning (not big                    {1,C(x)}            LINEARLY INDEPENDENT!         
enough)

There's sort of one level where spanning and linearly independent overlap. That's the collections of "vectors" where everything can be described by linear combinations and there are no redundancies.

Some weird polynomials to change our point of view
Look at these polynomials of degree 2:
   P(x)=(x-1)(x-2)
   Q(x)=(x-1)(x-3)
   R(x)=(x-2)(x-3)
Why these polynomials? Who would care about such silly polynomials?

Are these polynomials linearly independent?
This was the QotD. I remarked that I was asking students to show that if
A P(x)+B Q(x)+C R(x)=0
for all x, then the students would need to deduce that A=0 and B=0 and C=0.

What I expected from students in their answers
Well, take the equation A P(x)+B Q(x)+C R(x)=0 and plug in x=1. Then we get (since P(1)=0 and Q(1)=0 and R(1)=2) that 2C=0 so that C must be 0. Similarly, x=2 gives B=0 and x=3 gives A=0. So the linear combination is the trivial linear combination, and the polynomials are linearly independent.
What I was doing while the students were writing
At the board I computed P(x)=x²-3x+2 and Q(x)=x²-4x+3 and R(x)=x²-5x+6. Then I wrote the equation
A P(x)+B Q(x)+C R(x)=0
as the equation A (x²-3x+2)+B (x²-4x+3)+C (x²-5x+6)=0
and then I further distributed and got:
(A+B+C)x²+(-3A-4B-5C)x+(2A+3B+6C)=0
which results in the linear system
```
 1A+1B+1C=0
-3A-4B-5C=0
 2A+3B+6C=0
```
and then I row reduced (by hand!) the coefficient matrix:
```
( 1  1  1 ) ( 1  1  1 ) ( 1  0  1 ) ( 1  0  0 )
(-3 -4 -5 )~( 0 -1 -2 )~( 0  1  2 )~( 0  1  0 ) 
( 2  3  6 ) ( 0  1  4 ) ( 0  0  2 ) ( 0  0  1 )
```
This shows that the original system was row equivalent to the system A=0 and B=0 and C=0 (remember that "row equivalent"<==>"same solution set"), therefore there are no solutions to the equation A P(x)+B Q(x)+C R(x)=0 except the trivial solution. And therefore P(x) and Q(x) and R(x) are linearly independent: none of them are "redundant".

But can I describe all of the deg<=2 polys this way?
I assert that every polynomial of degree less than or equal to 2 can be described as a linear combination of P(x) and Q(x) and R(x). How would I verify this claim? Please note that I am more interested in the logic than the computational details here!
I should be able to write x² as sum of P(x) and Q(x) and R(x). This means I should be able to solve the equation
A P(x)+B Q(x)+C R(x)=x².
Just as above, this leads to an augmented matrix which looks like:

( 1  1  1 | 1 )   ( 1  0  0 | FRED   )
(-3 -4 -5 | 0 )~~~( 0  1  0 | MARISSA) 
( 2  3  6 | 0 )   ( 0  0  1 | IRVING )

I know this is true, since I already did the row operations above. Right now I am not totally interested in the values of FRED and MARISSA and IRVING but I know that the row operations just involve adding and multiplying and interchanging, so there must be such numbers. And therefore there are numbers which satisfy the equation:
FRED P(x)+MARISSA Q(x)+IRVING R(x)=x².
Therefore x² is in the span of P(x) and Q(x) and R(x). What about x? The system of equations which results has augmented matrix

( 1  1  1 | 0 )
(-3 -4 -5 | 1 )
( 2  3  6 | 0 )

and just as before there will be solutions, and so x is in the span of P(x) and Q(x) and R(x). And so is 1. Since the linear combinations of x² and x and 1 are all of the polynomials of degree 2 or less, and each of x² and x and 1 is a linear combination of P(x) and Q(x) and R(x), I know that the span of P(x) and Q(x) and R(x) is all polynomials of degree 2 or less. So each of P(x) and Q(x) and R(x) is needed and there are "enough" of them. Notice that all I needed to do was verify that the RREF of the matrix above was in the 1 0 0 etc. form. Then everything automatically followed!

VOCABULARY TIME! A collection of vectors is a basis if the vectors are linearly independent and if every vector is a linear combination of these vectors (that is, the span of these vectors is everything).

So P(x) and Q(x) and R(x) are a basis of the polynomials of degree less than or equal to 2.

Why would we look at P(x) and Q(x) and R(x)?
Suppose again I have data points, let's say (1,13) and (2,18) and (3,-9). I could do linear interpolation as we did above. If I want to be a bit more sophisticated, and get maybe something smoother, I could try to get a polynomial Ax²+Bx+C which fits these data points. Here is the polynomial: [-9/2]P(x)+[18/-2]Q(x)+[13/-2]R(x). How was this remarkable computation done? Well, I know certain function values

The function Its values when
x=1 and x=2 and x=3

P(x) 0 0 2

Q(x) 0 -1 0

R(x) 2 0 0

The function	Its values when x=1 and x=2 and x=3
P(x)	0	0	2
Q(x)	0	-1	0
R(x)	2	0	0

so when I "plug in" x=1 and x=2 and x=3 in the linear combination A P(x)+B Q(x)+C R(x) and I want to get 13 and 18 and -9, respectively, the structure of the table makes it very easy to find A and B and C. If we want to interpolate quadratically, I would get a function defined by a simple formula using this basis. In fact, these functions are very useful in quadratic interpolation, and in the use of splines, a valuable technique for numerical approximation of solutions of ordinary and partial differential equations.

HOMEWORK
An exam is coming. Please look at the review material. I will be in Hill 525 at 4 PM on Sunday.

Confession

Monday, October 3

Linear algebra
terms

Consider the following system of 3 linear equations in 4 unknowns:

   3x₁+2x₂+x₃-x₄=A
   4x₁-1x₂+5x₃+x₄=B
   2x₁+5x₂-3x₃-3x₄=C

These questions will be important:

For which values of A and B and C does this system have any solutions?
If A and B and C have values for which there are solutions, describe the solutions in some effective way.

These questions turn out to be useful to consider. I tried to show in class how to answer them "by hand". First I remarked that there is a great deal of writing which can be discarded: the explicit mention of the variables x₁ through x₄, for example. We can, in fact, abbreviate the system by just considering first, the matrix of coefficients and then augmenting this matrix with the inhomogeneous parts of the equations. This is frequently written by hand to look like

(3  2  1 -1 | A)
(4 -1  5  1 | B)
(2  5 -3 -3 | C)

The vertical bar is used to distinguish between the two sides of the equations. It is useful to change the collection of linear equations so that the system becomes easier to understand. These changes should all be reversible, so that these equivalent systems will have the same set of solutions.

There are all sorts of things one can do with elementary row operations, but one very useful goal is to change the coefficient matrix into reduced row echelon form, RREF. This means:

Any initial non-zero entry in each row is 1.
The other column entries for an initial non-zero entry of 1 will be 0.
Initial 1's to the right always occur in lower rows. That is, if a_ij is an initial 1 and a_mn is another initial 1, then if i<m, j must be <n.

Here are some matrices in RREF:

 3 by 4         2 by 7
(1 0 0 0)   (1 0 0 0 0 4 4)
(0 0 1 5)   (0 0 0 0 1 2 3)
(0 0 0 0)

And here is essentially a complete list of all possible 3 by 3 RREF matrices:

(1 0 0)  (1 0 *)  (1 * *)  (1 * 0)  (0 1 0)  (0 1 *)  (0 0 1)  (0 0 0) 
(0 1 0)  (0 1 *)  (0 0 0)  (0 0 1)  (0 0 1)  (0 0 0)  (0 0 0)  (0 0 0) 
(0 0 1)  (0 0 0)  (0 0 0)  (0 0 0)  (0 0 0)  (0 0 0)  (0 0 0)  (0 0 0)

The entries * may be any number, 0 or non-zero. There aren't very many and they are easy to understand. Notice that each of these is essentially different from the others: they represent coefficient matrices with very different solution sets.

My example
I then tried to use row operations on the augmented matrix of my system of linear equations so that the coefficient matrix was in RREF. I did this by hand. What follows is taken from a Maple session (I did remove my errors!). Maple does have a rref command, but it also allows row operations one at a time. I will comment in this manner about each command.

>with(linalg):
This loads the linear algebra package.
>M:=matrix(3,5,[3,2,1,-1,A,4,-1,5,1,B,2,5,-3,-3,C]);
                        [3     2     1    -1    A]
                        [                        ]
                   M := [4    -1     5     1    B]
                        [                        ]
                        [2     5    -3    -3    C]
This command just creates a matrix of specified size
with the listed entries.
>mulrow(M,1,1/3);
                   [1    2/3    1/3    -1/3    A/3]
                   [                              ]
                   [4    -1      5      1       B ]
                   [                              ]
                   [2     5     -3      -3      C ]
Multiply row 1 by a third. Creates an initial 1.
>addrow(%,1,2,-4);
              [1     2/3     1/3     -1/3       A/3   ]
              [                                       ]
              [                                4 A    ]
              [0    -11/3    11/3    7/3     - --- + B]
              [                                 3     ]
              [                                       ]
              [2      5       -3      -3         C    ]
Add -4 times row 1 to row 2. This "pivots" and makes the
(2,1) entry of the coefficient matrix 0.
>addrow(%,1,3,-2);
              [1     2/3      1/3     -1/3       A/3   ]
              [                                        ]
              [                                 4 A    ]
              [0    -11/3    11/3     7/3     - --- + B]
              [                                  3     ]
              [                                        ]
              [                                 2 A    ]
              [0    11/3     -11/3    -7/3    - --- + C]
              [                                  3     ]
Add -2 times row 1 to row 3. So the (3,1) entry becomes 0.

>mulrow(%,2,-3/11);
              [1    2/3      1/3     -1/3       A/3   ]
              [                                       ]
              [                       -7     4 A   3 B]
              [0     1       -1       --     --- - ---]
              [                       11     11    11 ]
              [                                       ]
              [                                2 A    ]
              [0    11/3    -11/3    -7/3    - --- + C]
                                                3
Makes another leading 1.
>addrow(%,2,1,-2/3);
              [                               A     2 B]
              [1     0        1      1/11    ---- + ---]
              [                               11    11 ]
              [                                        ]
              [                       -7     4 A   3 B ]
              [0     1       -1       --     --- - --- ]
              [                       11     11    11  ]
              [                                        ]
              [                                2 A     ]
              [0    11/3    -11/3    -7/3    - --- + C ]
              [                                 3      ]
Makes the (1,2) entry equal to 0.
>addrow(%,2,3,-11/3);
                [                          A     2 B ]
                [1    0     1    1/11     ---- + --- ]
                [                          11    11  ]
                [                                    ]
                [                 -7      4 A   3 B  ]
                [0    1    -1     --      --- - ---  ]
                [                 11      11    11   ]
                [                                    ]
                [0    0     0     0      -2 A + B + C]
And now the (3,2) entry is 0. The coefficient matrix 
is now in RREF.

Well, this went a heck of a lot better than when I did it in class. I will try, as I said, to avoid doing very much row reduction in class. I am totally inept at it.

Now back to the questions:

For which values of A and B and C does this system have any solutions?
If A and B and C have values for which there are solutions, describe the solutions in some effective way.

The last equation in the system now reads: -2A+B+C=0. If this compatibility condition is not satisfied, then there can be no solution to the system. For example, if A=2 and B=7 and C=-4, then -2A+B+C is -2(2)+7-4=-1 and the original system will have no solution. This is not really totally clear to me if I just look at the original system with those values of A and B and C. A system which has no solutions is called inconsistent.

What if I know -2A+B+C=0? It turns out that there are solutions: the system is consistent. Let me check this claim by choosing some values of A and B and C which will make -2A+B+C=0 true. How about A=4 and B=3 and C=5? Then -2A+B+C=-2(4)+3+5=0 (I hope). The RREF system then becomes (inserting these values of A and B and C [I did this in my head so there may be ... errors]):

                [                          10  ]
                [1    0     1    1/11     ---- ]
                [                          11  ]
                [                              ]
                [                 -7        7  ]
                [0    1    -1     --    - ---- ]
                [                 11       11  ]
                [                              ]
                [0    0     0     0         0  ]

Then the first equation (unabbreviated) is x₁+x₃+(1/[11])x₄=([10]/[11]) so that
x₁=-x₃-(1/[11])x₄+([10]/[11]).
Similarly the second equation becomes
x₂=x₃+(7/[11])x₄-([7]/[11]).

Be sure to look carefully at the signs, to check on what I've written. The equations have been written this way so that you can see that x₃ and x₄ are free. That is, I can give any values for these variables. Then the other variables (x₁ and x₂) will have their values specified by what is already given. So: we select A and B and C satisfying the compatibility condition. Then there always will be a two-parameter family of solutions to the original system of linear equations. Notice that we get solutions exactly when the compatibility condition is satisfied: there are solutions if and only if (as math folks might say) the compatibility condition is correct.

The logic here is actually "easy". Since all the computational steps we performed are reversible, I know that the assertions I just made are correct. What is more wonderful is that the general situation will always be much like this.

What RREF does in general
Take your original augmented matrix, and put the coefficient matrix into RREF. Then you get something like

(BLOCK OF 1's & 0's WITH|  JJJJJJ  UU  UU  NN  N  K K | Linear stuff )
(THE 1's MOVING DOWN AND|    JJ    UU  UU  N N N  KK  | from original)
(TO THE RIGHT.          |  JJJJ    UUUUUU  N  NN  K K |  right sides )
(--------------------------------------------------------------------)
(                    MAYBE HERE SEVERAL               |   More such  )
(                       ROWS OF 0'S                   | linear stuff )

I then started to examine other systems of linear equations. I had several purposes, and one was to introduce some additional linear algebra vocabulary.

Linear combination
Is (1,2,3,4) a linear combination of (1,1,-1,0) and (1,0,1,1) and (1,0,0,1)?
Suppose v is a vector and w₁, w₂, w₃, ..., w_k are other vectors. Then v is a linear combination of w₁, w₂, ...,w_k if there are scalars c₁, c₂, ..., c_k so that v=SUM_j=1^kc_jw_j
Here v corresponds to (1,2,3,4), and k=3 (there are three w_l;s). Are there scalars (here, just real numbers) so that
(1,2,3,4)=c₁(1,1,-1,0)+c₂(1,0,1,1)+c₃(1,0,0,1)?
Please recognize that this is the same as asking for solutions of

1c₁+1c₂+1c₃=1
1c₁+0c₂+0c₃=2
-1c+1c₂+0c₃=3
0c₁+1c₂+1c₃=4

This isn't what I did in class, but then I did something easier in class. I will change the coefficient matrix to RREF form.

                A                 B                 C
( 1  1  1 | 1 )   ( 1  0  0 | 2 )   ( 1  0  0 | 2 )   ( 1  0  0 | 2 )
( 1  0  0 | 2 ) ~ ( 1  1  1 | 1 ) ~ ( 0  1  1 |-1 ) ~ ( 0  1  1 |-1 ) 
(-1  1  0 | 3 )   (-1  1  0 | 3 )   ( 0  1  0 | 5 )   ( 0  0 -1 | 6 ) 
( 0  1  1 | 4 )   ( 0  1  1 | 4 )   ( 0  1  1 | 4 )   ( 0  0  0 | 5 )

The tilde, ~, is frequently used to indicate that systems of equations are equivalent.
A The first row operation I used switched the first and second rows.
B I used the first row to zero out (?) the other entries in the first column. Then C I used multiples of the second row to zero out the other entries in the second column.
I decided to stop here even though I didn't complete the row reduction. Look at the last "equation", or rather, let's look at the equation represented by the last row. It is
0c₁+0c₂+0c₃=5
This system of linear equations has no solution. The system is inconsistent. The compatibility condition is not satisfied.

Therefore the answer to the original question:
Is (1,2,3,4) a linear combination of (1,1,-1,0) and (1,0,1,1) and (1,0,0,1)?
is No.

Another linear combination
Can t² be written as a linear combination of t(t+1) and t(t-1) and (t-1)(t+1)?
Here there are again three "vectors", w₁ and w₂ and w₃, but the vectors are polynomials. Are there numbers c₁ and c₂ and c₃ so that
t²=c₁t(t+1)+c₂t(t-1)+c₃(t-1)(t+1)?

I can tranlate this into a system of linear of equations:

The t² coefficients: 1 = 1c₁+1c₂+1c₃
The t¹ coefficients: 0 = 1c₁-1c₂+0c₃
The t⁰ coefficients: 0 = 0c₁+0c₂-1c₃

I could do row reduction, but the last equation tells me that c₃=0, and then the second equation tells me that c₁=c₂ and the first equation says their sum is 1, so that c₁=1/2 and c₂=1/2 finish a solution to this system of equations.

Therefore the answer to the original question:
Can t² be written as a linear combination of t(t+1) and t(t-1) and (t-1)(t+1)?
is Yes.

Linear independence
Are t(t+1) and t(t-1) and (t-1)(t+1) linearly independent?
Suppose w₁, w₂, w₃, ..., w_k are vectors. Then w₁, w₂, ...,w_k are linearly independent if whenever the linear combination v=SUM_j=1^kc_jw_j=0, then all of the scalars c_j MUST be 0. In other words, the vector equation v=SUM_j=1^kc_jw_j=0 only has the trivial solution.

This may be the most important definition in linear algebra. It contains an "If ... then ..." statement which should be understood. In geometric terms, you could think that linearly independent vectors point off into different directions. Maybe this makes you happy, but my basic "geometric" intution has definitely deserted me above 47 dimensions!

So I need to consider the equation
c₁t(t+1)+c₂t(t-1)+c₃(t-1)(t+1)+0
and then try to see what this tells me about the c_j's. The most important ingredient here is LOGIC. First let me convert this polynomial equation into a system of linear equations.

The t² coefficients: 1 = 1c₁+1c₂+1c₃
The t¹ coefficients: 0 = 1c₁-1c₂+0c₃
The t⁰ coefficients: 0 = 0c₁+0c₂-1c₃

The third equation tells me that c₃ must be 0. The second equation tells me that c₂=c₁, and then the first equation says that 2c₁=0 so c₁ must be 0 also. And so c₂ must be 0. Wow.

Or we could go to the RREF. Here is what Maple says (this is a homogeneous system so I only need the coefficient matrix):

>with(linalg):
> A:=matrix(3,3,[1,1,1,1,-1,0,0,0,-1]);
                                                            [1     1     1]
                                                            [             ]
                                                       A := [1    -1     0]
                                                            [             ]
                                                            [0     0    -1]

> rref(A);
                                                          [1    0    0]
                                                          [           ]
                                                          [0    1    0]
                                                          [           ]
                                                          [0    0    1]

Therefore all of the c_j's must be zero.

Therefore the answer to the original question:
Are t(t+1) and t(t-1) and (t-1)(t+1) linearly independent? is Yes.

Just one more ...
Are the functions cos(t), sin(t), and cos(t-78) linearly independent?
So I'm asking if there are scalars so that
c₁cos(t)+c₂sin(t)+c₃cos(t-78)=0, and the implication of the question is that this should be true for all t.

There were many complaints. Engineering students were complaining about the absurdity of 78. There are many sillier numbers. Also there was little understanding of the question. Huh ... well, an inspiration arose ... cos(t-78): we can regress to a more primitive form of life, and recall a trig formula (which is on the formula sheet for the first exam!):
cos(A+B)=cos(A)cos(B)-sin(A)sin(B)

If A=t and B=-78, then cos(t-78)=cos(t)cos(-78)-sin(t)sin(-78). So look at
c₁cos(t)+c₂sin(t)+c₃cos(t-78)=0
and realize (with cos(-78)=cos(78) and sin(-78)=-sin(78)) that
-cos(78)cos(t)-sin(78)sin(t)+cos(t-78)=0

Therefore the answer to the original question:
Are the functions cos(t), sin(t), and cos(t-78) linearly independent?
is NO. You could take c₁=-cos(78) and c₂=-sin(78) and c₃=1. I hope I got the signs right.

Word of the day regress
1. To go back; move backward.
2. To return to a previous, usually worse or less developed state.

QotD
The QotD represents an opportunity for people to earn 5 points on next week's exam. Also, it allows students to convince me that they can convert a matrix to RREF. I gave everyone a randomly chosen 3 by 5 matrix, with entries integers between -3 and 3. I asked people to tell me the RREF of the matrix. Any one who has not earned 5 points for the exam can try again out of class (in my office, Hill 542, for example).

Here, for the curious, is the Maple code which created these problems:

>with(linalg):
> z:=rand(-3..3);
    z := proc() (proc() option builtin; 391 end proc)(6, 7, 3) - 3 end proc

Creates a "random number" procedure, with output -3,-2,-1,0,1,2,3.

 
> ww:=(p, q) -> matrix(p, q, [seq(z(), j = 1 .. p*q)]);
            ww := (p, q) -> matrix(p, q, [seq(z(), j = 1 .. p q)])

Creates a p by q matrix with entries random outputs of the procedure z previously defined.

> zz:=proc(p, q, A, B) C := ww(p, q); B := rref(C); A:=C; return end;
Warning, `C` is implicitly declared local to procedure `zz`
zz := proc(p, q, A, B)
local C;
    C := ww(p, q); B := linalg:-rref(C); A := C; return
end proc

Creates a matrix and gets its RREF.

> zz(3,5,frog,toad);eval(frog);eval(toad);
                          [1    -2    -1    -1     2]
                          [                         ]
                          [2     0    -1     3    -3]
                          [                         ]
                          [0    -2    -1    -2    -3]

                          [1    0    0     1      5]
                          [                        ]
                          [0    1    0    3/2    -5]
                          [                        ]
                          [0    0    1    -1     13]

One example of the final procedure.

HOMEWORK
Prepare for the exam. Review suggestions are available.

Confession

Thursday, September 29

Linear algebra
terms

Linear algebra studies the structure of solutions of systems of linear equations and the methods for obtaining these solutions. So we'd better describe some of these ideas. All this will be interesting because my notes for the lecture seem to have disappeared, perhaps eaten by a large imaginary dog.

A linear equation is a sum of scalar multiples of unknown quantities equal to a scalar. So
3x₁-77x₂=408sqrt(17) is a linear equation.
I will frequently call the unknown quantities x₁, x₂, ... etc. So a linear equation is SUMa_jx_j=c where all the a_j's and c are scalars.

A system of linear equations is a collection of linear equations. So

41x₁-.003x₂+Pi x₃=98
-x₁+0x₂-441,332x₃=0

is a system of linear equations.

Frequently we will abbreviate a system of linear equations. The actual variable names (x₃ and x₄₄, etc.) may not be important. The coeffients of the variables (as an ordered set!) and the constants are important. The system

41x₁-.003x₂+Pi x₃=98
-x₁+0x₂-441,332x₃=0

has coefficient matrix equal to

( 41 -.003    Pi    )
( -1    0  -441,332 )

and its augmented matrix is

( 41 -.003    Pi    | 98 )
( -1    0  -441,332 |  0 )

The use of the vertical bars (|) to separate the coefficients and the constants is conventional in your textbook and many other places, but not everywhere.

Matrix algebra
A matrix is a rectangular array. If the number of rows is n and the number of columns is m, the matrix is said to be nxm (read as "n by m"). If A=(a_ij) is a matrix with entries a_ij, then i refers to the row number and j refers to the cokumn number.

Matrices of the same size can be added.

 
( 2  3 9 1 )   ( 0 9 -4 Pi )   ( 2 12 5 Pi+1 ) 
( 3 -4 0 7 ) + (10 1 -7 0  ) = (13 -3 -7 7 ) 
( 5  0 4 12)   ( 3 3  3 3  )   ( 8 3 7 15 )

so if A and B are both n-by-m matrices, the ij^th entry of A+B is a_ij+b_ij.

Matrices can be multiplied by scalars, so that

 ( 9 4)   ( 63 28 )
7( Q P) = ( 7Q 7P ) 
 ( 3 1)   ( 21  7 )

The ij^th entry in cA is ca_ij.

Matrix multiplication is more mysterious, and why it occurs the way it does may not be completely obvious. Suppose somehow that quantities x₁ and x₂ are controlled by y₁ and y₂ by the equations

3x₁-2x₂=y₁
5x₁+7x₂=y₂

Maybe this is some kind of reaction or linkage or something. Also now suppose that y₁ and y₂ are in turn ?controlled by quantities z₁ and z₂, and this control is also expressed by a system of equations:

4y₁+15y₂=z₁
8y₁-7y₂=z₂

How can we expressed the relationship between the pair z₁ and z₂ and the pair x₁ and x₂ more directly? Well, we can plug in the equations for the x's into the equations for the y's:

4(3x₁-2x₂)+15(5x₁+7x₂)=z₁
8(3x₁-2x₂)-7(5x₁+7x₂)=z₂

This sort of concatentation of control does occur quite often in many mathematical models. If we multiply through, we see that the equations have become:

(4·3+15·5)x₁+(4·-2+15·7)x₂=z₁
(8·3+-7·5)x₁+(8·-2+-7·7)x₂=z₁

The coefficients combine in sort of a strange way.

    A     ·     B    =      this which is equal to      this
( 4  15 )  ( 3  -2 )      ( 4·3+15·5   4·-2+15·7 )    ( 87  97 )
( 8  -7 )  ( 5   7 )      ( 8·3+-7·5   8·-2+-7·7 )    (-11 -65 )

Matrix multiplication is defined only when the "inner dimensions" coincide. So A·B is defined when A is an n-by-m matrix and B is an m-by-p matrix. The result, C, is an n-by-p matrix, and c_ik=SUM_t=1^ma_itb_tk: it is the dot product of the i^th row of A by the j^th column of B. This "operation" is important and occurs frequently enough so that many chip designers have it "hard-wired".

Matrix multiplication is associative but it is not necessarily commutative. The QotD was to give an example of two 2-by-2 matrices A and B so that AB and BA are not equal.

A solution to a system of linear equations is a n-tuple of scalars which when substituted into each equation of the system makes the equation true. The phrase n-tuple is used since I want to have the scalars ordered as they are substituted> So consider the system

5x₁-x₂+2x₃=23
7x₁+3x₂-x₁=10

The 3-tuple (triple) of numbers (2,1,7) is a solution to this system. The triple (0,0,2) is not a solution, and neither is the triple (1,7,2).

Here are some linear algebra questions:

Suppose (1,2,3,4) and (-9,3,2,2) are solutions to seven linear equations in four unknowns. Are there other solutions?
Answer It turns out that the answer is "Yes, there are always other solutions." This does not depend on the number of equations.

Someone comes up to you on the street and says that there is a system of 218 linear equations in 234 variables which has exactly 568 solutions. Could this possibly be correct?
Answer In fact, this situation could not possibly be correct! We will discover why.
Now we have a homogeneous system of linear equations. Here the right-hand side, not involving the variables, is 0. A homogeneous system always has what is called the trivial solution: all the variables=0. For example, 3x₁+5x₂+2x₃=0. Using the values x₁=0 and x₂=0 and x₃=0 we certainly get a solution (check:0=0). Are there other solutions? I think that x₁=1 and x₂=1 and x₃=-4 make the equation correct. This is a non-trivial solution, because there is at least one variable which is not 0. Homogeneous systems are interesting because they frequently describe what are called "steady-state" solutions of ODE's and PDE's. Learning that there are non-trivial solutions frequently has physical significance. That's the setting for the following almost ludicrous question:
Suppose we have a linear system with 235 equations and 517 unknowns. Does this linear system have a non-trivial solution?
Answer In fact, there is actually, always (!) a solution to this homogeneous system which is not all 0's. I don't think this is obvious.

Systems of linear equations are frequently analyzed by changing them to equivalent systems. Two systems are equivalent if they have the same set of solutions. So, for example, the systems

5x₁-x₂+9x₃=-17
2x₁+33x₂-x₃=3

10x₁-2x₂+18x₃=-34
6x₁+99x₂-3x₃=9

are equivalent.

I want to describe an algorithm which allows us to "read off" whether solutions of systems exist, and, when they exist, to describe them efficiently. The algorithm uses
Elementary row operations

Add one equation to another.
Multiply an equation by a non-zero number.
Interchange equations.

These are all reversible. About the only subtle observation to make is about the adjective "non-zero": if we allowed multiplication by 0 we couldn't reverse the operation.

I will show how to use these operations to change a matrix to reduced row echelon form (RREF). The algorithm is called Gaussian elimination or row reduction or ...

HOMEWORK
Please hand in these problems on Monday. I hope they will be graded and returned to you on Thursday. Maybe this will help with your exam preparation. So: 4.5: 11, 4.6: 11, 8.1: 37, and 8.2: 6. I strongly suggest that you don't look in the back of the book for the answers. Do the problems first!
Further information about the exam will be available soon. There will be a review session on Sunday, October 9. Further information on that will also be available soon.

Maintained by greenfie@math.rutgers.edu and last modified 9/2/2005.


The unit circle and its transform by A. The graph goes from about -5.5 to 5.5 both horizontally and vertically.	The transform by A and by A². The graph goes from about -35 to 35 both horizontally and vertically.	The transform by A² and by A³. The graph goes from about -245 to 245 both horizontally and vertically.

Math 421 diary, fall 2005 Linear Algebra

Thursday, November 3

Monday, October 31

Thursday, October 27

Monday, October 24

Thursday, October 20

Monday, October 17

Thursday, October 13

Thursday, October 5

Monday, October 3

Thursday, September 29

Math 421 diary, fall 2005
Linear Algebra