Class diary for Math 291:01, fall 2002

Diary for Math 291, fall 2002

Now in reverse chronological order!

Date What happened

10/14/2002 Jason Sullivan kindly presented problem #5 of workshop #4. This is a problem which caused some difficulties.
We then went over some problems from chapter 14 in preparation for the exam. Ther instructor managed to use up lots of time by suggesting that students do a problem from the Chain Rule section, translating the Laplacian (sum of the "pure" second partials in x and y) from one coordinate system to another.

10/10/2002

Ah, well, the last real class before our first exam wasn't a total disaster but I didn't quite do what I wanted. Let me first tell what I thought I did, and then comment on what I might additionally have wished to do.

I summarized looking for extreme values for functions of 1 variable. Here goes:

A point p in the real numbers is a local {maximum|minimum} for a function f if the domain of f includes an interval with p in the interior of the interval, and if for all x in that interval, f(x){<|>}=f(p). Now comments about this definition: the definition really is local. The interval doesn't have to be "big", just some interval. So if f(x)=x²-10^-100x⁴ I bet that locally (near 0) f(x)'s values are positive (the effect of the x⁴ term is tiny) except that f(0)=0. So 0 is a local min of f. But certainly when |x| is large, the x⁴ term dominates, and so f does not have an absolute min. The word "strict" is sometimes applied as a modifier to "local" if the = sign is not needed. So the function f(x)=0 has all local maxes (and mins, actually!) but no strict local maxes or mins.
If f is differentiable at a point p, and if f'(p) is not 0, then f does not have a local max or min at p. That's because from the definition of differentiability which we reviewed before, f(p+h)=f(p)+f'(p)h+higher order error. So when h is small, the error will be much less in absolute value than the f'(p)h term, and that will make values to the {right|left} bigger than f(p) if f'(p) is {posi|nega}tive, and values to the {left|right}less that f(p).
p is a critical point of f if either f'(p) doesn't exist or f'(p) equals 0. There are lots of "rough" functions where f' doesn't exist, so that is a possibility which should not be discarded in practice, although in elementary courses it is frequently neglected. And finding p's where f'(p)=0 can be difficult with functions defined by even moderately complicated formulas.
Local {max|min} must occur at critical points. Examples: |x|, +/-x², +/-x³.
How can one guarantee that a critical point is a local max or min? A simple "test" uses the second derivative, goes like this: suppose p has f'(p)=0 (so p is a critical point). Then:

If then

f''(p)>0 p is a local max.

f''(p)<0 p is a local min.

f''(p)=0 no conclusion can be made.

As for the last line of the table, the examples x³ and +/-x⁴ show that the hypotheses can be fulfilled while the function has no local max or min, or that such a function can have a local max or min.
This all really should be thought about in the context of Taylor's Theorem. This result is very important. If a function f has sufficiently many derivatives, then f(p+h)=f(p)+f'(p)h+(f''(p)/2)h²+...+(f⁽ⁿ⁾(p)/n!)hⁿ+Error(f,p,h,n) where (the important thing!) the error term --> faster than hⁿ. This last means precisely that the limit of Error(f,p,h,n)/hⁿ is 0 as h-->0. Notice that this really looks like the definition of derivative for the case n=1. Taylor's Theorem yields "statements" like the 18^th derivative test. (?) This could say something like this: if p is a critical point, and if f''(p)=f'''(p)=...f⁽¹⁷⁾(p)=0 and if f¹⁸(p) is not 0, then f has a local max or min at p depending on the sign of f⁽¹⁸⁾(p). The reason this is true is that Taylor's Theorem for n=18 is just (because of all the ludicrous hypotheses!) f(p)+(f⁽¹⁸⁾(p)/18!)h¹⁸+Error, and the Error term is negligible compared to the term immediately before it for |h| small. I don't know if anyone ever really states such a "test" because it would hardly ever be used.

How much of all this can be carried over to more than 1 variable? Much can but some surprises develop, most particularly in the local geometry of a critical point. Students in 1 variable calc usually don't like x³ which has an inflection point at 0: the tangent line crosses the graph. An analogous occurence (the tangent plane crossing the graph) will occur very often if n>1. So let's begin.

A point p in Rⁿ is a local max for a function f if there is some positive number R so that the domain of f includes all points at distance <R from p and if x is a point in Rⁿ with ||x-p||<R then f(x)<=f(p). When n=2, "||x-p||<R" means points inside a circle of radius R centered at p. When n=3, "||x-p||<R" means points inside a sphere of radius R centered at p. For local min just change < to > in what was written. We had as examples in R⁴ functions like f(x)=x₁³⁰⁰+5x₂⁶⁰⁰+88x₃⁹⁰⁰+22x₄⁶. Here f(0,0,0,0)=0. Because of the parity (even!) of the exponents and positivity of the coefficients, f of anything not (0,0,0,0) is positive. So (0,0,0,0) must be a local min of this f. No other "work" is necessary! We can of course get a local max by reversing all the signs. But notice that worse can happen. There are 16=2⁴ choices of signs in this expression. Any of the other 14 choices of sign distribution result in the following behavior: values of f near (0,0,0,0) which are bigger than 0 and values which are less than 0. Since grad f at 0 is 0, the tangent plane is "horizontal" (parallel to the domain plane) and the graph of the function cuts through it, sort of an inflection behavior. This behavior is called a saddle point.
If f is differentiable at p, and if the gradient of f at p is not 0, then: well, some partial derivative of f at p isn't 0, so in some "slice" with all varialbes but 1 fixed, we get a function with non-zero derivative at p, and by the 1 variable analysis above, in that slice, the functin can't have a local max or min at p. So if grad f is not 0 at p, p can;'t be a local max or min.
p is a critical point of f if either f is not differentiable at p or grad f(p)=0.
Local {max|min}'s must occur at critical points. We considered some examples in just two variables. f(x,y)=sqrt(x²+y²) can be differentiated from (0,0). grad f is not 0 away from (0,0). At (0,0) the gradient does not exist (the square root is in the denominator!) (0,0) is the only critical point, and certainly f(0,0)=0 and f(everywhere else)>0 (just look at the function, don't attempt to do anything very sophisticated!). Therefore for this function, (0,0) is a local min (actually an absolute min, in fact). The graph of this function is a right circular cone, with axis of symmetry the z-axis and vertex at (0,0,0), a "corner" (it is the graph of |x| revolved about the z-axis. Then f(x,y)=+/-x²⁰⁰+/-y³⁰⁰ provides 4 more examples. This function is differentiable at every point, and the only critical point is (0,0). For +/+ the c.p. is a local min, for -/- it is a local max, and for +/- or -/+ it is a saddle point.

Now we'll try to extend a second derivative test to several variables. This will be complicated. The idea is somehow to use "second order" information at a critical point to see if the critical point is a max, a min, or a saddle. Realize, though, that the test might well "fail" -- that is, it might be inapplicable just as the 1 variable second derivative test (last line in the table above, "no conclusion can be made") can also fail.
The simplest situation is 2 variables. Things already get complicated enough. We try to "bootstrap" from what we already know: that is we will try to use the second derivative test from one variable to allow us to get information for two variables.
Here are the starting assumptions: we have a function z=f(x,y) with a critical point at (a,b). So we know that grad f=0 at (a,b). If we now "slice" the graph in R³ of z=f(x,y) by planes perpendicular to the (x,y)-plane which go through the point (a,b), in each case we get a curve which has a critical point above (a,b). Let me try to describe these slices. Choose a two-dimensional direction: that is, a two-dimensional unit vector. We're lucky that such vectors have a simple description: u=cos(theta)i+sin(theta)j. A two-dimensional straight line through (a,b) in the direction of u is just (a+cos)theta)t,b+sin(theta)t). And f's values on that straight line are exactly f(a+cos(theta)t,b+sin(theta)t). What's the derivative of this function with respect to t? We will compute this, but first some discussion of it: this is the directional derivative of f in the u direction. It is also the derivative of the curve obtained by slicing the graph z=f(x,y) by the plane through (a,b,0), perpendicular to the (x,y)-plane, in the direction of u. So now:

(d/dt)f(a+cos(theta)t,b+sin(theta)t)=
D₁f(a+cos(theta)t,b+sin(theta)t)cos(theta)
+D₂f(a+cos(theta)t,b+sin(theta)t)sin(theta).

This computation uses the Chain Rule. It is, of course, exactly D_uf, which was defined last time. Since (a,b) is a critical point, both D₁f and D₂f are 0 at (a,b). So the first directional derivative is 0. What we now need to do is take the t derivative again. We must be careful. Each of D₁f and D₂f are functions of 2 variables, and that each of the two entries in each of these two functions has some dependence on t: the Chain Rule needs to be used with care. So if we d/dt what's above (in this computation theta and a and b are constant), we will get:

d/dt(D₁f(a+cos(theta)t,b+sin(theta)t)cos(theta)
+D₂f(a+cos(theta)t,b+sin(theta)t)sin(theta))=
D₁D₁f(a+cos(theta)t,b+sin(theta)t)(cos(theta))²
+D₂D₁f(a+cos(theta)t,b+sin(theta)t)(cos(theta)sin(theta))
+D₁D₂f(a+cos(theta)t,b+sin(theta)t)(sin(theta)cos(theta))
+D₂D₂f(a+cos(theta)t,b+sin(theta)t)(sin(theta))²

and now "plug in" t=0. Remember that cross-partials are equal (f_xy=f_yx) and call A=f_xx(a,b) and B=f_xy(a,b) and C=f_yy(a,b). Then what we have is

A(cos(theta))²+2Bcos(theta)sin(theta)+C(sin(theta))²

We could apply the second derivative test in one variable to this if we knew appropriate information about every slice. Thus we could conclude that the critical point (a,b) was a minimum if for every theta, the expression above was positive. If theta=0 this means that A=f_xx(a,b) should be positive. But even more needs to be true. Suppose we divide the expression above by (cos(theta))² and we call w=tan(theta) (this is weird!). Then what is written above is just A+2Bw+Cw² and we would like to know what conditions on A and B and C would imply that this is always positive. We did something like this in the lecture 9/9. This positivity will occur exactly when the quadratic has no real reals. So (2B)²-4AC should be negative. (The parabola A+2Bw+Cw² will then never intersect the w-axis and since A>0 it must always be positive.!)
This somewhat subtle reasoning leads to what is called the Second Derivative Test for functions of two variables. First, define the Hessian, H, to be

      | f_xx  f_xy |
H=det |        |
      | f_xy  f_yy |

and suppose that (a,b) is a critical point of f.

If	and if	then
H(a,b)>0	f_xx(a,b)>0	p is a local max.
H(a,b)>0	f_xx(a,b)<0	p is a local min.
H(a,b)<0		p is a saddle point.
H(a,b)=0		no conclusion can be made.

This statement needs examples. But it does display the local "structure" of a critical point a bit. (I very briefly looked at f(x,y)=xy.) Please do some problems in section 14.7.
This should give you some idea of the complexity of critical point behavior in more than 1 variable. Although much is known already, this is actually an object of current research.

10/9/2002

I finally got around to something I should have discussed several lectures ago. First, a mean thing done to Maple: I typed z:=x^2*arctan(y*exp(y)); and then I asked Maple to compute diff(diff(z,y$30),x$3); which is the 30^th derivative with respect to y of z followed by 3 derivatives with respect to x. The result was 0, and the computation time needed was 5.539 seconds on the PC in my office. Of course what Maple did was compute 30 derivatives of a complicated function, and these derivatives get very big in terms of data (this is called "expression swell"), and take time and storage space to manipulate. Then there were 3 x derivatives, and the result was 0. On the other hand, when I instructed Maple to do the x derivatives first and then the y derivatives, the program reported no elapsed time used (that means, less than a thousandth of a second).
What's going on? The result needed here is called in your text Clairaut's Theorem, and loosely states that the order of taking "mixed" partial derivatives (derivatives with respect to different variables) doesn't matter. More precisely, the result is: if f is a function of two variables, x and y, and if both f_xy and f_yx exist and are continuous, then they must be equal. I tried to motivate this with a brief allusion to spreadsheets (!) and then remarked that an actual proof is in appendix F of the text, and uses the Mean Value Theorem 4 times, in much the same manner as we've already seen (the lecture on 10/3). I didn't have the time or desire to give the proof.
Here, though, is an example to show that some hypotheses are needed to guarantee the equality of the mixed partials:

A function f with f_xy NOT equal to f_yx

PDF Picture

The main topic of today's class was flying a rocket ship. Suppose we have a rocket ship flying through a nebula. The nebula is a cloud of matter, with qualities such as pressure and temperature. So I decided to consider the temperature. What contributes to the temperature of the nebula as perceived by the crew of the rocket ship? An analysis of this question mathematically using the tools we already have is possible if we describe everything algebraically. So suppose the flight path of the rocket ship is given as a parametric curve. That is, x=x(t), y=y(t), and z=z(t). This is the same as giving a position vector, R(t)=x(t)i+y(t)j+z(t)k. We know about this, and about the first and second derivatives of position (which are velocity and acceleration, respectively). The temperature depends on the location in the nebula, so that is a function T of (x,y,z), the coordinates of a point in the nebula: usually we write T(x,y,z).
The temperature of the point in the nebula outside the rocket ship at time t is T(x(t),y(t), z(t)). How does the temperature change? This is, of course, (d/dt)T=T_x(dx/dt)+T_y(dy/dt)+T_z(dz/dt). Some contemplation of this formula allows a certain "decoupling" to take place, that is, a separation of the influence of the rocket ship and of the temperature. We recognize that the velocity vector is V(t)=(dx/dt)i+(dy/dt)j+(dz/dt)k. And then we see that (d/dt)T is really a dot product of V(t) with another vector, namely T_xi+T_yj+T_zk. This vector is called the gradient of T, written sometimes as (upside down triangle)T and sometimes called grad T. I'll call it grad T here because of typographical limitations.
There were some questions about grad T, so I invented a rather simple example, something like the following: if T(x,y,z)=3x²+4xy³+z⁴, then T(2,-1,1)=5. And grad T=6xi+12xy²+4z³. At p=(2,-1,2), grad T is 12i+24j+4k. grad T is a vector function of position, while T itself is a scalar function. Then (d/dt)T turns out to be the dot product of grad T with V(t).
Observation 1 We have separated the effects of temperature and the rocket. In fact, all that matters in the computation of (d/dt)T from the rocket is the velocity. If two rocket ships go through the same point with the same velocities, then (d/dt)T will be the same. All that matters is the velocity: the tangent direction to the flight path (a curve) and the speed (the lenght of V).
Observation 2 How can we make (d/dt)T big? If it isn't 0, we could always make the rocket ship move faster. If we increase the speed by a factor of M>0, then (d/dt)T will multiply by M. This is because (d/dt)T is ||V|| ||grad T|| cos(angle between V and grad T). Multiplication of V by M results in a multiplication of ||V|| by M, and nothing else changes. So (d/dt)T gets bigger.
Observation 3 But really the answer to the previous question was a bit silly. We should try to look at different directions. What direction will cause the derivative to be larger? So we defined the directional derivative of T with respect to a unit vector u: D_uT. This was the rate of change of T with respect to a "rocket ship" moving with unit speed in the direction of u, and it is just u· grad T. Since this is ||u|| ||grad T|| cos(angle between them) and we've restricted ||u|| to be a unit vector, we see the only ingredient we've got to vary is the angle. Indeed, immediately we have: The directional derivative is always between -||grad T|| and ||grad T||. This is because cosine's values are in [-1,1]. More is true, when grad T is not the 0 vector. The unique unit vector maximizing the directional derivative is (grad T)/||grad T||. The unique unit vector minimizing the directional derivative is -(grad T)/||grad T||. This turns out to have many computational implications.
I went back to the example and found the directional derivative of that T in various directions, including the maximizing direction and the minimizing one.
Observation 4 Isothermals are collections of points where the temperature is constant. If T(2,-1,1) is 5, then (2,-1,1) is in the isothermal (the level set) associated to the temperature 5. If the rocket ship flies in an isothermal, then the rate of change of the temperature perceived by the rocket ship is 0. So (d/dt)T=0 and by the decoupling we've already seen, that means V(t)·grad T=0. So the velocity vector of such a flight is always perpendicular (normal) to the gradient vector. From this evidence and the knowledge that grad T at (2,-1,1) is 12i+24j+4k we were able to deduce that the plane tangent to the surface T(x,y,z)=9 is 12(x-2)+24(z+1)+4(z-1)=0. Directions normal to the gradient have zero first order change.
Here is a command to graph a chunk of the surface T(x,y,z)=5 and store the graph in the variable A:
A:=implicitplot3d(3*x^2+4*x*y^3+z^4=5,x=1..3,y=-2..0,z=0..2, grid=[40,40,40],axes=normal,color=green):
Here is a command to graph and store as a variable B a piece of the candidate for the tangent plane which we computed:
B:=implicitplot3d(8*(x-2)+24*(y+1)+4*(z-1)=0,x=1..3,y=-2..0,z=0..2, grid=[40,40,40],axes=normal,color=red):
And finally the command display3d(A,B) displays both graphs together. The result is shown below. Note that the tangent plane actually cuts through the surface (a saddle-more about this next time!).

Then I discussed how people might want to computationally maximize ("hill climbing") or minimize ("method of steepest descent") functions of many variables. To increase (respectively decrease) a function W of many variables, compute grad W, and move in the direction of grad W (respectively - grad W) "for a while". Then repeat (because the direction of grad W will likely change). There are many technicalities in all this: how big are the steps to move, and when should one terminate the procedure. Answers to these questions may depend on the specific nature of the functions and the situations. I looked at a specific function of 4 variables (I think it was p²exp(q-rs) and computed its (4-dimensional) gradient at (2,1,-2,1).
I sketched some level curves for the function 10-2x²-y² and related the gradient of the function to the level curves: again the gradient is perpendicular to the level curves, and points in the direction of increasing function value.
I gave out and discussed the review problems.

10/7/2002 We discussed the chain rule using some examples of functions with given data points. What I wanted people to get out of this was a general philosophy that differentiation somehow measures first-order or linear effects. Therefore when computing the derivative or partial derivative, "kick" the designated variable and follow through the change using the various definitions of derviative and partial derivative. At the end, examine the compounded change and identify the first-order effect: that's the desired derivative.
Each of the questions on the data sheet involved what seemed to be a slightly different type of composition. They can all be analyzed in the same way, though.
For the club example, the chain rule is S'(t)=h'(k(t))k'(t). For diamond, the equation W'(t)=f_x(h(t),k(t))h'(t)+f_y(h(t),k(t))k'(t) applies. And heart is a consequence of Q_x(x,y)=f_x(h(x),g(x,y))h'(x)+f_y((h(x),g(x,y))g_x(x,y) and Q_y(x,y)=f_y((h(x),g(x,y))g_y(x,y). There is the potential for great confusion in the varied appearances of x and y, and in where the derivatives are evaluated. Finally, spade follows from C'(t)=h'(f(t,2-3t))(f_x(t,2-3t)+f_y(t,2-3t)(-3)).
Then I tried to indicate how the chain rule can result in results that are notationally irritating when working with implicit functions. These examples are discussed in the text. If F(x,y)=Constant implicitly defines y as a function of x, then applying d/dx to this equation gives (F_x)+(F_y)(dy/dx)=0 or dy/dx=-F_x/F_y where certainly the idea of treating derivatives like fractions doesn't make sense! And in 3 variables the situation gets even worse. If F(x,y,z)=Constant implicitly defines z as a function of x and y, then the partial derivative of z as a function of x (with y held constant) is -F_x/F_z. Symmetrically working with each variable in turn we see that (z_x)(x_y)(y_z)=-1.
Then I gave out workshop #5 and it was noted that the due date was wrong. This has since been corrected on the web.

10/3/2002

A function of one variable, f(x), is differentiable if f(x+h) can be written in the following way: f(x+h)=f(x)+Ah+Error(f,x,h)h where: 1) A is a number not depending on h but possibly depending on x and f (A is called f'(x) usually); 2) Error(f,x,h)-->0 as h-->0.
Parenthetical remark: I mentioned that if we knew a function of 1 variable was equal to its Taylor series, then the error term could be written as a sum of an infinite tail of the series, so that's what "Error" could be. Of course, many functions don't have a Taylor series and/or don't have a readily computable Taylor series, so the value of that observation is unclear!
I want to define something similar for functions of two variables. Preliminarily, f(x,y) will be differentiable if f(x+h,y+k) can be written in the following way: f(x+h,y+k)=f(x,y)+Ah+Bk+Error₁(f,x,y,h,k)h+Error₂(f,x,y,h,k)k where Error₁ and Error₂ both -->0 as (h,k)-->(0,0).
This is a complicated statement and needs some investigation. First, the equation is true for any selections of h and k. So we can look at any special values we care to. For example, if k=0 and h is not 0, we can rewrite the equation to become ((f(x+h,y)-f(x,y))/h)=A+Error₁. But as h-->0, the right-hand side-->A (since the Error terms go to 0). That means the left-hand side has a limit. So if f is differentiable, then f_x exists and equals A. Similarly (set h=0 and let k-->0) if f is differentiable, then f_y exists and is B. Therefore, If f is differentiable, the partial derivatives of f exist. The converse of that statement is generally not true.
The converse of a simple implication reverses the hypothesis and the conclusion. For example, the converse of "If Fred is a frog, then Fred hops" is "If Fred hops, then Fred is a frog." Even this simple a statement shows that a converse need not be true.
The function we looked at last time, defined by f(x,y)=(xy)/sqrt(x²+y²) if (x,y) is not (0,0) and by 0 if x and y are both 0 has some interesting properties. First, f has partial derivatives at every point. Why? Away from (0,0), the partial derivatives can be computed by the standard algorithms of 1-variable calculus. At (0,0), we notice that f is 0 on both the x and y axes, so the partials both exist and are 0. But consider f(w,w) where w is a small positive number. Direct computation in the formula shows that f(w,w,) is (1/sqrt(2))w. But if f were differentiable at (0,0) the wonderfully complicated definition above would apply. The right-hand side of the formula is f(x,y)+f_x(0,0)h+f_y(0,0)k+Error₁(f,x,y,h,k)h+Error₂(f,x,y,h,k)k but f(0,0) and f_x(0,0) and f_y(0,0) are all 0. So f(w,w)=Error₁(f,x,y,w,w)w+Error₂(f,x,y,w,w)w. If we remember that f(w,w) is (1/sqrt(2))w and divide the equation by w, we get (1/sqrt(2))=Error₁+Error₂ and both of the errors-->0 as w-->0. So the right-hand side-->0 but the left-hand side does not. This rather intricate contradiction shows that this f is not differentiable at (0,0).
Several things still may not (probably are not!) clear to a student right now. What are the reasons for defining "differentiable" in this intricate manner? How could one check that specific functions defined by formulas are differentiable?
Here is one answer to the second question. The first will be answered several times during the rest of this course. Suppose we want to compare f(x+h, y+k) and f(x,y). Then consider: f(x+h,y+k)-f(x,y)=f(x+h,y+k)+0-f(x,y)=f(x+h,y+k)-f(x,y+k)+f(x,y+k)-f(x,y). We have again written a 2-variable change as a succession of 1-variable changes.
The 1-variable Mean Value Theorem shows that f(x,y+k)-f(x,y)=f_y(x,y+beta_k)k where |beta_k|<|k|. Also f(x+h,y+k)-f(x,y+k)=f_x(x+alpha_h,y+k)k where |alpha_h|<|h|. Here's the big hypothesis coming: if the partial derivatives are continuous, then the difference between f_y(x,y+beta_k) and f_y(x,y) approaches 0 as k-->0. And the difference between f_x(x+alpha_h,y+k) and f_x(x,y) also goes to 0 as (h,k)-->(0,0). These differences both get put into the error terms, and so we see If the partial derivatives are continuous, then the function is differentiable. The contrast between differentiability and partial derivatives doesn't really have a good analogy in 1 variable: it seemed new to me when I first saw it and I needed time and effort to understand it. There are some further examples to illustrate the complicated logical relationships from a calculus course at MIT:

A function f so that f_x and f_y exist everywhere, but f is NOT differentiable	PDF Picture
A function f which is differentiable although f_x and f_y are NOT continuous	PDF Picture

In this course, almost all the functions will be defined by formulas and the formulas can be differentiated, and, inside usually easily defined domains, the partial derivatives will be continuous, so the functions will be differentiable.
Then I went through problem #19 of section 14.4, a rather straightforward application of the linear approximation (that's the constant term plus the first degree terms) to an incremented value of the function. The numbers worked out nicely, and, in this example, the errors were fairly small.
I concluded by finding a tangent plane to a graph, I think the graph of z=5x²+2y⁴, a sort of a cup, at the point (2,1,22). I did this by examining the section of the graph where y=1. This gives a curve with equation z=5x²+2 whose derivative at (2,22) was 10x, or 20. Therefore 1i+0j+20k was tangent to the surface at (2,1,22). A similar analysis when x=2 shows that 0i+1j+8k was also tangent to the surface at that point. Therefore the cross product of these vectors would be perpendicular to the tangent plane. We computed the cross-product and it was -20i-8j+k (generally it will be -f_xi-f_yj+k) and the plane went through the point (2,1,22), so the desired tangent line is -20(x-2)-8(y-1)+(z-22)=0.
I remarked that on an exam I would rather not have such an answer "simplified" which brought up the question of Exam conditions and schedule

I'd like to give the first exam after we finish 14.5, 14.6, and 14.7, as planned on the schedule. I would also give out review material. So I tentatively would like to give the exam on Wednesday, October 16.
I would like to eliminate time pressure on the exam, so I will try to get a room we can stay in for more than the standard period (more than 4:30--5:50). I will not use this as a reason to make an exam long or excessively difficult.
I will try to schedule a review session on Tuesday evening, October 15.
My feeling right now is that calculator use should be minimal on an exam in this course. I would like to restrict the use of calculators to the last 20 minutes of an exam.
Some of the formulas may seem intricate to you (to me, too!). I would be happy to write a formula sheet which would be attached to the exam. Please let me know what formulas that you believe you will need.

10/2/2002

Matthew Gurkovich presented a solution to problem 3 in workshop 3. I thank him for this. I think the most difficult part of this problem is recognizing that the letter "t" serves two different purposes in the problem.
I analyzed the idea of differentiability for functions of 1 variable. What is the derivative? Some ideas are i) rate of change, ii) slope of the tangent line to the graph of the function, iii) a certain limit, and iv) velocity or acceleration. And certainly there are others.
As a limit, f'(x)=lim_h-->0(f(x+h)-f(x))/h. Of course this limit is paradoxical contrast to the basic "limit reason" for looking at continuous functions: those limits can be evaluated just by "plugging in", and here such an approach results in the forbidden 0/0.
Removal of the "lim_h-->" prefix to the defining equation above yields a statement which is generally false: there is an error involved. Also the division and minus signs are a complicating feature. So we transform the equation above into the following: f(x+h)=f(x)+f'(x)h+Error(f,x,h)h where Error(f,x,h) is something (possibly [probably!]) very complicated depending on f and x and h with the property that (with f and x held fixed) that Error(f,x,h)-->0 as h-->0. With perspective that further study of calculus gives, we know that if f is equal to its Taylor series, then the Error(f,x,h)h is just a sum of terms involving powers of h higher than 1, so at least one power of h can be factored out. The result is something that goes to 0 "faster" than h: order higher than first order. So a perturbation in the argument, x, to a function f(x), that is, x changes to x+h (h may be either positive or negative) yields f(x)+f'(x)h+Error(f,x,h)h. The first term is the old valuye, the second term is directly proportional to the change, h, with constant of proportionality f'(x), while the third term -->0 with more than first order in h.
In particular, this shows easily that if h-->0 then f(x+h)-->f(x), so differentiable functions (which are those having such a "decomposition") must be continuous. But more than continuity is involved. The simple example

      2x if x>0
f(x)= 0 if x=0
     (-1/3)x if x<0

shows that the same multiplier is needed on both sides of 0: 2 can't be deformed into -1/3 by concealing things in the error term. So this function is not differentiable.
The aim is to define a similar "good" decomposition for functions of more than 1 variable: (old value)+(linear or first order change)+(error of higher order) Now I looked at 2 variable functions. I defined f_x, the partial derivative with respect to x, as the limit as h-->0 of (f(x+h,y)-f(x,y))/h, and f_y, the partial derivative with respect to y, as the limit as k-->0 of (f(x,y+k)-f(x,y))/k, if either or both of these limits exist. The choice of letters (k and h) are conventional, and of course we could use anything.
We computed some partial derivatives for functions which looked like 5x²y⁵ and xe^xy² and arctan(y/x). The routine differentiation algorithms (rules for derivatives) work here.
More suspicious examples were considered. The first is f(x,y)=0 if xy=0 and 1 otherwise. This function had the following properties: away from the coordinate axes (where xy=0) both f_x and f_y are 0. On the y-axis, f_y is 0 and on the x-axis, f_x is 0. At (0,0), both f_x and f_y exist and are 0. The limits for the other partial derivatives don't exist. But notice that f(0+h,0+k) compared to f(0,0)+f_x(0,0)h+f_y(0,0)k+ERROR where somehow ERROR should go to 0 faster than first order. But that means f(h,k)=(three zero terms)+ERROR. Since if both h and k are small positive numbers, we get 1=ERROR, apparently the decomposition is impossible! So (when the correct definition is given!) this f has partial derivatives at (0,0) but is not differentiable at (0,0)!
Then I looked at the function f(x,y)=r cos theta sin theta with x and y in polar coordinates. A more mysterious formula for f(x,y) results if we use x=r cos theta and y=r sin theta: f(x,y)=(xy)/sqrt(x²+y²). The graph of this function includes the x and y axes: f's values are zero there. But then the partial derivatives of f at (0,0) both must exist and are 0. But f(w,w)=.5w, so the change along the line y=x is certainly first order, but cannot be accounted for in the formula f(x+h,y+k)=f(x,y)+Ah+Bk+ERROR. If we take k=0, then a limit manipulation shows that A must be f_x(x,y) and B must be f_y(x,y). For this function, both partial derivatives exist at all points, and both are 0 at (0,0). Therefore f(0+h,0+k)=0+0+0+ERROR. But f(w,w) is .5w=ERROR, and the right-hand side is higher than first order as w-->0 and the left-hand side is not. More will follow about this tomorrow.
Here's the result of the Maple command plot3d((x*y)/sqrt(x^2+y^2),x=-3..3,y=-3..3,grid=[30,30],axes=normal,color=pink);

and maybe the picture will help you understand the properties of the function. The colors seem fairly unreliable!

9/30/2002

I began by considering how continuity is defined. In one variable, a function f is continuous at x₀ if lim_x-->x₀ f(x) exists and equals f(x₀). Combining this with the official definition of limit given last time, we see:

Definition of continuity

A function f is continuous at x₀ if, given any eps>0, there is a delta>0 so that
if |x-x₀|<delta, then |f(x)-f(x₀)|<eps.

The definition says that limits can be evaluated in the simplest possible fashion, just by "plugging in". It is an important definition which I wanted to work with.
First we looked at the function n(x,y), defined last time. I asked where n was continuous: that is, for which (x₀,y₀) does the limit of n(x,y) as (x,y)-->(x₀,y₀) exist and is it equal to n(x₀,y₀)? First we approached the question "emotionally": where is n continuous? After some discussion, it was decided that n would be continuous "off" the parabola, that is, for (x₀,y₀) where y₀ is not equal to (x₀)². Away from the parabola, the graph of the function is quite flat (always 0). So if (x,y) is close enough, then n(x,y) is really 0 around (x₀,y₀). So it should be continuous. If (x₀,y₀) is on the parabola, though, the limit won't exist.
I then said that I wanted to work with the definition, and verify that: n(x,y) is continuous at (0,1) and n(x,y) is not continuous at (0,0).
How to verify that n(x,y) is continuous at (0,1): if we take delta to be, say, 1/2, then ||(a,b)-(0,1)||<1/2 means that (a,b) is not on the parabola (the 1/2 is actually chosen for this reason!), so that n(a,b)=0 and n(0,1)=0 also, and therefore |n(a,b)-n(0,1)|<eps for any positive eps.
The choice of delta here is rather easy and almost straightforward. In general, the choice of delta likely will depend on (x₀,y₀) and on eps.
Then we went on to try to show that n is not continuous at (0,0). Here we need to verify the negation of the continuity statement. Negations of complicated logical statements can be quite annoying to state. In this case, we need to do the following:

Negation of continuity

There is at least one eps>0 so that for all delta>0 there is an (a,b) in R² with ||(a,b)-(x₀,y₀)||<delta and |n(a,b)-n(x₀,y₀)|>eps.

Here (x₀,y₀) is (0,0) and n(0,0)=1. The values of n(a,b) are either 0 or 1 (n is actually a rather simple function!). So we guess that a useful eps to try will be 1/2. Then to get |n(a,b)-n(x₀,y₀)| at least 1/2 we'd better have n(a,b) equal to 0. That means (a,b) should be off the parabola. So we need (a,b) off the parabola and also within distance delta of (0,0). The suggestion was made that we take (a,b) to be (0,delta/2), and this does work. Note that the (a,b) varies with the delta.
What are the simplest functions usually considered to be continuous> Thank goodness the suggestion was made that polynomials are continuous because I had prepared an analysis of the continuity of the polynomial f(x,y,z)=x²y²-4yz. We expect that this polynomial (and, indeed, all polynomials) are actually continuous at every point. So the limits should be evaluated just by "plugging in". In fact, the limit of f(x,y,z) as (x,y,z)-->(3,1,2) should just be f(3,1,2) which is 1.
I verified the definition for this limit statement. The verification used entirely elementary methods, but was quite intricate is spite of that.
We looked at: |f(x,y,z)-1|=|x²y²-4yz-1|=|(x²y²-4yz)-(3²1²-4·1·2)|. Then the triangle inequality was used, so that the last term is less than or equal to |x²y²-3²1²|+|-4yz+4·1·2|. The second of these expressions seems a bit easier to handle, so: |-4yz+4·1·2|<|4|·|yz-1·2|. Here I tried to suggest that too much was changing. This is handled by making the difference equal to several one variable differences. So: |4|·|yz-1·2|=4|yz + 0 -1·2|= 4|yz-y2+y2-1·2|=4(|(yz-y2)+(y2-1·2)|). And again split up by the triangle inequality: 4|yz-y2|+4|y2-1·2|. The second term looks easiest to handle.
Get a "bound" on 4|y2-1·2|=8|y-1| by making |y-1| sufficiently small. Since we had split up the original difference into two parts and then split the part we were considering into two parts, I guessed that it would be good enough to make this less than eps/4. So we would need |y-1|<eps/32.
Now we considered 4|yz-y2|=4|y|·|z-2|. To get this less than eps/4 we would make |z-2| small. But to control the product we needed to control the size of |y|. Well, if (pulled out of the air!) |y-1|<1 then I knew that 0<y<2, so |y|<2. Therefore 4|y|·|z-2|<4·2·|z-2|. This will be less than eps/4 if |z-2|<eps/32. As I mentioned in class, the coincidence of the 32's made me uneasy.
Now we are half done. We still need to estimate the difference: |x²y²-3²1²|. Again we write it as a succession of differences of one variable: |x²y²-3²1²|=|x²y²-x²1²+x²1²-3²1²|. And again the triangle inequality leaves us with estimation of two pieces:|x²y²-x²1²| and |x²1²-3²1²|.
We do the second part first. |x²1²-3²1²|=|x²-3²|=|x+3|·|x-3|. If, say, |x-3|<1 then x is between 2 and 4 so |x+3| is between 5 and 7, and therefore |x²-3²|=|x+3|·|x-3|<7|x-3|. This will be less than eps/4 if we require |x-3|<eps/28. (Somehow the numbers came out differently in class!)
The final piece to handle is |x²y²-x²1²|=|x²|·|y-1|. Since we have already controlled |x| (it is less than 4) we know that |x²| is less than 16. Therefore |x²|·|y-1|<16|y-1|. This will be less than eps/4 if we require |y-1|<eps/64
If we collect all the restrictions on the variables, we see that the implication "if ||(x,y,z,)-(3,1,2)||<delta then |f(x,y,z)-f(3,1,2)|<eps" will be true when delta is chosen to be the minimum of all the blue restrictions. Therefore choose delta to be the minimum of 1, eps/32, eps/28, and eps/64. The technique I outlined is a bit painful, but it does work and it is "elementary".
A picture of what f(x,y,z) looks like doesn't seem to help very much. Here, for example, are three views of the output of the Maple command implicitplot3d(x^2*y^2-4*y*z=1,x=1..8,y=-1..3,z=-2..4,grid=[20,20,20], color=green,axes=normal);. The procedure implicitplot3d is loaded by with(plots); and plots implicitly defined surfaces, just as implicitplot itself plots implicitly defined curves. The option grid=[20,20,20] alters the sampling rate Maple uses. The default is [10,10,10], which makes quite a rough picture. On the other hand, one can ask for [50,50,50] which will take about 125 (5³) as much time as the default. I just experimented at home. Sketching a sphere with the default grid took .070 seconds, and the [50,50,50] grid took 7.719 seconds. Indeed: in practical applications, the tradeoff between time and picture detail can be interesting. Pictures of f(x,y,z)=1=f(3,1,2)


x-axis pointing "out"	y-axis pointing "out"	z-axis pointing "out"

I then discussed workload in the course.

I expected that students would spend 10 to 12 hours per week outside of class on the course, doing workshop problems and the textbook homework problems.
The workshop problems should be done neatly, with the pages fastened (stapled or with paperclips), with complete English sentences, with details of computations only indicated and not given.
Students could hand in do-overs of one workshop problem per workshop. These writeups would need to be done individually. Other students could be consulted, but the writeups themselves would need to be done individually. These redone workshop problems would be due on Wednesday, October 2.
I invited students to give oral presentations (5 points per problem bonus) of problems 3 and 4 and 5 of workshop #3, at most 5 minutes per problem, at the beginning of class on Wednesday. Please send me e-mail if you want to do this. I hope that by the end of the semester every student would have presented at least one problem to the class.

9/26/2002

The instructor rudely began by filling some boards with remarks about limits and continuity in 1 dimension. Consider f(x)=x². A sketch was drawn. What happens to f(x) as x-->4? Clearly lim_x-->4x²=16. What does this mean? This is a limit statement, and the official definition of limit is as follows:

lim_x-->af(x)=b means given any eps>0 there is a delta>0 so that
if 0<|x-a|<delta, then |f(x)-b|<eps.
(This is in bold because it is important in the history and theory of the subject!)

Here due to the limitations of html, "eps" will be written in place of the usually used Greek letter "epsilon" and "delta", in place of the usually used Greek letter "delta".
Here this means: given eps>0, find delta>0 so that if |x-4|<delta, then |x²-4|<eps. In order to verify that this implication is correct, some connection between |x-4| and |x²-16|. But in fact |x²-16|=|x-4| |x+4|. In order to really be convinced that this is true, we need to show that when |x-4| is small, |x+4| is controlled. That is, it does no good to verify that a product of two factors is small by showing that one factor of a product is small, if the size of the other product can't be controlled.
Here is a small "computation": if |x-4|<1, then -1<x-4<1, so 3<x<5 so that 7<x+4<9 and consequently |x+4|<9. Now we can do the proof.

The official proof

Suppose eps>0. Take delta to be the smaller of 1 and eps/9. Then if |x-4|<delta, |x²-16|=|x-4| |x+4|<(eps/9)9. The inequality |x-4|<eps/9 follows from the definition of delta. The inequality |x+4|<9 follows from the other part of the definition of delta followed by the "computation" above. Therefore |x²-16|<eps, and the limit statement is proved.

The relevance of the official definition of limit to real people in real life is maybe not too clear. First, revealing the official definition is an effort to encourage people not to interpret the limit statement as just "plugging in" a for x in a formula for f(x) (that's what we like to do, and in fact we do it for well-behaved functions -- exactly the continuous functions. The other observation is that the eps-delta connection is relevant in more detailed analyses of functions, where one tries to relate the "output tolerance" for an error (how close to f(b) are we?) to the input tolerance for error (how close to a need we be to produce at most an appropriate error in the output?).
Then I began analyzing functions in R². We began by looking at f(x,y)=x²+y². I drew a graph of this: the graph was a collection of points in R³. I also commented on the contour lines. I strongly recommended the Maple procedures plot3d and contourplot and contourplot3d. You need to type the command with(plots); before using these procedures.

Maple command followed by a picture of its output

plot3d(x^2+y^2,x=-2..2,y=-2..2,axes=normal);

contourplot(x^2+y^2,x=-2..2,y=-2..2,axes=normal,color=black,thickness=2);

contourplot3d(x^2+y^2,x=-2..2,y=-2..2,axes=normal,color=black,thickness=2);

Functions defined by such simple formulas will be continuous, so lim_{(x,y)-->(x₀,y₀)}x²+y²=(x₀)²+(y₀)² "naturally". In fact a detailed verification is much like what I just did in one variable. I would like to concentrate on aspects of limit and continuity which are somewhat new because of more than 1 variable.
We considered the function g(x,y) defined piecewise by g(x,y)=1 if (x,y) is NOT (0,0) and which is 0 if (x,y)=(0,0). The graph is a plane parallel to the (x,y)-plane, 1 unit "up" in the z direction, except for the origin, which is back at (0,0,0). I asked when the limit as (x,y)-->(x₀,y₀) existed and what the value was. Some discussion followed, and the somewhat disconcerting truth was told: the limit always exists and it always is 1. This example can be done in one variable, though.
The piecewise function h(x,y) defined by

      = 1 if x>0
h(x,y)=37 if x=0
      = 2 if x<0

has lim_{(x,y)-->(x₀,y₀)}h(x,y) existing if x is NOT 0, and for x₀>0 the limit is 1 while for x₀<0 the limit is 2.
A much more subtle example is provided by m(x,y)=(x²-y²)/(x²+y²). This function is "fine" (continuous) away from (0,0). The limit along rays through the origin varies with the ray. Along the positive and negative x-axis the limit is 1, but along the positive and negative y-axis the limit is -1. Along y=Mx, the limit is (1-M²)/(1+M²). I tried to show this surface with a demonstration in class. It is interesting to view the surface using Maple. The procedure contourplot3d gave the "best" picture for me.
I finally looked at n(x,y). This is peculiar piecewise-defined function. Its value is 1 if y=x² and 0 otherwise. It has the property that limits along any straight line through (0,0) exist, and all these limits are 0 BUT the limit as (x,y)-->(0,0) does NOT exist. I tried to explain this.
The text gives an example of a rational function: (xy²)/(x²+y⁴) (see p.890 in section 14.2) with similar properties, which maybe is harder to understand.
As a pop quiz, I asked students to create a function so that the limit as (x,y)-->(x₀,y₀) did not exist if x²+y²=1 but did exist for all other (x,y). I urged students to begin reading chapter 14.

9/25/2002

Hardly any "progress" was made. We did more and more problems from chapter 13. Attempts were made by valiant students to really get me to explain what CURVATURE and TORSION: Google reports only about 16,700 links with information about both of these.
Curvature, I tried to insist, referred to how much a curve bends. I gave another interpretation using the idea of the osculating circle. If a circle agrees "up to second order" (passes through a point, and first and second derivatives agree) with the graph of a function, then 1/(the radius of the circle) turns out to be the curvature. The circle is called the osculating circle. My online dictionary states:

1. [Math.] (of a curve or surface) have contact of at least the second
   order with; have two branches with a common tangent, with each branch
   extending in both directions of the tangent.
2. v.intr. & tr. kiss.

The osculating circle is a second order analog of a tangent line. The tangent line agrees with a curve up to first order (value and first derivative of the curve and tangent line should agree). The osculating circle does the same up to second order. So where the closest circle is small, the curve bends a lot.
Torsion is weirder. In a picture on this link an attempt is being made to show "high torsion when there is rapid departure from a plane."
In my Google search I found web pages dealing with the relationship of curvature and torsion to coronary arteries and blood flow, concrete, plasma flow, how birds and gnats and flies fly, "carbon nanotubes" (thin filaments), models of molecules, motion of robots and octopuses and cilia and flagella ... and so on. I found Maple routines for computation of curvature and torsion: lots of stuff, most of it quite technical in both its applications and its mathematics. Lots of stuff! I tried to argue that Problem #5 on Workshop #3 showed that curvature could be concealed easily. According to Einstein, "the Lord is subtle but not mean" (approximately) and that knowing that the structure of a curve means dealing with its curvature and torsion sometimes may make life easier. Problem #5 has exquisitely disguised simple curves: in a), a circle, and in b), a straight line.
The next few weeks would see an effort to analyze functions whose domain is in R² or R³ or Rⁿ and whose range was R. We will look at the concepts of limit, continuity, and derivative, and try to understand the real conceptual subtleties which occur with such functions.

9/23/2002 A valiant and not completely successful attempt to review all questions students had about textbook homework problems for most of the first two chapters. I'll need to spend time doing a few more problems on Wednesday.
I mentioned that one reason to consider an abstract version of vectors and inner products and lengths is that strong results involving other important examples can be learned. I suggested the following setup: a vector would correspond to a function on [0,1]. Vector addition and scalar multiplication would correspond to addition of functions and multiplication of functions by a constant. The dot product of two functions f and g would be defined by the integral from 0 to 1 of f(x) times g(x), so that the "length" of f would be the square root of the integral of f(x)^2 from 0 to 1. First, all the results we have proved about lengths and dot product remain correct. For example, the integral from 0 to 1 of exp(x)sin(x) will be bounded by the square root of the integral of exp(x)^2 multiplied by the square root of the integral of sin(x)^2. (Maple tells me that the first one is approximately .909 while the second is approximately .933.) So we have been "efficient" in learning how to organize our thoughts. Second, it turns out that this method of measuring the "size" of functions is essentially the same as the method of least squares, a widely used technique for estimating errors.
I just learned today that there is a web page with detailed solutions for many of the odd-numbered problems in the textbook. Sigh. You may want to look at www.hotmath.org

9/19/2002

We began with a problem for students: compute the curvature of the plane curve defined by

x(t)=integral from 0 to t cos(w^2/2)dw
y(t)=integral from 0 to t sin(w^2/2)dw

Most students were able to successfully see that this curve had curvature k=t, curvature which increased directly proportionately with travel along the curve. The integrals involved are called Fresnel integrals, and the curve resulting is called the Cornu spiral. The curve (and the integrals) arise in diffraction, and one link with a Java applet illustrating this is given. The spiral winds more and more tightly as the parameter increases.
Today is devoted to an investigation of space curves. The geometry of these curves, as seen from the point of view of calculus (called "differential geometry of space curves") is a subject which originated in the 1800's. The material presented here was stated in about 1850-1870. It has within the last few decades become very useful in a number of applications: robotics, material science (structure of fibers), and biochemistry (the geometry of big molecules such as DNA).
I'll carry along is a right circular helix as a basic example.

x(t)=a cos(t)
y(t)=a sin(t)
z(t)=b t

The quantities a and b are supposed to be positive real numbers. This helix has the z-axis as axis of symmetry. It lies "above" the circle with radius a and center (0,0) in the (x,y)-plane. The distance between two loops of the helix is 2Pi b.
If r(t)=x(t)i+y(t)j+z(t)k (the position vector), then r'(t)=x'(t)i+y'(t)j+z'(t)k=(ds/dt)T(t) is called the velocity vector. Here T(t) is called the unit tangent vector and is a unit vector in the direction of r'(t). ds/dt is the speed, and is sqrt(x'(t)²+y'(t)²+z'(t)²), the length of r'(t). We use ds/dt also to convert derivatives with respect to t to derivatives with respect to s, as last time (the Chain Rule).
Since T(t)·T(t)=1 differentiation together with commutativity of dot product gives 2T'(t)·T(t)=0, so T'(t) and T(t) are perpendicular. In fact, we are interested in dT/ds, which is the same as (1/(ds/dt))T'(t) (it is usually easier to compute T'(t) directly, however, and "compensate" by multiplying by the factor 1/(ds/dt)). Any non-zero(!) vector is equal to the product of its magnitude times a unit vector in its direction. For dT/ds, the magnitude is defined to be the curvature, and the unit vector is defined to be the unit normal N(t). This essentially coincides with what was done last time, when curvature was defined to be d(theta)/ds but, as a student remarked and I tried uncomfortably to acknowledge, there could be problems if dT/ds is zero or if d(theta)/ds was negative (example: look at how T and N change for y=x³ as x goes from less than 0 to greater than 0).
For the helix, we computed ds/dt (sqrt(a²+b²)) and T(t) (1/sqrt(a²+b²))(-a sint(t)i +a cos(t)j +bk) and also N(t) (-cos(t)i-sin(t)j, always pointing directly towards the axis of symmetry) and k, which was a/(a²+b²). I strongly suggested "checking" this computation by looking at what the formula "says" when a and b are large and small, and comparing this to the curves. We "complete" T and N to what is called a 3-dimensional frame by defining the binormal B(t) to be the cross-product of T(t) and N(t). Since T(t) and N(t) are orthogonal unit vectors, B(t) is a unit vector orthogonal to both of them. (This needs some thinking about, using properties of cross-product!). How does B(t) change? Since B(t)·B(t)=1, differentiation results in 2 B'(t)·B(t)=0, so B'(t) is orthogonal to B(t). But differentiation of B(t)=T(t)xN(T) results in B'(t)=T'(t)xN(t)+T(t)xN'(t). Since T'(t) is parallel to N(t), the first product is 0 (another property of cross-product!) so that B'(t) is a cross-product of T(t) with something. Therefore B'(t) is also perpendicular to T(t). Well: B'(t) is perpendicular to both T(t) and B(t), and therefore, since only one direction is left, B'(t) must be a scalar multiple of N(t). The final important definition here for space curves is: dB/ds is a product of a scalar and N(t). The scalar is - t. That is supposed to be the Greek letter tau, and the minus sign is put there so that examples (the most important is coming up!) will work out better. This quantity is called torsion, and is a measure of "twisting", how much a curve twists out of a plane. If a space curve does lie in a plane, and if everything is nice and continuous, then B will always point in one direction (there are only two choices for B, "up" and "down" relative to the plane, and by continuity only one will be used) so that the torsion is 0 since B doesn't change. The converse implication (not verified here!) is also true: if torsion is always 0, then the curve must lie in a plane!
For our example, we computed B(t) by directly computing the cross-product of T(t) and N(t). We got (I think!) (1/(1/sqrt(a²+b²))(b sin(t)i-a cos(t)j+a k) for B(t). This can be "checked" in several ways. First, that the candidate for B(t) has unit length, and then, that B(t) is orthogonal to both T(t) and N(t). This candidate passes those tests. Then we took d/dt of this B(t) and multiplied it by 1/(ds/dt)=(1/sqrt(a²+b²)). The result was (b/(a²+b²))(cos(t)i+sin(t)j). Checking all the minus signs (one in the definition of torsion and one in the result of N(t)) shows that here torsion is (b/(a²+b²)). Looking at the extreme values of a and b in this expression (a, b separately big and small) is not as revealing and/or as useful as with curvature, since a "feeling" for torsion isn't as immediate.
Then I looked at dN/ds, using the expression N=BxT. The result, after using the product rule carefully (remember that this product is not commutative!) is (dB/ds)xT+Bx(dT/ds) which, by the earlier equations, is -tNxT+kBxN which is tB-kT. So we have the following equations, called the Frenet-Serret equations (also called Darboux equations in mechanics):

dT/ds= 0 + kN + 0
dN/ds=-kT+ 0 + tB
db/ds= 0 - tN + 0

This is a collection of 3 3-dimensional vector equations, or a collection of 9 scalar differential equations. The remarkable fact is that if an initial point is specified for the curve, and an initial "frame" for the Frenet frame of T, N, and B, and if the curvature and torsion are specified, then the solutions to the differential equations above give exactly one curve. All the information about the curve is contained in the equations. So, for example, the motion of an airplane or a robot arm or the (geometric) structure of a long molecule are, in some sense, completely specified by k and t. Of course, this doesn't tell you really how to effectively control something so it moves or twists the way it is "supposed" to. The idea that the Frenet frame "evolves" in time, governed by the differential equations above, is useful.
Here are some pictures of various helices produced by Maple (the plural of "helix" is "helices").
The pictures below were produced using the command spacecurve([a*cos(t),a*sin(t),5*t],t=0..6*Pi,axes=normal,color=black,thickness=2, scaling=constrained); where a is 1 and 10 and 100 respectively. The procedure spacecurve is loaded as part of plots using the command with(plots);. I used the option scaling=constrained in order to "force" Maple to display the three curves with similar spacing on the axes. Otherwise the x and y variables would be much altered in each image. I hope that these pictures give some idea of what the curvature and torsion represent.
Some helices: x=a cos(t) & y=a sin(t) & z=bt


a=1 & b=5 k=.038 & t=.192	a=10 & b=5 k=.08 & t=.04	a=100 & b=5 k=.01 & t=.0005

9/18/2002 I wrote some simple vector differentiation "rules", dealing with how to differentiate A(t)+B(t) and f(t)A(t) and A(t)·B(t) and A(t)xB(t) if A(t) and B(t) are differentiable vector functions of t and f(t) is a differentiable scalar function of t. I miswrote one of these simple (!) rules, so was condemned to write out a proof until I found the error. I am sorry.
Then I tried to analyze the idea of how a curve bends. Curvature will be a measure of this bending.(Today a plane curve, in R², and tomorrow a space curve, in R³.) I began an analysis that was apparently first done by Euler in about 1750. A curve is a parametric curve, where a point's position at "time" t is given by a pair of functions, (x(t),y(t)). Equivalently, we study a position vector, r(t)=x(t)i+y(t)j. Here x(t) and y(t) will be functions which I will feel free to differentiate as much as I want.
There are special test cases which I will want to keep in mind. A straight line does NOT bend, so it should have curvature 0. A circle should have constant curvature, since each little piece of a circle of radius R>0 is congruent to each other little piece, and, in fact, the curvature should get large with R gets small (R is positive), and should get small when R gets large (and looks more like a line locally). I also suggested that even y=x² might be a good test to keep in mind, since there the curvature should be an even (symmetric with respect to the y-axis) function of x, and should be bell-shaped, with max at 0 and limits 0 as x goes to +/- infinity.
The problem is to somehow extract the geometric information from the parameterized curve. That is, if a particle moves faster, say, along a curve, it could seem like the same curve bends more. So what can we do?
We looked at theta, the angle that the velocity vector r'(t) makes with respect to the x-axis. How does theta change? After some discussion it was suggested that we look at the rate of change with respect to arclength along the curve: that is the same as asking for the rate of change with respect to travel along the curve at unit speed, and therefore somehow the kinetic information will not intrude on the geometry.
Arc length on a curve is computable with a definite integral: sqrt(x'²+y'²) integrated from t₀ to t with dt is the arc length. This is rarely exactly computable with antidifferentiation using the usual family of functions. But ds/dt is just sqrt(x'²+y'²) by the Fundamental Theorem of Calculus. And the Chain Rule suggests that d*/ds(ds/dt)=d*/dt if * is some quantity of interest, such as theta.
By drawing a triangle we see that theta is arctan of y'/x'. Differentiation with respect to t shows that d(theta)/dt must be (y''x'-x''y')/(x'²+y²)² (this uses the formula for the derivative of arctan, the Chain Rule, and the quotient rule. Then the previous results say that d(theta)/dt=(y''x'-x'y'')/(x'²+y'²)^3/2, a complicated formula.
Then we saw that this formula for a straight line gave 0, and this formula for a circle was 1/R, where R is the radius of the circle. We used y=mx+b for the line (so x=t and y=mt+b) and x=Rcos(t) and y=Rsin(t) for the circle. This fit well with the examples suggested earlier. And, in fact, on the curve y=x², with the parameterization x=t and y=t², the d(theta)/ds gave 4/(1+4x^2)^3/2, also consistent with earlier considerations.
d(theta)/ds is curvature, usually called k (Greek letter kappa).
I defined the unit tangent vector, T, to be a unit vector in the direction of r'(t). Therefore r'(t)=(ds/dt)T, where ds/dt is the length of r'(t), and this is the speed. I differentiated the formula for r'(t) using one of the product rules we had stated earlier. Therefore I got r''(t)=(d²s/dt²) T+ (ds/dt)d/dt(T). But T is cos(theta)i+sin(theta)j, and differentiation with respect to t is the same as differentiation with respect to s multiplied by ds/dt. But differentiation with respect to s gives (-sin(theta)i+cos(theta)j) multiplied by the derivative of theta with respect to s, and this is k. All this put together is: r''(t)= (d²s/dt²)T + k(ds/dt)²N where N is (-sin(theta)i+cos(theta)j), a unit vector normal to T (check this by dot product!), where is called the unit normal.
We have decomposed acceleration into the normal and tangential directions.
I used this to show that notion in a straight line (k= 0) had no normal component, and therefore a particle moving in a straight line had no force needed transverse to its motion. On the other hand, in our circular situation, the curvature was a positive number, and as long as the particle was moving (ds/dt not equal to 0) a force was needed to keep it one the circle. This is because the curvature k was non-zero, and so were the other terms. This is not at all "intuitively clear" to me.

9/16/2002

I'll go to a lecture which will finish at about 7:30 PM, tomorrow, Tuesday. I will go to Hill 304 and I will be available for questions from my arrival until 9:00 PM. I reserve the right to go home, however, if no one wants to talk to me.
We continued with the problem from last time: p=(3,2,-1) and q=(2,0,1) and r=(1,1,2) are three points in space. Can I describe a simple way to tell if the point (x,y,z) is on the plane determined by these three points?
Here is an method. Suppose v is the vector from p to q (so v is -i-2j+2k) and w is the vector from p to r (so w is -2i-j+3k) Then vxw is -4i-j-3k, a vector normal (perpendicular, orthogonal) to the plane. If a=(x,y,z), then a is on the plane determined by p and q and r if the vector from p to a is orthogonal to vxw=-4i-j-3k. This means that (x-3)(-4)+(y-2)(-1)+(z--1)(-3)=0, which simplifies to -4x-y-3z+11=0. All the steps of this process are reversible, so (x,y,z) is on the plane exactly when that equation is satisfied.
More generally, the points whose coordinates (x,y,z) satisfy Ax+By+Cz+D=0 (with N=Ai+Bj+Ck NOT zero) form a plane, with normal vector N.
We easily checked by direct substitution that (1,2,3) is not on the plane.
Another parametric description of this plane is obtained by adding the vector from 0 to p to scalar multiples of the vectors v and w: the result must be on the plane. So if s and t are any numbers, then the vector 3i+2j-k (from 0 to p) +tv+sw is on the plane. This means (looking at components) if x and y and z satisfy:

x=3+-t+-2s
y=2+-2t+-1s
z=-1+2t+3s

for some real numbers s and t, they must be on the plane. I substituted this into the equation -4x-y-3z+11=0 and checked that everything canceled.
What is the distance of Fred=(1,2,3) to the plane described above? We found two ways to do this. First, take a point on the plane: we took p=(3,2,-1). Then I drew a picture to convince people that the distance would be the "projection" of the line segment from Fred to p on a vector normal to the plane. That is, we would need to multiply the distance from Fred to p by the cosine of the angle between the Fred-to-p vector and a normal. We have such a normal (it is inherent in the equation of the plane), and we had such a vector. Then distance can then be computed with a dot product multiplied by the distance from Fred to p. We computed this.
Here's an alternative way to get the distance: find the point (which we called Walter) on the plane which is closest to Fred. How could we find Walter? The vector from Fred to Walter is parallel to a normal to the plane, so the vector from Fred to Walter is a scalar multiple of any normal vector. We therefore got the equations (assuming now that Walter has coordinates (x,y,z)):

x-1=-4t
y-2=-1t
z-3=-3t

where t is the scalar. Then substituting x=-4t+1 and y=-t+2 and z=-3t+3 into the equation of the plane (-4x-y-3z+11=0) got one value of t, and this value of t gave the coordinates of Walter. And the distance from Fred to Walter is the distance from the plane to the point.
More generally now I started talking about vector functions of a scalar variable. Here R(t)=x(t)i+y(t)j+z(t)k. This describes the geometry (the path) and kinematics (movement) of a particle. I illustrated this by playing around with the equations x(t)=-4t+1 and y(t)=-1t+2 and z(t)=-3t+3. What is the geometric object described by:
x(t)=-4t+1 and y(t)=-1t+2 and z(t)=-3t+3.    a line
x(t)=-8t+1 and y(t)=-2t+2 and z(t)=-6t+3.    the same line
x(t)=4t+1 and y(t)=1t+2 and z(t)=3t+3.    again the same line!
x(t)=-4t²+1 and y(t)=-1t²+2 and z(t)=-3t²+3.    a closed half-line (a ray)
x(t)=-4(sin(t))+1 and y(t)=-1(sin(t))+2 and z(t)=-3(sin(t))+3.  a closed line segment
The motion on the second line is in the direction of the first but twice as fast. The third line's motion is opposite the first. The t2 in the fourth gives an up and down effect from infinity to (1,2,3). The last example just oscillates back and forth on an interval. I recommended that students use the Maple procedure spacecurve to "see" what curves can look like.
So motion can be complicated.
I very briefly discussed what it means for a vector function to be differentiable: this works out to be the same as differentiability "in parallel" for each of the components. The same for integration. Then I began to discuss what R'(t), usually called the velocity vector, really means in terms of particle motion: the magnitude is the speed, and the direction is tangent to the curve. I needed to give some mechanical illustration of what a tangent vector to a curve might be.
We are currently skipping 12.6 and are jumping right into 13.1 and 13.2.

9/12/2002 The Maple field trip. Students worked through several pages of problems designed to give them some familiarity with Maple.

9/11/2002

We returned to considering the 3-dimensional vectors v=3i+j-k and w=4i+2j+3k. Writing a vector as a sum of perpendicular and parallel parts (compared to another vector) was vaguely (!) motivated by a picture of a block sliding on an inclined plane. We were able to write v as a sum: v_perp + v_parallel, where v_perp was perpendicular to w and v_parallel was parallel to w. We did this by finding v_parallel first: its direction was the direction of w, so a unit vector in w's direction was created by writing (1/|w|)w. The magnitude of v_parallel was obtained by looking at a triangle in the plane of v and w: the magnitude was |v|cos theta, where theta was the angle between v and w. Luckily we know cos theta from previous work with the dot product. So the magnitude is v · w /|w|. We computed all this and got v_parallel. v_perp was obtained by writing v_perp = v - v_parallel. A simple check was suggested: w· v_perp was to be 0, since the vectors are supposed to be perpendicular. Indeed (thank goodness!) this dot product was 0. My personal success rate with hand computation of this kind is not high. A new definition: vectors are orthogonal if their dot product is 0.
I introduced a new product, called the cross product or the vector product. There are dot products in every dimension. Dot products, however, give scalars as the result. The cross product is more-or-less unique to 3 dimensions and involves making a choice of "handedness". Some people feel this is rather important to physical reality. Related to this is the concept of chirality, important in chemistry and physics as well as mathematics. But enough diversions! What's vxw? Here is what the text calls the physics definition:

vxw is a vector.
The magnitude of vxw: in the plane determined by v and w, draw the parallelogram determined by v and w. The magnitude is the area of that parallelogram (an easy picture shows that the magnitude will be |v| |w| sin theta, where theta is the angle between v and w).
The direction of vxw: curl the fingers of your right hand from v to w. The thumb will "naturally" point perpendicular to the plane determined by v and w. That direction is the direction of vxw.

I "computed" a simple multiplication table:

   x     i    j    k
 --------------------
   i     0    k   -j  
 --------------------
   j    -k    0    i
 --------------------
   k     j   -i    0

This table already has some distressing or surprising information. Cross product has these properties:

Squares are 0: vxv=0 always (the area of a one-dimensional parallelogram is 0).
x is anticommutative: vxw=-wxv (the thumb points the other way!)
x is not even necessarily associative: (ixj)xj=-i but ix(jxj)=0.

Therefore computationally one must be careful in both ordering and grouping factors! This can lead to errors.
I stated further properties of x:

(v₁+v₂)xw= (v₁xw)+(v₂xw) for any vectors v₁, v₂, and w.
(cv)xw=c(vxw) for any scalar c and any vectors v and w.

The second one is almost believable from the geometric definition (stretch one side of a parallelogram by a factor of c and then the area gets stretched by c). The first is not so clear, and I didn't "prove" it. Similar results ("linearity") are also true in the second factor of x.
I applied those results to compute vxw, where v and w were the vectors we used earlier in the lecture. We distributed addition across x and also let the scalars "float" to the front. The multiplication table written above was used, and we finally got a result.
More generally, a convenient algebraic method of computing vxw was stated using determinants. The determinant of a 2-by-2 array (a matrix)

| a b |
| c d |

is ad-bc, while the determinant of a 3-by-3 array

| a b c |
| c d e |
| f g h |

is a(det I) - b (det II) +c (det III) where

   I         II       III
   ||        ||        ||
| d e |   | c e |   | c d |
| g h |   | f h |   | f g |

(these are called the minors of the larger matrix). There are many minus signs involved and ample opportunity for error. A course in linear algebra (Math 250 here) will explain why these formulas are interesting, but right now all I want are the definitions.
If v=ai+bj+ck and w=di+ej+fk, then vxw= the determinant of

| i j k |
| a b c |
| d e f |

and this follows from the linearity in each factor and the entries of the multiplication table for x. We checked that this works for the specific v and w we started with.
I began a geometric application of · and x which I'll finish next time. Students should read 12.3, 12.4, and begin 12.5.
Class tomorrow is a Maple field trip, to ARC 118.

9/9/2002

I tried to prevent my dog from getting to a lamppost by keeping the leash short enough. This led to the question of over- and under-estimating the quantity |v+w| where v and w are vectors. An overestimate is gotten from the triangle inequality: |v+w|=<|v|+|w|. An underestimate is obtained by a slightly more circuitous route: |v|=|v+0|=|v+(w-w)|=|v+w+(-w)|=<|v+w|+|-w|=|v+w|+|w| so that if we subtract |w| we get |v|-|w|=<|v+w| . This can give useful information if good choices of v and w are made (that is, with |v|>|w|).
I then recited the Law of Cosines for triangles in the plane. Most people seemed to know this, more or less. It specializes to the Pythagorean Theorem when the angle is a right angle. I used the Law of Cosines with vectors for the sides of the specified angle to deduce that the cosine of the angle between two vectors v and w was equal to a ratio: the bottom of the ratio was the product of the lengths of v and w, and the top of the ratio was a product ac+bd if v=ai+bj and w=ci+dj. This quantity is called the dot product or scalar product or inner product.
We generalized to Rⁿ. Here vectors are the sum of scalar multiples of n unit vectors pointed along the coordinate axes: v=sum_n=1ⁿa_je_j. If w is the same sort of sum (with b_j's as coordinates) then the dot product of v and w is v·w=sum_j=1ⁿa_jb_j. This product has some noteworthy properties.
1. If v and w are vectors, then v·w is a scalar (here, a real number).
2. v·w=w·v (commutativity).
3. If c is a real number, then c(v·w)=(cv)·w.
4. (v₁+v₂)·w=(v₁·w)+(v₂·w). Note Because of commutativity, the properties affecting the first factor work just as well with the second factor.
5. |v|=sqrt(v·v).
We "proved" a few of these, which really just involve not being afraid of the use of summation signs. Then the definition of the angle between two vectors is motivated by the 2-dimensional computation, so that: the angle between the vectors v and w =arccos( (v·w)/(|v| |w|) ).
Why should the argument to the function arccos be in the domain of arccos? Examination of this question shows that |v·w| is "supposed" to be less than the product of |v| and |w|, which is certainly not immediately clear to me. That is, if one actually wrote out several 23-dimensional vectors and computed the quantities v·w and |v| and |w| the desired inequality becomes less obvious. It is indeed true, and called the Cauchy-Schwarz inequality, and we went through a mysterious rapid proof of it. If Q(t)=|v+tw|², then Q(t) is a real-valued non-negative function of the real number t. Since |v+tw|²=(v+tw)·(v+tw), we can use the properties above to "expand this expression, and get Q(t)=(v·v)+(2v·w)t+(w·w)t²=C+Bt+At². This is just a quadratic function, whose graph is a parabola. Since A is non-negative, the parabola opens "up". Since Q(t) is itself always non-negative valued, the quadratic cannot have two real roots, so the discriminant B²-4AC must be nonpositive. But substitute the values of A and B and C to see that (2v·w)²-4|v|²|w|² is nonpositive. Add, divide by 4, take square roots: this is the Cauchy-Schwarz inequality.
I then applied this to find the angle between two specific vectors in R³ (not known to me at this time), an example I'll continue with next time.

9/5/2002 The instructor asked students to work on problem 40 of section 12.1, in groups or individually. The work was collected.
We introduced vectors: directed line segments, "arrows" going from a tail to a head, quantities with magnitude and direction. We gave some simple physical interpretations (force, velocity, etc.). Two vector "representations" will be the same if the arrows are parallel and have the same length.
Vector addition is motivated by simple physical experiments. It is defined by looking at the heads and tails of the summands in the correct order. The vector sum of two vectors is another vector. Then vector addition is commutative and associative, and these "facts" can be verified (at least in R³) geometrically. The zero vector, 0, is special, with head=tail. It is an additive identity: any vector+0=the vector. And vectors have additive inverses, defined by reversing the assignment of head and tail. The sum of a vector and its additive inverse is the zero vector.
"Scalar" multiplication was discussed. The word scalar is a somewhat antique usage, and here will mean a real number. If v is a vector and c is a scalar, then cv is another vector. If c=0, cv is the 0 vector. If c>0, then cv is a vector in the direction of v, whose length is v's multiplied by c. And if c<0, then cv is a vector whose direction is opposite v's direction, with length is v's multiplied by -c. Scalar multiplication satisfies various properties (all in section 12.2 of the text).
We discussed the norm or length or magnitude of a vector, v: |v|. It is the distance from the head to the tail. It is non-negative and satisfies various properties. The most subtle is |v+w|=<|v|+|w|, the vector version of the triangle inequality. And |cv|=|c| |v|.
Then we began to use this to give over- and under-estimates of various vector sums. This will be continued next time.
Some time was devoted to discussing ideas about the first problem of the workshop set, and just a very small amount of time devoted to the second problem.

9/4/2002

The instructor discussed the course and collected student information. Brief explanation of why the study of calculus in Rⁿ rather than just R² or R³ might be interesting was given.
Presentation of rectangular coordinate systems (how to locate points) in R¹ (the real line), R² (the plane), and R³ (space), followed by generalization to Rⁿ.
Distance introduced: this is a nonnegative real number. On the line, the distance between p and q, if these have coordinates a and b, say, is |a-b| or sqrt((a-b)²). "Simple" properties of this distance were given:

The distance from p to p is 0.
The distance from p to q equals the distance from q to p.
The distance from p to r is less than or equal to the distance from p to q + the distance from q to r.

The first two properties were "clear" and the last needed some discussion. An example was given to show that equality is not necessarily correct.
A distance formula in R² was suggested using the Pythagorean Theorem: the square root of the sum of the squares of the differences in coordinates of the points. Verification of the first two properties of distance for the plane suggested by those of the line was immediate. A rather lengthy algebraic verification of the third property was given. A sequence of reversible algebraic steps was applied to the inequality suggested until a statement about squares being nonnegative was obtained.
Generalization of the formula to Rⁿ was given. A name was given to the third property of distance: the triangle inequality, and an appropriate picture was drawn: geometrically, the length of one side of a triangle is "clearly" less than or equal to the sum of the lengths of the other two sides.

Maintained by greenfie@math.rutgers.edu and last modified 10/15/2002.