Class diary for Math 311:01, spring 2003

Diary for Math 311, spring 2003

In reverse chronological order.

Old diary entries: 1/22/2003 to 3/3/2003

Date What happened

4/14/2003

The review session on Tuesday evening, April 15, will be in Hill 425 at 6:10 PM.

I began by doing problem #2 in section 5.4 Suppose f(x)=1/x². Show that f is uniformly continuous on A=[1,infinity) and is not uniformly continuous on B=(0,infinity).
We first try to "link" f(x)-f(y) and x-y. Since f is given by a fairly simple algebraic formula, this works well: f(x)-f(y)=(1/x²)-(1/y²)=(y²-x²)/(x²y²)=(y+x)(y-x)/(x²y²). If we knew that |f(x)-f(y)|<2,304|x-y|, then we could verify the definition of uniform continuity by taking delta=epsilon/(2,304). So the "art" here is one of getting a useful overestimate of the other factors: (y+x)/(x²y²). Here let's be in A, so that y>=1 and x>=1. Then (y+x)/(x²y²)<=y/(x²y²)+x/(x²y²)=1/(x²y)+1/xy²<=1+1=2 because 1/(stuff>=1) is <=1. So if x and y are in A, then |f(x)-f(y)<=2|x-y|. Therefore, given epsilon>0, we can take delta=epsilon/2. The 2's will cancel, and if |x-y|<delta, we know that |f(x)-f(y)|<epsilon.
On B we must show that f is not uniformly continuous. That is, we must show that there is an epsilon>0 so that for any delta>0, there are x and y in B satisfying |x-y|<delta and |f(x)-f(y)|>=epsilon. This f is a simple function and simple choices will work. We can take epsilon=1. We will find sequences (x_n) and (y_n) in B so that |x_n-y_n|<1/n but |f(x_n)-f(y_n)|>=1. The suggestion was made that we take x_n=1/(n+1) and y_n=1/n. Then x_n-y_n=(n-(n+1))/(n(n+1))=1/(n(n+1)), and this is certainly less than 1/n. Also f(x_n)-f(y_n)=(n+1)²-n²=n²+2n+1-n²=2n+1, which is certainly greater than 1. So we have verified that f is not uniformly continuous. Notice that as n-->infinity, the points of the sequences go towards the "edge" where f becomes more "tilted".

We have now the following examples:
1. f(x)=x², domain [0,1], is uniformly continuous.
2. f(x)=x², domain [1,infinity) is not uniformly continuous.
3. f(x)=1/x², domain [1,infinity), is uniformly continuous.
4. f(x)=1/x², domain (0,infinity), is not uniformly continuous.

These examples might seem to support some sort of conjecture that if the range is "big" (unbounded) then a continuous function is not uniformly continuous, while if the range is "small" (bounded) then a continuous function is uniformly continuous. Neither of those statements is correct.
If f(x)=x and the domain is R, then f is uniformly continuous. (We can take delta=epsilon.) The range of this f is all of R, an unbounded set.
If f(x)=sin(1/x) and the domain in (0,1), then f is not uniformly continuous. This is because the function sin(1/x) oscillates back and forth between +1 and -1 "faster" and "faster": that is, take epsilon=2. sin(1/x) is +1 when x=1/(2nPi+Pi/2) and sin(1/x) is -1 when x=1/(2mPi+3Pi/2) and n and m are any integers. Notice that |1/(2nPi+Pi/2)-1/(2nPi+3Pi/2)|<=Pi/[(2nPi+Pi/2)(2mPi+3Pi/2)]<1/n. Therefore given any delta>0, we can find n so that these two values of x are less than 1/n apart, and their values of sin(1/x) still remain 2 apart. I certainly should have discussed this in more detail in class! The range of this function is [-1,1], certainly bounded.
Our conjecture was untrue. Here's the most important result.

Theorem (Continuous functions on closed, bounded intervals are uniformly continuous) Suppose f:[a,b]-->R is continuous. Then f is uniformly continuous.
Proof: We proceed by contradiction. Suppose that f is not uniformly continuous on [a,b]. Then there must be an epsilon>0 and sequences (x_n) and (y_n) in [a,b] so that |f(x_n)-f(y_n)|>=epsilon with |x_n-y_n|<1/n. Wow! Quite a lot of information.
Since (x_n) is a sequence in [a,b] we may apply the Bolzano_weierstrass Theorem: there must be a subsequence (x_{n_k}) which converges to some number q. And since all of the x_{n_k}'s satisfy the inequality a<=x_{n_k}<=b the limit, q, must also satisfy the inequality, so q is in [a,b]. Now we know that |x_{n_k}-y_{n_k}|<1/n_k and since we are dealing with a subsequence, n_k>=k, so that n_k>=k. This means that |x_{n_k}-y_{n_k}|<1/k: so the two subsequences are squeezed to the same limit, q. Now we are almost done.
(x_{n_k}) converges to q in [a,b] and (y_{n_k}) converges to q in [a,b]. The function f is continuous at q, so therefore the sequence (f(x_{n_k})) converges to f(q) and the sequence (f(y_{n_k})) converges to f(q) also. That means the distance between corresponding terms must -->0. But this contradicts |f(x_n)-f(y_n)|>=epsilon for all n. So we are done because we found a contradiction.

Butterflies imply boxes
The most common way that an effective connection between epsilon and delta is found uses what is called a Lipschitz condition. f:A-->R satisfies a Lipschitz condition with positive constant K if for all x and y in A we know that |f(x)-f(y)|<=K|x-y|. What does this condition "mean"? Well, first, in terms of epsilon and delta, if we are given epsilon>0, we could take delta=epsilon/K. The implication "If |x-y|<delta then |f(x)-f(y)|<epsilon" would be true. Geometrically, the inequality |f(x)-f(y)|<=K|x-y| is the same as "the slope from (x,F(x)) to (y,F(y))" is between -K and K. This would imply that if (x,F(x)) were on the graph, then the graph would be caught between lines of slope K and -K going through (x,F(x)). And then we could certainly put a rectangular box centered at (x,F(x)) on the graph so that the graph escaped out the "vertical" sides of the box. The box here is supposed to match the one drawn in the last lecture illustrating the delta-epsilon box for uniform continuity.
If we knew more about calculus, then we could assert the following: if f is differentiable on the interval [a,b], and if |f'(w)|<=K for all w in [a,b], then the Mean Value Theorem would imply |f(x)-f(y)|<=K|x-y|, because (f(x)-f(y))/(x-y)=f'(w) for some w in the interval between x and y. This is actually the way people frequently get estimates relating to uniform continuity. It is much more constructive than using the Bolzano-Weierstrass Theorem as part of a proof by contradiction!

Boxes don't imply butterflies
The function sqrt(x) is continuous on [0,1], and by the theorem proved just above, it must be uniformly continuous on [0,1]. However, there is no K so that |sqrt(x)-sqrt(y)|<=K|x-y|. To see this, just take y=0. Then the inequality becomes sqrt(x)<=Kx or (dividing by sqrt(x)) just 1<Kx^3/2. As x-->0, the right-hand side goes to 0 while the left stays at 1. So there is no "butterfly" of any angle which can sit anywhere on y=sqrt(x) and always have the graph within its wings. There is, however, a box which works: that is the meaning of uniform continuity.
The picture displayed essentially attempts to show that a box which works at the origin for sqrt(x) will be satisfactory elsewhere. A "butterfly" won't "fly" too well at the origin, basically because the tangent line becomes perpendicular there.

Uniform continuity will let us integrate continuous functions. Here is what we will do (this is essentially theorem 5.4.10 of the text). A step function will be defined by the following requirements: a "partition" of the interval [a,b] of this type:a=x₀<x₁<...<x_n-1<x_n=b, and selection of n numbers y₁,...,y_n. Then we will define S(x) to be y_j if x is in the interval [x_j-1,x_j). (If you are paying very close attention, you should see that the last interval needs special treatment, when x=b.) The idea is to approximate a continuous function f very closely by a step function. The steps of the step function and the boxes used in uniform continuity should seem to be related. The integral of a step function is easy: it will just be a sum of areas of rectangles. So this will approximate the integral of the continuous function. And if we can show that this approximation works well with area, then we will have defined the integral of a continuous function. Theorem 5.4.10 essentially describes one approach to this approximation. I will return to this later, and discuss it in more detail when we do integration.

4/10/2003

I did some problems in the textbook and made some other comments.

Section 5.3, problem #1 If I=[a,b] and f:I-->R is continuous, and if f(x)>0 for all x in [a,b], then there is alpha>0 so that f(x)>=alpha for all x in [a,b].
Proof: Use the Extreme Value Theorem (Theorem B from the last lecture). Then we know there is v in [a,b] so that f(v)<=f(x) for all x in [a,b]. Take alpha=f(v), and since all values of f are positive, this alpha will serve.

Comment Consider [a,b]=[0,1] and f(x)=x if x>0 and f(0)=1. This f fails to be continuous at only one point (x=0) and its values are positive for all x. But the inf of f's values is 0 so there is no positive lower bound.

Section 5.3, problem #3 If I=[a,b] and f:I-->R is continuous, and if for every x in [a,b] there is a y in [a,b] so that if |f(y)|<=(1/2)|f(x)|, then f has a root in [a,b]: there must be r in [a,b] with f(r)=0.
Proof: Suppose f has no root in [a,b]. Then (using the Intermediate Value Theorem, that is, Theorem C from the last lecture) either all of f's values are positive or all of f's values are negative. (If f has both positive and negative values, then it must have a root.) We will consider here the case that f(x)>0 for all x in [a,b]. The other case can be proved by considering, say, -f(x).
Now we have f(x)>0 for all x in [a,b]. By the preceding problem, f has a positive minimum value, f(v). But take "x=v" in the hypotheses to this problem. We then see there must be y in [a,b] such that f(y)<=(1/2)f(v)<f(v) (no absolute value signs are needed since we know in this case all function values are positive). But then f(v) isn't the minimum value of f, which is a contradiction.

Section 5.3, problem #4 Show that every polynomial of odd degree with real coefficients has at least one real root.
Discussion: Here I began by considering an example, something like P(x)=x⁵+44x⁴-403x³+22x-9. Let me call the terms less than the "top" term J(x). So here J(x)=44x⁴-403x³+22x-9. Essentially the odd degree term, x⁵, "dominates" J(x) when x is large positive or negative (this is a version of the statements that the limit of P(x) as x goes to +/-infinity is +/-infinity). What do I mean? Well, |J(x)|=|44x⁴-403x³+22x-9|<44|x|⁴|+403|x|³+22|x|+9<=(44+403+22+9)x⁴ when |x|>1. In fact, if, say, x=+/-10⁹ (this was the value suggested by students) then |J(x)|<|x|⁵. Since the inequality is < this means that the sign of P(x) is the sign of x⁵ for these two values of x. Since 5 is odd, this means that P(+/-10⁹) has +/-signs. So on the ends of the interval [-10⁹,+10⁹], P(x) changes sign. And P(x) must have a root inside this interval by the Intermediate Value Theorem.
The general case proceeds analogously. The highest degree term is the dominating term for x's which are large enough positive and negative, and since the degree of the dominating term is odd, the signs differ for positive and negative x's. So we know that functions like x¹⁷+808x⁴cos(x⁴+98) must have at least one real root.

Section 5.3, problem #6 f:[0,1]-->R and f(0)=f(1). Then there is c in the interval [0,1/2] so that f(c)=f(c+1/2).
Proof: Here the (non-obvious!) way to get c is to consider another function: g(x)=f(x)-f(x+1/2). g is continuous on the interval [0,1/2] by results on algebra and composition of continuous functions. g(0)=f(0)-f(1/2) and g(1/2)=f(1/2)-f(1)=-g(0). If g(0)=0, we are done (take c=0). If g(0)>0, then g(1/2)<0, so the Intermediate Value Theorem guarantees a c in [0,1/2] with g(c)=0 and again we are done. Finally, if g(0)<0, then g(1/2)>0, so we can get g(c)=0 again, etc.

Now I restated the definition of continuity:
f:A-->R is continuous at c in A when: given epsilon>0, there is delta>0 so that for all x in A satisfying |x-c|<delta, we know |f(x)-f(c)|<epsilon.

In geometric terms, this means we have a rectangular box with center at (c,f(c)) and with horizontal sides +/- epsilon from y=f(c) and with vertical sides +/-delta from x=c. The graph of y=f(x) is only "permitted" to "escape" from the box through the vertical sides. So when epsilon is specified, we can construct some box satisfying these geometric constraints.

Here's another definition:
f:A-->R is uniformly continuous in A when: given epsilon>0, there is delta>0 so that for all x and c in A satisfying |x-c|<delta, we know |f(x)-f(c)|<epsilon.

Therefore in this case, the same sized box will work for all points on the graph. We then considered the function f(x)=x² and tried to slide boxes around on the graph. We came up with some conjectures which we then verified.

Example 1 (uniform continuity satisfied) f(x)=x² is uniformly continuous in A=[0,1]. Geometrically, this is because we can slide the "box that works" from its center at (1,1) down the graph to (0,0) and it will always "work". Let's see: |f(x)-f(c)|=|x²-c²|=|x+c|·|x+c|<=(|x|+|c|)·|x-c|. If both x and c are in [0,1], then we know |f(x)-f(c)|<=2|x-c|. Therefore, if epsilon>0 is given, we could take delta=(1/2)epsilon, and this will satisfy the definition of uniform continuity for this f and A=[0,1].

Example 2 (uniform continuity not satisfied) f(x)=x² is not uniformly continuous in all of R. Geometrically, this is because the function "wiggles" or "tilts" too much as |x| gets large. If we take any box and move it so that its center is on the graph, if the box is moved far enogh to the right or to the left, eventually the graph will start poking out the top or the bottom of the box.
We need to verify that the definition is not fulfilled in the style of Math 311. So we must show that there is some epsilon>0 so that for all delta>0 there are x and c with |x-c|<delta with |f(x)-f(c)|>=epsilon. In fact, this is nicely done with a sequential approach. We can do this for any delta exactly when we can do it for 1/n (for any n in N) -- this uses the Archimedean property. So we will find epsilon and x_n and c_n so that the following is true:
epsilon>0
|x_n-c_n|<(1/n)
|x_n²-c_n²|>=epsilon.
We debated how to find these. The instructor suggested x_n=n, and then we were happy: c_n=n+(1/(2n)) makes everything work with epsilon=1. This is verified by direct computation:
|x_n-c_n|=1/(2n)<1/n and |x_n²-c_n²|=|n²+1+1/(4n²)-n²|>=1.

Next time we will use a proof by contradiction to verify:
Theorem A function which is continuous on a bounded, closed interval must be uniformly continuous.
This will be a vital ingredient when we integrate continuous functions.

4/9/2003

Propositions I and II and III are already known, and are stated here in order to help people understand the lecture.

Proposition I If (x_n) is a convergent sequence and if, for all n in N, x_n is in [a,b], then lim(x_n) is in [a,b].

Comments: This is an easy consequence of such sequential limit facts as: if L=lim(x_n) with x_n>=0 for all n, then L>=0. Just compare the x_n's with a and with b (look at b-x_n and x_n-b).

Proposition II Suppose (x_n) is a sequence with x_n in [a,b] for all n in N. Then there's a subsequence (x_{n_k}) which converges.

Comments: This is the Bolzano-Weierstrass Theorem. It is generally impossible (or at least difficult!) to discover the subsequence.

Proposition III If (x_n) converges and f is continuous, then the sequence (f(x_n)) converges, and its limit is f(lim(x_n)).

Comments: This is the sequential statement which is equivalent to continuity.

Theorem A (continuous functions on closed bounded intervals are bounded) If f is continuous on [a,b], then there is M>0 so that |f(x)|>M for all x in [a,b].

Comment: The phrase "f is continuous on [a,b]" means f is continuous at each x in [a,b].

Example A1 A function not continuous at one point of a closed bounded interval and the function is not bounded: take [a,b]=[0,1] and f(x)=1/x for x>0 and f(0)=0. f is not bounded, since f(1/n)=n for n in N and we have the Archimedean property.

Example A2 A function continuous on an open interval which is not bounded: take (a,b)=(0,1), and f(x)=1/x.

Proof of Theorem A: Suppose f is not bounded. Then for n in N, there exists x_n in [a,b] with |f(x_n)|>=n. By Proposition II, the sequence (x_n) has a subsequence (x_{n_k}) which converges. Also, |f(x_{n_k})|>=n_k>=k. By Proposition I, the limit of the subsequence is in [a,b]. We have assumed that f is continuous at all points of [a,b], so it must be continuous at the limit of the subsequence, and therefore by Proposition III, (f(x_{n_k})) converges. But a convergent sequence of real numbers is bounded, and this sequence is not bounded since its k^th element is at least k. This is a contradiction.

Theorem B (Extreme Value Theorem) If f is continuous on [a,b], then there are elements v and w of [a,b] so that f(v)<=f(x)<=f(w) for all x in [a,b].

Comment: So the function "achieves" its maximum and minimum values. And the range of the function must be a subset of the interval [f(v),f(w)].

Example B1 A function which is continuous at all but one point of a closed bounded interval and the function is bounded and the function does not "achieve" its sup and its inf: we decided that if [a,b]=[-1,1] and f(x)=x+1 for x in [-1,0), f(0)=0, and f(x)=x-1 for x in (0,1], then the range of f is the open interval (-1,1). f is discontinuous only at 0. The sup of f's values is 1 and the inf of f's values is -1, and f never attains either 1 or -1: a fairly weird example.

Proof of Theorem B: Since f is continuous on [a,b], Theorem A applies and f must be bounded. Therefore the set S={f(x): x in [a,b]} (the set of values of f) is non-empty (f(a) is in S) and is bounded. By the Completeness Axiom, S has both a sup and an inf. We will work with the sup, which will produce f(w). Parallel work with the inf will get f(v). Let q=sup S. Then given n in N, there is an element y_n of S so that q-(1/n)<y_n<=q (this is the alternative characterization of sup). But since y_n is in S, there is x_n in [a,b] with f(x_n)=y_n. By Proposition II, the sequence (x_n) has a subsequence (x_{n_k}) which converges. Also, q<=|f(x_{n_k})|>q-(1/(n_k))>=q-(1/k). By Proposition I, the limit of the subsequence is in [a,b] and is a point, w. We have assumed that f is continuous at all points of [a,b], so it must be continuous at the limit of the subsequence, and therefore by Proposition III, (f(x_{n_k})) converges to f(w). But f's values on this subsequence are squeezed to q, and therefore f(w)=q, the sup of the values of f([a,b]).

Corollary Values of continuous f on [a,b] always lie inside some closed bounded interval: f([a,b]) is a subset of [min value of f on [a,b],max value of f on [a,b]].

Theorem C (Intermediate Value Theorem) If f is continuous on [a,b], and if f(a)<0 and if f(b)>0, there is a element r of [a,b] with f(r)=0.

Example C1 A function not continuous at only one point, with only two values, which doesn't have any roots: here [a,b]=[0,1] and f(x)=-1 for x=0 and f(x)=1 for x>0. This is a silly example, but it works.

Example C2 How many roots can such a function have? We saw that such a function can actually have any finite number of roots, and it even can have an infinite number of roots.

Proof of Theorem C: The text uses the bisection method which leads naturally to an algorithm used for root-finding. Please look at the text and see if you like that proof. I'll try another technique here.
Let T={x in {a,b]: f(x)<0}. Now T is not empty since a is in T. T is bounded, since T is a subset of [a,b] and b is an upper bound of T. Therefore the completeness axiom applies, and T must have a least upper bound, which we will call r. So r=sup(T). Either f(r)<0 or f(r)>0 or f(r)=0 by Trichotomy. We will "eliminate" (proof by contradiction) the first two alternatives, so that the last must be true.
What if f(r)<0? Then r<b (since f(b)>0). By continuity of f at r, there is delta>0 and an interval [r,r+delta) where if x is in that interval, f(x)<0. But x must be in T, then, and therefore r cannot be an upper bound of T. Contradiction.
What if f(r)>0? Then r>a (since f(a)<0). By continuity of f at r, there is delta>0 and an interval (r-delta,r] where if x is in that interval, f(x)>0. But if r is the least upper bound of T, there must be an element x of T in that interval, and for that element, f(x)<0. Contradiction.

Of course the function takes on all "intermediate" values, not just 0. And we have the important:
Interval Mapping Theorem If f is continuous on [a,b], and if the maximum value of f on [a,b] is M (M is what I called f(w) before) and if the minimum value of f on [a,b] is m (m is what I called f(v) before) then the collection of all of f's values is [m,M].

All the results of this class session are almost part of our subconscious: probably Piaget showed that young babies "learned" them at a very early age. Is the intellectual structure of the course worth the difficult trip we have gone through? That is to be judged by each individual, naturally. I like it.

There will be an exam on Thursday, April 17. Here is information about the exam, and a collection of review problems.

4/7/2003

Many fine students showed up for class today in the midst of an unseasonal snow storm. I proved that composition of continuous functions was continuous.

Theorem (Composition and continuity) Suppose f:I-->R and g:J-->R, and we know that f(I) is a subset of J. Also suppose that f is continuous at c and that g is continuous at f(c). Then the composition of g with f is continuous at c.
Proof: 1Since f is continuous at c we know: given alpha>0 there is beta>0 so that if |w-c|<beta and w is in I, then |f(w)-f(c)|<alpha.
2Since g is continuous at f(c) we know: given gamma>0 there is delta>0 so that if |v-f(c)|<delta and v is in J, then |g(v)-g(f(c))|<gamma.
3We must prove the following implication: if epsilon>0, then there is rho>0 so that if |b-c|<rho and b is in I, then |g(f(b))-g(f(c))|<epsilon.
The "unconventional" letters really help in making readers concentrate about what's going on (I think so, anyway!). So suppose epsilon>0 is given. Then use 2 with gamma=epsilon. We then get some delta>0. Then use 1 with that delta taken as alpha. The beta guaranteed will work as the rho needed in 3. Why is this? If |b-c|<rho and b is in I, then 1 guarantees that |f(b)-f(c)|<delta. Of course, f(b) is in J by the hypotheses of the theorem. Then 2 guarantees that |g(f(b))-g(f(c))|<epsilon as desired.

Comments We can't generally drop any of the hypotheses of the theorem and expect the conclusion to remain valid. For example, if f(x)=x, any discontinuity of g will be "transmitted" to the composition. Similarly, if g(x)=x, the composition will have f's discontinuities.

I then announced that an EXAM would be given on Thursday, April 17. My current intention is to cover the major results of section 5.3 on Wednesday, and then the definition and simple examples about uniform continuity on Thursday (section 5.4). The exam would include coverage of that material. I hope to give out a review sheet on Wednesday, and may try to schedule a review session next week. We mostly did textbook homework problems for the remainder of today's class.

Section 5.2, #3 Give an example of functions f and g both discontinuous at c in R such that a) the sum f+g is continuous at c b) the product f·g is continuous at c.
Solution: We debated whether the example requested by the text needed to satisfy both a) and b) or whether two examples were being requested. Mr. Benson cut short the debate by giving an example which indeed did satisfy both a) and b): f(x)=1 if x=0 and f(x)=0 otherwise; g(x)=0 if x=0 and g(x)=1 otherwise. One of the first examples we analyzed in connection with continuity essentially showed that both f and g were not continuous at 0. But f+g is the function which is always equal to 1, and f·g is always equal to 0, so both of these functions are certainly continuous at 0. This is a very neat example.

Section 5.2, #7 Give an example of a function f from [0,1] to R which is not continuous at every point of [0,1] but |f|is continuous at every point of [0,1].
Solution: I have forgotten who gave this nice solution, but take f(x)=1 if x is rational and f(x)=-1 if x is irrational. Then |f| is always 1, so |f| is continuous everywhere. I did ask why f was not continuous -- we need to negate a definition which begins, "for all epsilon>0 there is delta>0 so that ..." and I asked what epsilon would have no satisfactory delta, and I was told, epsilon=1/2.

Section 5.2, #8 Suppose f and g are continuous from R to R. If f(r)=g(r) for all r in Q (that is, for all rational numbers r) then show that f(x)=g(x) for all x in R.
Solution: (following suggestions of Ms. Greenbaum and Mr. Hedberg). Suppose we are given x in R. We know that Q is dense in R. Therefore, for each n in N, there is a rational number r_n in the interval (x-(1/n),x+(1/n)). Consider the sequence (r_n). Since x-(1/n)<r_n<x+(1/n) and the sequence (1/n) converges with limit=0, the Squeeze Theorem implies that (r_n) converges and its limit is x. Now the sequential characterization of continuity implies that (f(r_n)) must converge, with limit equal to f(x) because f is continuous. But also (g(r_n)) must converge, with limit equal to g(x) since g is continuous. Since g and f are equal on rational numbers, the sequences are identical, and therefore must have equal limits, so f(x)=g(x).
This is the solution we discussed in class, but could we possibly find another one using the epsilon-delta characterization of continuity? Here is one way, but using the sometimes cumbersome method of contradiction. Suppose there is x in R with f(x) not equal to g(x). Then take epsilon=(1/2)|f(x)-g(x)|, surely positive. Since f is continuous at x, there is delta₁>0 so that if |x-anything|<delta₁ then |f(x)-f(anything)|<epsilon. Also there is delta₂>0 so that if |x-anything|<delta₂ then |g(x)-g(anything)|<epsilon. Now take delta=min(delta₁,delta₂). Density of Q implies that there is a rational number r in the interval (x-delta,x+delta). For this r we know |f(r)-f(x)|<epsilon and |g(r)-g(x)|<epsilon, so that (since f(r)=g(r) by hypothesis) |g(x)-f(x)|=|(g(x)-g(r))-(f(r)-f(x))|<=|g(x)-g(r)|+|f(r)-f(x)|<(1/2)epsilon+(1/2)epsilon=epsilon. Therefore epsilon<epsilon which is false. Whew! I think I like the sequence proof more.

#8 and a half (Not from the text!) Can we find an example of two functions f and g defined and continuous on all of R so that the set S of numbers s where f(s)=g(s) is infinite but there are x's in R where f(x) is not equal to g(x)?
Solution: Again, I have forgotten who suggested this. Take f(x)=0 for x>=0 and f(x)=x for x<0, while g(x)=0 for x>=0 and g(x)=-x for x<0. This works. Here S is the set [0,infinity).

#8 and three-quarters Can we find an example like the previous one where the set of agreement of the two continuous functions is just (0,infinity)?
Solution: No, we can't. Consider the sequence (1/n) which converges to 0 and which has all positive elements. Then f(1/n)=g(1/n) by hypothesis, so the sequences (f(1/n)) and (g(1/n)) are identical. But by the sequential characterization of continuity (used in the same way as the proof of #8 above) these sequences converge to, respectively, f(0) and g(0). Since the sequences are identical, the limits must agree: f(0)-g(0).

#8 and seven-eights (?) Suppose that f and g are continuous on all of R and that for x>0 we know |f(x)-g(x)|<33sqrt(x). What can we say about f(0) and g(0)?
Answer, not solution: f(0) and g(0) must be equal (they don't both have to be 0, though!). How can one "compare" the sequences (f(1/n)) and g(1/n)) in this case? Can they be "squeezed"?

Section 5.2, #12 A function f from R to R is said to be additive if A f(x+y)=f(x)+f(y) for all x and y in R. Prove that if f is continuous at one x₀ in R then f is continuous at every point of R.
Solution: I earnestly hoped that everyone would recognize the equation above, since it is the central object of study in both Math 250 and Math 350. I first deduced some properties of such f's:
1. If f is additive, then f(0)=0. Proof: Take y=0 in equation A. Then the left-hand side is f(x+0)=f(x), and the right-hand side is f(x)+f(0). Since f(x)=f(x)+f(0), we know from early in the course that f(0) must be the "additive identity", so (by uniqueness of such, for early in the course) f(0)=0.
2. If f is additive, then f(-x)=-f(x). Proof: Take y=-x in equation A. Then the left-hand side is f(x+(-x))=f(0)=0 by the previous observation. And the right-hand side is f(x)+f(-x). Therefore 0=f(x)+f(-x), so that f(-x) is the "additive inverse" of f(x), so (by uniqueness of such, for early in the course) f(-x)=-f(x).
Before trying the solution of the problem I tried to consider an example. One example suggested was f(x)=46x. This f is continuous, and to get |f(x)-f(x₀|<epsilon, there is delta>0 the restriction |x-x₀|<delta sufficient. This delta must be epsilon/(46), and, in fact, the same delta works everywhere. So for this problem we know that f is continuous at x₀: given epsilon>0, there is delta>0 so that if |x-x₀|<delta, then |f(x)-f(x₀|<epsilon. I bet that the same delta will be sufficient for y₀. I thus assert that if |y-y₀|<delta, then |f(y)-f(y₀|<epsilon. Why is this true? I will relate y and y₀ to the x's. So the condition |y-y₀|<delta is the same as |y-x₀+x₀-y₀|<delta. Take x=y+x₀-y₀. This inequality condition is then exactly |x-x₀|<delta. So we know by continuity of f at x₀ that |f(x)-f(x₀)|<epsilon. Since x=y+x₀-y₀, f(x)=f(y+x₀-y₀)=f(y)+f(x₀)-f(y₀), and f(x)-f(x₀)=f(y)+f(x₀)-f(y₀)-f(x₀)=f(y)-f(y₀). Therefore |f(y)-f(y₀)|<epsilon, and we have verified the continuity of f at y₀.

In fact, an additive function f, satisfying f(x+y)=f(x)+f(y), actually has some further properties. For example, f(2x)=f(x+x)=f(x)+f(x)=2f(x). It isn't hard to verify by math induction that f(nx)=nf(x) for all x in R and n in N. Also, if in the equation f(2x)=2f(x) we take x=(1/2)y we get f(2·(1/2)y)=2f((1/2)y), so that f(y)=2f((1/2)y) and (1/2)f(y)=f((1/2)y). Again we can verify that if m is in N, (1/m)f(y)=f((1/m)y). Combining what we have observed, you may see that we have proved the following result: if f is additive and if r is rational, then f(rx)=rf(x) for all x in R. That is, f is linear with respect to the rational numbers, Q. If c=f(1), the f(r)=f(r·1)=rf(1)=cr. So, when restricted to the rational numbers, an additive function must be multiplication by a constant. Combine the results of problems 8 and 12: an additive function which is continuous at one point must be multiplication by a constant. This is the content of problem #13.

Weird example It is amazing and almost unbelievable that there are additive functions which are not of this kind. To see this, I need to quote some results from linear algebra. Q is a field. Multiplication of real numbers by elements of Q establishes that R is a vector space over Q. This is very very weird to any sane human being, but it is correct. Of course, students need to know what a vector space is to be sure it is correct, but it is correct. Actually, the dimension of the vector space is infinite (!). We can define linear transformations from the vector space R to itself just by looking at what happens to a basis. Here is part of a basis: 1 and sqrt(2). They cannot be linearly dependent over Q because that would imply that sqrt(2) is rational. So here I will define a "linear transformation": its value on the basis element sqrt(2) is sqrt(2) (that is, multiplication by 1) and its value on the basis element 1 is 0 (that is, multiplication by 0). Also its value on all other basis elements is 0. This "linear transformation" is certainly additive. Can it be continuous? If it is continuous at any point, then it is continuous at all points. And if it is continuous at all points, then it multiplies all numbers by the same number (problem #13) which this additive function does not. So this function is not continuous at any point (!). Since f(0)=0, given epsilon>0, there is no delta>0 so that |x-0|<delta implies |f(x)-f(0)|=|f(x)| is less than epsilon. Therefore there must be x's in the interval (-delta,delta) with |f(x)|>epsilon. If one considers closely what this means for the graph of such an f, it turns out that there are "dots" (points on the graph) in a dense subset of the whole plane: dots inside every box of the plane! This is very, very weird to me. Linear algebra is hard.

Today's quote was, of course, from W. Shakespeare (1564-1616). Here is sonnet #18:

Shall I compare thee to a summer's day?
Thou art more lovely and more temperate:
Rough winds do shake the darling buds of May,
And summer's lease hath all too short a date:
Sometime too hot the eye of heaven shines,
And often is his gold complexion dimm'd,
And every fair from fair sometime declines,
By chance, or nature's changing course untrimm'd:
But thy eternal summer shall not fade,
Nor lose possession of that fair thou ow'st,
Nor shall death brag thou wander'st in his shade,
When in eternal lines to time thou grow'st,
So long as men can breathe, or eyes can see,
So long lives this, and this gives life to thee. ...

Just part of the Rutgers effort to educate the whole person.

4/3/2003

We discussed a few more "facts" about limits of functions.

I tried to find a limited converse to Fact 3. That is, if a function is positive and it has a limit, must the limit also be positive? Mr. Benson suggested the following example:
f(x)=x² and c=0. Then for non-zero x, f(x) is certainly positive, but the limit as x-->0 of f(x) is 0, which is not positive.
This is similar to examples with sequences (the sequence x_n=1/n, for example). Positivity is not inherited, but non-negativity is!

Fact 4: non-negativity is inherited by the limit Suppose c is a cluster point of a set A, and f is defined on A. Suppose also that the limit of f as x-->c for x in A exists and is equal to L. If f(x)>=0, then L>=0.
Proof: If L<0, consider Fact 3 with the signs reversed. The values of f must be negative for x close enough to c. But that contradicts the assumption about f(x), so the initial hypothesis must be wrong, and L must be non-negative.

Fact 5: squeeze theorem Suppose c is a cluster point of a set A, and f and g and h are functions defined on A and, for x in A, f(x)<=g(x)<=h(x). If the limit of f(x) as x-->c for x in A exists, and the limit of h(x) as x-->c for x in A exists, and if these two limits are equal, then the limit of g(x) as x-->c for x in A exists, and equals the common value of the other two limits.
Proof: The suggestion was made by Mr. Tropeano, I think, that we try to use the Squeeze Theorem for sequences. I agreed, and therefore temporarily "adjourned" this proof while I prepared a sequential equivalence for limits of functions.

Theorem (Cluster points and sequences) c is a cluster point of a set A if and only if there is a sequence (x_n) with the following properties:

Each x_n is in A.
None of the x_n's are equal to c.
(x_n) converges and its limit is c.

Proof: First let's assume there is such a sequence and try to verify that c is a cluster point of A. The definition tells us that, given delta>0, there must be some element w of A satisfying 0<|w-c|<delta. Well, property 3 above assures us that there is N(delta) so that |x_n-c|<delta for n>=N(delta). So w is one of those x_n's, then certainly w is in A and w is not equal to c by properties 1 and 2. Thus 0<|w-c|<delta, so we have verified the definition of cluster point.
On the other hand, suppose c is a cluster point of A. Then take delta=1/n. We know there must be an element of A (which I will call x_n!) satisfying 0<|x_n-c|<delta. The first inequality implies that x_n is not c (so property 2 above is true). And certainly each x_n is in A. And we also know (unrolling the second inequality) that c-(1/n)<x_n<c+(1/n). But the sequence version of the squeeze theorem then implies that (x_n) converges to c. And we are done.

Theorem (Limits and sequential limits) Suppose that c is a cluster point of a set A, and f is defined on A. Then the limit of f(x) as x-->c is L if and only if for every sequence (x_n) with each x_n in A and no x_n equal to c and which satisfies lim(x_n)=c, then (f(x_n)) converges and its limit is L.
Proof: The proof of this was essential done during our discussion of the logical equivalence of PD#1 and PD#2.

A return to the proof of fact 5: we verify that the limit of g as x-->c exists by checking on sequences, which is what the previous theorem allows us to do. So if x_n) is a sequence with each x_n in A and no x_n equal to c, we know by the hypotheses of fact 5 that f(x_n)<=g(x_n)<=h(x_n) for all n. We also know (using one implication of the theorem above) that both sequences (f(x_n)) and (h(x_n)) converge with a common limit. But then the sequence version of the squeeze theorem applies, and we know that (g(x_n)) converges with the same limit. But then the other implication of the theorem above implies that the limit of the function g exists and is as desired.
Notice the logical "dance" back and forth which the theorem above allows us to do. So there are lots of results about limits of sequences which imply results about limits of functions relatively easily.

Informally, we should remember from calculus that a function will be continuous at a point if it is defined at the point, and if the limit of the function at that point exists and equals the value of the function there. This will be quite complicated if I continue using the setting "c is a cluster point of A", so I won't. From now on (at least for a while) f will be defined on an interval I. My examples of I will be R (all of the reals) or [0,1] (a closed interval) or (0,1) (an open interval). Here is a formal definition of continuity for our course.

Definition (Continuity) Suppose f:I-->R. If c is an element of I, then f is continuous at c if, given any epsilon>0, there is delta>0 so that if |x-c|<delta with x in I, then |f(x)-f(c)|<epsilon.

One can try to understand this in terms of input and output tolerances, as I previously explained. One difference between this definition and the definition of limit is that we just have "|x-c|<delta" and not "0<|x-c|<delta". This is because we are assuming that the limit of f at c is actually f(c): it is the "L" in the previous definition of function limit.
I hope that most of the functions that we think are continuous indeed will be continuous. Such functions include polynomials, rational functions (away from where the denominators are 0, of course!) and roots. Let's compute a few examples straight from the definition.

Example 1, a rational function (I did not do 1/x which I was assured had been verified in the text!). Let us consider f(x)=1/(x²+1). I claim that f is continuous for every number c. Since this function is defined by an algebraic formula, my first step is to try to get some algebraic connection between |f(x)-f(c)| and |x-c|. This is easy, but a bit tedious:
|f(x)-f(c)|=|1/(x²+1)-1/(c²+1)|=|x²-c²|/[(x²+1)(c²+1)]=|x-c|·|x+c|/[(x²+1)(c²+1)].
Wow! What a mess, or maybe there is too much information here. All we really want is some inequality connecting |f(x)-f(c)| and |x-c|. We can simplify and lose some information and still retain enough "control" so that we can prove our desired result. Here is what I will do: I will "forget" terms on the bottom, and the result will be an overestimate. Therefore:
|f(x)-f(c)|<=|x-c|·|x+c|.
This looks familiar. We analyzed an example with a similar inequality last time. So given epsilon>0, we can take delta equal to the minimum of 1 and epsilon/(2|c|+1).

Then all of our work with sequences easily verifies the following result:
Theorem (some algebra and continuity) If f and g are continuous at c, so are f+g and f·g and (constant)f.
What about quotients? We need a little additional result.

Proposition (non-zero behavior of continuity) If f is continuous at c and f(c) is not 0, then there is delta>0 so that for all x in I (the domain of f) satisfying |x-c|<delta, f(x) is not zero.
Proof: Take epsilon=|f(c)|, which is a positive number. Then the definition of continuity assures us of a delta>0 so that if |x-c|<delta, then |f(x)-f(c)|<delta. If f(x) were equal to 0, the last inequality would become: |f(c)|<|f(c)| which is surely false. So these f(x)'s cannot be 0.

Theorem (more algebra and continuity) If f and g are continuous at c, and if f(c) is not 0, then 1/f and g/f are both defined in some intervals containing c, and are continuous at c.
Proof: The vital part is making sure that f can't be 0 in the interval, and that's what the preceding result insures.

The most important result about continuity is composition. The nicest thing about the definition of continuity that we settled on is that it makes the following result very easy to prove.
Theorem (Composition and continuity) Suppose f:I-->R and g:J-->R, and we know that f(I) is a subset of J. Also suppose that f is continuous at c and that g is continuous at f(c). Then the composition of g with f is continuous at c.
Proof: To be given next time, but as I explained in class, the model of input/output tolerances guides how the proof will be given. (The output tolerance for f turns out to be the input tolerance for g, I think.)

4/2/2003

I wrote again the provisional definitions #1 and #2 from the last lecture. I copied examples 1 through 4 from the last lecture.

I verified that example 1 (f(x)=x²) is continuous at c exactly as the textbook does. That is, I wrote x²-c²=(x-c)(x+c) and stated that I wanted to "control" |x²-c²| with some restriction on |x-c|. I first said that maybe we should ask |x-c|<1. Ms. Guthrie verified that this implied |x|<|c|+1 (you can see this by "unrolling" the inequality as we have already done many times). Then |x-c|<=|x|+|c|<2|c|+1.

Therefore if we knew that |x-c|<epsilon/(2|c|+1), we see |x²-c²|=|x-c|·|x+c|<epsilon/(2|c|+1)·(2|c|+1)=epsilon. So take delta to be the minimum of 1 and epsilon/(2|c|+1). We need the 1 in order to get some bound on the x in |x-c|.

Feeling silly, I did a similar thing for x⁵, which is more than any sane person might do: x⁵-c⁵=(x-c)(x⁴+x³c+x²c²+xc³+c⁴). So if |x|<|c|+1, the multiplier of x-c can be bounded by an expression not involving x (replace each x by |c|+1). This can in turn be used to produce a delta given an epsilon. And so it goes ....

I then turned my attention to example 4, and tried to verify that the f given there was not continuous at any irrational c. After drawing a picture, Mr. Benson suggested that I try |c|/(327) as my epsilon. If c>0 I needed to show that given delta>0, the implication: if |x-c|<delta then |f(x)-f(c)|<|c|/(327) was false some x. If c>0, I took x to be a rational number in the open interval (c,c+delta) (possible since the rationals are dense). Then f(x)=x, and f(c)=0 so that |f(x)-f(x)|=x>c>c/(327) certainly. If c<0, we could take x rational in the interval (c-delta,c).
A similar analysis would show that f is not continuous at rational non-zero c's.

Example 5 f(x)=1 if x=0 or if x=1, and f(x)=0 otherwise. This f is not continuous at 0 and 1 and is continuous at all other x's.

Example 6 The Question of the day! Find an example of f which is continuous at 0 and at 1, and is not continuous at other numbers.
A number of correct answers were given. One such answer is f(x)=x²-x for x rational and f(x)=0 for x irrational. There are many possible answers.

Then I tried to analyze the reasons the text is so "particular" about domains in its discussion in chapter 4.

We probably want to have the following results true:
Algebraic if f and g are continuous, then f+g and f·g should be continuous (this seems to be easy with the sequential definition [PD#1]). As soon as we consider 1/f and f/g, however, we get into questions of restricting domains.
Composition I think the following result is more subtle, but should also be true: if f and g are both continuous, then f composed with g should be continuous. One not so simple example of this has g(x)=x²(x-1) and f(x)=sqrt(x). Here the "natural domain" of f composed with g is {0} union [1,infinity). This is probably best seen by looking at the graph of y=g(x) and seeing where it is non-negative. But the graph of the composition is then one "isolated" dot and a piece of a smooth curve. It really isn't clear if the function "at" the isolated dot can or should be called continuous. We think that continuity involves the behavior of a function "at" a point compared to its behavior at nearby points. If there are no nearby points, then restrictions on comparative behavior are pointless (pun!).

An element c of a subset A of R is called an isolated point of A if there is delta>0 so that the only point of A in (c-delta,c+delta) is c.
c will be a cluster point (almost!) if it is not isolated. I made mistakes in class on this, and I hope I won't make another here.
A real number c is a cluster point of A if given an delta>0, there is an element a not equal to c in the interval (c-delta,c+delta).
So cluster points can be in or not in the set.

Examples:

Set Cluster points of the set
(0,1) (open interval) [0,1] (closed interval)

[0,1] [0,1]

{0} empty set (no cluster points)

N (natural numbers) empty set

{1/n : n in N} {0}

Q (rational numbers) R (all of the reals)

Another motivating example if the limit of ((f(x)-f(c))/(x-c)) as x-->c. The "natural domain" of this quotient, well-known for being part of the definition of the derivative, will always exclude c. So we must be a bit more careful about limits.

Definition Suppose c is a cluster point of A, and f is defined on A. Then we say that the limit as x-->c with x in A of f is L if the following is correct:
given epsilon>0, there is delta>0 so that when 0<|x-c|<delta, then |f(x)-L|<epsilon.

Notice that there are several differences between what we wrote here and Provisional Definition #2. First we have the additional layer of "A" and "cluster point of A". In this course, A will almost always be an interval (closed or open) or a finite union of intervals. Second, and more interesting, is the additional restriction on x: 0<|x-c|. This is to take care of such situations as the definition of derivative.

We can now merrily (?) prove a collection of facts.

Fact 1: uniqueness of limits Suppose c is a cluster point of a set A, and f is defined on A. If L₁ and L₂ are both limits of f at c, then L₁=L₂.
Proof: If L₁ is not equal to L₂, then take epsilon=|L₁-L₂|/2, certainly a positive number. Then there must be delta>0 so that when 0<|x-c|<delta and x is in A, then |f(x)-L|<epsilon. Since c is a cluster point of A, there is an x in A satisfying the restrictions 0<|x-c|<delta. So for this x we can write:
|L₁-L₂|=|(L₁-f(x))-(f(x)-L₂|<=(triangle ineq.)|L₁-f(x)|+|f(x)-L₂| and each of these pieces is less than epsilon, so we have 2epsilon<2epsilon, which is not possible: contradiction.

Fact 2: limit existence implies local boundedness Suppose c is a cluster point of a set A, and f is defined on A. Suppose also that the limit of f as x-->c for x in A exists. Then f is bounded on A near c: there is delta>0 and M>0 so that if x is in A intersect (c-delta,c+delta), then |f(x)|<M.
Proof: Take epsilon=1. There is some delta>0 so that if 0<|x-c|<delta and x is in A, then |f(x)-L|<1. But then ("unrolling" yet again!) for those x's, |f(x)|<|L|+1. So now we could take M to be |L|+1 if c is not in A and max(|f(c)|,|L|+1) if c is in A. This works.

Fact 3: functions locally inherit the signs of their limits Suppose c is a cluster point of a set A, and f is defined on A. Suppose also that the limit of f as x-->c for x in A exists. If L is the limit and L is positive, then f is positive for x in A near c: there is delta>0 so that if x is in A intersect (c-delta,c+delta) with x not equal to c, then f(x)>0.
Proof: Take epsilon=L. Since L is positive, this is a "legal" epsilon. Now if x is in A and 0<|x-c|<delta, we know |f(x)-L|<L. Unrolling yet again, L-L<f(x)<L+L for those x's. Since L-L=0, for those x's we see that f(x)>0.
Comment: Notice that if f(x)=37 for x not equal to 0 and f(0)=-5, then (taking c=0 and L=37) we see that f(c) need not be positive, even if the limit is positive. This is familiar example (I hope!) from calculus.

Essentially I am "converting" sequence/limit facts to function/limit facts. We could continue like this for a while. I will do a few more tomorrow.

3/31/2003

The lecture primarily dealt with two provisional definitions of continuity. They were "provisional" or temporary principally because I wanted to delay thinking about the intricacies of domain questions involving functions.

Provisional definition #1 A function f:R-->R is said to be continuous at c if for all sequences (x_n) satisfying lim(x_n)=c, then (f(x_n)) converges, and lim(f(x_n))=f(c).

Provisional definition #2 A function f:R-->R is said to be continuous at c if, given any epsilon>0, there is a delta>0 so that if |x-c|<delta, then |f(x)-f(c)|<epsilon.

#2 is frequently briefly presented in calculus 1. There are various interpretations of it.
The function as a machine If we think of f as a "black box" turning inputs, x, into outputs, f(x), then what #2 means is the following: an output error specification, epsilon, is given. We "want" the output to be within epsilon of f(c). Can we control the inputs by a small specification so that this occurs? The control over the inputs is measured by |x-c|<delta.

The function via its graph Consider the graph of y=f(x). If we draw horizontal lines y=f(c)+epsilon and y=f(c)-epsilon, then locally near (c,f(c)) the graph is trapped between the two horizontal lines. When we look for delta, we are asking if there is an interval centered around c, an interval of the form (c-delta,c+delta) so that the portion of the graph over the interval will lie within the horizontal strip specified.

Perhaps one of these "explanations" helps to comprehend PD#2.

We then proved an essential chunk of the course: PD#1 and PD#2 are logically equivalent. We had to prove two implications:
I If PD#1 is true, then PD#2 is true. II If PD#1 is true, then PD#2 is true.
Both of these will be proved using the proof by contradiction approach.

If PD#1 is true, then PD#2 is true.
Proof: We suppose that PD#1 is true, and that PD#2 is false. PD#2 false is the following statement:
there is epsilon>0 so that for all delta>0, there exists x with |x-c|<delta and |f(x)-f(c)|>=epsilon.
We used this to create a sequence. If n is in N, then x_n will be an x satisfying |x-c|<1/n and |f(x)-f(c)|>=epsilon. Such x_n's are guaranteed to exist using the "PD#2 false" statement. Now consider the sequence (x_n). Since |x_n-c|<1/n, we know c-(1/n)<x_n<c+(1/n). By the Squeeze Theorem, since both c+(1/n) and c-(1/n) --> c, the sequence (x_n) converges and its limit is c. But what about (f(x_n))? IF that sequence converges to f(c), then we would need to know that |f(x_n)-f(c)|<epsilon for n large. But exactly that statement is always false by the purple statement above. This is a contradiction to PD#1, which is supposed to be true. Therefore, PD#1 implies PD#2.

If PD#2 is true, then PD#1 is true.
Proof: Now we need to write what "PD#1" is false means. Well, there exists a sequence (x_n) which converges to c such that (f(x_n) does not converge to f(c). We discussed this last statement a bit. What does it, in turn, mean? It means: there exists an epsilon>0 so that for all K in N there exists n>=K with |f(x_n)-f(c)|>=epsilon. That is, there is no way (!) f(x_n) eventually gets and stays "epsilon-close" to f(c). Now to contradict PD#2: we will take the epsilon in PD#2 to be the epsilon guaranteed by the failure (!) of sequential convergence of (f(x_n). That is, we will use this epsilon as an output tolerance. Can there possibly be any input tolerance for this epsilon? Suppose a delta>0 is given. We know that there is N in N so that for n>=N, |x_n-c|<delta. Therefore the "infinite tail" after x_N satisfies this candidate for an input tolerance. But the failure of convergence says that there is some n in this infinite tail which has |f(x_n)-f(c)|>=epsilon. That means we have not controlled the output tolerance as we were supposed to! So we have contradicted PD#2, and we are done.

I find this pair of proofs logically adequate but somehow lacking in definiteness. I can only write or discuss this feeling approximately and awkwardly, but somehow, we contradict these complexly quantified statements in order to construct a "house of cards" which then, by design, collapses at the end. To me this seems, emotionally, somewhat unsatisfactory.

Example 1 f(x)=x². We saw using PD#1 that this f was continuous at every c in R. We should also check this with PD#2.

Example 2 f(x)=1 if x=0 and f(x)=0 otherwise. We saw using PD#1 that f was not continuous at 0. If c is not 0, then f is continuous at c. Here one can take delta=|c| (the delta will not depend on epsilon). Then f(x)=0 for |x-c|<delta, since x can't be 0 (for if x=0, then |x-c|=|c|<|c| which is false. So |f(x)-f(c)|=|0-0|=0<epsilon for any positive epsilon.

Example 3 f(x)=1 if x is rational and f(x)=0 if x is rational. This function is continuous nowhere. That is, if c is in R, we can find an "output tolerance" epsilon which can be satisfied by no input tolerance. Ms. Guthrie suggested that we use epsilon=1. The proof naturally divides into two cases.

c is rational. If c is rational, then f(c)=1. Suppose we try a proposed delta>0. Then the interval |x-c|<delta is naturally the open interval (c-delta,c+delta). Since the irrationals are dense, there must be an irrational number w in this interval. Then f(w)=0. Therefore |w-c|<delta but |f(w)-f(c)|=|0-1|=1 and this cannot be less than 1.
c is irrational. If c is irrational, then f(c)=0. Suppose we try a proposed delta>0. Then the interval |x-c|<delta is naturally the open interval (c-delta,c+delta). Since the rationals are dense, there must be an rational number w in this interval. Then f(w)=1. Therefore |w-c|<delta but |f(w)-f(c)|=|1-0|=1 and this cannot be less than 1.

We noted that we could have used any epsilon between 0 and 1 to make this reasoning work.

Example 4 f(x)=x if x is rational and f(x)=0 if x is irrational. This function is continuous only at 0. Indeed, we verify PD#2 at 0: given epsilon>0, take delta=epsilon. Then f(0)=0, and we must show: if |x-0|<epsilon, then |f(x)-f(0)|<epsilon. There are two cases: if x is rational, then f(x)=x, so |x-0|<epsilon certainly implies |f(x)-0|=|x-0|<epsilon; if x is irrational, then f(x)=0, so |x-0|<epsilon certainly implies |f(x)-0|=|0-0|<epsilon. Now we wanted to show that f is not continuous at c which was not 0. If we want to show that PD#2 is not true, we need to find one epsilon for which no delta will serve. The suggestion to take epsilon=|c| was made. I think this is inspired partially by example 3 and even example 2. I first looked at two specific c's to check the logic.
c=1 Here f(c)=1, and epsilon=1. If there is delta>:0 with |x-1|<delta with |f(x)-1|<1 necessarily, then we could take x irrational in the interval |x-1|<delta (possible since irrationals are dense). Then for this x, f(x)=0. And the statement |f(x)-1|<1 becomes |0-1|<1 which is false.
c=sqrt(2) This c is irrational, so f(c)=0. But we will try epsilon=sqrt(2). Now given delta>0 we want to find x satisfying |x-sqrt(2)|<delta and guaranteed not to have |f(x)-f(sqrt(2))|<sqrt(2). Here we should certainly take x to be rational. Then |f(x)-f(sqrt(2))|=|x|. But if |x-sqrt(2)|<delta why must |x|<sqrt(2) need to be false? In fact, it doesn't "need" to be false. We could take delta=1, and actually x=1. Then |1-sqrt(2)|<delta and |1|<sqrt(2). We've got to be a bit more clever in this case. Take x to be a rational number "to the right" of sqrt(2): that is, x should be a rational in the interval (sqrt(2),sqrt(2)+delta). Then |x|>sqrt(2) and |x-sqrt(2)|<delta. So we do have here what we need.
I'll try to go over this next time.

Textbook problems due Thursday: 3.7: 8, 9 and 4.1: 2, 12.

Où sont les neiges d'antan?
Usually translated as: "Where are the snows of yesteryear?" This mournful refrain of the 15^th century French poet, François Villon (1431-1463?) was correctly identified by Mr. LeDuc.

3/27/2003

Absolutely convergent series can be rearranged without changing their "sums".
Theorem Suppose that sum_j=1^infinitya_j converges absolutely. If sum_j=1^infinitya_j converges with sum L, then any rearrangement of the series, sum_j=1^infinitya_f(j) will converge and its sum will be L.
Proof: (A proof of this result appears in section 9.1.) Since sum_j=1^infinitya_j converges with sum L, if we define x_n=sum_j=1ⁿa_j we know:
Given epsilon>0, there is K(epsilon) in N so that for n>=K(epsilon), |x_n-L|<epsilon. Now let J(epsilon) be equal to the maximum of the numbers f(1),f(2),...,f(K(epsilon)). If we define the partial sums for the "rearranged" series to be y_n=sum_j=1ⁿa_f(j) then when n is at least J(epsilon), every y_n has all the "pieces" of x_K(epsilon). So we could "separate" the y_n sum into sum_{f(j) is one of [1,...,K(epsilon)]}a_f(j)+sum_{the other terms}a_f(j). Now consider |y_n-L|. We can use the triangle inequality to estimate this:
|y_n-L|<=|sum_{f(j) is one of [1,...,K(epsilon)]}a_f(j)-L|+|sum_{the other terms}a_f(j)|. The first term is less than epsilon because of the specification of K(epsilon). As for the second term, a different strategy is needed.
|sum_{the other terms}a_f(j)|<=(Triangle!)sum_{the other terms}|a_f(j)|. Note that each of these |a_f(j)| is "far out": at least more than K(epsilon).

Please note that we have not used part of the hypothesis yet, that the series converges absolutely. So let's use it now. We can label the partial sums of this series z_n=sum_j=1ⁿ|a_j|. Since (z_n) converges, it must be a Cauchy sequence (here is that fact again!). Therefore, given epsilon>0, there is W(epsilon) in N so that for n>m>W(epsilon), |z_n-z_m|<epsilon. But each z_stuff is a sum of a bigger chunk of a series with non-negative terms, so that we now know sum_j=m+1^M|a_j|<epsilon. In fact, every finite "chunk" of this series summed up (if you have indices at least W(epsilon)) will be less than epsilon. In fact, any finite sum of such terms can be "embedded" into a (maybe larger) sum of successive terms, so that any such finite sum is less than epsilon, provided that the indices are all at least W(epsilon). This gives us a way to control the other terms, the ones in purple above.
In fact, let's take the maximum of both W(epsilon) and K(epsilon) for the number needed. Then the sum in purple above will be less than epsilon, and we have estimated |y_n-L| by 2epsilon, which is good enough to me (you can rewrite the proof starting with epsilon/2 if you wish to end up with epsilon).

This theorem helps those new to series from making mistakes. That's because many if not most of the series these people encounter are power series, sum_j=1^infinitya_jx^j. Power series have intervals of convergence. Outside these intervals, the series diverges. Inside these intervals, the series converges absolutely, so any sort of rearrangement leaves convergence and the specific sum unchanged. However, Fourier series and series derived from wavelets typically do not have such behavior, and so more care is needed to deal with them. Both Fourier series and especially series derived from wavelets are used in "real life". In particular, I believe that a few years ago the FBI fingerprint files were converted to storage based on wavelets.

I further challenged the class to something very specific. If one believes in the Cauchy criterion, then it should be possible, perhaps even easily possible, to create a list of infinitely many positive numbers so that the sum of any finite subset of them is less than .0001 (1/(10,000)). After some effort, we did create such a "list": the n^th number (for n in N) would be, perhaps, 1/[(10,001)2ⁿ]. Since we know that the sum of 1/2ⁿ as n goes from 1 to infinity is 1, our example is complete.

Onward to CALCULUS!

What do we know about continuity, and what can it be used for?

Most nice functions (functions defined by simple formulas) are continuous, or, at least, are continuous most places. sine, cosine, polynomials, rational functions, logs, exponentials. You can add, subtract, etc. and compose such functions to get further continuous functions.

So graphs of such functions are unbroken: important in curve sketching and root finding.

And areas "underneath" such curves are defined and maybe computable or approximable (the definite integral "exists").

If a function models some sort of physical "process", then we can think of x as input and f(x) as output. f's continuity is roughly reflected in the fact that small changes in input (x) should imply small changes in output (f(x)).

Here is a candidate for a Math 311 definition of continuity. I warn you that it is different from what's at the beginning of chapter 4. So please let us agree that it is only a preliminary definition of continuity. Also, in order to avoid annoying technicalities at the beginning, I will assume that the function f is defined everywhere and not worry for now about domain questions. We will need to modify this definition a bit later.
Preliminary definition A function f:R-->R is said to be continuous at c if for all sequences (x_n) satisfying lim(x_n)=c, then (f(x_n) converges, and lim(f(x_n)=f(c).

Example 1 Suppose f(x)=x². Then f is continuous at every c in R. We can prove this: if (x_n) is a sequence which converges to c, then (theorem about limits and arithmetic) ((x_n)²) must also converge, and its limit will be c². So we have shown that (f(x_n) converges, and lim(f(x_n)=f(c).
Example sqrt(2) "Jump," said Toad. "No, no, I won't," said Frog.
Example 2 We define this f piecewise. f(x) will be 1 if x=0 and will be 0 if x is not equal to 0. Note that the quantifier specifying the sequences in the preliminary definition above is universal: "all". That means to show that f is not continuous at 0, it is sufficient to exhibit one sequence (x_n) which converges to 0 for which the sequence (f(x_n) does not converge to f(0)=1. So the suggestion was take x_n=1/n. Certainly this (x_n) converges to 0, but f(x_n)=0, and the sequence (0) does not converge to 1.

My next "job" will be to contrast this preliminary definition of continuity with the one given in the text. I will also need to deal with the complexities which occur when domain problems are included.
Please begin reading chapter 4. If you know a bit about probability, this paper about random harmonic series may be interesting to you.

3/26/2003

We discussed infinite series. This material is somewhat contained in sections 3.7 and 9.1 of the text. This is basic material about infinite series, but we will also cover material which is not in the text.

Definition The infinite series sum_j=1^infinitya_j converges if the sequence of partial sums x_n=sum_j=1ⁿa_j converges. If L is the limit of (x_n), L is called the sum of the infinite series. You will see that it is very important to think only about this definition, and not to get "distracted" by the idea of "adding up infinitely many numbers". This is rather different.

Theorem If sum_j=1^infinitya_j converges, then the sequence (a_n) itself must converge to 0.
Comment: in some calculus books this is called the n^th term test.
Proof: We know that (x_n) converges to L. Then also (x_n-1) converges (only tails matter!) to the same limit, L. But the difference of convergent series converges to the difference of the limits: (x_n-x_n-1) converges to L-L=0. Since x_n=sum_j=1ⁿc_j and x_n-1=sum_j=1^n-1c_j, the difference x_n-x_n-1=a_n, and we have the conclusion we wanted.

The converse is false. That is, we can get series sum_j=1^infinitya_j which do not converge, but where lim(a_j)=0. The best-known example is probably the harmonic series: sum_j=1^infinity1/j, which diverges, but whose individual terms -->0.

But signs (+/-) can matter. The alternating harmonic series, sum_j=1^infinity(-1)^j+1/j, converges. This is a simple consequence of the alternating series test. So there's another definition.

Definition The series sum_j=1^infinitya_j converges absolutely if sum_j=1^infinity|a_j| converges.
Comment: Absolute convergence is a little bit easier to study, because when all of the terms are non-negative, the sequence of partial sums is monotone increasing, and the "theory" of such sequences is fairly simple (indeed, "dichotomous" -- either one thing [partial sums bounded and convergent] or another [partial sums unbounded and divergent]).

Theorem If sum_j=1^infinitya_j converges absolutely, then it must converge.
Proof: Let (x_n) be the sequence of partial sums of sum_j=1^infinitya_j, and let (y_n) be the sequence of partial sums of sum_j=1^infinity|a_j|.
The logic of this proof is particularly interesting. We will use strongly the equivalence of convergence of a sequence and the Cauchy criterion: we proved an "if and only if" statement and we will use both implications!
Since (x_n) converges, (x_n) is Cauchy. Therefore, given epsilon>0, there is Z(epsilon) in N so that if n>m>=Z(epsilon), |y_n-y_m|<epsilon. Now there is a trick. Here we know what the y_stuff is: each is a partial sum. So |y_n-y_m|=sum_j=m+1ⁿ|a_j|. The triangle inequality states that |sum_j=m+1ⁿa_j|<= sum_j=m+1ⁿ|a_j|. (Note how the absolute value signs have changed!). But |x_n-xm|=|sum_j=m+1ⁿa_j|, so that |x_n-x_m|<=|y_n-y_m|<epsilon. This means that the sequence (x_n) satisfies the Cauchy criterion (!) and therefore it must also converge. So we are done.

Many examples in calc 2 use this result. For instance, since we know that sum_j=1^infinity1/2^j converges, we therefore know that sum_j=1^infinity(sin(j^3+5j))/2^j converges, since |sin(j^3+5j)|<=1.

Of course, convergence does not imply absolute convergence. The alternating harmonic series, mentioned above, shows this. And indeed we have the following additional definition.

Definition The infinite series sum_j=1^infinitya_j is conditionally convergent if it converges but sum_j=1^infinity|a_j| does not.

Example: The alternating harmonic series converges conditionally.

Let us "discuss" such conditionally convergent series. For the remainder of this lecture, sum_j=1^infinitya_j will be a conditionally convergent series. So

sum_j=1^infinitya_j converges.
sum_j=1^infinity|a_j| diverges.

We can divide up the positive integers, N, into two disjoint subsets, S and T. Here S will denote those j's for which a_j is non-negative, and T is those j's with a_j<0. The sets S and T are disjoint, of course. But how big are they? We analyzed this slowly. I first asked if, say, T could be empty. Well, if T were empty, then all the elements of the series would be non-negative, and |a_j|=a_j. Then it would be impossible for the assumptions numbered above to be true. If S were empty, then the absolute values would all be (-1) times the a_j. The series would be constant multiples of each other. So it would be impossible for one of the series to converge while the other diverged. What if, say, T were finite? Then only finitely many terms would be negative. And since "only tails matter", eventually the series |a_j| and a_j would coincide. So again, both the series would have to {con|di}verge together. And similarly, T would have to be infinite also.

Now we know that N is "divided" into two infinite sets, and the conditionally convergent series has infinitely many positive and infinitely many negative terms. In fact, let's look at say the positive terms. If the sum of the positive terms converges, then we could subtract (cancel out!) the positive terms from the convergent series sum_j=1^infinitya_j. Since the difference of two convergent series must also converge (theorems on arithmetic and limits) we then see that the series of negative terms must converge. But then since the series of |a_j|'s is the difference of the positive terms and the negative terms (not profound: it is always true that |x|=x for x>=0 and |x|=-x for x<0), it would follow that the sum of the absolute values would converge! But this is a contradiction. So if you can follow this chain of hypothetical reasoning we have seen that the sum of the positive terms must be divergent (and that means that the partial sums must be unbounded above). Similar reasoning establishes that the sum of the negative terms must be unbounded below.

Now here is a wonderful result, which is totally unintuitive to me. Ooops: just as I did in class, I forgot to state a definition first.

Definition Suppose sum_j=1^infinitya_j is a series. Then a rearrangement of this series is gotten by specifying a bijection f:N-->N to obtain an infinite series sum_j=1^infinitya_f(j).
Comment: the bijection establishes that every term of a_j appears exactly once in the rearranged series.

Riemann's Rearrangement Theorem Suppose sum_j=1^infinitya_j converges conditionally. If c is a real number, then there exists a rearrangement so that sum_j=1^infinitya_f(j) converges, and the limit of the rearranged series is c.
Comment: this really means that commutativity and associativity are FALSE for general "infinite sums". So they are not really sums, but rather some very very special kinds of limits.
Proof: Since sum_j=1^infinitya_j converges conditionally, we know several facts:

lim(a_j)=0. Therefore, given epsilon>0, there are only a finite number of a_j's with |a_j|>=epsilon. In fact, there are at most N(epsilon) such, if N(epsilon) is the integer guaranteed by the definition of convergence.
The sum of the positive numbers in the series is unbounded above: for any number, we can find chunks of the positive elements of the series which add up to something greater than the number.
The sum of the negative numbers in the series is unbounded below: for any number, we can find chunks of the negative elements of the series which add up to something less than the number.

These are the ingredients which are needed for the proof. First just start with a₁, and assume for simplicity that a₁<c. Now add on terms from the "positive" subseries, the one associated with S. Since the partial sums of this subseries are unbounded above, we can add on "enough" to make a partial sum larger than c. We need to reverse things, so add on terms from the subseries associated with T until the partial sum is less than c. Reverse again, using terms from S. Etc. Each time we reverse, we use up at least one term from the subseries. Eventually, of course (by the time we have done N(epsilon) up and downs, we have used all the terms of the series which are larger than epsilon in absolute value. So when we go "past" c in either direction, we will stop after at most one more term, and so all of these partial sums are N(epsilon up and downs will lie in the interval [c-epsilon,c+epsilon]. But that is exactly verifying the definition of convergence for the sequence of partial sums of this rearranged series!

I could not "believe" this result when I first saw it. I needed to think about it quite a bit. Notice that if the rearrangement bijection, f, only changes finitely many of the integers (that is, if there is N so that f(n)=n for n>=N), then the {con|di}vergence of the rearranged series and its sum do not change, because the partial sums of the original series and the rearranged series are identical after the first N such. Again, "only tails matter". The preceding theorem shows that we shouldn't think of infinite series as sums. They are just a different sort of creature. As I mentioned in class, if we called sum_j=1^infinitya_j something totally new and ludicrous, like GLUMP(a_j), we then would have less "intuition" or preconceptions (prejudices!) to get rid of. If someone told us that GLUMP(a_j) and GLUMP(a_f(j)) were not necessarily the same, well, then, I guess they would not necessarily be the same: not a tragedy.

I'll do just a bit more on series tomorrow before starting chapter 4.

3/24/2003

Sequences which satisfy the Cauchy criterion and convergent sequences are the same sequences! I tried to give "real" examples illustrating the usefulness of this "coincidence".

I began by restating the definition of convergence, and the definition of the Cauchy criterion. Of course, the Cauchy criterion does not include knowledge of the limit. But we can state the following: since we know that given epsilon>0, there is M(epsilon) in N so that when n and m are at least M(epsilon), then |x_n-x_m|<epsilon, we do know that all x_n's with n>=M(epsilon) satisfy |x_n-x_M(epsilon)|<epsilon. But that means that those x_n's lie inside the interval (x_M(epsilon)-epsilon,x_M(epsilon)+epsilon). The infinite tail of the sequence is in that open interval. Then the work we have done on limits and inequalities will tell us that the limit must be in the closed interval [x_M(epsilon)-epsilon,x_M(epsilon)+epsilon].

With that behind us, I introduced my example: an infinite series. So I want to analyze the sum as n goes from 1 to infinity of +/-(1/nⁿ) where the sign is given in the following weird way: one + sign, then two - signs, then three + signs, then four - signs, etc. So the series begins:
1-1/4-1/27+1/256+1/3125+1/46656 etc.
I ask the following questions:

Does this series converge?
If it does, what can one say about its sum?

I think it does converge. And the real reason I think it converges is that the terms -->0 so fast.

Much of what we're going to do here could also be done in a second semester calc course, but I want to do it from the viewpoint of Math 311. The basic idea is to compare this series with a more familiar series. And the most familiar series are probably geometric series.

Consider the series 1+1/2+1/4+1/8+...+1/2^n-1+... where a_n=1/2^n-1. Does this series converge? Well, we need to reach back to second semester calculus. We say that the series converges if the sequence of partial sums converges. Here the sequence of partial sums is defined by x_n=sum_j=1ⁿ(1/2^j-1). The nicest thing about geometric series is that there are simple formulas for their partial sums. Here we can multiply x_n by 1/2 and subtract from the defining sum for x_n. Lots of things cancel out, and we have x_n-(1/2)x_n=1-1/2ⁿ, so that x_n=2-1/2^n-1.

Now let's go back to 311. The sequence (x_n) is obtained by adding positive numbers. So it is a monotone increasing sequence. For such sequences, there is a dichotomy (the online dictionary says that this means "a division into two, esp. a sharply defined one.") Either a monotone increasing sequence is bounded and converges, or it is unbounded and diverges. But we have a nice formula for x_n, so we know that the terms are all bounded above by 2. And, indeed, since we know that 1/2ⁿ-->0 we even know that 2 is the limit of this sequence.

The Cauchy criterion can be applied to this sequence. That is, given epsilon>0, there is M(epsilon) in N so that for n and m at least M(epsilon), |x_n-x_M(epsilon)|<epsilon. Suppose m>n. We will actually be able to "compute" M(epsilon) in this case. Here we know that x_n=sum_j=1ⁿ(1/2^j-1) and x_m=sum_j=1^m(1/2^j-1) so that x_m-x_n=sum_j=n+1^m(1/2^j-1). We can multiply by 1/2 and subtract, just as we did above. The result gets us a nice formula: x_m-x_n=(1/2ⁿ)(1-(1/2^m-n) (remember that m>n). Therefore we can simplify (get rid of one variable on the right-hand side, and give up strict equality):
|x_m-x_n|<=(1/2ⁿ). Since (x_n) converges, it must be a Cauchy sequence. What is M(.001)? Apparently we need (1/2ⁿ)<(1/1,000), which will occur when n=10. So M(.001) is 10 (or, actually, any integer larger than 10).

Now let's go back to the original weird sequence. Let me call b_n the n^th term, which is 1/nⁿ with a weird sign. And let me call y_n, the n^th partial sum of the b_n's. What can we say about the sequence (y_n)? It is certainly not monotone, because of the + and - signs distributed among the b_n's. But can we compare the two sequences? Since |b_n|=1/nⁿ, and we know that 1/nⁿ<=1/2^n-1 (we verified this for n=1 and n=2, and I bet it is true for all n>=2), we know that |y_m-y_n|=|sum_j=n+1^m{weird sign}1/j^j|, and the triangle inequality breaks this up to <=sum_j=n+1^m1/j^j<= sum_j=n+1^m1/2^j-1=|x_m-x_n|. But now if n and m are at least M(epsilon) for the (x_n) sequence, we know |x_m-x_n|<epsilon. This implies that |y_m-y_n|<epsilon also for those n and m's. Therefore, (y_n) is also a Cauchy sequence. And further therefore (!), we now know that (y_n) must converge. Even more, we know that y₁₁, which we could compute, would be within .001 of the sum of the whole series. So this strategy allows us to conclude that a certain infinite series converges, and to get a good approximation of the sum of the series.

Now let me write down some of the theory we have actually in effect proved:

Theorem Suppose (x_n) and (y_n) are sequences. If we know that there is some positive constant C so |y_m-y_n|<=C|x_m-x_n| and if we know that (x_n) converges, then (y_n) converges.
"Proof": We essentially verified this for C=1 above. Since (x_n) converges, it satisfies the Cauchy criterion, and then if we use M(epsilon/C) for (x_n) we'll get the Cauchy criterion for (y_n).

Theorem If an infinite series converges absolutely, then it converges.
"Proof": This statement should be familiar from second semester calculus. Here the y_n's will be the sum of the absolute values of the series, and C=1 in the previous result.

I used some Maple instructions to compute the sum of the weird series I started out with. Here are the Maple instructions, and the result.
t:=(a,b)->evalf(sum(1/j^j,j=a..b)); Defines a function to add up a chunk of 1/j^j from a to b.
t(1,1)-t(2,3)+t(4,6)-t(7,10)+t(11,15)-t(16,21)+t(22,28); 0.7172093698
And since Maple told me that 2^-28 is less than 10^-8 this result is accurate to at least 8 decimal places.

I wanted to do another example, but I sort of ran out of time. I wanted to show an "iterative" way to get a root of a polynomial. This is discussed in section 3.5, pages 84 and 85.

3/13/2003

We began the lecture by contemplating a page of a final exam I gave in 1996 to a second semester calculus class (o.k.: now I'll admit it. It was a small class, an honors second semester calc course, and the students were quite good). The worksheet had four true/false statements. We analyzed those statements.

For a): the sequence (a_k) defined by a_k=(-1)^k has the property that (|a_k|) converges, but (a_k) does not. So the assertion in a) is false.

For b): this is a more subtle question. If L is not zero, and if (a_k) and (b_k) both converge to L, then ("eventually", which means, there exists K in N so that a_k is not 0 for k>=K) the sequence (a_k/b_k) converges to 1. If L=0, then (using the examples 1/k and 1/k²) we can get sequences (a_k/b_k) which maybe diverge or which converge to 0. We can even get a sequence (use 1/k and 37/k) which converges to 37. So the statement as given is false.

For c): the statement is true. Here I tried to give a proof. Since (a_k) converges (say to L) we know that given epsilon>0, there is K(epsilon) in N so that for k>=K(epsilon), |a_k-L|<epsilon. Now we need to create M(epsilon) so that for n>=M(epsilon), |(a_k+1-a_k)-0|<epsilon. Here we can take M(epsilon)=K(epsilon/2) (in class we used L(epsilon/100) which also works but this answer is more traditional). Then if k>=M(epsilon), we know that |a_k+1-L|<epsilon/2 and |a_k-L|<epsilon/2. Now use the triangle inequality: |(a_k+1-a_k)-0|<=|(a_k+1-L)+(L-a_k)|<=|a_k+1-L|+|L-a_k|<epsilon/2+epsilon/2=epsilon. So we are done: we have used the K(epsilon) "machine" to "build" an M(epsilon) "machine".

For d): this is probably the most subtle question. We have in fact addressed it before. If we consider the sequence a_k=sum_j=1^k1/j then the logic behind the Integral Test of calculus allows us to underestimate a_k by ln(k) or ln(k+1) (see the diary entry for 3/5/2003). So in fact this sequence is unbounded. Mr. Hedberg suggested a sequence from a homework problem: a_k=sqrt(k). The problem showed that (a_k+1-a_k) converged to 0 when a_k=sqrt(k). This example is more in the spirit of Math 311: the integral test is way more advanced than what we can use now.

I altered the example. I asked: suppose that you know that both (a_k+1-a_k) and (a_k+2-a_k) converge to 0. Can you conclude that (a_k) converges? And, indeed, the same example (a_k=sqrt(k)) shows that the answer is "No." Even more, we can add any finite number of such requirements and still the answer will be "No". So getting any criterion for convergence which doesn't seem to depend either on already knowing a limit or on some special structure (such as monotonicity) seems difficult, which makes the Cauchy criterion yet more amazing.

I will call this CC for Cauchy criterion:
A sequence (x_n) has the following property: given epsilon>0, there is W(epsilon) in N so that if both n and m are >=W(epsilon), then |x_n-x_m|<epsilon.

We will further contrast it with the following, which I will temporarily label V (it is the definition of convergence):
A sequence (x_n) has the following property: there is a number L so that for any epsilon>0, there is K(epsilon) in N so that if n>=K(epsilon), then |x_n-L|<epsilon.

The purpose of the remainder of this class is to verify that CC and V are equivalent. CC is used a great deal in practice and in theory, because it does not need a specific L to be designated, but guarantees convergence. I'll try to show in classes after vacation how CC is used. First, as always in Math 311, the proofs:

Theorem V implies CC.
Proof: If we have K(epsilon), I need to show you how to "construct" W(epsilon). And this implication is fairly easy: W(epsilon)=K(epsilon/2). Why does everything work? Well, if both n and m are at least the specified W(epsilon), then: |x_n-x_m|=|x_n+0-x_m|=|x_n-L+L-x_m|<=|x_n-L|+|L-x_m| using the triangle inequality. But since n and m are both at least K(epsilon/2) each of the terms |x_n-L| and |L-x_m| are less than epsilon/2, and the sum is certainly less than epsilon.

That wasn't hard. Now for the other way. We will sneak up (?) on the result.

Proposition If (x_n) has CC, then the sequence is bounded.
Proof: Well take some explicit epsilon, say, epsilon=40. Then W(40) is some fixed positive integer, and every x_n with n>=K(40) satisfies |x_n-x_K(40)|<40. But we can "unroll" this inequality so that x_K(40)-40<x_n<x_K(40)+40. Therefore x_K(40)+40 is an upper bound for the infinite tail of the series which begins with the K(40)^th term. And I can "take care" of the earlier ones, only a finite number (!!) with a simple (?) "max". So, here it is: I claim that the max of this finite set: {x_K(40)+40,x₁,x₂,x₃,...,x_K(40)-1} is an upper bound of all of the elements of the sequence. And a good lower bound is the minimum of the set {x_K(40)-40,x₁,x₂,x₃,...,x_K(40)-1}.
I think we are done.

Proposition If (x_n) has CC, then (x_n) has some convergent subsequence.
Proof: Since we know the sequence must be bounded by the previous result, then the Bolzano-Weierstrass Theorem applies. There must be a convergent subsequence.

Proposition If (x_n) has CC and if (x_n) has some convergent subsequence, then (x_n) converges.
Comment And once we prove this, we will be done with "If CC then V."
Proof: What do we know? We know two things:

(x_n) has CC: so given epsilon>0, there is W(epsilon) in N so that if n and m are at least W(epsilon), |x_n-x_m|<epsilon.
(x_n) has a convergent subsequence: let's suppose its limit is L. So for each k in N, there is an n_k in N with n_k+1>n_k for all k (so always n_k>=k) so that if epsilon>0 is given, then there is M(epsilon) in N so that for k>=M(epsilon), |x_{n_k}-L|<epsilon.

From all this we need to verify:

If epsilon>0 is given, there is K(epsilon) in N so that if n>K(epsilon), then |x_n-L|<epsilon.

We discussed this at length in class. I tried to show during this discussion that these proofs (at least when I do them!) don't come out perfectly and immediately. I don't think I can reproduce what I consider the instructive dynamics of this discussion. Instead, all I can do right now is freeze us into a possible answer for creating K(epsilon): K(epsilon) should be equal to the following integer: n_{max(W(epsilon/2),M(epsilon/2))}. Now we need to verify this claim:

Take n>=K(epsilon) as defined above. Look at |x_n-L|=|x_n-x_{n_{max(W(epsilon/2),M(epsilon/2))}}+x_{n_{max(W(epsilon/2),M(epsilon/2))}}-L|<|x_n-x_{n_{max(W(epsilon/2),M(epsilon/2))}}|+|x_{n_{max(W(epsilon/2),M(epsilon/2))}}-L|. Now look at each piece.

|x_n-x_{n_{max(W(epsilon/2),M(epsilon/2))}}| Since (x_n) is a Cauchy sequence, and n>=n_{max(W(epsilon/2),M(epsilon/2))}, we know that n>=W(epsilon/2) (we just used the increasing nature of the subsequence numbering!) and also n_{max(W(epsilon/2),M(epsilon/2))}>=W(epsilon/2). But two elements of the sequence which are "W(epsilon/2)" far along must differ by less than epsilon/2: that is exactly how the W-machine implements the Cauchy criterion.
|x_{n_{max(W(epsilon/2),M(epsilon/2))}}-L| The sequence element x_{n_{max(W(epsilon/2),M(epsilon/2))}} is a member of the subsequence, exactly because it is x_{n_something!}. And look, it is at least as "far along" the subsequence as M(epsilon/2). But the M measures how close the subsequential elements are to the limit, and for subsequential elements at least as far along as M(epsilon/2), the difference will L is less than epsilon/2.

So each piece is less than epsilon/2, and the sum is less than epsilon. And we are done.

I tried diligently and perhaps (almost certainly!) unsuccessfully to "motivate" this elaborate strategy in class. It is certainly complicated. We have now proved a major theorem.

Theorem A sequence converges if and only if the sequence satisfies the Cauchy condition.

We will use this a lot.

3/12/2003

I began by advising students that I would "cover" sections 3.1-3.5 of chapter 3 and requesting that students read these sections. I hope that I will finish this material and that soon after vacation I will begin chapter 4. From now on, many of the results (theorems, examples, etc.) will have increasingly familiar statements. Most students should recall versions of the results from calculus. But in Math 311 I will be principally interested, almost solely interested, in the proofs of these statements. For example, the "Ratio Test" in the text:
Suppose (x_n) is a sequence of positive numbers, and that lim(x_n+1/x_n) exists and is a number L. If $L<1, then lim(x_n) exists and is 0.
I won't prove this but it is a part of what students should know (Theorem 3.2.11 of the text). One special case is interesting, however, since the text's verification of the result uses several of the concepts we've been looking at recently.

Example If 0<b<1, then the sequence defined by x_n=bⁿ converges, and its limit is 0.
This result should be familiar from calculus. A short proof using mathematical induction verifies that the sequence (x_n) is monotone decreasing. Since all of the elements of the sequence are positive, the sequence is bounded below by 0. But then by our previous work on monotone sequences, this sequence converges. Let's call the limit of the sequence, x. What is x? Since all of the x_n's are non-negative, x must be also. But here is a "trick" to verify what we suspect x to be. Let y_n=b²ⁿ. Then (y_n) is a subsequence of (x_n): y_n=x_2n. Therefore by what we learned about subsequences last time, (y_n) also converges, and its limit is x, the same limit as (x_n). But y_n=(x_n)², so using our results on the arithmetic of convergent sequences, we know that the limit of (y_n) is the square of the limit of (x_n). Since they both have x as a limit, we know that x²=x. But all of the x_n's are less than 1, and the sequence is decreasing. Therefore of the two roots of x²=x (0 and 1) we know the limit can't be 1. It must be 0.

Here is one of the most important results of the course.
Bolzano-Weierstrass Theorem Suppose (x_n) is a sequence all of whose terms are in [a,b]. Then (x_n) has a convergent subsequence.
Proof: The proof I tried to give of this result is a bit different from what is in the book. But I wanted to give a proof based on bisection. The structure of the proof involves creating a sequence of "things" using an inductive procedure.
Base case My initial interval is [a,b]. My initial set, S₀, is the natural numbers, N. Now I will try to create a new, smaller interval, and a new (sub)set of N. Here is how: Define S₀^L to be the collection of integers n in S₀ for which x_n is in the left half of [a,b] (that is, x_n is in [a,(a+b)/2]). And define S₀^R to be the collection of integers n in S₀ for which x_n is in the right half of [a,b] (that is, x_n is in [(a+b)/2,b]). Notice that S₀ is the union of S₀^L and S₀^R. (There may be n's in both sets, if x_n is actually equal to (a+b)/2: this will not affect the discussion.) Since the infinite set S₀ is a union of two sets, at least one of those sets must be infinite. (If they are both finite, then their union would be a finite set, which would be a contradiction.) It is also true that both of the sets might be infinite. In any case, I choose S₁ to be one of these two sets, and my choice is made so that S₁ is infinite. Also I choose I₁ to be the subinterval corresponding to the choice of S₁ (if S₀^L is S₁, then I₁ will be the left half interval, and if S₀^R is S₁, then I₁ will be the right half interval). Now we have S₁ and I₁ with the length of I₁ equal to (b-a)/2¹.

Inductive step Here we are given the following elaborate ingredients: a subinterval I_n of [a,b] with the length of I_n=(b-a)/2ⁿ, and an infinite subset S_n of N so that if k is in S_n, then x_k is in I_n. (There will be lots of indices to keep track of in this proof!) Now I divide I_n into two equal halves, the left and right halves. I also create S_n^L and S_n: these are two subsets of S_n defined by: k is in S_n^L if x_k is in the left half of I_n and k is in S_n^R if x_k is in the right half of I_nn is the union of these two subsets, at least one of them must be infinite. So S_n+1 will be one of the subsets (either the "left" or the "right" one) which is infinite. And I_n+1 will be the associated half interval. So I have done the inductive "step".

What have we created? We have several "things":

A sequence of intervals, the n^th of which is I_n=[a_n,b_n], so that the length of I_n is (b-a)/2ⁿ, and so that I_n+1 is a subset of I_n. This is exactly the hypotheses of the Nested Interval Theorem. We are "guaranteed" to have exactly one x which is in all of the I_n's.
We have a sequence of S_n's. Each of these is an infinite subset of N. Each one is "smaller" than the one before: well, exactly what I mean is that for each n in N, S_n+1 is a subset of S_n.

Now I will "construct" a convergent subsequence of (x_n). Since each S_n is a nonempty subset of N, each has a least element ("well-ordering"). I'd like the n^th element of the subsequence to be one which is in S_n (so the least element would work) and one which is greater than the n-1^st element. I can do both of these (especially the latter one) because I know that S_n is infinite, and infinite subsets of N obey a version of the Archimedean property: there's always an element of such a subset which is larger than any given real number. Notice that this n^th element of the subsequence must be in I_n since its index is in S_n. Thus, since x is also in I_n, we know that the distance between x and this element is less than or equal to the length of I_n, which is (b-a)/2ⁿ. Whew! I think I have done enough: created a subsequence, shown that its distance to x-->0, and so the subsequence converges. I think I proved the theorem.

Discussion I need to convince you of the claim that this is an important theorem. But I also mention that lots of people don't "like" it. They don't like it because it is non-constructive, it is "ineffective": by that I mean that no mechanism is shown to create an explicit subsequence which converges. Somehow one "knows" that there are infinitely many points in one half of each interval, etc. etc. I don't think human beings are too comfortable contemplating infinity so directly. Humans tend to like things they can verify a step at a time, and jumping to "Hey, half the interval has infinitely many points" is quite a big jump. It is amusing, though, to look at what follows: we will soon be able to "construct" sequences that surely converge, and this will be, at least theoretically, a consequence of the Bolzano-Weierstrass Theorem. The theorem applies to any sequence which is bounded: no other condition is needed. Here is the theorem again, stated simply:

A bounded sequence has a convergent subsequence.
Please note that K. Weierstrass is a professional ancestor of the instructor of this course. The instructor's professional ancestors can be inspected.

In the fashion of Math 311, yet another definition: Definition A sequence (x_n) is a Cauchy sequence if for all epsilon>0, there is K(epsilon) in N so that for n and m >=K(epsilon), |x_m-x_n|<epsilon.

There are lots of quantifiers in this definition, which is somewhat reminiscent of the definition of convergence. What does the definition "say"? Well, let's try to particularize it. Suppose epsilon=46. And just suppose that K(45)=400,000. Then, certainly, if n>=400,000, |x_400,000-x_n|<46. We can "unroll" this inequality. It means all the elements of the sequence after the first 399,999 must be in the interval (x_400,000-46,x_400,000+46). So in fact the sequence must be bounded, because the "infinite tail" of the sequence is caught in this interval and there are only finitely many (hey: 399,999 is just a finite number!) outside the interval. So we have almost proved that
1 A Cauchy sequence is bounded.
Problem 11 of the first exam more or less proves the following:
2 If a subsequence of a Cauchy sequence converges, then the whole sequence converges.
That's because problem 11 says that the convergent sequence sort of "drags" along the other sequence, since they have to stay close. And the elements of a Cauchy sequence sort of have to stay close to each other. Therefore we almost have:
3 Any Cauchy sequence converges.
This is because any Cauchy sequence is bounded, so Bolzano-Weierstrass applies, so there's a convergent subsequence, so the sequence converges. So ... and it isn't too hard to verify that convergent sequences are Cauchy. Sigh. What we have is a necessary and sufficient description of a convergent sequence without mentioning the limit of the sequence. And since most of the sequences I know sort of naturally "occur" without specifying limits, this description is very useful. It is an "internal" rather than "external" description of convergence. All this, maybe, will be made clear tomorrow.

3/10/2003

Since I left my notes at home, I had to "wing it" a bit, and spontaneous talk about the material. I went over 3.3, monotone sequences, more abstractly. In particular, I proved the following result:

Theorem Suppose (x_n) is a monotone increasing sequence. Then: (x_n) converges if and only if the set {x_n : n in N} is bounded above. If (x_n) converges, then its limit is the sup of the set {x_n : n in N}.
Comment: A similar result is true for decreasing sequences, with the word "above" replaced by "below" and "sup" replaced by "inf".
Proof: If (x_n) converges, then we saw already that it must be bounded. Let me try the converse. Suppose (x_n) is bounded. We must know here that (x_n) is monotone for the following to work, though!
If {x_n : n in N} is bounded above, then we can apply the Completeness Axiom (the set is non-empty and is bounded above). Let T be the sup of the set. Now given epsilon>0, there must be an element of the set between T and T-epsilon. But elements of the set are sequence elements. That is, there is x_t so that T-epsilon<x_t<=T. Now consider n>=t. We know that (x_n) is increasing, so if n>=t, then xt<=x_n. But T is the sup of all the x_n's, so x_n<=T. Thus T-epsilon<x_n<=T. Therefore for n>=t, |x_n-T|<epsilon. We have just verified the definition of convergence (!) with K(epsilon)=t. Very neat, somewhat tricky.

As I remarked last time, this class of sequences is interesting because of the large number of applications involving them. It is quite easy to create examples. Here, let me do it "spontaneously" (something like this arose in conversations with Mr. Oleynick after class): I know from "elementary" considerations that the following functions are increasing (at least for positive numbers):

x³
x+7
5x
x^1/4

So if I compose them I will still get an increasing function. Consider the sequence (x_n) defined by x₁=1 and x_n+1=(5x_n³+7)^1/4. Then x₂ is approximately 1.86. By an inductive argument, (x_n) is increasing. Also (x_n) seems to be bounded: if x_n < 100, then (5(100))³+7)^1/4 is less than 100 (easy estimates, since (100)⁴ is 100,000,000). So I can now conclude that (x_n) converges! Quite simple. Of course, this doesn't tell me what the limit is, but at least I can try to look for it now.

I wrote in the background to the course (and said during various lectures) that the single most important definition was that of limit of a sequence. I can sort of try to understand sequences: they are a countable object, sort of a list, and I can try to use simple-minded (?) techniques like mathematical induction on sequences. Yet I am trying to investigate the real numbers. One of the horrors and the pleasures of the real numbers is that they are uncountable. How can I use the tool of sequences, basically what seems to be a countable tool, to investigate an huge, uncountable set? The method is to use subsequences.

I will take for today a very formal view of sequences and subsequences. Partly this is an effort to keep students alert and away from the familiar standard notation (a subsequence is denoted by a subscript on a subscript). But also it is a way to keep us honest and rely only upon proving things. So here I go. First, let's repeat what a sequence is:

A sequence is a function f from N to R.

A function h:N --> N is strictly increasing if for all a and b in N, a<b implies h(a)<h(b).

A function g:N --> R is a subsequence of a function f:N --> R if there is some strictly increasing function h:N --> N so that g=foh (the notation o stands for the "little circle" indicating composition).

I have written this in a very formal manner, to be darn near incomprehensible: incomprehensible but honest. I gave a rather silly example: f(n)=1/n, h(n)=n², so that the subsequence g(n) was 1/n². In this example, we have a subsequence of a convergent sequence which also converges, and the limits agree. This is no accident:
Theorem If a sequence converges, then every subsequence converges and has the same limit as the original sequence.
Proof: We first prove a result about strictly increasing h's:

Proposition If h:N-->N is strictly increasing, then h(n)>=n for all n in N.
Proof: We prove this by Mathematical induction.
The statement is true for n=1, since h(n) is in N, and the lowest element in N is 1, so h(1)>=1.
Now assume h(n)>=n. By the "strict increasing" definition, h(n+1)>h(n), and h(n)+1 is the lowest integer greater than h(n). Therefore h(n+1)>=h(n)+1. But if h(n)>=n, then h(n)+1>n+1, so that h(n+1)>=n=1. We have completed the inductive step.

Back to the proof of the theorem: suppose the sequence f converges to x. This means: given epsilon>0. there is K(epsilon) in N so that if n>=K(epsilon), then |f(n)-x|<epsilon. But h(K(epsilon))>=K(epsilon), so if m is in N and m>K(epsilon), then h(m)>=h(K(epsilon)>K(epsilon), so that |f(h(m))-x|<epsilon, which is exactly the definition of "g=foh" converges to x. And we're done.

Notation If g is a subsequence of f, then the traditional notation goes something like this: f(n) corresponds to x_n, and g(k) corresponds to x_{n_k}. The text uses the more traditional notation, so that's what I will generally do in the work that follows.

Example We saw in the last lecture that the sequence f(n)=(-1)ⁿ does not converge. The way we showed it in the last lecture seemed rather elaborate and difficult to motivate. Here's another method. If f(n)=(-1)ⁿ converges, then every subsequence must converge, and the limit of every subsequence must be the same. So we looked at the following strictly increasing functions from N to N: h₁(n)=2n (just "hits" the even integers) and h₂(n)=2n-1 (gets the odd ones). Then g₁=foh₁ and g₂=foh₂ are two subsequences of f. Not very elaborate computation shows that g₁ is the constant sequence 1 (-1 raised to an even integer power is 1) and g₂ is the constant sequence -1 (-1 raised to an odd integer power is -1). So these two subsequences converge, and since they do not converge to the same number (-1 is not equal to 1) the original sequence f cannot converge.

Then I asked if it is possible to have a sequence with subsequences converging to three numbers. We decided that could happen (the sequence would alternate among 1 and 0 and -1). We could even have a sequence with subsequences converging to 17 different numbers. A more complicated question is the following:

Question Is there a sequence whose subsequences converge to infinitely many distinct numbers?
There are many solutions to this problem. The answer, however, is "Yes". One solution that I suggested in class is the following: let x_{odd numbers} always be 0. Now we know that at least one subsequence must converge to 0. There are still infinitely many integers left. Take every other even number (2,6,10,14, etc.) and let the sequence elements which correspond to these integers be 1. Now we know there's at least one subsequence which converges to 1. There are still infinitely many integers left. Sigh. Take every other one of these (I guess this would be n's corresponding to 4,12,20, etc.) and let the x_n's here be 2. Etc. So we get a rather weird sequence (we could probably write a formula but who needs it?) with a subsequence which converges to every non-negative integer. This is strange. But how about an even stranger question?

Question Is there a sequence which, for each x in [0,1], has a subsequence which converges to x?
The answer here is "Yes", also. Most verifications of this answer seem a bit strange. Let me offer one of them, with an secret motive in mind.
I will outline a procedure for creating the sequence. Then I will outline a procedure for creating a subsequence which converges to any x specified in [0,1].
Creating the sequence

The first element of the sequence, x₁, should be any number in [0,1].
The next elements: divide [0,1] into two equal halves. Let x₂ be any element in the left half, and let x₃ be any element in the right half.
The next elements: divide [0,1] into four equal parts. Let x₄ through x₇ be points in each of these quarters in order (left to right).
The next elements: divide [0,1] into eight equal parts. Let x₈ through x₁₅ be points in each of these eighths, in order (left to right).
And so on. Again, I think it is possible to write formulas for such a sequence, but I want to emphasize the qualitative aspect: we generate the sequence "elements" in powers of 2, so that there is one in each 1/2^K length of [0,1].
Creating the subsequence
Let's choose a random element, RANDOM, of [0,1]. I will get a subsequence of the sequence "defined" above which will be guaranteed to converge to RANDOM.
1. The first element of the subsequence will always be x₁ as defined above. Note that since RANDOM and x₁ are both in [0,1], |RANDOM-x₁|<=1.
2. Since the union of [0,1/2] and [1/2,1] is all of [0,1], RANDOM must be in at least one of the two subintervals. Choose one. We choose the next element of the subsequence to be the element of the subsequence which is in the same half-length interval as RANDOM sits in. Then that element has distance at most 1/2 from RANDOM.
3. Since the union of [0,1/4], [1/4,1/2], {1/2,3/4] and [3/4,1] is all of [0,1], RANDOM must be in at least one of these subintervals. Choose for the next subsequential element the x_n which is in that subinterval. Then the distance from RANDOM to that element is at most 1/4, since they are both in an interval of length at least 1/4.
4. Etc. I hope I am not evading a good-enough specification of the subsequence. I certainly don't want to write a complete specification, since I would probably have to write lots and lots of details and I am not sure that the details would help the understanding. The subsequence (let me use the function-g notation now) is recursively chosen so that |RANDOM-g(n)|<=1/2^n-1 (that's the length of the subintervals). Since the sequence 1/2^n-1 --> 0, we know (Squeeze Theorem?) that the sequence defined by g must --> RANDOM. And g is a subsequence of the original sequence.
  OF course there are other things one can say about this procedure. First, it actually depends on the binary expansion of RANDOM: if there is a 0 or 1 in the n^th place, then we choose the left- or right-hand half of the next subinterval and look for the element of the original subsequence which is in that subinterval. Also, creation of this sequence uses the idea of bisection, which is a famous technique both in theoretical mathematics (I'll use it next time to prove the Bolzano-Weierstrass Theorem) and in numerical analysis (the bisection method finds roots of lots of equations.)
  Question Id there a sequence which, for each x in R, has a subsequence which converges to x?
  I ended with this question, whose answer is also, "Yes". I remarked that this could be verified with a construction like the previous one. I should also say that Mr. Hedberg had a suggestion which applies here: since we verified that the rational numbers are countable, we can create a sequence which has each rational as an element of the sequence. Then by using the density of the rationals we can get subsequences which converge to any desired number. The details are not completely easy, but this is an argument which can be made.
  I returned the first exam, with an answer sheet and with comments about the grading.

3/5/2003

We began the material of section 3.3 today. I decided to look at a specific example and try to understand it well. I decided on the following "process" in a fairly random way. Here is what I did:
I started with 1, then I multiplied by 8, added 13 to the result, and took the square root of that. The result was 4.582575695.
I took this, multiplied it by 8, added 13, and took the square root. The result was 7.0470281.
I took this, multiplied it by 8, added 13, and took the square root. The result was 8.3292392.
I took this, multiplied it by 8, added 13, and took the square root. The result was 8.9237836.
I took this, multiplied it by 8, added 13, and took the square root. The result was 9.1864176.
I took this, multiplied it by 8, added 13, and took the square root. The result was 9.3000721.
I took this, multiplied it by 8, added 13, and took the square root. The result was 9.3488276.
ETC. What is going on? People with experience in numerical analysis may recognize this. Let me try to explain (and, hopefully, interest!) other people.

First, we are looking at a sequence defined by the following recursive rules:
An initial condition:x₁=1
Recursive definition:x_n+1=sqrt(8x_n+113)

What can happen to this sequence? Here is a simple sequence of "lemmas" analyzing what goes on:
Lemma 1: If x<=y, then 8x<=8y.
Lemma 2: If x<=y, then x+13<=y+13.
Lemma 3: If 0<=x<=y, then sqrt(x)<=sqrt(y).
Our sequence is obtained by a recursive rule which "concatenates" these three "operations". Note that I needed to be a tiny bit careful in Lemma 3, since our square root is only defined for non-negative numbers. I also note, since it is useful for understanding what's going on, that if the <= ("less than or equal to") between x and y are changed to just < ("less than") then this strict inequality is "inherited" by the outputs of the three operations. Now back to our sequence.

Theorem: The sequence (x_n) defined by
An initial condition:x₁=1
Recursive definition:x_n+1=sqrt(8x_n+113)
is increasing: x_n<x_n+1 for n in N.
Proof: We will prove this by using mathematical induction. We will let P(n) be the proposition x_n<x_n+1.
Base case: x₁=1 and x₂=sqrt(8+13)=sqrt(21), and since 21>1², we have verified P(1).
Inductive step: We assume P(n): x_n<x_n+1. Now use Lemmas 1 and 2 and 3 in order to get sqrt(8x_n+113)<sqrt(8x_n+1+13). But this is x_n+1<x_n+2, which is P(n+1).
We are done with the inductive proof.

So we look now for more information about the sequence (x_n). In fact, we wonder if (x_n) converges. Certainly an increasing sequence alone need not converge. A simple example is x_n=n, which, by the Archimedean Property, cannot converge. Mr. Goode observed that the sequence whose initial terms we computed seems to have smaller and smaller "jumps" between successive terms. Maybe this means it converges. I remarked that there is a sequence from calc 2 with interesting properties: x_n= the sum as j goes from 1 to n of 1/j. This is called the sequence of harmonic numbers. It is an increasing sequence since it is adding up more and more positive numbers as n increases. In this case, x_n+1-x_n is 1/(n+1), so certainly the sequence is taking ever smaller "steps" up as n increases. However, we can use logic familiar from the Integral Test, and compare x_n with the left-hand Riemann sums of width 1 for the integral from 1 to n+1 of 1/x. x_n is larger than this integral, and the integral is ln(n+1), which certainly is unbounded as n grows. (I note that it is possible to get the result that the sequence of harmonic numbers is unbounded without using integrals [I think the logic is in the book] but this way is quicker.)

. What happens to our sequence (x_n)? Mr. Hedberg suggested that the set S={x_n : n in N} is bounded. He even suggested the bound of 45. We checked this:
Proposition: x_n<45.
Proof: We will use Mathematical Induction. Call Q(n) the proposition that x_n<45. Then we observe:
The base case: x₁=1, and 1 is less than 45.
The inductive step: Suppose x_n<45. Multiply by 8, add 13, take square root, to obtain: x_n+1<sqrt(8·45+13)=sqrt(373), and since 373<400 which is 20², I know that Q(n+1) is also true.

So (x_n) is increasing and bounded. Does it converge, and, if it converges, what is its limit? Well, it can't converge to 46, say, since 45 is an upper bound of the x_n's, and 46 is one more, so that the inequality |x_n-46|<1 will never be satisfied. Recall that a bounded sequence need not converge (we considered ((-1)ⁿ) last time, an example of a bounded sequence which did not converge). But here we have additional "structure": the sequence increases. In fact, the sequence converges to the sup of the set S={x_n : n in N}. Why should this set even have a sup? We know that S is not empty (1 is in S!). And we also know that 45 is an upper bound of S. Therefore by the Completeness Axiom, S must have a least upper bound. Let me call that least upper bound, L.

Theorem: (x_n) converges to L.
Proof: I know that x_n<=L for all n (definition of upper bound). I also know that given epsilon>0, there is a member of the sequence, x_fred, so that L-epsilon<x_fred<=L (the choice of x_fred depends on epsilon, of course). But I claim: if n>=fred, then |x_n-L|<epsilon. Why? The inequality |x_n-L|<epsilon can be "unrolled" to L-epsilon<x_n<L+epsilon. But if n>fred, x_n>x_fred. fred was selected, though, so that L-epsilon<x_fred. so L-epsilon<x_n. Also x_n<=L since L is an upper bound of the x_n's. Therefore if n>=fred, L-epsilon<x_n<=L, which implies |x_n-L|<epsilon. So we have verified the definition of convergence.
Readers should note that we needed the sequence to be increasing (o.k.: we didn't exactly need x_n<x_n+1 : our results would have been true with the "weaker" statement x_n<x_n+1) and we needed L to be the least upper bound, the sup, of the set of all the x_n's.

But what is L? A whole bunch of more-or-less unsatisfactory answers can be given:
L is the limit of the sequence (x_n).
L is the sup of the set S.
L is less than 45.
These are unsatisfactory because they don't really "tell us" what L is, nor do they relate L to the recursive rule which defined the sequence. We can in fact be much clearer.

Since x_n+1=sqrt(8x_n+113), (x_n+1)²=8x_n+113. We know that (x_n) converges and that the limit of (x_n) is L. We also can deduce that the sequence (x_n+1) converges, and its limit is also L (remember, "only tails matter."). The limit of the left-hand side is L². The limit of the right-hand side is 8L+13. So we know L²-8L-13=0. The quadratic formula is valid (I think it was an exercise earlier in the book) so that L must be (8+/-sqrt(8²+4(13)))/2=(8+/-sqrt(116))/2=[approx](8+/-10.77)/2. So there seem to be two choices for L. One choice is negative. But, golly, each of the x_n's is positive, so from previous results on order, we know that the limit can't be negative. Therefore the limit must be [approx]9.385 which is certainly at least consistent with the previously computed numbers.

What is going on here? Look at the picture. The line shown is just y=x. The parabola is y=sqrt(8x+13), or even y²=8x+13. The point A is just an initial x₁. Go "up" to the parabola. The point B is (x₁,x₂), because the "rule of formation" of the sequence is the same as y=sqrt(8x+13). Now "bounce" to the diagonal line, where C is (x₂,x₂). Now up to D, which must be (x₂,x₃), etc. The sequence is gotten by bouncing back and forth between the curves. The (x_n) of Math 311 is just the first coordinates of these points, marching to the right on the real line. The intersection of the curve and the straight line is the point (L,L), and the sequence "clearly" (well, maybe clearly geometrically) converges to L. It is interesting to note that this process is quite "robust" -- perturbations of the initial "guess" x₁ don't affect its convergence to L. If the initial guess is chosen "too large", so x₁ is greater than L, the geometry shows that the sequence decreases, wiggling back and forth, down towards L. This is a very stable, very neat method of approximating a root. We have an "attractive fixed point". The main object of section 3.3 is to discuss:

Monotone sequences (x_n).
- A sequence is increasing if x_n<=x_n+1 for all n in N.
- A sequence is decreasing if x_n>=x_n+1 for all n in N.
Monotone sequences converge if and only if they are bounded.
- An increasing sequence is bounded below always, so such a sequence is bounded when it is bounded above.
- An decreasing sequence is bounded above always, so such a sequence is bounded when it is bounded below.
A monotone sequence which is bounded converges to
- its sup if it is increasing.
- its inf if it is decreasing.

The proofs of these facts depend upon the characterization of inf and sup with epsilon's, and on the "structure" ({in|de}creasing) of the sequences involved. We may do more on this on Monday. There's an exam tomorrow.

Maintained by greenfie@math.rutgers.edu and last modified 3/12/2003.