A Primer on The Central Limit Theorem



The Ohio State University, 1998


This primer will consider two central limit theorems, as all other primers, at an intuitive level and will introduce the relevant terminology. It will start out with the “standard” central limit theorem (CLT)

Central Limit Theorem. Let X1,X2,… be a sequence of independent identically distributed random variables with finite means m and finite non-zero variances s2, and let

Sn = X1 + X2 + … + Xn.

Then       (Sn – nm)/(n1/2s) ® N(0,1) as n®µ.

This theorem is considered to be the most profound result in the theory of probability. It has far reaching implications than it seems on the surface. Let us consider in detail its contents.

First, the theorem deals with a sum of random variables. Obviously, a sum of random variables (Sn) is a random variable itself. As we know from elementary probability theory, any additions, subtractions, divisions, and multiplications of random variables with constants give as a result random variables again. Thus, the conclusion of the theorem should not come as a surprise.

Second, there is a myriad of conditions imposed on the variables Xi and, therefore, a myriad of versions of the theorem. Here we consider the simplest version – Lyapunov’s CLT. It is the simplest one because it imposes the most stringent requirements on Xi’s. It requires that the variables are independent. This is not necessary. Other versions of the theorem, known as Lindeberg, and Lindeberg-Levy versions, oftentimes called conditions (instead of versions) allow for certain dependence of the variables. Here we will not consider these cases.

Third, there is a requirement that the variables are identically distributed. This is also an optional requirement and is far from necessary but makes proving the Lyapunov CLT a straighforward task. Roughly speaking, a more relaxed requiremnent is that the variables are distributed somewhat similarly without too many conspicuous differences in their disributions.

Fourth, part of the argument for identacally distributed random variables is the requirement that all the variables have a finite mean m. As mentioned above, while the requirement that all the variables have a mean m is not necessary, it is absolutely important that their mean(s) are finite.

Five, another crucial requirement is that the variances of the variables are finite. This is a requirement that is maintained in any version of the theorem. Our theorem requires that they be the same, though we may do away with this.

Six, we require that the variances are non-zero. If the variances are zero, then the random variables are singular, and thus degenerate to constants; obviously, a sum of constants cannot produce a random variable. However, other versions of the theorem have somewhat less stringent requirement; namely that no more than a finite number of variables be degenerate, or equivalently, that there be an infinite number of non-degenerate random variables in the summation.

Seven, the resulting distribution is a standard normal distribution. What the numerator of the conclusion of the theorem does ( Sn – nm) is to center the variables around zero. Any version of the theorem will have some form of centering around zero. The purpose of the denominator is to adjust the variance to unity. One should note that the denominator has n1/2 which is usually referred to as the rate of convergence. Another way of saying this is that  (Sn-nm) is approximately as big as n1/2.

Eight, the sign ® means “converges to”. This is a term hedged around technical difficulties, and is probably the hardest topic in the theory of probability and stochastic processes. While for a set of points there is basically one standard definition and intuition for the concept of convergence, for random variables there are at least 8 standard ways to define, interprete, and understand convergence. The six most popular and widely used modes of convergence are (1) pointwise, (2) almost sure, (3) in the mean, (4) in probability (also known as “stochastic convergence”), (5) in distribution, and (6) weak convergence. The classical encyclopaedic treatment of convergence can be found in the monograph “Convergence of Probability Measures” written by Billingsley; an extensive treatment can be found in Shyriaev’s “Probability”; the basics are expounded in Grimmet & Stirzaker; regarding applied work, having even the weakest form of convergence is satisfactory. In its current formulation, the theorem states that the convergence is in distribution.

Finally, we come to the most celebrated part of the theorem. It states that no matter what the original distribuions are, the resulting normalized sum is normally distributed. Thus, the original variables may be discrete, additionally, they may be highly skewed, they may be highly platicurtic or leptocurtic, the resulting sum (of the centered variables) will be normally distributed.

Now, let us turn our view towards the Functional Central Limit Theorem. Our approach will be an heuristic one: we will first explain why it is called functional by introducing some terminology, then we will give a simple and intuitive application of the theorem, and finally, we will state the theorem and provide further clarifications. Our discussion will closely follow, in order of paragraphs,  Gikhman & Skorokhod’s pp. 1-2 of “Introduction to the Theory of Random Processes”, Grimmet & Stirzaker’s pp. 487-490,  and pp.479-481 of Hamilton’s textbook.

[Following Gikhman & Skorokhod]. The course of a random process is described by a function x(q) which may assume real, complex, or vector values, where q assumes values in a reference set Q. The functions x(·) are called sample functions of the random process, realizations of the process, or trajectories of the process. For each q, the quantity x(q) is random. In the theory of stochastic processes, x(q) is not simply assumed to be random, but to be a random variable which enables the analyst to utilize fully all tools of probability theory into the theory of stochastic processes. Then, as we have defined elsewhere in our previous primers, a random process is a family of random variables x(q) depending on a parameter q that assumes values in some set Q. If the set Q is arbitrary (or purely abstract) then instead of the term “random process”, it is more convenient to use the term “random function” and to reserve the name “random process” for those cases in which the parameter q is interpreted as time. When the argument of a random function is a spatial variable, this function is usually called a random field. When the argument is both time and space, one may see the term random panel derived from the well known term panel data. Finally, when the time argument is discrete, we use the term random sequence rather than random process. Everywhere in the literature,  authors will use interchangeably and indiscriminately the term stochastic for the term random, thus you may as often see the terms stochastic sequence, stochastic process, stochastic function, stochastic field, stochastic panel, etc.. In the case of the Functional Central Limit Theorem (FCLT), the term Functional indicates that the theorem, refers to, and is applied to, a random function. Given that we deal with time dimension only, we may further indicate that the theorem will deal with a random process. The CLT stated that a sum of random variables is a (normally distributed) random variable, while the FCLT will indicate that the sum will be a random function/random process, namely, a standard Brownian Motion (Wiener Process).

[Following Grimmet & Stirzaker] Now, let us move to our heuristic example. Suppose we have a random walk process: every second the process jumps up or down with unity. Economists are all familiar with this. Now, suppose that we split the time and the jump into halves: every half a second the process jumps up or down with a half step. Then we split the time and the step into halves again, and again. At this stage in the analysis we let the inter-point distance (jump) and the inter-time jump approach zero; in doing so we hope that the discrete random walk may approach some limit properties, and indeed it does. The limiting process is, not surprisingly, Brownian Motion. Thus, by transforming a random walk into a continuous process we obtain a Brownian motion, and conversely, by discretizing a Brownian motion we obtain a random walk process.

[Following Hamilton] Now let us formulate the theorem first, and explain afterwards its notation and meaning.  The FCLT says that

T1/2XT(·)/s ® W(·).

The notation is as follows. W(·) denotes the process of standard Brownian motion. ® denotes convergence in distribution. s is the variance of the process defined in lefthandside of the equation (thus normalizing the variance to unity, so that we can obtain standard Brownian motion). T is the size of the sample, i.e., the number of observations. XT(·) is a random function defined as follows: for rÎ[0,1],

0                     for     0 £ r £ 1/T

u1/T                for 1/T £ r £ 2/T

XT(r ) @      (u1+u2)/T             for 2/T £ r £ 3/T

(u1+u2+…+uT)/T      for r = 1.

Now let us see what does the  XT(r ) represent. First of all, the meaning of r must be elucidated: r represents time. In other words, time has been normalized to unity, thus the whole process takes place from time zero to time one. It has also been implicitly assumed that observations are equidistant in time. For example, r=.15 represents the first 15% of the obseravtions rounded off to the smaller integer. If we have 201 observations, this represents the first 30 observations. Next, the theorem assumes that the ui’s are independent. This means that a CLT may be applied to them, thus arguing that XT(r ) is a normally distributed random variable for any r in the zero-one range. As Hamilton strongly emphasizes, we must note that while XT(r ) for any r is a random variable, XT(·) is a random function/process. The reader should realize that while the process is continuous in time (r takes values continuously in the 0-1 interval), its realizations will be discrete (i.e, the trajectories will have jumps). The next point in order is that the process is constructed as a partial sum process, or cumulatively. This is exactly what we have been doing in our heuristic example outlined a paragraph above. Finally, we will leave the discussion about a convergence of one stochastic process to another to Hamilton’s p.481, and to Billingsley’s Chapter 3 since this is well beyond the realm of our purposes in this primer.

In summary, what the FCLT roughly says is that the process of cummulative sums of identically and independantly distributed random variables converges asymptotically to Brownian motion. Our heuristic example was a simple illustration of this.

In conclusion, the ui’s must satisfy more or less the same conditions as those imposed in the CLT. Obviously, there will be as many versions of the FCLT as there are of the CLT. At this moment, the most complete and authoritative treatment in the literature is the monograph STOCHASTIC LIMIT THEORY: An Introduction for Econometricians written by James Davidson and published by Oxford University Press in the series “Advanced Texts in Econometrics” in 1994. The book contains 4 chapters on CLT and another 5 on FCLT.

Leave a Reply