**A PRIMER ON WOLD’s DECOMPOSITION**

A SUPPLEMENT READING FOR

AGECON 993.07: TIME SERIES ANALYSIS WORKSHOP

The Ohio State University, 1998

KRASSIMIR PETROV

This primer considers the Wold’s decomposition. As in the previous primer, mathematical proofs will not be given; the major purposes of the primer are to explain the technical terms and their usage, and to clarify the meaning and interpretation of the theorems.

Textbook versions of this theorem usually give it in an amalgamated, fairly complicated form which usually masquerades a property of stationary stochastic processes that is important in itself. Also, it seems useful, at least from an instructive point of view, to split the theorem in two parts so that its meaning is more transparent. The discussion here will proceed with two parts of the theorem stated as auxiliary facts – lemmas, and will state the theorem itself and will conclude with two important applications clothed in the form of corrolaries .

LEMMA 1. *Every stationary (in the broad sense) process x can be uniquely decomposed into orhtognonal regular and singular processes x ^{r} and x^{s}. *

This lemma is the heart of Wold’s decomposition. First of all, the decomposition is applicable only for stationary in the broad sense processes which are defined and explained in the previous primer.

Next, the theorem states that the stochastic process can be decomposed into two different processes. To “decompose”, as used for stochastic processes in general, refers to a representation in more components than the original has so that the sum of the components equals the original. In our case, one stochastic process is decomposed into two: x_{n} = x^{r} + x^{s}.

Now, let us see what singularilty means. A process is singular if its future values can be determined solely on the past. That is to be interpreted that there is no uncertainty in the prediction – the variance of all future predicted values is zero. It is the “magic” number zero that led mathematicians to call the process zero; for example, mathematicians say that a matrix is singular if its determinant is zero; or they would say that a function is singular if its first derivative is almost everywhere zero; or that a random variable is singular if its variance is zero, i.e, a random variable is singular if it is (almost surely) a constant. Returning back to singular processes, these are processes which can be perfectly forecasted – that is why they are oftentimes referred to as ‘purely’ or ‘completely’ or ‘entirely’ deterministic processes.

Regular processes, on the other hand, are those that do not have any predictable components in itself. *Sometimes*, they are referred to as ‘purely’, ‘completely’ or ‘entirely’ non-deterministic (or even random) processes. An excellent illustration of this process is the white noise process which was defined and explained in the previous primer.

Thus, what the theorem is saying is that every stationary process can be represented as a simple sum of a purely deterministic and a purely random processes. Moreover, this representation is unique.

Finally, the two processes can be found to be orthogonal. Orthogonality is an important concept in mathematics. It refers to elements of spaces endowed with a scalar product: two elements are orthogonal if their scalar product is zero. Now, the elements of these spaces could be ordinary points (for example, points from the real line, plane, or 3-D space) or they could be functions. Since random variables are a special type of functions, called measurable functions, the orthogonality of random variables is just that – their scalar product is zero. However, if we impose in addition that the random variables have zero mean, then the scalar product, as understood in the subject of real analysis and linear algebra, becomes equivalent to the concept of covariance (as used in probability, statistics and econometrics). Since a discrete random process is defined as a sequence of random variables, two processes are defined to be orthogonal if the scalar product/covariance of any (Cartesian) combination of elements/random variables is zero: in our notation if COV(x^{r}_{m},x^{s}_{n})=0 for *any* m and n.

This completes the discussion of the first lemma. The second lemma gives us a complete characterization of all regular processes in terms of white noise:

LEMMA 2. *A random process x is regular if and only if there exists an innovation process **e**={**e*_{n}*} and a sequence {a _{n}} , *

*S*

*a*

_{n}^{2}<*µ*

*, such that*

* x _{n}=*

*S*

_{k}*a*

_{k}*e*

_{n-k}*, where summation is over all non-negative integers.*

To start with, the mathematical definition of an innovation process is hedged around gory technical details, but it is sufficient to note that the white noise processes represent almost all innovation processes, and that for the purposes of applied econometrics and engineering these may well be regarded as equivalent.

What this lemma is saying is that for a stochastic process to be regular it is both necessary and sufficient that it be representable in a one-way (one-sided) moving average form. A moving average form is one-way if the form includes only past values of the (white noise) process. It is a two-way form if it includes also future values of the white noise process. The most general form is including infinitely many values both from the future and from the past. These processes have been analysed extensively by engineers under the heading of filters and filter theory, and the engineering literature has established the term feasible process to refer to one-sided moving average processes and infeasible (or non-feasible) processes to refer to processes which include as components future values of the process. Using this termonology, one may restate the theorem as saying *that a process is regular if (and only if) it is feasible*. The discussion of this lemma will conclude with the observation that economists, just like engineers, do not work *explicitly* with non-feasible processes. This does not mean that economists work only with feasible processes (this will practically doom any economic and econometric modelling) but means that they would rather extract somehow the predictable part and leave the unpredictable separately. A simple example with seasonality should fix this idea. Everybody knows that around Christmans sales rise with a certain percent. This is treated econometrically with a dummy variable. The function of the dummy variable is to extract the predictable component in the model and to leave the error as a white noise. Of course, we could have done differently – we could have found what the errors are around Christmas (they will presumably be exorbitantly positive, e.g. for sales, profits, etc.) and then then add in our forecast the *forecasted* future values of the *error* terms. Well, the point is that economists usually do not do things the latter way but the former.

At this point we are ready to combine the two theorems. The first theorem was saying that every stationary process was decomposable into a singular and regular processes, and the second was saying that the regular processes have only past innovation values. Therefore,

**WOLD’S DECOMPOSITION THEOREM. ***Every stationary sequence can be represented as a sum of a perfectly predictable process and a feasible moving average process: x _{n }= x_{n}^{s} + *

*S*

_{0}

^{µ}

*a*_{k}

*e*

_{n-k}

*, where*

*e*

*is an innovation process.*The significance of this theorem is that working with ARMA processes is not an arbitrary choice. Since AR processes can be represented as feasible MA processes, we can say that ARMA processes represent the most general form of stationary processes. To put it differently, every stationary process can be represented as an ARMA process; or we can say that the ARMA processes are all the stationary processes.

Finding Wold’s representation requires, at least in principle, fitting an infinite number of parameters a_{k} to the data. With a finite number of observations, this is simply impossible. Therefore, a way out of this predicament is to impose a structure (some form of relationship) among the coefficients a_{k}. An illustrative structure may be found in equation [4.8.4] on p.109 in Hamilton’s textbook while the rigorous definitions and proofs discussed in this section may be found in Ch.6, Par.5 of Springer’s 1984 and 1995 editions of the textbook “Probability” written by Albert N. Shiryaev. This is an *advanced* book worthwhile having for all those that intend pursuing quantitative analysis of stochastic nature in discrete time. Here the reader will find a short but understandable treatment of the two (almost immediate) consequences of this theorem which we will state here as corrolaries which should fit well our intuition.

CORROLARY 1. *The forecasting error of a regular process is monotonic, i.e, **s*_{n}*<**s*_{n+1}* for all n, where **s*_{n }*represents the standard error of a forecast n periods in the future.*

CORROLARY 2. *In the limit, the forecast of a regular process is trivial, i.e, as n tends to infinity, the forecast will coincide with the mean of the process.*

In common sense terms, our forecasting abilities dissipate as we try to predict further into the future.