RDP 7901: Estimation and Statistical Evaluation of an Economic Model 2. Evaluation of Econometric Models: A Brief Survey

Many econometric models include some non-linear relationships between variables, and an increasing number of models have been estimated with full information simultaneous equation techniques. Since model evaluation procedures for these types of models are not as readily available as procedures for evaluation of single equations, there is little information on the statistical properties of the models. Until there is a formal, well-developed set of criteria for evaluation of simultaneously estimated models, the best model building process can be termed an iterative research strategy.[1] Economic theory and available information are used to formulate the model, and the performance and reliability of the model should be gauged by a range of formal and informal measures. Not all of the available model evaluation procedures may be appropriate in every situation, but as much testing as possible should be carried out. Unsatisfactory parts of the model may be revealed, and the model can then be reformulated with newly available data and information before the testing phase is repeated. Because of the pretesting and informal use of prior information, this strategy will lead to the development of a model with final parameter estimates of unknown statistical properties. However, given the small samples of data and the complex economic system to be modelled, it is unlikely that any research strategy could produce a valid model with known statistical properties.

Model evaluation procedures can be divided into eight main areas: these are discussed under two headings.

Overall model selection

  1. Hypothesis testing procedures: If the models are nested, a test such as the likelihood ratio test can in principle be applied[2] For non-nested models it may be possible to embed the alternative models into one general model and thus apply the likelihood ratio test, although this procedure is unlikely to be practical in most cases. Procedures for testing non-nested hypotheses have been developed by Atkinson and Cox (1974) and Quandt (1974) for application to linear regression models: further study of the properties and interrelationships of these procedures, and extension to more general situations, is required.[3]
  2. Bayesian procedures: Probabilities are assigned to alternative models and these probabilities can be updated as more information becomes available. These procedures have usually been seen as impractical for application to models of more than two or three equations. However, according to Giles (1977), recent developments in computing and costs of Bayesian procedures indicate that it may soon be practical to use prior information and to compare non-nested models for non-linear structures and simultaneously estimated systems.[4]

Partial or Informal Procedures

  1. Size, sign and significance of parameter estimates: The internal consistency of the model can be examined by checking the magnitude and signs of parameters expected from theory[5] and the significance of the estimated parameters as indicated by their sampling variance. It may be possible in a simultaneous equations system to conduct a formal statistical test on the significance of the parameters.[6]
  2. Goodness-of-fit measures: Since the coefficient of determination used for single equation regression models is not applicable to simultaneous systems, a number of informal measures are often used. For example, in testing his permanent income model, Trevor (1978) uses five measures of goodness-of-fit at the estimation stage; a simple squared correlation coefficient and a coefficient of determination for each equation of both the structural and reduced form models, and a whole model coefficient of determination (Carter and Nagar (1977)).
  3. Residuals analysis: The assumptions of uncorrelated residuals and normality of structural residuals can be examined by an analysis of the residuals of the estimated model. Most of the statistical tests for significant noncontemporaneous co-variance of the structural residuals are for single equation analysis; the available tests for simultaneous systems generally involve computations which would be impractical for large systems, and small sample properties are often unknown. Thus it is necessary to use informal measures such as the cross correlation coefficient,[7] or the sign reversal test[8] which has at least large sample justification.[9] If these tests indicate the presence of autocorrelation, residuals analysis can be used to help trace mis-specification in the model.[10] The procedure of estimating autocorrelation schemes should be used with caution: the invalid representation of mis-specified dynamics by an autocorrelation process may worsen rather than improve the situation. However, Hendry and Mizon (1978) suggest that serial correlation adjustment can be interpreted as a convenient way of representing dynamic relationships and making use of long run information in the data.
  4. Testing restrictions: When restrictions have been imposed on a model, these restrictions should be tested relative to a less restrictive version of the model. The task of finding a suitable version of the model with a structure acceptable in economic terms may be difficult: ideally this model should be accepted relative to the unrestricted reduced form.[11] When the hypotheses are nested, the over-identifying restrictions should systematically be tested in increasing order of restrictiveness, using the likelihood ratio test if appropriate. A less satisfactory procedure is to start with the most restrictive model and to test sequentially the need to relax restrictions.[12] An advantage of this method is that it is not necessary to compute estimates for models less restricted than the accepted model. However the procedure is arbitrary in that the complete framework for hypothesis testing is not specified at the outset and thus the power of the procedure is unknown.[13]
  5. Simulation and eigensystem analysis: Since analytic dissection of a large, non-linear system is generally not possible, simulation analysis is used to examine the dynamic properties of the model. Simulations carried out can either be ex-post or ex-ante, single period or dynamic, and deterministic or stochastic.[14] If a non-linear model is linearised for estimation purposes,[15] simulations should be carried out if possible with the non-linear structural form of the model to obtain root mean square errors, compare alternative model structures and for policy analysis.

    Simulations that allow for the stochastic property of parameter estimates are often not computationally feasible, but simulations in which random shocks are added to the equations should be carried out wherever possible to obtain a distribution of outcomes, particularly when the model is non-linear.[16]

    Evaluation of the predictive ability of a model is essentially a goodness-of-fit problem. However, most of the applicable statistical tests require assumptions that do not hold in the situation of evaluating simulation results and thus graphical techniques and simple summary measures (some of which can be tested for significance) are generally used. These include root mean square error,[17] a statistical test on the coefficients from the regression of actual data on the predicted values, and comparison with forecasts of mechanistic models.

    The dynamic properties of a linearised version of a model can also be analysed with the eigenvalues of the estimated model. A positive real eigenvalue or a complex eigenvalue with a positive real part indicate local instability in the non-linear differential equation system. The complex conjugate eigenvalues give information on the cyclical properties of the model.[18]

  6. Intertemporal stability of parameter estimates: Procedures for formally testing the equality of coefficients when the model is estimated over subperiods of the original estimation period have been developed mainly for the single equation case, and thus for simultaneous systems comparison of models estimated over subperiods would generally be based on informal criteria such as those discussed above. Since it is unlikely that two full data sets would be available for cross-validation of the model, a procedure, used for example by Hendry (1974), is to save the last few observations of the sample to carry out a post-sample parameter stability chi-square test.[19] The practice of checking the model with new data as it becomes available should be viewed with caution in that the new data could still be subject to revision. The procedure of re-estimating a model with data not used for its specification is potentially useful when comparing independently developed models.[20]

    In evaluation of models using the above criteria, consideration should be given to the purposes for which the model was developed, as the model evaluation criteria appropriate for a large scale forecasting model may be different to those for a small policy model. A more general principle that can be applied in choosing between models or procedures is preference for the simple procedure when the advantages of a simple and a complex procedure are found to be similar.

Footnotes

Zellner and Peck (1973). [1]

The minimum sample size required for application of the likelihood ratio test is a matter of uncertainty: work by Byron (1974) indicates that the test is not excessively conservative, although Phillips (1979) suggests that the chi-square distribution for the likelihood ratio test may not satisfactorily be approximated with samples of less than 100 observations. With reference to the F test for homogeneity of consumer demand, Laitinen (1978) shows by simulation experiment that this test is seriously biased toward rejecting the hypothesis, and gives a small sample interpretation of the test statistic which explains this bias. [2]

The Hausman specification test may be used when sets of estimates of a model have been obtained using alternative estimation techniques: Fair and Parke (1979) report an application of this test which is, however, only partly successful because of a small sample problem. [3]

The Theil-Goldberger mixed estimator (which combines prior and sample information) may be given a Bayesian interpretation, as shown in Frenkel and Clements (1978) where the estimator is applied to a small model of exchange rates. [4]

However, the criterion of adequate representation of accepted economic theory is not always relevant in that there are still gaps in economic theory, and thus models are sometimes used for additional exploration of economic hypotheses. [5]

Monte Carlo work by Phillips (1979) has shown that a good approximation of the distribution of point estimates to the asymptotic normal distribution could be achieved with a sample of size 60. Thus the ratio of the parameter estimate to its standard deviation can be tested against the asymptotic normal distribution. [6]

Trevor (1978). [7]

AS used by Giles (1977). [8]

The results of such tests may not indicate the type of autocorrelation process that is present: positive first order autocorrelation could indicate either errors in the data, or the presence of a moving average process in the relationships. [9]

Hendry (1974) suggests that single equation estimation methods are “value for money” in that they can indicate the presence of mis-specification and provide clues to its rectification, although the methods cannot reveal problems with cross serial correlation. [10]

However, as pointed out by Helliwell (1977), this is improbable for realistic economic models. [11]

For single parameter restrictions this involves testing the significance of the parameter estimate after the restriction has been relaxed; the equality of two parameters can also be tested for significance by obtaining the standard error of [12]

Mizon (1977). [13]

The effects of shocks of various sizes and of different starting points can be examined, and simulations from the end of the estimation period obtained. The relative importance of the various channels of response in the model can be explored by suppressing particular channels (Helliwell and Higgins (1976)). [14]

Non-linear systems estimation of simultaneous equation models of the size of RBA76 is at present very expensive, or requires more than the available computer core. [15]

Sowey (1973). As Howrey and Kelejian (1969) show on a theoretical basis, the simulation of a non-linear model should be stochastic. However it is necessary to obtain an indication of the bias associated with non-stochastic simulation of a particular non-linear model. For a version of the NIF model the difference between deterministic simulations and the average time paths of stochastic simulations is negligible; see FitzGerald (1973) . [16]

One problem with this measure is that no indication is given of whether over or under prediction occurs, or of the position of turning points. [17]

An application to the RBA76 model of the Australian economy is reported in Jonson, Evans and Moore (1978). [18]

Christ (1951) suggests that this process may be biased since the model-builder could already be familiar with the additional data observations. [19]

However, as is the case with the study by Cooper (1972), the relative performance of the models can be difficult to interpret when features such as tax rate changes and Korean war data may be used in the specification of some models but not in others. [20]