Good Models, Bad Models

Notes from the Vault
Nikolay Gospodinov
January 2018

Economists are often criticized for their reliance on abstract mathematical models that greatly oversimplify the real world. One response is that the real world is incredibly complex. Economic models cannot identify the most important relationships if they tried to model all of the potentially relevant determinants. Indeed, it is doubtful economists could even know all the factors that went into any given set of economic decisions. However, by focusing on a relatively small set of the more important determinants, economists can provide useful insights and predictions to those making economic decisions.

Unfortunately, although the economists' defense of their models is reasonable, all too often the implications are forgotten when economists proceed to estimate their models using real-world data.¹ That is, the economists assume the model or set of models they estimate includes the true model and ignore the inherent incompleteness of virtually all their models. As a result, they typically ignore information that could be used to improve the accuracy of their predictions.

In this post, I argue that any policy, stress testing, or forecasting exercise will benefit from recognizing explicitly the uncertainty arising from the inherent incompleteness of economic models. By accounting for various sources of uncertainty surrounding data and economic decisions, a researcher has a better chance of achieving a more robust and reliable approximation by combining and aggregating information from a collection of partially specified models.

Extracting information from misspecified models
Obtaining as accurate a prediction as possible is important in a wide variety of settings—such as inflation dynamics, discount factors for pricing financial assets, or income inequality measures—even when all the models are false. Model selection, which chooses only one from the whole set of models, is only justified when the true model is assumed to belong to this set. In this setup, the ambiguity about the true model is resolved asymptotically and the model selection procedure (or some other model averaging procedure) recovers the underlying process. If this assumption—which in my view is highly unrealistic—does not hold, selecting only one model will result in suboptimal inference, as it discards potentially important information in the rest of the models. By contrast, the model aggregation framework dispenses completely with the notion of a true model and treats all of the candidate models as potentially misspecified.² The aggregation approach adapts better to the underlying uncertainty surrounding the correct model and results in a more robust approximation when all models are false and subsumes the model selection procedure when one of the candidate models is indeed true.

Model misspecification and asset pricing
Asset pricing models provide the perfect "laboratory" for illustrating the approach to misspecification, given the tight restrictions imposed by lack of arbitrage that rule out arbitrage opportunities within or across financial asset classes. The fundamental pricing equation—used for pricing equity, bonds, contingent claims, and so forth—states that asset prices are obtained by "discounting" the future payoffs by a (positive) discount factor so the expected present value of the payoff is equal to the current price. Note this process is similar to the standard discounted cash flow models that are often used to value corporate investments. Dynamic asset pricing theories suggest candidate discount factors with different functional forms and factor specifications whose parameters are chosen to minimize the risk (distance) between the candidate and the "true," but unobserved, discount factor. If no parameter value exists that allows the candidate discount factor to price the test assets correctly, we refer to this model as misspecified. There is now overwhelming evidence that most, if not all, asset pricing models are strongly rejected by the data, and hence, are misspecified.³ Despite the convincing evidence of misspecification, however, it is common practice to proceed with statistical inference procedures that are developed under the assumption the model is correctly specified.

The highly misleading inference that can arise in this case—especially when combined with some form of model underidentification—can be illustrated using an example from my recent Atlanta Fed working paper 2017-9 (written in collaboration with University of Toronto Professor Raymond Kan and University of Georgia Professor Cesare Robotti). Suppose we consider the standard capital asset pricing model (CAPM) model and a model with a "new" factor that will be revealed later. The CAPM uses the market excess return (mkt) as a risk factor and the test asset returns are the monthly gross returns on the popular value-weighted 25 Fama-French size and book-to-market ranked portfolios.⁴ The results (specification test, significance tests, goodness-of-fit [R²]), based on an optimal estimator for the period January 1967 to December 2012, are presented in the table below.

**Evaluation of Asset Pricing Models**
	CAPM	CAPM+ "new" factor	"new" factor only
t-statistic (mkt)	5.28 (0.0000)	0.85 (0.3954)	-
t-statistic (new)	-	5:08 (0.0000)	5.12 (0.0000)
specification test	62.20 (0.0000)	25.53 (0.2726)	25.96 (0.3029)
R²	0.2277	0.9928	0.9938

Note: The numbers in parentheses are p-values.
Source: Authors' calculations, for more details, see the Atlanta Fed working paper 2017-9.

The results in the first column for CAPM are consistent with the existing literature. The market factor is statistically significantly different from zero as indicated by the p-value, which suggests it is priced. However, goodness-of-fit statistic points to some, but not particularly strong, explanatory power with over three-quarters of the variation not explained by the model. Moreover, the possibility this is the true model is strongly rejected by the data, as indicated by the specification test p-value. The results for the models with the "new" factor (middle and right columns in the table), however, are striking. First, the models with the "new" factor exhibit an almost perfect fit, explaining over 99 percent of the variation. Second, based on the specification test, the models appear to be correctly specified with p-values well above 0.05. Finally, the "new" factor is highly statistically significant (priced). Moreover, the highly empirically successful and theoretically justified market factor become statistically insignificant (p-value well above 0.05) in the middle column. As a result, the market factor is driven out of what most economists would consider to be the best of these empirical models (the right column).

Given the exceptional performance of this new model, what exactly is this "new" factor? It turns out this factor is generated as a standard normal random variable that is independent of returns! The results of this numerical illustration are completely spurious since the "new" factor does not contribute to the pricing of the test assets by construction. In summary, an arbitrarily bad model with one or more spurious factors is concluded to be a correctly specified model with a spectacular fit and priced factors. This example should serve as a warning about the dangers of conducting standard statistical inference in misspecified models. Some recent papers (Gospodinov et al., 2017, my Econometrica article with Kan and Robotti, for example) have developed tools and procedures to detect and deal with these statistical problems.

But if all models are inherently misspecified, how should we use these models for policy analysis, stress testing, and so on? Misspecified models can still be useful for informing policymakers in their decision process, provided the model uncertainty (in addition to data and parameter uncertainty) is adequately incorporated into the decision making.⁵ One popular approach is to choose the "least misspecified" member of the set by model or factor selection. As argued above, this approach is suboptimal, as the model selection is designed to choose only one of these models and ignores information in the remaining ones. Another popular alternative is to include a large number of plausible factors in the initial estimation and let the results of the estimation(s) determine which factors have the most explanatory power. However, that approach to selecting the single best model is also likely to result in a loss of information, and it compromises the integrity of individual models, each of which is guided by economic theory.

Aggregation of misspecified asset pricing models
An alternative approach (developed in my recent Atlanta Fed working paper 2017-10 with Emory University Professor Esfandiar Maasoumi) combines and aggregates information from a set of misspecified models with the goal to better elicit some features of the unobserved discount factor. As explained earlier, this procedure is analogous to taking a weighted average of the models. The standard approach in typical studies that pick a single best model is effectively to assign all of the weight to a single model. A simple alternative is to assign equal weight to each of the models, but that implies we think they are all equally good. A better approach takes advantage of the fact we have some information about their performance or estimation risk that can help determine the weights.

Information theory provides the natural theoretical foundation for dealing with the types of uncertainty and partial specification underlying the candidate models. The optimal aggregator—which summarizes the beliefs about the individual models—takes a harmonic mean form with geometric and linear weighting schemes as special cases. This mixing procedure, partly informed by theory and partly driven by data and statistics,⁶ is assigning weights depending on the model's contribution to the overall reduction of the pricing errors. One important property of the optimal aggregator is it relaxes the perfect substitutability of the candidate models that is implicitly embedded in the linear pooling procedures.

Regarding the practical performance of the aggregator, it is probably not surprising this approach offers a substantial improvement in pricing performance. However, the approach tends to give the greatest weight to the best performing models and that may not be optimal in all circumstances. For example, one model may do relatively well in explaining the data over some time period. However, the model may be flawed and/or structurally unstable, as the parameters are subject to time variation and regime shifts over longer time periods. Thus, some weighting schemes seek to make the estimates more robust by distributing the weights more evenly across models. That is, some risk functions deliberately reduce the weights assigned to the best model and distribute some of this weight to other models. Even though the model with just the "new factor" provided far superior estimates in my example, the new factor's success was a fluke that is unlikely to persist. In terms of making future predictions, we would want to give significant weight to the CAPM (the first column in the table). This illustrates the "insurance" value of mixing by penalizing the possibility of choosing catastrophically false individual models.

Conclusion
While the arguments made here have been developed in the context of asset pricing models, they are in no way limited to that particular setup. For example, model aggregation in macroeconomics and labor economics as well as a density forecast combination using a large set of diverse, partially specified models would be another natural application of the proposed method. This eclectic, "wisdom of the crowd" approach can offer substantial advantages to a wide variety of economic problems where model specifications are incomplete and measurement error is ubiquitous.

References

Bernardo, José M., and Adrian F.M. Smith (1994). Bayesian Theory. Chichester, England: Wiley. Available behind a paywall at http://onlinelibrary.wiley.com/book/10.1002/9780470316870.

Canova, Fabio (1994). "Statistical Inference in Calibrated Models." Journal of Applied Econometrics 9 (S1), S123–S144. Available behind a paywall at http://onlinelibrary.wiley.com/doi/10.1002/jae.3950090508/abstract.

Gospodinov, Nikolay, Raymond Kan, and Cesare Robotti (2013). "Chi-squared Tests for Evaluation and Comparison of Asset Pricing Models." Journal of Econometrics 173, 108–125. Available behind a paywall at http://www.sciencedirect.com/science/article/pii/S0304407612002485.

Gospodinov, Nikolay, Raymond Kan, and Cesare Robotti (2017). "Spurious Inference in Reduced- Rank Asset-Pricing Models." Econometrica 85, 1613–1628. Available behind a paywall at http://onlinelibrary.wiley.com/doi/10.3982/ECTA13750/full.

Gospodinov, Nikolay, Raymond Kan, and Cesare Robotti (2018). "Asymptotic Variance Approximations for Invariant Estimators in Uncertain Asset-Pricing Models, Econometric Reviews, forthcoming. Available behind a paywall at http://www.tandfonline.com/doi/full/10.1080/07474938.2016.1165945.

Hall, Alastair R., and Atsushi Inoue (2003). "The Large Sample Behaviour of the Generalized Method of Moments Estimator in Misspecified Models." Journal of Econometrics 114, 361–394. Available behind a paywall at http://www.sciencedirect.com/science/article/pii/S0304407603000897.

Maasoumi, Esfandiar (1990). "How to Live with Misspecification if You Must." Journal of Econometrics 44, 67–86. Available behind a paywall at https://www.sciencedirect.com/science/article/pii/0304407690900733.

Watson, Mark W. (1993). "Measures of Fit for Calibrated Models." Journal of Political Economy 101, 1011–1041. Available behind a paywall at http://www.journals.uchicago.edu/doi/abs/10.1086/261913.

White, Halbert L. (1994). Estimation, Inference and Specification Analysis. New York: Cambridge University Press. Available behind a paywall at https://www.cambridge.org/core/books/estimation-inference-and-specification-analysis/9B5D4DED8AA37C8231EB942B935CEF55.

Nikolay Gospodinov is a finance economist and senior adviser at the Atlanta Fed. The author thanks Larry Wall for valuable comments and suggestions. The view expressed here are the author's and not necessarily those of the Federal Reserve Bank of Atlanta or the Federal Reserve System. If you wish to comment on this post, please email atl.nftv.mailbox@atl.frb.org.

_______________________________________

¹ Most of the exceptions are coming from the econometrics literature and include Maasoumi (1990), Watson (1993), White (1994), Canova (1994), among others. Gospodinov et al. (2013, 2018) develop a comprehensive framework for conducting inference in possibly misspecified asset pricing models.

² See Bernando and Smith (1994) for a useful taxonomy of the different views regarding model comparison and selection.

³ The two Atlanta Fed working papers mentioned below (2017-9 and 2017-10), and the references therein, report extensive evidence of misspecification of linear and nonlinear asset pricing models.

⁴ These are available from Kenneth R. French's website.

⁵ For details on how to perform misspecification-robust inference, see Hall and Inoue (2003) and Gospodinov et al. (2013, 2018).

⁶ Yet another alternative is a data-driven (model-free) approach to approximating the unknown model using nonparametric methods. This is better suited to a "machine learning" approach with no or only limited guidance from economic theory.