threeColLeftColumn {
padding-top: 5px !important; margin-bottom: 0px !important; padding-bottom: 2000px !important; }
threeColCenterColumn {
margin-bottom: 0px !important; padding-bottom: 0px !important; }
threeColRightColumn {
margin-bottom: 0px !important; padding-bottom: 0px !important; }
Many policymakers are now concerned about how the next wave of foreclosures will affect the housing market. Analysts have cited a large "shadow inventory" of homes, referring to the mass of delinquent mortgages that have yet to make their way through the foreclosure process. When these foreclosures occur, they could raise the number of homes for sale and put downward pressure on house prices. They could also impose negative externalities to other homes in the same neighborhoods, sending house prices even lower. (We recently blogged about the so-called contagion effects of foreclosures on surrounding properties.)
These potential effects seem intuitive, but measuring them is not easy. The main problem is what economists call "simultaneity." Foreclosures lead to an increased supply of homes for sale, which can lower prices—but lower prices also increase the probability that borrowers have negative equity, which can lead to foreclosure. Thus, there is simultaneous causality: foreclosures can reduce prices, and lower prices can cause the negative equity that leads to foreclosure. As a result, simply showing a correlation between foreclosures and falling house prices is not sufficient to measure—or even establish—a causal effect of foreclosures on prices.
A new study by Atif Mian, Amir Sufi, and Francesco Trebbi claims to have solved this econometric problem. Their paper reports a substantial causal impact of foreclosures on not only house prices, but also residential investment and automobile purchases. However, the authors make a major data error that, in our opinion, invalidates a large part of their analysis. In addition, there are important conceptual issues that raise deep questions about their identification strategy, even if it is possible to correct the data error.
Can simultaneity be solved by classifying states as judicial, nonjudicial?
The authors attack the simultaneity problem with a classic method: they use differences in state laws as an instrumental variable. The essential idea is that states vary randomly as to whether they are judicial or nonjudicial. Judicial states are typically characterized by longer foreclosure durations, since the mortgage servicer must navigate through the legal system to get court approval, which usually entails a significant amount of time (see Pennington-Cross 2010 for a nice discussion). If the judicial/nonjudicial classification is random with respect to the health of state-level housing markets, then state laws will generate random variation in the number of foreclosures across states. Under these assumptions, using the classification as an instrument yields consistent estimates of the effect of foreclosures on house prices.
Of course, the classification of states into judicial and nonjudicial groups may not be random. It turns out that there is a strong regional component to this classification. Figure 3 in the Mian-Sufi-Trebbi paper shows that states in the Northeast and Midwest tend to be judicial, while the states in the South and West are mostly nonjudicial. It's no secret that problems in the U.S. housing market also have a strong regional character, with housing markets in Arizona, California, Florida, and Nevada (all located in the South and West) in particularly bad shape.
One way to check for the possibility of confounding effects across the two classifications of states is to compare their observable variables. The authors do this, and then claim that "states with a judicial foreclosure requirement are remarkably similar to other states in all attributes of interest except the propensity to foreclose" (p.3). But eyeballing their Figure 3 should give a reader pause. Nevada and Arizona, which are nonjudicial states, include the number one and two MSAs for new construction and for house price appreciation in the two years prior to the collapse of the mortgage market.1
Cross-state differences challenge regressions
Regional patterns in both state laws and housing markets cause problems for the authors' identification strategy. If we find that foreclosures tend to be more frequent in the nonjudicial states, this might be because foreclosing on delinquent homeowners is easier in those states, as the authors' identification strategy assumes. But high foreclosure rates in the nonjudicial states could also stem from negative shocks to housing demand in the parts of the country where the nonjudicial states happen to be located. Consequently, if we find that housing prices are lower and foreclosure rates are higher in nonjudicial states, then we can't be sure what's causing what. The high foreclosure rates could be causing the falling prices, as the authors' claim. But it could also be true that low regional demand and falling prices in the South and West are causing the high foreclosure rate—the very possibility that the authors were hoping to rule out.
The authors recognize that unobserved cross-state differences make the state-level experimental approach problematic so they propose an alternative set of regressions that are not subject to such criticism. In addition to estimating the first set of regressions—which, in the manner described above, uses all the states in the country—they estimate a second set that includes only ZIP codes adjacent to borders between judicial and nonjudicial states. The idea is that while unobserved heterogeneity across states could potentially invalidate the first set of regressions, this heterogeneity is less likely to be a problem in the second. In other words, the housing market in Arizona may differ markedly from the housing market in Maine and not just because Arizona is a nonjudicial state while Maine is judicial.
However, the ZIP codes just north of the Massachusetts-Rhode Island border are likely to have similar housing markets to the ZIP codes that are just south of this border. So, if the border ZIP codes in Massachusetts, which the authors label a judicial state, are experiencing higher foreclosures than the border ZIP codes in Rhode Island, a nonjudicial state, then differences in the two state's laws—and not unobserved differences in demand— are probably the reason why. And if the state laws are generating random variation in foreclosures, then the authors claim that this variation can be used to get a clean estimate of the causal effect of foreclosures on housing prices.
Problems in the data: Massachusetts, Wisconsin are misclassified
The authors find similar results in both sets of regressions. This similarity gives them some confidence that they have truly pinned down the direct effect of foreclosures on other economic outcomes. But here's where the data error comes in: the authors make a mistake in classifying at least two states as judicial or nonjudicial, which has major implications for their results. Specifically, they misclassify Massachusetts as judicial and Wisconsin as nonjudicial.2 Most sources, including the National Consumer Law Center (NCLC), reverse those classifications.
(For readers interested in the gory details, we show that for Massachusetts, there is no question that the NCLC is right.)
While the misclassification of two out of 50 states may seem minor, it turns out that Wisconsin and Massachusetts dominate the samples for the "border discontinuity" regressions. As the table shows, depending on the sample, using the alternative classification from the NCLC invalidates between 58 and 78 percent of the ZIP codes the authors use. Consider the sample that uses ZIP codes in 5-mile bands around state borders. Because it uses homes closest to state borders, this sample is least susceptible to unobservable differences between geographic areas, although we argue below that even 5-mile bands are inadequate to obtain clean identification. In this sample, classifying Massachusetts—correctly—as nonjudicial eliminates 70 percent of the comparisons.3
One response to this criticism would be to reclassify the states correctly and then reestimate both sets of regressions. The problem for the border regressions is that Massachusetts's and Wisconsin's borders with judicial and nonjudicial states respectively are sparsely populated and do not meet the authors' criteria for inclusion in the border sample. For example, farms and weekend homes comprise most of the properties in border ZIP codes between western Massachusetts and southern Vermont.
Misclassification proves detrimental to the identification strategy
As the authors have written the paper, they claim to find big differences in ZIP-code-level outcomes based on the judicial/nonjudicial classification. However, they use regressions with the wrong classification for most of the comparisons. If the identification strategy worked as the authors had hoped, their regressions would have implied that there are no important differences on either side of most judicial/nonjudicial borders because these borders in fact separated states with similar laws. However, because the regressions instead reported significant differences, some other important sources of heterogeneity across the state lines must exist—and if the authors can't control for heterogeneity across, say, the Massachusetts–Rhode Island border, the reader can't be expected to have confidence in their ability to control for unobserved differences between Massachusetts and Nevada.
Another way of putting this is that the authors have inadvertently performed and failed a falsification, or placebo, test on their data. They estimated their regressions on a sample of borders that are, for the most part, not characterized by differences in foreclosure laws, at least in terms of the judicial/nonjudicial classification, and found large effects where they should have found none. In our opinion, this is very strong evidence against their claim that judicial/nonjudicial foreclosure laws are a valid instrument for foreclosure rates. Even if the authors correctly reclassify the states and reestimate the IV regressions for the border sample, this failed falsification test still sheds doubt on the entire empirical strategy.
In addition to this primary critique, we also found some other important drawbacks in the analysis. For readers that are interested in learning more about these issues, here is a detailed discussion.
We remain unconvinced by the authors' claim that exogenous increases in foreclosures substantially reduce housing prices. This issue, of the link between foreclosure and house prices, is of first-order importance to policymakers, who struggle not only with the foreclosure problem itself but also with the potential effects of foreclosures on the economic recovery. However, the authors' research strategy is unlikely to be helpful in addressing these problems given the deep conceptual issues it did not deal with and the poor data on which it is based.
Kris Gerardi
Research economist and assistant policy adviser at the Federal Reserve Bank of Atlanta
Paul Willen
Research economist and policy adviser at the Boston Fed
1 Moreover, one of the main stylized demographic facts about the United States in the last 50 years has been the spread of population south and west across the country. Indeed, for the past 25 years, population has consistently and steadily grown twice as fast in the states the authors identify as nonjudicial compared to the states they identify as judicial.
2 Arguably, the authors misclassify as many as six states: the two listed plus Maryland, Nebraska, New Mexico, and Iowa. However, as we explain below, it's the misclassification of Massachusetts and Wisconsin that dramatically affects their results.
3 The authors are aware that there are alternative classifications but view the discrepancies as minimal, relegating the following comment to a footnote: "The only states that differ across these three classifications are Massachusetts, Nebraska, Oklahoma, Rhode Island, and Wisconsin." It is unclear whether they were aware that two of those states accounted for most of their border sample and that their border sample specification was not robust to the alternatives.