RDP 2013-05: Liquidity Shocks and the US Housing Credit Crisis of 2007–2008 4. Data

4.1 The Home Mortgage Disclosure Act

The data underpinning the regression analysis are derived from the Home Mortgage Disclosure Act (HMDA) Loan Application Registry. Enacted by Congress in 1975, HMDA requires mortgage lenders located in metropolitan areas to collect data about their housing-related lending activity and make these data publicly available. The HMDA dataset is generally considered to be the most comprehensive source of mortgage data in the United States, and covers about 80 per cent of all home loans nationwide and a higher share of loans originated in metropolitan statistical areas. Whether a lender is covered depends on its size, the extent of its activity in a metropolitan statistical area, and the weight of residential mortgage lending in its portfolio.

The underlying sample of mortgage loan applications includes almost 300 million annual observations covering the period 2000–2010. For each application there is information on the loan application (e.g. the type of loan, the size of the loan, whether the loan is approved or not), the borrower (e.g. income, race, ethnicity), and the lending institution (e.g. the identity of the lender).[6] Most importantly for the purposes of this paper, I can identify whether a loan is sold to another financial institution or not. I assume that loan sales and loan securitisations are equivalent, so that I can directly observe the extent of securitisation activity by each mortgage lender.[7] I can also identify the type of institution that purchased the loan, which allows me to split loan securitisations into private and public securitisations. In particular, I identify public securitisations as any loans that were sold to the GSEs. I classify the remaining loan sales as private securitisations.

The raw loan application data are not panel data as the behaviour of specific borrowers cannot be tracked over time, although a given lender can be observed each year. To create a pseudo-panel I aggregate the annual loan application data so that the data vary by lender and Census tract. This means that I track the lending of a given loan originator to the average borrower in a given Census tract across time. A Census tract is a very narrowly defined geographic region. The tracts are designed, for the purpose of taking the Census, to be relatively homogeneous units in terms of population characteristics, economic status, and living conditions. In the United States, there are about 73,000 Census tracts, and each tract has between 2,500 and 8,000 residents. Several tracts commonly exist within a county, with the boundaries of a tract usually coinciding with the limits of cities and towns. The very narrow geographic focus of Census tracts supports my identification strategy, as different borrowers in the same tract are likely to share similar characteristics. This ensures that two different lenders that originate loans in the same tract are likely to face the same demand conditions and borrower risk profiles.

The HMDA dataset covers bank and non-bank lenders (i.e. mortgage companies). The non-bank lenders are an important segment of the US mortgage market. Over the sample period, they originated more than half of all new residential mortgage loans. Moreover, with no access to depository funding, non-bank lenders typically operate under the originate-to-distribute (OTD) business model and hence are much more reliant on loan securitisation for funding. The market share of these lenders typically varies with the credit cycle, so including these non-bank lenders in the sample reduces the probability of sample selection bias in identifying the effect of financing conditions on credit supply.

4.2 Measuring Mortgage Lending and Bank Liquidity

As will be discussed in the next section, in the main regression model, the dependent variable is a measure of the change in lending activity by each mortgage lender in each tract during the crisis. My preferred measure of lending activity is the number of new loans.[8] I proxy the liquidity shock through the average propensity of each mortgage lender to securitise loans in the pre-crisis period. More specifically, for each lender and each year, I calculate the ratio of the number of new loans that are sold to the total number of new loans and then, for each mortgage lender, average across all the years of the pre-crisis period. This averaging process is partly aimed at transforming the flow of loan sales into an approximate measure of each lender's stock of loan sales in the pre-crisis period, as the stock determines each lender's exposure to the liquidity shock.[9] I define the pre-crisis period to be 2000 to 2006. However, the results are not sensitive to the length of the pre-crisis period. For example, similar results are obtained when the pre-crisis period is defined as 2004 to 2006.

My set of control variables includes lender-tract controls, such as the average growth in income of the borrowing household and the share of minority household applicants faced by each lender in each tract, as well as lender-level controls, such as the average (log) number of loan applications, which acts as a proxy for lender size.[10] I exclude other lender-level variables, such as measures of profitability, as these data are unavailable for non-bank lenders. The non-bank lenders are an important segment of the US mortgage market and could be critical to the relationship between loan securitisation and credit supply given they typically operate under the OTD business model.

The second part of the analysis relies on a measure of borrower risk to test the flight to quality hypothesis. I measure the borrower risk faced by each lender through the share of high-priced loans originated by each lender in each tract. Information on the interest rate spread was added to the HMDA dataset in 2004. However, HMDA respondents are only required to report the interest rate spread on a subset of loans. Mortgages with a reported spread are ‘higher-priced’ loans.[11] As the interest rate spread on a loan largely reflects the credit risk of a borrower, the share of high-priced mortgages is often viewed as an indicator of risky or subprime lending (Mayer and Pence 2008). This measure of risky lending is only available since 2004, so this necessarily restricts the time series available to establish the pre-crisis period in the second part of the analysis.[12]

To test the flight to home hypothesis I construct a measure of each bank's average lending distance based on the detailed information provided by the HMDA. The HMDA dataset includes information on the Census tract of the residence of each loan applicant, as well as the full address details of the headquarters of each mortgage lender. This allows me to estimate the geographic distance between each borrower and lender using geocoding software provided by STATA and Google Maps. I then calculate, for each lender in each year, the average distance across all its borrowers within a given Census tract, which provides a measure of ‘lending distance’ at the lender-tract level.

The set-up of the regression model implies that the sample is restricted to tracts in which there is at least one OTD lender and one non-OTD lender originating new loans. In other words, I exclude tracts in which there is only one type of lender. The final sample comprises about 5,000 mortgage lenders that lend to more than 60,000 tracts in the United States. Table 1 provides summary statistics for the key variables used in the panel regression.

Table 1: Variable Summary Statistics
Variable Mean Median 25th pct 75th pct Std dev
Pre-crisis, 2000–2006
Sale share (%) 36.1 14.3 0.0 80.2 40.5
Private sale share (%) 29.6 1.8 0.0 63.5 39.1
Public sale share (%) 6.5 0.0 0.0 0.4 17.6
Minority ratio (%) 12.0 0.0 0.0 13.3 22.7
No of applications (log level) 4.7 4.7 3.4 6.0 2.2
Post-crisis, 2007–2008
No of loans (% change) −18.7 −2.4 −67.2 14.4 66.7
Household income (% change) 7.1 6.8 −17.6 31.1 48.6

Source: Home Mortgage Disclosure Act Loan Applications Registry

The summary statistics show that, on average, 36.1 per cent of all approved loans were sold in the pre-crisis period (2000–2006) and that 12.0 per cent of household loan applicants were from a minority group.[13] Moreover, the number of new loans fell by 18.7 per cent over the crisis period (2007–2008) while average household income rose by 7.1 per cent, on average.

Footnotes

I restrict the sample to conventional owner-occupier one-to-four family residential mortgages, which is consistent with numerous other studies. [6]

Loan sales and securitisations are separate but closely related concepts. A loan sale involves the lender selling the loan in its entirety to another institution. If that institution wants to sell it again, they have to find a buyer and negotiate a price. A loan securitisation involves the lender selling a loan (or portfolio of loans) to investors where the loan (or portfolio) is converted into rated securities, which are publicly traded. In general, loan sales are a broader measure of financing lending than securitisations. [7]

I have also estimated the regressions using alternative measures of lending activity, such as the total value of new loans and the share of applications that are approved. The results are qualitatively very similar. [8]

The averaging process also smoothes the data and minimises any ‘lumpiness’ in RMBS transactions by financial institutions. [9]

I define minority households as all African-American or hispanic households. [10]

Higher-priced loans are those with an interest rate spread to the comparable-maturity Treasury for first-lien mortgages with an annual percentage rate (APR) 3 percentage points over the Treasury benchmark and for junior liens with an APR 5 percentage points over the benchmark. A lien is the legal claim of the lender upon the property for the purpose of securing debt repayments. The lien given the highest priority for repayment is the first lien; any other liens are junior liens. Because junior liens are less likely to be repaid, they are a higher risk to the lender than the first lien. In the US mortgage market, junior liens can include home equity loans and home equity lines of credit. [11]

There are at least two problems with using the share of high-priced loans as an indicator of risky or subprime lending. I talk about these issues in more detail in Appendix D. [12]

The estimated share of loan sales in the HMDA data (36 per cent) is significantly lower than the share of loan securitisations suggested by the US flow of funds (63 per cent) over the corresponding period. This is mainly because the flow of funds estimate is based on aggregate mortgage data while the HMDA estimate is based on bank-level mortgage data. The different estimates reflect the distribution of loan sales across lenders of different sizes. There is a large number of small banks in the United States that sell few loans, which implies that the bank-level mean estimate is lower than the aggregate estimate. Aggregating the HMDA data to the national level, the share of loans sold is about 60 per cent, which is similar to the flow of funds estimate. [13]