> For the complete documentation index, see [llms.txt](https://insights.windhamlabs.com/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://insights.windhamlabs.com/risk-management/factor-analysis.md).

# Factor Analysis

Financial analysts are concerned with common sources of risk that contribute to changes in security prices, called **factors**. By identifying these factors, analysts may be able to control a portfolio’s risk more efficiently, and perhaps even improve its return.

This post will discuss the first of two common approached used to identify factors. The first, called **factor analysis**, allows analysts to isolate factors by observing common variations in the returns of different securities. These factors are merely statistical constructs that represent some underlying source of risk (which may or may not be observable). The second approach, called **cross-sectional regression analysis**, requires that we define a set of attributes that measure exposure to an underlying factor and determine whether or not differences across security returns correspond to differences in these security attributes.&#x20;

## Factor Analysis

Let us first begin with an analogy that will highlight the insight behind factor analysis. Suppose we wish to determine whether or not there are common sources of intelligence in students, based on the grades of 100 students in the following nine courses: algebra, biology, calculus, chemistry, composition, French, geometry, literature, and physics.

* First, we compute the correlation between the algebra grades of all 100 students and their grades in each of the other eight courses
* Next, we compute the correlations between the biology grades of all 100 students and their grades in each of the other seven courses
* We continue until we have computed the correlations between the grades of every pair of correlations – 36 in all (shown below).

|             | Bio. | Calc. | Chem. | Comp. | Fre. | Geo. | Lit. | Phy. |
| ----------- | :--: | :---: | :---: | :---: | :--: | :--: | :--: | ---- |
| Algebra     | 0.41 |  .93  |  .52  |  .31  |  .35 |  .88 |  .29 | .59  |
| Biology     |      |  .39  |  .94  |  .49  |  .44 |  .50 |  .31 | .90  |
| Calculus    |      |       |  .42  |  .29  |  .33 |  .95 |  .38 | .60  |
| Chemistry   |      |       |       |  .37  |  .41 |  .47 |  .40 | .91  |
| Composition |      |       |       |       |  .87 |  .28 |  .94 | .35  |
| French      |      |       |       |       |      |  .32 |  .89 | .46  |
| Geometry    |      |       |       |       |      |      |  .38 | .55  |
| Literature  |      |       |       |       |      |      |      | .43  |

That all of these correlations are positive suggests the presence of a pervasive factor (probably related to study habits). In addition to this factor, there appear to be three other factors or commonalities in performance.

First, the variation in algebra grads is highly correlated with the variation on calculus and geometry grades. Moreover, performance in calculus is highly correlated with performance in geometry. The three grades, however, are not nearly as highly correlated with the grades in ANY of the other six courses. Therefore, we conclude that there is a common aptitude that underlies performance in these three courses.

Second, performance in biology is highly correlated with performance in chemistry and physics, and performance in chemistry is highly correlated with performance in physics. Again, performance in these courses does not correspond as closely with performance in any other course. We may therefore again conclude that there is a common source of aptitude associated with biology, chemistry, and physics.

Finally, the grades in composition, French, and literature are all highly correlated with each other, but not with the grades of any other courses. This leads us to deduce the presence of a third factor.

Our next task is to identify these factors, which is where our intuition comes into play. We may reasonable conclude that one of the common sources of scholastic aptitude is skill in mathematics or quantitative methods, because we observe high correlations between the three math courses. Aptitude in science appears to be another common factor, given the high correlations in the three science courses. Finally, verbal aptitude seems to be another common factor, due to the high correlations in French, composition, and literature.

We do not actually observe the underlying factors; we merely observe that a student who performs well in algebra is more likely to perform well in geometry or calculus than in French. From this observation, we infer that there is a particular aptitude that helps to explain performance in algebra, calculus, and geometry—but not in French. **The aptitude is the factor.**

We should note that these results do not imply that performance in a certain course is explained by a single factor. If such were the case, we would only observe correlations of 1 and 0. This point is underscored by the fact that the variation in physics grades (science) is more highly correlated with performance in math courses than it is with French, literature, or composition. This result is intuitively pleasing in that physics depends more on mathematics than French, literature, or composition. We may therefore conclude that performance in physics is primarily explained by aptitude in science, but that it is also somewhat dependent on aptitude in math as well.

### Factors in Stock Returns

Ok, so how do we apply this thought process to the stock market? Let’s assume instead that we wish to determine the factors that underlie performance in the stock market. We begin by calculating the daily returns of a representative sample of stocks during some period. *(In this study, the stocks are analogous to courses, the days in the period are analogous to students, and the returns are analogous to grades!).*

To isolate the factors that underlie stock market performance, we begin by computing the correlations between the daily returns of each stock and the returns on every other stock. Then, we seek out groups consisting of stocks that are highly correlated with each other, but not with stocks outside the group.

For example, we might observe that stock 1’s returns are highly correlated with the stocks of 12, 21, 39, 47, 55, 70, and 92, and that the returns of all the other stocks in this group are all highly correlated with each other. From this observation, we may conclude that the returns of these stocks are explained, at least in part, by a common factor. We proceed to isolate groups of stocks whose returns are highly correlated with each other, until we isolate all the groups that seem to respond to a common source of risk.

Our next task is to identify the underlying source of risk for each group. Suppose that a particular group consists of utility companies, financial companies, and a few other companies that come from miscellaneous industries but that all have especially high debt-to-equity ratios. We might reasonably conclude that interest rate risk is a common source of variation in the returns of this group of stocks. Another group might be dominated by stocks whose earnings depend on the level of energy prices; we may thus deduce that the price of energy is another source of risk. Yet another group might include companies across many different industries that all derive a large fraction of their earnings from foreign operations; we might conclude that exchange risk is another factor.

We must first rely on our intuition to identify the factor that underlies the common variation in returns among the member stocks. Then, we can test our intuition as follows:

1. We define a variable that serves as a proxy for the *unanticipated* change in the factor value<br>
2. We regress the returns of stocks that seem to depend on our hypothesized factor with the unanticipated component of the factor value

It is important that we isolate the unanticipated component of the factor value, because stock prices should not respond to an anticipated change in a factor. It is new information that causes investors to reappraise the prospects of a company.

Suppose we identify inflation as a factor. If the Consumer Price Index is expected to rise 0.5% in a given month, and it rises precisely by that amount, the prices of inflation-sensitive stocks should not change in response. If, however, the CPI rises 1.5%, then the prices of these stocks should change in response. In order to test whether or not a particular time series represents a factor, we must model the unanticipated component of its changes.

A reasonable approach for modeling the unanticipated component of inflation is to regress inflation on its prior values under the assumption that the mark’s outlook is conditioned by past experience. The errors, or residuals, from this regression represent the unanticipated component of inflation. We thus regress these residuals on the returns of the stocks we believe to be dependent on an inflation factor to determine is inflation is indeed a factor.

This approach is heuristic, designed to expose factors by identifying groups of stocks with common price variations. Its intuitive appeal is offset by the fact that it produces factors that explain only part of the variation in returns. Moreover, these factors are not necessarily independent of each other.

With a more advanced mathematical technique (*called maximum likelihood factor analysis*), we can identify several linear combinations of securities, comprised of both long and short positions, that explain virtually all the covariation in the returns of a sample of securities. These linear functions are called **eigenvectors**, and the sensitivity of a particular security to an eigenvector is called an **eigenvalue**.

Instead of groups of highly correlated stocks, this approach yields precise linear combinations of stocks that represent independent sources of common variation in returns. In effect, the eigenvectors are the factors. Not only are the factors derived in this fashion independent of each other, but we can derive as many factors as necessary to explain as much of the covariation in a portfolio as we would like.

In order to label these factors, we proceed as described earlier. We determine whether or not the returns of these linear combinations of stocks correlate with the unanticipated changes in the variables that proxy for the factors. Within this context, we represent a security’s return as follows:

$$
R\_i=\alpha\_i+\beta\_{i1}F\_1+\beta\_{i2}F\_2+...+\beta\_{in}F\_n+\xi\_i
$$

where $$R\_i$$ is the return of security $$i$$ , $$\alpha\_i$$ is a constant, $$\beta\_{in}$$ is the sensitivity of security $$i$$ to the $$n^{th}$$  factor, $$F$$ , and $$\xi\_i$$ is the unexplained component of security $$i$$ 's return.

### Issues of Interpretation

Factors derived through factor analysis, whether we employ the heuristic approach or the more formal approach, are not always amenable to interpretation. It may be that a particular factor cannot be proxied by a measurable economic or financial variable. Instead, the factor may reflect a combination of several influences, some perhaps offsetting, that came together in a particular way unique to the selected measurement period and the chosen sample of securities. In fact, factors may not be definable.

We thus face the following trade-off with factor analysis. Although we can account for nearly all of a sample’s common variation in return with independent factors, we may not be able to assign meaning to these factors, or even know if they represent the same sources of risk from period to period or sample to sample. Next, we’ll consider an alternative procedure called cross-sectional regression analysis.

## Cross-Sectional Regression Analysis

As we described earlier, factor analysis reveals covariation in returns, and challenges us to identify the sources of covariation. Cross-sectional regression analysis, on the other hand, requires us to specify the sources of return and *challenges us to affirm that these sources correspond to differences in return.*

We proceed as follows. Based on our intuition and prior research, we hypothesize attributes that we believe correspond to differences in stock returns. For example, we might believe that highly leveraged companies perform differently from companies with low debt, or that performance varies according to industry affiliation. In either case, we are defining an *attribute*—not a factor. The factor that causes low-debt companies to perform differently from high-debt companies most likely has something to do with interest rates. Industry affiliation, of course, measures sensitivity to factors that affect industry performance (such as military spending or competition).

Once we specify a set of attributes that we feel measure sensitivity to the common sources of risk, we perform the following regression. We regress the returns across a large sample of stocks during a given period—say a month—on the attribute values of the stocks as of the beginning of that month. Then, we repeat this regression over many different periods. If the coefficients of the attribute values are not zero and are significant in a sufficiently high number of the regressions, we conclude that differences in return across the stocks relate to the differences in their attribute values.

According to this approach, a security’s return in a particular period equals:

$$
R\_i=\alpha+\theta\_{i1}\gamma\_1+\theta\_{i2}\gamma\_2+...+\theta\_{in}\gamma\_n+\xi\_i
$$

where $$R\_i$$ is the return of security $$i$$ , $$\alpha\_i$$ is a constant, $$\gamma\_n$$ is the marginal return to attribute $$n$$ , $$\theta\_{in}$$ is the  attribute $$n$$of security $$i$$, and $$\xi\_i$$ is the unexplained component of security $$i$$ 's return.

It is not necessary for the coefficient $$\gamma$$  in the above formula to be significantly positive or negative on average over all the regressions. The attribute $$\theta$$ is a measure of sensitivity to some underlying factor. Suppose the attribute it affiliation with industries that benefit from military spending. If there is an unexpected increase in military spending in a particular period, the coefficient of this attribute will be positive. If military spending declines, the coefficient will be negative. The average value for the coefficient over many regressions may be zero, but the attribute would still be important if the coefficient were not zero in a large number of the regressions.

We can measure the extent to which a coefficient is significant in a particular regression by its t-statistic. The t-statistic equals the value of the coefficient divided by its standard error. A t-statistic of 1.96 implies that the likelihood of observing a significant coefficient by chance is only 5%. In order to be confident that a particular attribute helps to explain differences across security returns, we should observe a t-statistic for its coefficient of 1.96 or greater in more than 5% of regressions. Otherwise, it is possible that the attribute occasionally appears significant merely by chance.

### Which is better?

There are pros and cons to both factor methods. Through factor analysis, we can isolate independent sources of common variation in returns that explain nearly all of a portfolio’s risk. It is not always possible, however, to attach meaning to these sources of risk. They may represent accidental and temporary confluences of myriad factors. Because we cannot precisely define these factors, it is difficult to know whether they are stable or simply an artifact of the chosen measurement period or sample.

As an alternative to factor analysis, we can define a set of security attributes we know are observable and readily measurable and, through cross-sectional regression analysis, test them to determine if they help explain differences in returns across securities. With this approach we know the identity of the attributes, but we are limited in the amount of return variation we are able to explain. Moreover, because the attributes are typically codependent, it is difficult to understand the true relationship between each attribute and the return. Which approach is more appropriate depends on the importance we attach to the identity of the factors versus the amount of return variation we hope to explain with independent factors.

## Why Bother With Factors?

At this point, you may be questioning why we bother to search for factors or attributes in the first place. Why not address risk by considering the entire covariance matrix instead?

There are two reasons why we might prefer to address risk through a limited number of factors. A security’s sensitivity to a common source of risk may be more stable than its sensitivity to the returns of all the other securities in the portfolio. If this is true, then we can control a portfolio’s risk more reliably by managing its exposure to these common sources.

The second reason has to do with parsimony. If we can limit the number of sources of risk, we might find that it is easier to control risk and to improve return simply because we are faced with fewer parameters to estimate.