Bayes' Bonds

It is so common as to be cliché to write an explainer of Bayes' theorem. Even as hordes of hypothetical doctors with dodgy diagnostics startle innumerate students with misdiagnoses on the pages of a textbook, Bayes never seems to stray further into our everyday lives. So here, rather than mysteriously trapping goats behind doors, I am going to focus on portfolios of loans. This simplified example illustrates the complexity of how multiple assets interact, and therefore some intuition about risk in portfolio construction. Managing this risk is at the heart of how banks, funds, insurers, ETFs, securitisations, all function.

Ironically we won't quite make it to Bayes. We will develop some intuition for his younger brother conditional probability as well as castmates from the Statistics Extended Universe: Ordinary Least Squares regression, and correlation. By the end we'll understand how this maths went wrong and was partly responsible for the 2008 crash.

The Problem

Let's start with the problem: I have a portfolio which contains two loans. For simplicity we'll say that both loans have a value of \(1\) and mature in a single time period. For not simplicity we'll say both of these loans are \(50\%\) correlated and return nothing on default. Draw the histogram of returns.

It was at this point that the analyst I first posed this to had a mini-meltdown (before unbeknownst to her a fun-filled full day of mathematical adventure)

Ok before going further see if you can get it based off of your innate genius.

Drag the three bars to sketch the return distribution

Total probability 100.0%

First we can add some structure to the problem and work out which bits we know, and which we don't. The probability tree below immediately shows us roughly what we're looking for: we need three bars for the three outcomes

  1. I get all my money returned to me \([2]\)
  2. One of the loans defaults and half (\(1\)) is returned to me
  3. Nothing comes back - sad

Then it's also clear that we know the first line of probabilities: 10% of the time one loan defaults, so 90% of the time that first loan survives. What we don't know yet is how the next loan behaves as a result of the first surviving or dying. In other words all we need now is the probability that the second loan lives given that the first lives (or vice versa).

Conditional Probability

The probability that one thing happens given another. The probability that Mike shagged Meghan given that Daisy told me. The probability that Trump isn't in the Epstein files given that Karoline Leavitt said so. The probability I'll be fired given my colleague just was. Conditional probability, updating your beliefs based on new information, is something we unknowingly and at times unwillingly encounter every day. Formally we write it as \(P(D_2 \mid D_1)\), the probability loan 2 defaults given loan 1 has.

Mathematically, computing a conditional probability amounts to shrinking our sample space to a particular event. Let's think about a simpler example to develop an intuition: rain passing through two coloured shelves.

When we look at A we ignore all the rain either side. The probability some rain filters through B as well, that rain filters through B given A, only depends on the amount of overlap between A and B. Through this lens the formula for conditional probability should make sense. For B given A, \(P(B \mid A)\):

  1. We look at the amount of overlap between A and B: \(P(A \cap B)\)
  2. We exclusively focus on A i.e. divide by the probability of A: \(P(A)\)

Armed with this intuition we can derive the formula for conditional probability:

\[ P(B \mid A) = \frac{P(A \cap B)}{P(A)} \]

Great so all I do is plug in the probability that loan 1 defaults and loan 2 defaults \(P(D_1 \cap D_2)\), divide by the probability loan 1 defaults which I know is \(0.1\) and we're done. Ok probability that loan 1 defaults and loan 2 defaults. Well that's just...

hmm. Rainclouds dropping red knickers into blue drawers or whatever were unrelated activities. My loans are correlated. What does that mean?

Regression

The two loans are 50% correlated. So we need to understand the relationship between them. Let's start with a simplified example of the linear relationship between two variables. Let's look at Edgar Anderson's famous dataset of Iris plants petals and sepals.

Ordinary Least Squares (OLS) answers the question: given my data on petals and sepals what is the straight line (\(\hat{y} = \beta_0 + \beta_1 x\)) that best fits them. "Best" here means minimise the sum of the squared vertical distances (the residuals) between each observed value (\(y_i\)) and the line's prediction \(\hat{y}_i\). In standard OLS we treat \(x\) as the input and minimise the error in \(y\). Squaring does three things: 1) it means errors above and below the line (positive or negative) have the same sign, 2) it penalises larger errors more than smaller ones, and 3) because the "loss surface" it creates is a parabola there is one unique minimum - one answer to rule them all.

Now in applying OLS I've already made a few assumptions: 1) that the data is pulled from some continuous distribution i.e. that my petals and sepals could have any length within, less than, or beyond what Anderson measured, 2) that the data is linear i.e. there is a straight line relationship between the variables. How good are these assumptions? Not great! For assumption 1) look at Sepal vs Petal length for all three species . Versicolor and Virginica look like they may come from one continuous distribution but Setosa is clearly doing its own thing. For assumption 2) the famous Anscombe's quartet neatly shows how linearity can be misleading. Always eyeball your data. Simplicity is a virtue here but no one is going to thank you for giving them a wrong answer.

There are subtler ways yet of getting the wrong answer. At its heart OLS's residuals are assuming all of our measurements are equally wrong (or equally right for the American readers!). Consider if we were stood below a rocket (flame retardant) and eyeballing its height over time. As it gets further away my ability to judge its height decreases. This results in something that economists love, heteroskedasticity, in which the variance of the model error is systematically related to the underlying data in the model. Anyway let's move on with our lives.

Correlation

Correlation is a measure of relatedness between two things we measure; for a change in one fish (red) what happens to two fish (blue). Many of these are nonsense. The Pearson correlation coefficient is computed as:

\[ \rho = \frac{\text{Cov}(X, Y)}{\text{SD}(X) \times \text{SD}(Y)} \]

where the covariance is:

\[ \text{Cov}(X, Y) = \frac{1}{n} \sum_{i=1}^{n} (x_i - \bar{x})(y_i - \bar{y}) \]

If we expand the product:

\[ = \frac{1}{n} \sum_i (x_i y_i - x_i \bar{y} - \bar{x} y_i + \bar{x}\bar{y}) \]

Split into four sums:

\[ = \frac{1}{n}\sum_i x_i y_i \;-\; \bar{y}\underbrace{\frac{1}{n}\sum_i x_i}_{=\,\bar{x}} \;-\; \bar{x}\underbrace{\frac{1}{n}\sum_i y_i}_{=\,\bar{y}} \;+\; \bar{x}\bar{y} \]

The middle two terms are both \(\bar{x}\bar{y}\), so one cancels with the last term:

\[ = \frac{1}{n}\sum_i x_i y_i \;-\; \bar{x}\bar{y} \]

Now this is the sample analogue of \(E[XY] - E[X]E[Y]\), where \(\bar{x}\) is a sample mean and \(E[X]\) is an expectation value of population mean. One represents a constant parameter rather than a measured random variable. So back to our original problem we can see that Covariance \(= P(\text{both default}) - P(D_1) \times P(D_2)\), which is how much more often they default together than you'd expect if they were independent. The final piece of the puzzle was \(P(\text{both default})\). We know the correlation so if we rearrange:

\[ \rho = \frac{\text{Cov}(X, Y)}{\text{SD}(X) \times \text{SD}(Y)} \]

Substituting our found formula for covariance:

\[ \rho = \frac{E[XY] - E[X]E[Y]}{\text{SD}(X) \times \text{SD}(Y)} \]

Rearranging for the joint expectation:

\[ E[XY] = \rho \cdot \text{SD}(X) \cdot \text{SD}(Y) + E[X] \cdot E[Y] \]

We know from the question that \(\rho = 0.5\), and \(E[X] = E[Y] = 0.1\). This article is already quite long so I will just tell you that if \(X\) is a Bernoulli trial with probability \(p\), then:

\[ E[X] = \sum_{k=0}^{1} k \, P(X=k) = 0 \cdot (1-p) + 1 \cdot p = p \]

\[ E[X^2] = \sum_{k=0}^{1} k^2 \, P(X=k) = 0^2 \cdot (1-p) + 1^2 \cdot p = p \]

Hence:

\[ \text{Var}(X) = E[X^2] - \{E[X]\}^2 = p - p^2 = p(1-p) \]

and so \(\text{SD}(X) = \sqrt{p(1-p)}\). Which means:

\[ P(\text{both default}) = \rho \cdot \sqrt{p(1-p) \cdot p(1-p)} + p \cdot p \]

\[ = 0.5 \times \sqrt{0.09 \times 0.09} + 0.01 = 0.5 \times 0.09 + 0.01 = 0.055 \]

Think of this as: the independent part (\(0.01\)) plus the extra clustering from correlation (\(0.045\)). Now we can use the formula for conditional probability to solve the rest:

Get \(P(D_2 \mid D_1)\):

\[ P(D_2 \mid D_1) = \frac{P(\text{both default})}{P(D_1)} = \frac{0.055}{0.10} = 0.55 \]

Get \(P(D_2 \mid \overline{D_1})\): The remaining default probability has to go somewhere:

\[ P(D_2 \mid \overline{D_1}) = \frac{P(D_2) - P(\text{both default})}{1 - P(D_1)} = \frac{0.10 - 0.055}{0.90} = 0.05 \]

So overall we can see

10.0%
0.50

Higher Ground

What is interesting is that two loans is the maximum number for which we could solve this equation. With 2 loans, knowing the PDs and one pairwise correlation gives you \(P(\text{both default})\), and from that you can derive the entire tree, as we have just done. With three loans you know all three pairwise correlations but you don't know how all three behave together. Consider two hypotheticals:

10.0%
0.35
0.020

Both worlds can have identical pairwise correlations but wildly different \(P(A \cap B \cap C)\). You'd need a third order parameter to resolve the ambiguity. For \(n\) loans, I need \(n\) orders of parameter. You can see that for any normal asset pool this explodes fast and because of the curse of dimensionality it's effectively impossible to estimate \(n\)th order parameters. Defaults, thankfully, are rare enough as it is.

Instead in finance we use a copula, Latin for link. This is an assumption that takes the pairwise correlations and fills in all the needed higher-order dependencies. It's an elegant piece of maths but a great example of how we can be blinded by our models - the 2008 financial crisis was partly due to the inappropriate use of a Gaussian copula.

Let's look at what it does to understand this:

10.0%
0.55

The copula doesn't change how risky each loan is individually. It changes where the dots cluster, the higher order dependencies. More dots in the bottom-left red corner = more simultaneous defaults = more catastrophic losses. The Gaussian copula says: "generate the dot pattern using a multivariate normal distribution." Compare it with two other regularly used copulas: Student-t and Clayton.

10.0%
0.55

In 2008, reality looked more like Clayton with lower tail dependence. Assets that were fine normally suddenly all defaulted together in the crash, more than the Gaussian copula predicted. When liquidity completely dries up, nothing becomes "safe", so assets which were simulated as fine in fact were not.

In 2026 as Trump bombs Iran, the headline of the FT reads: Stocks and bonds slump in tandem as Iran shock leaves investors 'nowhere to hide'. Perhaps you can now see where the Gaussian went wrong.