# Why does the chaos game converge to the Sierpinski triangle? April 28, 2020

Here is a simple process— also known as the “chaos game”— to generate a shape:

- Draw an equilateral triangle on a piece of paper and mark a random initial point.
- Mark the next point midway to one of the vertices of the triangle, chosen randomly.
- Repeat step 2 ad infinitum or ad nauseum, whichever comes first.

If you haven’t seen this before (and maybe even if you have): what shape do you expect to emerge? Now, try simulating the game below:

From this point on, I’ll assume that you’ve either tried the simulation above or already know its result: a fascinating fractal shape— the Sierpinski triangle— appears to emerge from the game. While there are a number of resources online that explore the properties of the Sierpinski triangle, I haven’t yet found explanations for *why* the chaos game produces this specific shape, or any of these other related questions^{1}:

- Does the game work with non-equilateral triangles too?
- Is it important that the next point is chosen midway? Would other ratios also work?
- Do we have to pick vertices randomly? Could we pick them sequentially or via some other method instead?
- Does this game also work with squares, pentagons and other polygons?

Figuring out— deeply understanding— these answers is the objective of this post. I’ll try to combine intuition with interactive diagrams to make formal, rigorous arguments accessible.

Here is an outline: first, we’ll devise a formal representation to help us clearly pose the above questions; this representation is central to proving that the chaos game does indeed converge to a Sierpinski triangle. The proof follows in three parts (or lemmas) that each address a different facet of how the chaos game works. Finally, we’ll apply the insights from our proof to answer the above questions and more.

Let’s begin!

## What is a Sierpinski Triangle?

There are a couple of ways to construct a Sierpinski triangle. Perhaps the most common approach starts with an equilateral triangle and *removes* the inner triangle formed by the midpoints of the three sides. This results in three smaller triangles that are recursively edited in the same way (see Figure 2(a)). Another approach— one that we’ll be using— starts with the perimeter of an equilateral triangle, and then connects the midpoints of the three sides. This again results in three smaller triangles that are recursively edited in the same way (see Figure 2(b)).

With either approach, it’s clear this shape is recursive: each Sierpinski triangle is composed of three smaller Sierpinski “*sub-triangles*” which are each in turn composed of three even small sub-triangles and so on. Each Sierpinski sub-triangle looks the same as the original Sierpinski triangle except that it’s been scaled down and shifted towards one of three triangle vertices.

**How do we (mathematically) represent Sierpinski sub-triangles?** Let’s move from these visual constructions to a formal mathematical representation. We want to represent sub-triangles in a way that lets us define the Sierpinski triangle recursively as we have above. A geometric approach that specifies the coordinates of its vertices may appear straightforward, but doesn’t naturally recurse and quickly becomes unwieldy^{2}.

Instead, note that any Sierpinski sub-triangle \(x\) is fully specified by which of the three sectors (or “tridrants”) \(x\) belongs to in each sub-triangle bigger than \(x\). For example, in Figure 3, \(x\) is uniquely located by taking the first tridrant of the base triangle, followed by the third tridrant in the resulting sub-triangle and finally the second tridrant thereafter. In other words, we can identify any sub-triangle using a *ternary* string representing this path; in the above example \(x = \texttt{132}\). I encourage you to experiment with other ternary strings to get a feel for this representation.

We will shortly see how this representation recurses more naturally, but let’s formalize our representation first:

**Definition (Sub-triangle at prefix \(x\)).** * Suppose \(T(x)\) is a sub-triangle at prefix \(x\) with vertices \(\texttt{1}\), \(\texttt{2}\) and \(\texttt{3}\). The sub-triangle \(T(x \cdot \texttt{1})\) (and by analogy, \(T(x \cdot \texttt{2})\) and \(T(x \cdot \texttt{3})\)) is defined to be triangle formed by the vertex \(\texttt{1}\) and the midpoints of its two edges. The sub-triangle at the empty prefix \(T(\varnothing)\) is defined to be the given base triangle. We’ll use \(T\) without a prefix to mean \(T(\varnothing)\). *

We are now ready to formally define a Sierpinski sub-triangle as the union of the sub-triangles formed by its three tridrants:

**Definition (Sierpinski Triangle and Sierpinski Sub-triangle at prefix \(x\)).** * Let the \(T(x)\) be the (non-Sierpinski, simple) sub-triangle at prefix \(x\) defined above. The Sierpinski sub-triangle at prefix \(x\), \(S(x)\), is the union of \(T(x)\) and the Sierpinski sub-triangles in each tridrant of \(T(x)\):*

\[\begin{equation} S(x) = T(x) \cup S(x \cdot \texttt{1}) \cup S(x \cdot \texttt{2}) \cup S(x \cdot \texttt{3}). \end{equation}\]

*A Sierpinski triangle is then the Sierpinski sub-triangle at the empty prefix \(S(\varnothing) = T(\varnothing) \cup S(\texttt{1}) \cup S(\texttt{2}) \cup S(\texttt{3})\). We’ll use \(S\) without a prefix to mean \(S(\varnothing)\). *

## A proof in three parts

With that out of the way, we’re ready to prove that the chaos game converges to the Sierpinski triangle \(S\). At the high level, our proof consists of three key lemmas:

- First, we’ll show that a step of the chaos game always moves a point on the Sierpinski triangle to
*another*point on it: once the game touches the \(S\), it remains there. - Second, we’ll show that for any point off the Sierpinski triangle, the chaos game halves the distance to a point on it. In 10-20 steps, the point off \(S\) is practically indistinguishable from a similar point on \(S\).
- Finally, and this is subtle, we’ll show that randomness ensures the chaos game (eventually) get arbitrarily close to every point on \(S\). This guarantees that the chaos game generates the whole Sierpinski triangle.

##### Part 1: What starts on the triangle, stays on the triangle

Recall that a single step of the chaos game moves points midway to one of the three vertices of the triangle. Let’s characterize what happens to points *on the Sierpinski triangle \(S\)* under this transformation. Instead of reasoning about the trajectory of individual points on \(S\), we’ll leverage our sub-triangle representation to reason about how the entire sub-triangle \(T(x)\) is transformed: Without loss of generality, let’s say that the chaos game chose to move midway towards the vertex \(\texttt{1}\); every point in the triangle \(T(x)\) is translated to a similar point in the first tridrant of \(T\). As a result, the transformed \(T(x)\) is equivalent to a sub-triangle constructed by first choosing the tridrant \(\texttt{1}\) followed by the tridrants specified by \(x\): by definition, this is the sub-triangle \(T(\texttt{1}\cdot x)\), a point on the Sierpinski triangle! This logic applies to the remaining vertices as well, completing our argument. A more formal statement of this lemma follows, but see Figure 4 for a proof-by-picture.

**Lemma 1.** * Let \(p \in S\) be a point on the Sierpinski triangle and let \(x\) be the shortest prefix such that \(p \in T(x)\) and let \(q\) be \(p\) after a single step of the chaos game. Then, \(q \in T(a \cdot x) \subset S\) where \(a\) is the vertex chosen by the game. *

**Proof (by picture).** See Figure 4.

\(\blacksquare\)

##### Part 2: Meet the twin

Lemma 1 says that *if* we happen to start the chaos game from a point on the Sierpinski triangle \(S\), we are guaranteed to stay on it. What happens when we start from a point *not* on \(S\)?

It’s fairly obvious that if we start far away from \(S\), then moving midway to one of the vertices approximately *halves* the distance to the triangle. In the following lemma, we’ll show this is always the case; in fact, there is a *“twin point”* on the Sierpinski triangle which follows the same transformations that we are halving the distance to in each turn. **After about 10-20 iterations, our original point (which didn’t start on the Sierpinski triangle) and this twin point (which was always on the Sierpinski triangle) are practically indistinguishable from each other (Figure 5)**. Thus, we can safely ignore points outside of \(S\) when talking about the chaos game.

**Lemma 2.** * Let \(p_0\) be some point off the Sierpinski triangle, \(q_0 \in S\) be its closest point (its twin point) on the Sierpinski triangle and \(\epsilon_0 = \|p_0 - q_0\|\) be the distance between the two. If \(p_n\) is the point visited after \(n\) steps of the chaos game when starting from \(p_0\) and \(q_n\) the analogous point if we had started from \(q_0\) instead, then the distance between \(p_n\) and \(q_n\) decreases exponentially for each step of the game: \[\begin{align}
\| p_n - q_n \| &= \frac{\epsilon_0}{2^n}.
\end{align}\]*

*As a straightforward corollary, the closest distance of \(p_n\) to the Sierpinski triangle is at most \(\frac{\epsilon_0}{2^n}\). *

**Proof.** Each step of the chaos game moves the points \(p_{0}\) and \(q_{0}\) midway to the some vertex \(v_0\): \[\begin{align*}
p_1 &= \frac{p_{0} + v_0}{2} &\quad q_1 &= \frac{q_{0} + v_0}{2} \\
&\vdots &\quad &\vdots \\
p_n &= \frac{p_{n-1} + v}{2} &\quad q_n &= \frac{q_{n-1} + v}{2}.
\end{align*}\]

We can now express the distance between \(p_n\) and \(q_n\) using the above relations: \[\begin{align*} \| p_n - q_n \| &= \left\| \frac{p_{n-1} + v}{2} - \frac{q_{n-1} + v}{2} \right\| &= \frac{\| p_{n-1} - q_{n-1} \|}{2} \\ &= \frac{\left\| \frac{p_{n-2} + v}{2} - \frac{q_{n-2} + v}{2} \right\|}{2} &= \frac{\| p_{n-2} - q_{n-2} \|}{2 \times 2} \\ &\vdots \\ &= \frac{\| p_{0} - q_{0} \|}{2^n} \\ &= \frac{\epsilon_0}{2^n}. \end{align*}\]

\(\blacksquare\)

##### Part 3: One from all and all from one

Let’s recap: Lemma 1 and Lemma 2 together show that no matter where we start from, the chaos game (effectively) *only* visits points on the Sierpinski triangle. In this last part, we’ll address a more subtle question: does the chaos game reach *every* point on the Sierpinski triangle? The subtlety arises because there are infinite points on the triangle and it would take infinite time to visit them all— we need to clarify what it means to visit “every” point on the triangle. More precisely, we’re going to ask if we can get *arbitrarily close* (within any \(\epsilon\)) to any point \(p \in S\) on the Sierpinski triangle in *finite* steps. The arguments in this section are a bit more technically involved; if proofs aren’t your thing, I recommend you skim this section and treat Figure 6 as a proof-by-demonstration.

Before we begin, we’ll need to solve two more representational problems:

**Identifying a single point on the Sierpinski triangle.** First, we need to identify a single point \(p \in S\). The prefix representation \(T(x)\) only describes a *set* of points, but also using an angle \(\theta\) lets us specify a point on its perimeter. Try using \(x\) and \(\theta\) to pick a point in Figure 6.

**Representing a region “within \(\epsilon\)” of a point.** Next, we need to describe a region around a point \(p\) such that every other point in that region is at most \(\epsilon\) away from it.

To do so, note that the edge length of a sub-triangle *halves* every time we add another trit to its prefix. Because the edge length is the longest distance between two points on or inside an equilateral triangle, the distance between any two points in \(S(x)\) is at most \(2^{-n}\), where \(n\) is the length of the prefix \(x\).

Next, observe that for any \(n\), every point \(p \in S\) must belong to one of the \(3^n\) possible prefixes of length \(n\): if \(n\) is small enough, namely \(n = \lceil -\log_2 \epsilon \rceil\), and \(x\) is the length-\(n\) prefix of the sub-triangle containing \(p\), then every other point \(q \in S(x)\) is at most \(\epsilon\) from \(p\). We’ll denote this special prefix \(x(p, \epsilon)\).

The rest of the proof follows because randomly picking vertices guarantees that the prefix \(x(p, \epsilon)\) is generated:

**Lemma 3.** * Let \(q_n \in S\) be a point visited by the \(n\)-th step of the chaos game. For any point \(p \in S\), the probability that some \(q_m\) for \(m < n\) is within \(\epsilon\) of \(p\) converges to \(1\):*

\[\begin{align*} \mathbb{P}\left( \min_{m < n} \| p - q_m \| < \epsilon \right) \ge 1 - \left(1 - \frac{\epsilon^{\log_2 3}}{3}\right)^{\left\lfloor \frac{n}{-\log_2\epsilon} \right\rfloor} \to 1. \end{align*}\]

*To a first approximation, the chaos game almost certainly visits a point within \(\epsilon\) of \(p\) in \(5 \log_2\frac{1}{\epsilon} \epsilon^{-\log_2 3}\) steps: if \(\epsilon = 0.1\) (\(\approx 2^{-3}\)), this is about 600 steps and if \(\epsilon = 0.01\) (\(\approx 2^{-7}\)) in about 45,000 steps. *

**Proof.** Let \(l = \lceil -\log_2 \epsilon \rceil\). Without loss of generality, let the chaos game start at \(q_0 \in S(\alpha)\) for some prefix \(\alpha\) of length \(l\). After \(n\) steps, Lemma 1 shows that the game reaches \(q_n \in S(\beta_n \cdot \alpha)\), where \(\beta_n\) is a \(n\)-length prefix. If \(x(p, \epsilon)\) (which has length \(l\)) appears anywhere in the string \(\beta_n \cdot \alpha\), then we would have visited a \(q_m\) that is within \(\epsilon\) of \(p\) from the arguments above. What is the probability of this happening?

If we assume that \(\alpha\) was chosen randomly, then the sequence \(\beta_n \cdot \alpha\) consists of independent, random trits. The probability that a single length-\(l\) substring is equal to \(x(p, \epsilon)\) is simply \(3^{-l}\). However, because length-\(l\) substrings starting right next to each other overlap, they are not independent events; length-\(l\) substrings starting one after the other are. There are \(\lfloor \frac{n}{l} \rfloor\) of these; the probability that at least one of them are equal to \(x(p, \epsilon)\) is then \(1 - \left(1 - 3^{-l}\right)^{\lfloor \frac{n}{l} \rfloor}\). After substituting in the value of \(l\) and performing some arithmetic simplification, we get the desired result.

I’ll note that a more careful analysis can significantly tighten this bound, but this simplified version is more than sufficient for our purposes.

\(\blacksquare\)

## Revisiting questions

With all of this math behind us, let’s revisit the questions we began with. For each of these questions, you can use the advanced controls of Figure 1 to simulate what actually happens. I highly encourage you to develop a hypothesis of what would happen before trying each simulation.

**Why does this simple process lead to such a non-trivial result?**Lemma 1 shows us that the Sierpinski triangle is*invariant*to moving halfway to any vertex. This really is the defining characteristic of the Sierpinski triangle.*With this perspective, it’s almost intuitive the chaos game generates the Sierpinski triangle.***Does the game work with non-equilateral triangles as well?**Yes! None of our proofs rely on the positioning of vertices; they were entirely described using a symbolic abstraction, so any triangle will do. However, changing the positioning of the three vertices correspondingly skews the Sierpinski triangle that forms. You can simulate random triangles (and other polygons) using the control.**Is the random selection process essential?**Not really! The proof of Lemma 3 highlights that what really matters is visiting every possible permutation of vertices. Randomly picking vertices is probably the simplest way to ensure this condition, but we could also use the trits of an irrational number like \(\pi\)! You can simulate deterministic vertex sequences using the control.**How important is the fact that the next point is chosen midway?**It depends: if you want to get a proper looking Sierpinski triangle you have to choose the midpoint. All the same, let’s say that we chose our next vertex to be at \(p_n = (1-r) p_{n-1} + r v\) where \(v\) is a vertex on the triangle and \(r\) is the ratio of how much we move towards one of the vertices. The ratio \(r = \frac{1}{2}\) leads to the Sierpinski triangle \(S\), but for other ratios we can similarly define another recursive shape that our process would converge to. For example, if we chose an \(r > \frac{1}{2}\), we should expect a shape that looks very similar to the Sierpinski triangle, but “squished” towards each vertex. It turns out that for any \(0 < r < 1\), variants of the three lemmas hold, and we are guaranteed to converge to this new, unnamed shape. This is a lot of fun to play around with and I encourage you to try this out using the control.**Would the chaos game work with squares, pentagons and other polygons too?**Yes! Of course, we’ll get a different shape in each of these cases. The lemmas we proved easily extend to these new shapes and can provide a great deal of intuition for what happens. For example, if applied the chaos game to a square using the default ratio of \(\frac{1}{2}\), Lemma 1 would show that the game results in a shape equivalent to a simple boring square! On the other hand, using a ratio \(r > \frac{1}{2}\) (say \(\frac{3}{4}\)) results in a more interesting grid-like pattern. One of my favorite configurations is a hexagon with the ratio \(r = \frac{2}{3}\); it produces a perfectly tiled recursive shape also known as the Koch snowflake! Give it a try using the control.

## Endnotes

I hope you enjoyed this post! If you have follow up thoughts or questions, email me at `arunchaganty@gmail.com`

or tweet me at `@arunchaganty`

.

\(\blacksquare\)

A little bit of history about this post: I first read about the chaos game in James Gleick’s excellent book, Chaos: Making a New Science, while in high school. The book captivated my imagination and inspired me to learn how to code so that I could simulate the game myself; it also led to one of my first blog posts! While I had empirically answered the questions in this post, I was deeply unsatisfied without a theoretical explanation. Unfortunately, I lacked the mathematical maturity at the time to even formulate the questions properly. It wasn’t until I really learned to prove things during my PhD that I realized I could derive the theory myself.↩

Trust me, I’ve tried.↩