The Kuratowski definition of pairs
- sets (basic notation, cardinality)
- tuples (pairs)
Disclaimer: This unit is different from other background units in that it is quite opinionated. I will talk about the Kuratowski definition of pairs and how linguists’ attempts to leverage it for the study of sentence structure have serious problems. A lot of them stem from a misunderstanding of the status of mathematical definitions and their intended usage. After reading this unit, you will hopefully have a deeper appreciation of these issues and stay away from such misapplications of math.
Pairs as sets
As you have seen several times by now, mathematics provides many different ways of talking about the same thing. This includes the option to look at pairs as specific kinds of sets. In fact, there are many different ways of defining pairs as sets. Below is a far-from-comprehensive selection of such definitions.
We define the pair \(\left \langle a,b \right \rangle\) as the set
- Wiener
-
\(\left \{ \left \{ \left \{ a \right \}, \emptyset \right \}, \left \{ \left \{ b \right \} \right \} \right \}\)
- Hausdorff
-
\(\left \{ \left \{ a, 1 \right \}, \left \{ b, 2 \right \} \right \}\)
- Kuratowski (long)
-
\(\left \{ \left \{ a \right \}, \left \{ a,b \right \} \right \}\)
- Kuratowski (short)
-
\(\left \{ a, \left \{ a,b \right \} \right \}\)
Note that since these are definitions, they cannot be true or false. The only question is whether these are useful definitions of pairs. What exactly this means depends on why one would want to use sets instead of pairs. At the very least, though, a useful definition of pairs has to preserve the most fundamental property of pairs: \(\left \langle a,b \right \rangle = \left \langle c,d \right \rangle\) iff \(a = c\) and \(b = d\). Let us take a closer look at whether that holds for the two Kuratowski definitions.
The logic behind the long Kuratowski definition
According to the long Kuratowski definition, the pair \(\left \langle a,b \right \rangle\) is equivalent to the set \(\left \{ \left \{ a \right \}, \left \{ a,b \right \} \right \}\). The two are equivalent because they satisfy the same identity condition: \(\left \langle a,b \right \rangle = \left \langle c,d \right \rangle\) iff \(a = c\) and \(a = d\), and \(\left \{ \left \{ a \right \}, \left \{ a,b \right \} \right \} = \left \{ \left \{ c \right \}, \left \{ c, d \right \} \right \}\) iff \(a = c\) and \(b = d\). Let us verify that this indeed holds.
The right to left direction is trivial: if \(a = c\) and \(b = d\), then \(\left \{ \left \{ a \right \}, \left \{ a,b \right \} \right \} = \left \{ \left \{ c \right \}, \left \{ c,d \right \} \right \}\). In the other direction, we have to consider two separate cases depending on whether \(a = b\) or \(a \neq b\). In each case, we need to show that \(a = c\) and \(b = d\).
Case 1. Suppose \(a = b\). Then \(\{ \{a\}, \{a,b\} \} = \{ \{a\}, \{a, a\} \} = \{ \{a\}, \{a\} \} = \{ \{a\} \}\). But then \(\{ \{a\}, \{a,b\} \} = \{ \{c\}, \{c,d\} \}\) is the same as \(\{ \{a\} \} = \{ \{c\}, \{c,d\} \}\). This is possible only if \(\{ c \} = \{c ,d\}\), which implies \(c = d\). Hence \(\{ \{c\}, \{c,d\} \} = \{ \{c\}, \{c,c\} \} = \{ \{c\}, \{c\} \} = \{\{c\}\} = \{\{a\}\}\), which entails \(a = c\). As we already know that \(a = b\) and \(c = d\), we also conclude from \(a = c\) that \(b = d\).
Case 2. Now suppose that \(a \neq b\). Then either \(\left \{ a \right \} = \left \{ c,d \right \}\) or \(\left \{ a \right \} = \left \{ c \right \}\). We consider each subcase.
Subcase 2.1. Suppose towards a contradiction that \(\left \{ a \right \} = \left \{ c,d \right \}\). Since two sets with distinct cardinality cannot be identical, the equality \(\{a\} = \{c,d\}\) holds only if \(c = d\). Then \(\{ \{a\}, \{a,b\} \} = \{ \{c\}, \{c,d\} \} = \{ \{c\}, \{c\} \}\). But then we have both \(\left \{ a \right \} = \left \{ c \right \}\) and \(\left \{ a,b \right \} = \left \{ c \right \}\). This is impossible unless \(a = b\), but by our initial assumption for Case 2, \(a \neq b\). This is a contradiction, disproving the assumption of Subcase 2.1 that \(\left \{ a \right \} = \left \{ c,d \right \}\). We conclude that \(\left \{ a \right \} \neq \left \{ c,d \right \}\) after all.
Subcase 2.2. Assume, then, that \(\left \{ a \right \} = \left \{ c \right \} \neq \left \{ c,d \right \}\). This implies that \(a = c\), so it only remains for us to show that \(b = d\). Note that \(c \neq d\) because we assume \(\left \{ c \right \} \neq \left \{ c,d \right \}\). Given all these (in)equalities, \(\left \{ \left \{ a \right \}, \left \{ a,b \right \} \right \} = \left \{ \left \{ c \right \}, \left \{ c,d \right \} \right \}\) necessarily entails that \(\left \{ a,b \right \} = \left \{ c,d \right \}\), which in turn implies \(b = d\) because we already know that \(a = c\) and \(a \neq b\).
Show that the Wiener definition of pairs preserves identity. That is to say, prove that \(\left \{ \left \{ \left \{ a \right \}, \emptyset \right \}, \left \{ \left \{ b \right \} \right \} \right \} = \left \{ \left \{ \left \{ c \right \}, \emptyset \right \}, \left \{ \left \{ d \right \} \right \} \right \}\) iff \(a = c\) and \(b = d\).
It is tempting to generalize the long Kuratowski definition from pairs to tuples. For instance, the triple \(\left \langle a,b,c \right \rangle\) would correspond to the set \(\left \{ \left \{ a \right \}, \left \{ a,b \right \}, \left \{ a,b,c \right \} \right \}\). However, this definition does not preserve the identity requirement of tuples. Show that there are cases where \(\left \langle a,b,c \right \rangle \neq \left \langle d,e,f \right \rangle\) yet \(\left \{ \left \{ a \right \}, \left \{ a,b \right \}, \left \{ a,b,c \right \} \right \} = \left \{ \left \{ d \right \}, \left \{ d,e \right \}, \left \{ d,e,f \right \} \right \}\).
When does the Kuratowski definition work?
The proof above is mathematically sound, but it relies on specific assumptions of set theory. For example, we assumed that it is impossible for both \(c \neq d\) and \(\{a\} = \{c,d\}\) to be true at the same time because a set with one member can never be identical to a set with two members. While this is intuitive enough, it means that the proof is tied to a notion of sets that satisfies this property. Change how sets work, and the long Kuratowski definition will no longer be useful as an alternative view of pairs.
While the troubles with the long Kuratowski definition stop there, the short Kuratowski definition requires even more specialized assumptions. This definition defines \(\left \langle a,b \right \rangle\) as \(\left \{ a, \left \{ a,b \right \} \right \}\) instead of \(\left \{ \left \{ a \right \}, \left \{ a,b \right \} \right \}\). It is this definition in particular that seems to be a good fit for linguistics because of recent proposals that sentence structure involves sets of the shape \(\left \{ a, \left \{ a,b \right \} \right \}\).
Consider the sentence My cat snored. It is built from the verb snored and the noun phrase my cat. One common proposal is that this noun phrase is built by an operation called Merge. Given a head H and argument A, Merge combines them into the set \(\left \{ \mathit{H}, \left \{ \mathit{H}, \mathit{A} \right \} \right \}\). In the case at hand, Merge takes the head my and argument cat and combines them into \(\left \{ \mathit{my}, \left \{ \mathit{my}, \mathit{cat} \right \} \right \}\) (there are linguistic reasons for assuming my is the head rather than cat).
Applying the short Kuratowski definition to the set \(\left \{ \mathit{my}, \left \{ \mathit{my}, \mathit{cat} \right \} \right \}\) yields the pair \(\left \langle \mathit{my}, \mathit{cat} \right \rangle\), which is conceptually pleasing because it corresponds to the string my cat that we actually see in the sentence my cat snored.
What would be the set produced by merging \(\left \{ \mathit{my}, \left \{ \mathit{my}, \mathit{cat} \right \} \right \}\) with snored, assuming that snored is the head? Does this still produce the correct linear order? That is to say, does the order of words in the resulting pair correspond to the order of words in the sentence my cat snored?
Many linguists would argue that snored is itself built up from smaller parts which correspond to the verb snore and the past tense marker ed. Assuming that the past tense marker acts as the head and the verb as the argument, snored corresponds to the set \(\left \{ \mathit{ed}, \left \{ \mathit{snore}, \mathit{ed} \right \} \right \}\). What, then, would be the result of merging this set with \(\left \{ \mathit{my}, \left \{ \mathit{my}, \mathit{cat} \right \} \right \}\) as in the previous exercise? What pair does this correspond to?
Consider the sentence The dog hates my cat. Treating hates as a single element, and assuming that it is a head that first combines with my cat, we get the set \(H \mathrel{\mathop:}=\left \{ \mathit{hates}, \left \{ \mathit{hates}, \left \{ \mathit{my}, \left \{ \mathit{my}, \mathit{cat} \right \} \right \} \right \} \right \}\). Similarly, the dog corresponds to the set \(A \mathrel{\mathop:}=\left \{ \mathit{the}, \left \{ \mathit{the}, \mathit{dog} \right \} \right \}\). What is the result of combining the two if \(H\) is the head and \(A\) the argument? Can this set be interpreted as a pair via the short Kuratowski definition? If so, what is the corresponding pair?
At first glance it seems that the proof for the long Kuratowski definition also works for its short counterpart. But this is truly at first glance only. Consider the counterpart to Subcase 2.1 above. That is to say, we assume \(\left \{ a, \left \{ a,b \right \} \right \} = \left \{ c, \left \{ c,d \right \} \right \}\) and furthermore assume towards a contradiction that \(a = \{c, d\}\). The problem is that this does not result in a contradiction. We have two cases to consider: \(\left \{ a,b \right \} = c\) and \(\left \{ a,b \right \} = \left \{ c,d \right \}\). If \(\left \{ a,b \right \} = c\), then we have
\[\begin{align*} a &= \left \{ c, d \right \} \tag{Initial assumption that $a = \left \{ c,d \right \}$}\\ &= \left \{ \left \{ a,b \right \}, d \right \} \tag{Substitute $\left \{ a,b \right \}$ for $c$}\\ &= \left \{ \left \{ \left \{ c,d \right \} ,b \right \}, d \right \} \tag{Initial assumption that $a = \left \{ c,d \right \}$}\\ &= \left \{ \left \{ \left \{ \left \{ a,b \right \} ,d \right \} ,b \right \}, d \right \} \tag{Substitute $\left \{ a,b \right \}$ for $c$}\\ &\vdots\\ \end{align*}\]
And if \(\left \{ a,b \right \} = \left \{ c, d \right \}\), then by our previous assumption that \(a = \left \{ c,d \right \}\) we have
\[\begin{align*} a &= \left \{ c, d \right \} \tag{Initial assumption that $a = \left \{ c,d \right \}$}\\ &= \left \{ a, b \right \} \tag{Substitute\ $\left \{ a,b \right \}$ for $\left \{ c,b \right \}$}\\ &= \left \{ \left \{ a,b \right \}, b \right \} \tag{Substitute $\left \{ a,b \right \}$ for $a$}\\ &= \left \{ \left \{ \left \{ a, b \right \}, b \right \}, b \right \} \tag{Substitute $\left \{ a,b \right \}$ for $a$}\\ &\vdots\\ \end{align*}\]
Either way we end up with what looks like an infinite sequence of substitutions, building an infinitely deep set where we never reach the bottom. This may strike you as highly unintuitive and quite distinct from how we have used sets so far. But remember what we said about sets: a set can contain arbitrarily complex objects, including other sets. This does not rule out a set that contains a set that contains a set that contains a set, and so on, ad infinitum. We can rule it out, of course, but we don’t have to. Mathematicians have produced many different definitions of sets, which are called axiomatizations of sets. Some axiomatizations allow this kind of infinite nesting, and some do not.
The most commonly used axiomatization of sets is ZFC, which is short for Zermelo-Fraenkel with the axiom of choice. ZFC is very different from the naive notion of sets we have been using as it makes no explicit distinction between objects and collections of objects. In ZFC, everything is a set.
ZFC consists of multiple axioms. One of them, the axiom of regularity, does indeed rule out infinitely nested sets like the ones we are building above. However, this axiom isn’t very important for ZFC. One can remove it from the axioms and almost all the interesting theorems that have been established for sets in ZFC still hold.
Do we want infinitely nested sets?
There are over 100 years of mathematical research on sets that allow such infinite nesting. This branch of mathematics is known as non-well-founded set theory, and these infinite sets are sometimes referred to as hypersets. Hypersets aren’t just a mathematical curiosity, they have been used in linguistics as the mathematical foundation of situation semantics, which models how humans reason over situations. This is a very natural task that billions of humans perform effortlessly each day. It does not look like human cognition struggles with the concept of hypersets. On what grounds, then, should we ban hypersets for sentence structure?
Let \(S\) be defined as \(\left \{ a, S \right \}\). This is a hyperset. Explain why!
Can you think of an argument that sentences might be hypersets?
We can do it, of course, we can rule out hypersets with the stroke of a pen. But it would be a stipulation whose only motivation is to save the short Kuratowski definition as a convenient way of serializing the sets we see in sentence structure. It would be tantamount to claiming that human cognition does not involve hypersets when it comes to sentence structure, but at the same time it is so deeply rooted in mathematical sets that it even employs the short Kuratowski definition of pairs. That is a peculiar claim to make.
Moreover, making this claim does not tell us anything deep or profound about sentences. We are simply putting in place mathematical assumptions to get the result we want. These mathematical assumptions aren’t forced onto us by the empirical lay of the land. Whether we allow hypersets or not has no empirical consequences, it is a purely theory-internal decision. This does not constitute scientific progress, it is a mathematical mirage.
To the extent that this proposal is supposed to capture a real property, the math is at best orthogonal to it and at worst actively obscures it. The exercises earlier in this unit reveal that the short Kuratowski definition does not map the sets built by Merge to pairs with an obvious linguistic interpretation. Invoking the short Kuratowski definition has nothing to say on the problem of how the sets built by Merge are mapped to the linear sequences that we call sentences, and it is unclear what other property of language it could possibly be capturing in an insightful manner. In the absence of a clear empirical payoff, we should not open the can of worms that is hypersets, ZFC, the axiom of regularity, and correspondences between sets and pairs. And if a linguistic proposal lacks empirical substance, then it is worthless as a claim about language no matter how elegantly it was derived from mathematical set theory.