Sets: The basics

Sets are the fundamental building block of modern mathematics. Intuitively, a set is a collection of objects, but with two important twists:

Sets are unordered.
Sets contain no duplicates.

Suppose you want to keep a record of which words occur in a text. You aren’t interested in how often a given word occurred, just whether it occurs at all. Nor do you care in which order the words occurred in the text. So you are actually interested in the set of words that occur in the text.

Each property is explained in detail below, but let’s first put some helpful notation in place.

List notation

Sets are often written as lists with curly braces around them. So \(\left \{ a, b, c, d \right \}\) denotes the set containing \(a\), \(b\), \(c\), \(d\). Here \(a\), \(b\), \(c\), \(d\) are some arbitrary objects. This is known as list notation. More complex sets are defined with set-builder notation, which will be covered in a later unit.

Consider the string If John slept, then Mary left. Its set of words (ignoring sentence-initial capitalization) is \(\left \{ \text{if}, \text{John}, \text{left}, \text{Mary}, \text{slept}, \text{then} \right \}\).

Write the following as a set:

the first names of your three favorite actors/actresses,
the colors of the rainbow,
all even numbers between 1 and 11

Elements and set membership

The objects contained in a set are called its elements or members. One writes \(e \in S\) to indicate that \(e\) is an element of \(S\). The opposite is denoted \(e \notin S\): \(e\) is not an element of \(S\). The symbol \(\in\) thus indicates set membership.

Let \(W\) be the set of words in the string If John slept, then Mary left. Then it holds that \(\mathit{left} \in W\) and \(\mathit{right} \notin W\). But it is not the case that \(\mathit{then} \notin W\) or \(\mathit{awake} \in W\).

You might be wondering why we use the symbol \(\in\) for set membership. The following mnemonic, while historically inaccurate, may help you with remembering the notation: \(\in\) looks like a stylized E, and \(x \in S\) means that \(x\) is an Element of \(S\).

Sometimes \(\ni\) is used as the mirror image of \(\in\). For example, \(a \in S\) could also be written as \(S \ni a\).

Continuing the previous example, it is true that \(\mathit{left} \in W \ni \mathit{then}\). That is to say, both \(\mathit{left} \in W\) and \(\mathit{then} \in W\) are true.

Put \(\in\), \(\ni\), \(\notin\), \(\not\ni\) in the gaps below as appropriate:

\(5 \_ \left \{ 1,2,4,5,8 \right \}\)
\(6 \_ \left \{ 1,2,4,5,8 \right \}\)
\(\left \{ 5 \right \} \_ \left \{ 1,2,4,5,8 \right \}\)
\(5 \_ \left \{ 1,2,4,5,8 \right \} \_ 6\)

Sets can contain arbitrary objects, including other sets. However, the members of a set are just the elements immediately contained by the set, not what might in turn be contained inside of those elements.

The set \(\left \{ a, \left \{ b \right \} \right \}\) has two members: \(a\) and \(\left \{ b \right \}\). While \(b\) is an element of \(\left \{ b \right \}\), it is not an element of \(\left \{ a, \left \{ b \right \} \right \}\).

Put \(\in\), \(\ni\), \(\notin\), \(\not\ni\) in the gaps below as appropriate:

\(5 \_ \left \{ 1,\left \{ 2,4 \right \},5,8 \right \}\)
\(6 \_ \left \{ 1,\left \{ 2,4,5,8 \right \} \right \}\)
\(\left \{ 5 \right \} \_ \left \{ 1,\left \{ 2,4 \right \},\left \{ 5 \right \},8 \right \}\)
\(5 \_ \left \{ \left \{ 1,2,4,5,8 \right \} \right \} \_ 6\)

Lack of order

Even though we may write sets in a linear fashion as lists, they have no internal order. The set \(\left \{ a,b \right \}\) could also be written as \(\left \{ b,a \right \}\). So we have \(\left \{ a,b \right \} = \left \{ b,a \right \}\), and \(\left \{ a,b,c \right \} = \left \{ a,c,b \right \} = \left \{ b,a,c \right \} = \left \{ b,c,a \right \} = \left \{ c,a,b \right \} = \left \{ c,b,a \right \}\).

Consider the strings If John slept, then Mary left and If Mary left, then John slept. While they are clearly distinct sentences, their sets of words are identical.

For each one of the following, fill the gap with \(=\) or \(\neq\) as appropriate:

\(\left \{ a,b \right \} \_ \left \{ a,b \right \}\)
\(\left \{ b,a \right \} \_ \left \{ a,b \right \}\)
\(\left \{ b,a,c,d \right \} \_ \left \{ e,a,b,d \right \}\)

Lack of duplicates/Idempotency

Sets are idempotent, which means that duplicates are ignored. So \(\left \{ a,b \right \} = \left \{ a,a,b \right \} = \left \{ a,b,b,a,b,a,b,a,a \right \}\). It also holds that \(\left \{ a \right \} = \left \{ a,a \right \} = \left \{ a,a,a \right \}\), and so on.

Linguists distinguish between word types and word tokens. The sentence dogs love dogs contain two tokens of the type dogs, and one token of the type love. The sentences dogs love and dogs love dogs are different with respect to word tokens, but identical with respect to word types. So if you care about word types rather than word tokens, you’re dealing with a set because the only thing that matters is which words the text contains, not how many tokens of each word.

Consider the sentence If police police police, then police police police. Its set of words (ignoring capitalization) is \(\left \{ \text{if}, \text{police}, \text{then} \right \}\).

For each one of the following, fill the gap with \(=\) or \(\neq\) as appropriate:

\(\left \{ a,b \right \} \_ \left \{ a,a,b,b \right \}\)
\(\left \{ b,a \right \} \_ \left \{ a,b,a \right \}\)
\(\left \{ c,b,a,a,d,c \right \} \_ \left \{ a,a,b,d,c,c,c \right \}\)
\(\left \{ a \right \} \_ \left \{ a,a,a,a,a,a,c,a,a,a,a,a,a \right \}\)

The sentence If police police police, then police police police actually uses two different word types. It just just so happens that both are pronounced and spelled police. But one is the noun police, the other one the verb police. We can use the parts of speech N and V, respectively, to distinguish between the noun police and the verb police. Let us add parts of speech to all the words in the string (we use C for the complementizers if and then): If[C] police[N] police[V] police[N], then[C] police[N] police[V] police[N]. What would be the corresponding set of words for this string (i.e. where we now distinguish between police[N] and police[V])?

It is important to keep in mind that idempotency only holds with respect to the elements of a set, not what may be contained by those elements.

Even though \(\left \{ a \right \} = \left \{ a,a \right \}\), it is not the case that \(\left \{ a \right \} = \left \{ a, \left \{ a \right \} \right \}\). The former is the set containing \(a\), the other is the set containing \(a\) and the set of \(a\).

For each one of the following, fill the gap with \(=\) or \(\neq\) as appropriate:

\(\left \{ a,b \right \} \_ \left \{ a,a,b,\left \{ b \right \} \right \}\)
\(\left \{ \left \{ b \right \},\left \{ a \right \} \right \} \_ \left \{ \left \{ a \right \},\left \{ b \right \},\left \{ a \right \} \right \}\)
\(\left \{ c,b,\left \{ a,a \right \},d,c \right \} \_ \left \{ \left \{ a \right \},b,d,c,c,c \right \}\)
\(\left \{ a \right \} \_ \left \{ a,a,a,a,a,a,\left \{ a \right \},a,a,a,a,a,a \right \}\)

Recap

Sets are collections of arbitrary objects.
Sets are unordered and idempotent (= duplicates are ignored).
Sets can be defined with list notation, e.g. \(\left \{ a, b \right \}\).
The objects contained in a set are called its elements or members.
The symbols \(\in\) and \(\notin\) are used to indicate membership and non-membership, respectively.
Occasionally, \(\ni\) is used as the mirror image of \(\in\).
While sets may contain objects that are themselves collections of other objects, these objects inside objects are not considered for set membership. Remember: \(a \in \left \{ a \right \}\) and \(\left \{ a \right \} \in \left \{ \left \{ a \right \} \right \}\), but \(a \notin \left \{ \left \{ a \right \} \right \}\).