Parts of strings

  • strings (basic notation)

It is often important to refer to specific substructures of a string. The most important notion is that of substring, with the special cases of prefix and suffix. But for some applications, subsequences are also relevant.

Substrings

A substring is a continuous part of a string.

The string \(\mathit{abcd}\) has 11 substrings:

  • \(\varepsilon\)
  • \(\mathit{a}\)
  • \(\mathit{b}\)
  • \(\mathit{c}\)
  • \(\mathit{d}\)
  • \(\mathit{ab}\)
  • \(\mathit{bc}\)
  • \(\mathit{cd}\)
  • \(\mathit{abc}\)
  • \(\mathit{bcd}\)
  • \(\mathit{abcd}\)

Some authors like to write \(u \sqsubseteq v\) to indicate that \(u\) is a substring of \(v\).

Note that

  1. the empty string is a substring of every string, and
  2. every string is a substring of itself.

A substring \(u\) of \(v\) is a proper substring iff \(u \neq v\).

All the strings listed above are proper substrings of \(\mathit{abcd}\), except \(\mathit{abcd}\) itself.

If \(u\) is substring that spans from the very beginning of \(v\), we call it a prefix. And if \(u\) is a substring that spans to the end of \(v\), we call it a suffix. Make sure not to confuse these with the linguistic notions of prefix and suffix, which work very differently.

Among the strings listed above, all of the following are prefixes of \(\mathit{abcd}\):

  • \(\varepsilon\) (if you find this confusing, check the formal definition below)
  • \(\mathit{a}\)
  • \(\mathit{ab}\)
  • \(\mathit{abc}\)
  • \(\mathit{abcd}\)

And the all of the following are suffixes of \(\mathit{abcd}\):

  • \(\varepsilon\) (if you find this confusing, check the formal definition below)
  • \(\mathit{d}\)
  • \(\mathit{cd}\)
  • \(\mathit{bcd}\)
  • \(\mathit{abcd}\)

Substrings, prefixes, and suffixes are formally defined via concatenation.

Given \(\Sigma\)-strings \(u\) and \(v\), \(u\) is a substring of \(v\) (\(u \sqsubseteq v\)) iff there are \(x, y \in \Sigma^*\) such that \(v = x \cdot u \cdot y\). We furthermore call \(u\)

  • a proper substring iff \(u \neq v\),
  • a prefix iff \(y = \varepsilon\),
  • a suffix iff \(x = \varepsilon\).

For each one of the string pairs below, indicate whether the first string is a substring of the second string, a proper substring, or neither:

  • \(\mathit{a}\ \&\ \mathit{aaaa}\)
  • \(\mathit{a}\ \&\ \mathit{b}\)
  • \(\varepsilon\ \&\ \mathit{b}\)
  • \(\varepsilon\ \&\ \varepsilon\)
  • \(\mathit{aa}\ \&\ \mathit{abbbca}\)
  • \(\mathit{bc}\ \&\ \mathit{abbbca}\)
  • \(\mathit{cb}\ \&\ \mathit{abbbca}\)

For every string \(u\), there are two substrings that are both prefixes and suffixes of \(u\). What are they? For which string are these two substrings not distinct?

Subsequence

Whereas substrings must be continuous, subsequences are allowed to also be discontinuous. However, a subsequence need not be discontinuous.

The string \(\mathit{abcd}\) has 16 subsequences:

  • \(\varepsilon\)
  • \(\mathit{a}\)
  • \(\mathit{b}\)
  • \(\mathit{c}\)
  • \(\mathit{d}\)
  • \(\mathit{ab}\)
  • \(\mathit{ac}\)
  • \(\mathit{ad}\)
  • \(\mathit{bc}\)
  • \(\mathit{bd}\)
  • \(\mathit{cd}\)
  • \(\mathit{abc}\)
  • \(\mathit{abd}\)
  • \(\mathit{acd}\)
  • \(\mathit{bcd}\)
  • \(\mathit{abcd}\)

Note that \(\mathit{ca}\) is not a subsequence of \(\mathit{abcd}\), but it is a subsequence of \(\mathit{abcda}\).

List all distinct subsequences of the string \(\mathit{aaaa}\) (without duplicates).

Just like substrings, a subsequence \(u\) of \(v\) is proper iff \(u \neq v\).

The formal definition of subsequences is quite a bit more verbose than that of substrings. This is because the option of discontinuity requires the use of additional string variables that can be interleaved with the subsequence in order to obtain the original string.

Let \(v\) be a \(\Sigma\)-string and \(u \mathrel{\mathop:}=u_1 u_2 \cdots u_n\) a member of \(\Sigma^n\). Then \(u\) is a subsequence of \(v\) iff there are strings \(x_0, x_1, \ldots, x_{n+1} \in \Sigma^*\) such that

\[ v = x_0 \cdot u_1 \cdot x_1 \cdot u_2 \cdot x_2 \ldots \cdot u_n \cdot x_{n+1} \]

A subsequence \(u\) of \(v\) is proper iff \(u \neq v\).

For each one of the string pairs below, indicate whether the first string is a subsequence of the second string, a proper subsequence, or neither:

  • \(\mathit{a}\ \&\ \mathit{aaaa}\)
  • \(\mathit{a}\ \&\ \mathit{b}\)
  • \(\varepsilon\ \&\ \mathit{b}\)
  • \(\varepsilon\ \&\ \varepsilon\)
  • \(\mathit{aa}\ \&\ \mathit{abbbca}\)
  • \(\mathit{bc}\ \&\ \mathit{abbbca}\)
  • \(\mathit{cb}\ \&\ \mathit{abbbca}\)

Say whether the following is True or False: Every substring of some string \(s\) is also a subsequence of \(s\), but not the other way round. Justify your answer.

Recap