# Basic Formal Language Theory

Definition 1. (Alphabet) An alphabet $$\Sigma$$ is a well-defined set of symbols.

Example. $$\{\texttt{0},\texttt{1}\}$$, $$\{\texttt{0}, \texttt{1}, \dots, \texttt{9}\}$$, $$\{\texttt{a}, \texttt{b}, \dots, \texttt{z}\}$$, and $$\{\sqcup\}$$ are all valid alphabets.

Definition 2. (String) A string is a sequence of symbols.

Specifically, a string consisting of no symbol (i.e., a sequence of zero length) is called an empty string, denoted as $$\varepsilon$$.

Example. $$\texttt{1001}$$, $$\texttt{cat}$$, and $$\varepsilon$$ are all valid strings.

Definition 3. (Language) A language $$L$$ over the alphabet $$\Sigma$$ is any set of strings made up of symbols from $$\Sigma$$ (including the empty string $$\varepsilon$$ vacuously, if $$L$$ is nonempty).

Specifically, an empty set of strings is called an empty language, denoted as $$\varnothing$$. For every $$L \neq \varnothing$$, $$\varepsilon \in L$$ by definition.

The language of all strings over $$\Sigma$$ (including the empty string $$\varepsilon$$) is denoted as $$\Sigma^*$$, that is, $\Sigma^* = \{a^*\ |\ a \in \Sigma\}$

Example. $$\{\texttt{0},\texttt{1}\}^* = \{b^*\ |\ b \in \{\texttt{0}, \texttt{1}\}\} = \{\varepsilon,\texttt{0},\texttt{1},\texttt{00},\texttt{01},\texttt{10},\texttt{11},\texttt{000},\dots\}$$, $$\{\varepsilon, \texttt{0}, \texttt{1}\}$$, $$\{\varepsilon\}$$, and $$\varnothing$$ are all valid languages.

Note that every language $$L$$ over $$\Sigma$$ (including the empty language $$\varnothing$$) is a subset of $$\Sigma^*$$, that is, for every $$L$$ over $$\Sigma$$, $$L \subseteq \Sigma^* = \{a^*\ |\ a \in \Sigma\}$$.

If a language over $$\Sigma$$ contains every string of fixed length $$k$$, we denote the language as $$\Sigma^k$$.

Example. $$\{\texttt{0}, \texttt{1}\}^8$$ is the language of all 8-bit binary strings.

Definition 4. (Union) The union of two languages $$L_1$$ and $$L_2$$ is the language $L_1 \cup L_2 = \{ w\ |\ w \in L_1 \lor w \in L_2 \}$

Definition 5. (Intersection) The intersection of two languages $$L_1$$ and $$L_2$$ is the language $L_1 \cap L_2 = \{ w\ |\ w \in L_1 \land w \in L_2 \}$

Definition 6. (Complementation) The complement of a language $$L$$ over $$\Sigma$$ is the language $\overline{L} = \Sigma^* \setminus L$

Definition 7. (Concatenation) The concatenation of two languages $$L_1$$ and $$L_2$$ is the language $L_1 \circ L_2 = \{ w_1w_2\ |\ w_1 \in L_1 \land w_2 \in L_2 \}$

Additionally, we use $$L^k$$ to denote the language obtained by concatenating $$L$$ to itself $$k$$ times.

Definition 8. (Kleene star / Closure) The Kleene star or closure of a language $$L$$ is the language $L^* = \{\varepsilon\} \cup L \cup L^2 \cup L^3 \cup \cdots$