# Regular Languages

Mort Yao**Deterministic finite automaton (DFA).** A *deterministic finite automation* \(M\) is a 5-tuple \((Q, \Sigma, \delta, q_0, F)\), where

- \(Q\) is a finite set called the
*states*, - \(\Sigma\) is a finite set called the
*alphabet*, - \(\delta: Q \times \Sigma \to Q\) is the
*transition function*, - \(q_0 \in Q\) is the
*start state*(also called the*initial state*), - \(F \subseteq Q\) is the set of
*accept states*(also called*final states*).

We say that the DFA \(M = (Q, \Sigma, \delta, q_0, F)\) *accepts* a string \(w = a_1 \cdots a_n\) (where each \(a_i \in \Sigma\)) if and only if there exists a sequence of states \(r_0, \dots, r_n\) (where each \(r_i \in Q\)) such that:

- \(r_0 = q_0\),
- For every \(0 \leq i < n\), \(r_{i+1} = \delta(r_i, a_{i+1})\),
- \(r_n \in F\).

Otherwise, we say that the DFA \(M\) *rejects* the string \(w\).

\(L\) is the *language* of DFA \(M\), denoted as \(\mathcal{L}(M) = L\), if and only if \(L = \{w\ |\ w \text{ is a string accepted by } M\}\). We say that the DFA \(M\) *recognizes* the language \(L\).

**Regular language (RL).** A language \(L\) is called a *regular language* if there exists a DFA \(M\) that recognizes \(L\).

**Regular operations.** Given languages \(L_1\), \(L_2\) and \(L\), we define the *regular operations* as follows:

*Union*: \(L_1 \cup L_2 = \{ w\ |\ w \in L_1 \lor w \in L_2 \}\).*Concatenation*: \(L_1 \circ L_2 = \{ w_1w_2\ |\ w_1 \in L_1 \land w_2 \in L_2\}\).*Kleene star*: \(L^* = \{ w_1w_2 \dots w_k\ |\ k \geq 0 \land \forall i \in \{1, \dots, k\} : w_i \in L \}\).

**Theorem 1. (Closure properties)** The class of regular languages is closed under regular operations.

**Nondeterministic finite automaton (NFA).** A *nondeterministic finite automaton* is a 5-tuple \((Q, \Sigma, \delta, q_0, F)\), where

- \(Q\) is a finite set of states,
- \(\Sigma\) is a finite alphabet,
- \(\delta: Q \times \Sigma_\varepsilon \to \mathcal{P}(Q)\) is the transition function, (where \(\Sigma_\varepsilon = \Sigma \cup \{\varepsilon\}\))
- \(q_0 \in Q\) is the start state,
- \(F \subseteq Q\) is the set of accept states.

We say that the NFA \(M = (Q, \Sigma, \delta, q_0, F)\) *accepts* a string \(w\) if \(w\) may be written as \(w = a_1 \cdots a_n\) (where each \(a_i \in \Sigma_\varepsilon\)), and there exists a sequence of states \(r_0, \dots, r_n\) (where each \(r_i \in Q\)) such that:

- \(r_0 = q_0\),
- For every \(0 \leq i < n\), \(r_{i+1} \in \delta(r_i, a_{i+1})\),
- \(r_n \in F\).

Otherwise, we say that the NFA \(M\) *rejects* the string \(w\).

\(L\) is the language of NFA \(M\), denoted as \(\mathcal{L}(M) = L\), if and only if \(L = \{w\ |\ w \text{ is a string accepted by } M\}\). We say that the NFA \(M\) *recognizes* the language \(L\).

Clearly, every DFA has an equivalent NFA (that recognizes exactly the same language), by taking \(Q \times \Sigma \subset Q \times \Sigma_\varepsilon\) as the domain and \(Q \in \mathcal{P}(Q)\) as the range of the transition function.

**Theorem 2. (Rabin-Scott powerset construction)** Every NFA has an equivalent DFA.

**Corollary 3.** A language \(L\) is regular if and only if there exists an NFA \(M\) that recognizes \(L\).

**Regular expression (RE).** A *regular expression* \(R\) is defined as \[R ::= a\ |\ \varepsilon\ |\ \emptyset\ |\ (R_1 \cup R_2)\ |\ (R_1 \circ R_2)\ |\ (R_1^*)\] where \(a \in \Sigma\), \(R_1\) and \(R_2\) are regular expressions.

\(L\) is the language of regular expression \(R\), denoted as \(\mathcal{L}(R) = L\), if and only if \(L = \{ w\ |\ w \text{ is a string in the form of } R \}\).

**Theorem 4. (Kleeneās theorem)** A language \(L\) is regular if and only if there exists a regular expression \(R\) that describes \(L\).

**Theorem 5. (Pumping lemma)** If \(L\) is a regular language, then there is a number \(p\) (called the *pumping length*) such that if \(w \in L\) and \(|w| \geq p\), then \(w\) may be written as \(w = xyz\), under the following conditions:

- For every \(i \geq 0\), \(xy^iz \in L\),
- \(|y| > 0\),
- \(|xy| \leq p\).

**Theorem 6. (Myhill-Nerode theorem)** Let \(L\) be a language over \(\Sigma\). We say that strings \(x\) and \(y\) are *indistinguishable* by \(L\) if and only if for every string \(z\), we have both \(xz \in L\) and \(yz \in L\), or both \(xz \notin L\) and \(yz \notin L\), denoted as \(x \equiv_L y\). Then,

- \(L\) is regular if and only if the equivalence relation \(\equiv_L\) has a finite number of equivalence classes;
- There exists a DFA \(M\) with \(\mathcal{L}(M) = L\) having precisely one state for each equivalence class of \(\equiv_L\).