Brzozowski derivative

In theoretical computer science, in particular in formal language theory, the Brzozowski derivative $u^{- 1} S$ of a set $S$ of strings and a string $u$ is the set of all strings obtainable from a string in $S$ by cutting off the prefix $u$ . Formally:

u^{- 1} S = {v \in Σ^{*} ∣ u v \in S}

.

For example,

c^{- 1} {cat, cow, dog} = {at, ow} .

The Brzozowski derivative was introduced under various different names since the late 1950s.^[1]^[2]^[3] Today it is named after the computer scientist Janusz Brzozowski who investigated its properties and gave an algorithm to compute the derivative of a generalized regular expression.^[4]

Definition

Even though originally studied for regular expressions, the definition applies to arbitrary formal languages. Given any formal language $S$ over an alphabet $Σ$ and any string $u \in Σ^{*}$ , the derivative of $S$ with respect to $u$ is defined as:^[5]

u^{- 1} S = {v \in Σ^{*} ∣ u v \in S}

The Brzozowski derivative is a special case of left quotient by a singleton set containing only $u$ : $u^{- 1} S = {u} ∖ S$ .

Equivalently, for all $u, v \in Σ^{*}$ :

v \in u^{- 1} S \Leftrightarrow u v \in S .

From the definition, for all $u, v \in Σ^{*}$ :

(u v)^{- 1} S = v^{- 1} (u^{- 1} S)

since for all $w \in Σ^{*}$ , we have $w \in (u v)^{- 1} S \Leftrightarrow u v w \in S \Leftrightarrow v w \in u^{- 1} S \Leftrightarrow w \in v^{- 1} (u^{- 1} S)$ .

The derivative with respect to an arbitrary string reduces to successive derivatives over the symbols of that string, since for all $a \in Σ, u \in Σ^{*}$ : $\begin{matrix} (u a)^{- 1} S & = a^{- 1} (u^{- 1} S) \\ ε^{- 1} S & = S \end{matrix}$

A language $S \subseteq Σ^{*}$ is called nullable if and only if it contains the empty string $ε$ . Each language $S$ is uniquely determined by nullability of its derivatives:

w \in S \Leftrightarrow ε \in w^{- 1} S

A language can be viewed as a (potentially infinite) boolean-labelled tree (see also tree (set theory) and infinite-tree automaton). Each possible string $w \in Σ^{*}$ denotes a node in the tree, with label true when $w \in S$ and false otherwise. In this interpretation, the derivative with respect to a symbol $a$ corresponds to the subtree obtained by following the edge $a$ from the root. Decomposing a tree into the root and the subtrees $a^{- 1} S$ corresponds to the following equality, which holds for every language $S \subseteq Σ^{*}$ :

S = ({ε} \cap S) \cup ⋃_{a \in Σ} a (a^{- 1} S) .

Derivatives of generalized regular expressions

When a language is given by a regular expression, the concept of derivatives leads to an algorithm for deciding whether a given word belongs to the regular expression.

Given a finite alphabet A of symbols,^[6] a generalized regular expression R denotes a possibly infinite set of finite-length strings over the alphabet A, called the language of R, denoted L(R).

A generalized regular expression can be one of the following (where a is a symbol of the alphabet A, and R and S are generalized regular expressions):

"∅" denotes the empty set: L(∅) = {},
"ε" denotes the singleton set containing the empty string: L(ε) = {ε},
"a" denotes the singleton set containing the single-symbol string a: L(a) = {a},
"R∨S" denotes the union of R and S: L(R∨S) = L(R) ∪ L(S),
"R∧S" denotes the intersection of R and S: L(R∧S) = L(R) ∩ L(S),
"¬R" denotes the complement of R (with respect to A*, the set of all strings over A): L(¬R) = A* \ L(R),
"RS" denotes the concatenation of R and S: L(RS) = L(R) · L(S),
"R*" denotes the Kleene closure of R: L(R*) = L(R)*.

In an ordinary regular expression, neither ∧ nor ¬ is allowed.

Computation

For any given generalized regular expression R and any string u, the derivative u⁻¹R is again a generalized regular expression (denoting the language u⁻¹L(R)).^[7] It may be computed recursively as follows.^[8]

(ua)⁻¹R	= a⁻¹(u⁻¹R)	for a symbol a and a string u
ε⁻¹R	= R

Using the previous two rules, the derivative with respect to an arbitrary string is explained by the derivative with respect to a single-symbol string a. The latter can be computed as follows:^[9]

a⁻¹a	= ε
a⁻¹b	= ∅	for each symbol b≠a
a⁻¹ε	= ∅
a⁻¹∅	= ∅
a⁻¹(R*)	= (a⁻¹R)R*
a⁻¹(RS)	= (a⁻¹R)S ∨ ν(R)a⁻¹S
a⁻¹(R∧S)	= (a⁻¹R) ∧ (a⁻¹S)
a⁻¹(R∨S)	= (a⁻¹R) ∨ (a⁻¹S)
a⁻¹(¬R)	= ¬(a⁻¹R)

Here, $ν(R)$ is an auxiliary function yielding a generalized regular expression that evaluates to the empty string ε if R's language contains ε, and otherwise evaluates to ∅. This function can be computed by the following rules:^[10]

ν(a)	= ∅	for any symbol a
ν(ε)	= ε
ν(∅)	= ∅
ν(R*)	= ε
ν(RS)	= ν(R) ∧ ν(S)
ν(R ∧ S)	= ν(R) ∧ ν(S)
ν(R ∨ S)	= ν(R) ∨ ν(S)
ν(¬R)	= ε	if ν(R) = ∅
ν(¬R)	= ∅	if ν(R) = ε

Properties

A string u is a member of the string set denoted by a generalized regular expression R if and only if ε is a member of the string set denoted by the derivative u⁻¹R.^[11]

Considering all the derivatives of a fixed generalized regular expression R results in only finitely many different languages. If their number is denoted by d_R, all these languages can be obtained as derivatives of R with respect to strings of length less than d_R.^[12] Furthermore, there is a complete deterministic finite automaton with d_R states that recognises the regular language given by R, as stated by the Myhill–Nerode theorem.

Derivatives of context-free languages

Derivatives are also effectively computable for recursively defined equations with regular expression operators, which are equivalent to context-free grammars. This insight was used to derive parsing algorithms for context-free languages.^[13] Implementation of such algorithms have shown to have cubic time complexity,^[14] corresponding to the complexity of the Earley parser on general context-free grammars.

References

^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
^ Brzozowski (1964), p.481, required A to consist of the 2ⁿ combinations of n bits, for some n.
^ Brzozowski (1964), p.483, Theorem 4.1
^ Brzozowski (1964), p.483, Theorem 3.2
^ Brzozowski (1964), p.483, Theorem 3.1
^ Brzozowski (1964), p.482, Definition 3.2
^ Brzozowski (1964), p.483, Theorem 4.2
^ Brzozowski (1964), p.484, Theorem 4.3
^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).

[1] Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).

[2] Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).

[3] Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).

[4] Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).

[5] Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).

[6] Brzozowski (1964), p.481, required A to consist of the 2ⁿ combinations of n bits, for some n.

[7] Brzozowski (1964), p.483, Theorem 4.1

[8] Brzozowski (1964), p.483, Theorem 3.2

[9] Brzozowski (1964), p.483, Theorem 3.1

[10] Brzozowski (1964), p.482, Definition 3.2

[11] Brzozowski (1964), p.483, Theorem 4.2

[12] Brzozowski (1964), p.484, Theorem 4.3

[13] Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).

[14] Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

Brzozowski derivative

Contents

Definition

Derivatives of generalized regular expressions

Computation

Properties

Derivatives of context-free languages

See also

References

Navigation menu

Brzozowski derivative

Definition

Derivatives of generalized regular expressions

Computation

Properties

Derivatives of context-free languages

See also

References

Navigation menu

Search