word-parse/words/everything

A (co)algebraic theory of succinct automata✩
Gerco van Heerdta , Joshua Moermana,b , Matteo Sammartinoa , Alexandra
Silvaa
a University

College London
University

arXiv:1905.05519v1 [cs.FL] 14 May 2019

b Radboud

Abstract
The classical subset construction for non-deterministic automata can be generalized to other side-eﬀects captured by a monad. The key insight is that both the
state space of the determinized automaton and its semantics—languages over
an alphabet—have a common algebraic structure: they are Eilenberg-Moore algebras for the powerset monad. In this paper we study the reverse question to
determinization. We will present a construction to associate succinct automata
to languages based on diﬀerent algebraic structures. For instance, for classical
regular languages the construction will transform a deterministic automaton
into a non-deterministic one, where the states represent the join-irreducibles of
the language accepted by a (potentially) larger deterministic automaton. Other
examples will yield alternating automata, automata with symmetries, CABAstructured automata, and weighted automata.

1. Introduction
Non-deterministic automata are often used to provide compact representations of regular languages. Take, for instance, the language
L = {w ∈ {a, b}∗ | |w| > 2 and the 3rd symbol from the right is an a}.
There is a simple non-deterministic automaton accepting it (below, top automaton) and it is not very diﬃcult to see that the smallest deterministic automaton
(below, bottom automaton) will have 8 states.
a, b
a, b
a
s1
s2
s3
s4
a, b

✩ This work was partially supported by ERC starting grant ProFoundNet (679127) and a
Leverhulme Prize (PLP-2016-129).

Preprint submitted to Elsevier

May 15, 2019

a
a

1

b b

a

b
b

14

a

12

b
a

13

123

a

1234

b

a
124

b

a

134

b
The labels we chose for the states of the deterministic automaton are not
coincidental—they represent the subsets of states of the non-deterministic automaton that would be obtained when constructing a deterministic one using
the classical subset construction.
The question we want to study in this paper has as starting point precisely
the observation that non-deterministic automata provide compact representations of languages and hence are more amenable to be used in algorithms and
promote scalability. In fact, the origin of our study goes back to our own work
on automata learning [15], where we encountered large nominal automata that,
in order for the algorithm to work for more realistic examples, had to be represented non-deterministically. In other recent work [7, 3], diﬀerent forms of nondeterminism are used to learn compact representations of regular languages.
This left us wondering whether other side-effects could be used to overcome
scalability issues.
Moggi [16] introduced the idea that monads could be used a general abstraction for side-eﬀects. A monad is a triple (T, η, µ) in which T is an endofunctor
over a category whose objects can be thought of as capturing pure computations.
The monad is equipped with a unit η : X → T X, a natural transformation that
enables embedding any pure computation into an eﬀectful one, and a multiplication µ : T T X → T X that allows ﬂattening nested eﬀectful computations.
Examples of monads capturing side-eﬀects include powerset (non-determinism)
and distributions (randomness).
Monads have been used extensively in programming language semantics (see
e.g. [22] and references therein). More recently, they were used in categorical
studies of automata theory [6]. One example of a construction in which they
play a key role is a generalization of the classical subset construction to a class
of automata [21, 20], which we will describe next.
The classical subset construction, connecting non-deterministic and deterministic automata, can be described concisely by the following diagram.
X

{−}

δ

2 × P(X)A

P(X)

l

2A

∗

<ǫ?,∂>
δ♯
id×l

A

∗

2 × (2A )A

We omit initial states and represent a non-deterministic automaton as a pair
(X, δ) where X is the state space and δ : X → 2 × P(X) is the transition
function which has in the ﬁrst component the (non-)ﬁnal state classiﬁer. The
language semantics of a non-deterministic automaton (X, δ) is obtained by ﬁrst
constructing a deterministic automaton (P(X), δ ♯ ) which has a larger state space
2

consisting of subsets of the original state space and then computing the accepted
language of the determinized automaton. The language map l associating the accepted language to a state is a universal map: for every deterministic automaton
(Q, Q → 2 × QA ) the map l is the unique map into the automaton of languages
∗

∗

<ǫ?,∂>

∗

(2A , 2A −−−−−→ 2 × (2A )A ).
The universal property of the automaton of languages inspired the development of a categorical generalization of automata theory, including of the subset
construction which we detail below. In particular, we can consider general aut
tomata as pairs (X, X −
→ F X) where the transition dynamics t is parametric on
a functor F . Such pairs are usually called coalgebras for the functor F [18]. For
a wide class of functors F , the category of coalgebras has a ﬁnal object (Ω, ω),
the so-called final coalgebra, which plays the analogue role to languages.
The classical subset construction was generalized in previous work [21] by
replacing deterministic automata with coalgebras for a functor F and the powerset monad with a suitable monad T . As above, it can be summarized in a
diagram:
X
δ

FTX

η

TX

l

Ω
ω

δ♯
Fl

FΩ

The monad T will be the structure we will explore to enable succinct representations. The crucial ingredient in generalizing the subset construction was the
observation that the target of the transition dynamics—2 × P(−)A —and the
∗
set of languages—2A —both have a complete join-semilattice structure. This
enables one to deﬁne the determinized automaton as a unique lattice extension
of the non-deterministic one, and, moreover, the language map l preserves the
semantics: l({s1 , s2 }) = l({s1 }) ∪ l({s2 }).
This latter somewhat trivial observation was also exploited in the work of
Bonchi and Pous [8] in deﬁning an eﬃcient algorithm for language equivalence of
NFAs by using coinduction-up-to. Join-semilattices are precisely the EilenbergMoore algebras of the powerset monad, and one can show that if a functor has
a ﬁnal coalgebra in Set, this can be lifted to the category of Eilenberg-Moore
algebras of a monad T (T -algebras). This makes it possible to construct the
more general diagram above, where the coalgebra structure is generalized using
a functor F and a monad T . The only assumptions for the existence of T -algebra
maps δ ♯ and l are the existence of a ﬁnal coalgebra for F in Set and that F T X
can be given a T -algebra structure.
In this paper we ask the reverse question—given a deterministic automaton,
if we assume the state space has a join-semilattice structure, can we build a corresponding succinct non-deterministic one? More generally, given an F -coalgebra
in the category of T -algebras, can we build a succinct F T -coalgebra in the base
category that represents the same behavior?
We will provide an abstract framework to understand this construction,
based on previous work by Arbib and Manes [4]. Our abstract framework relies on alternative, more modern, presentation of some of their results. Due to
our focus on set-based structures, we will conduct our investigation within the
category Set, which enables us to provide eﬀective procedures. This does mean
that not all of the results due to Arbib and Manes will be given in their original
3

generality. We present a comprehensive set of examples that will illustrate the
versatility of the framework. We also discuss more algorithmic aspects that are
essential if the present framework is to be used as an optimization, for instance
as part of a learning algorithm.
After recalling basic facts about monads and structured automata in Section 2, the rest of this paper is organized as follows:
• In Section 3 we introduce a general notion of generators for a T -algebra,
and we show that automata whose state space form a T -algebra—which we
call T -automata—admit an equivalent T -succinct automaton, deﬁned over
generators. We also characterize minimal generators and give a condition
under which they are globally minimal in size.
• In Section 4 we give an eﬀective procedure to ﬁnd a minimal set of generators for a T -algebra, and we present an algorithm that uses that procedure
to compute the T -succinct version of a given T -automaton. The algorithm
works by ﬁrst minimising the T -automaton: the explicit algebraic structure
allows states that correspond to algebraic combinations of other states to
be detected, and then discarded when generators are computed.
• In Section 5 we show how the algorithm of Section 4 can be applied to
“plain” ﬁnite automata—without any algebraic structure—in order to derive an equivalent T -succinct automaton. We conclude with a result about
the compression power of our construction: it produces an automaton that
is at least as small as the minimal version of the original automaton.
• Finally, in Section 6 we give several examples, and in Section 7 we discuss
related and future work.
2. Preliminaries
Side-eﬀects and diﬀerent notions of non-determinism can be conveniently
captured as a monad T on a category C. A monad T = (T, µ, η) is a triple
consisting of an endofunctor T on C and two natural transformations: a unit
η : Id ⇒ T and a multiplication µ : T 2 ⇒ T . They satisfy the following laws:
µ ◦ ηT = id = µ ◦ T η
µ ◦ µT = µ ◦ T µ.
S
An example is the triple (P, {−}, ) where P denotes the powerset functor in
Set that assigns to each setSthe set of all its subsets, {−} is the function that
returns a singleton set, and is just union of sets.
Given a monad T , the category of CT of Eilenberg-Moore algebras over T , or
simply T -algebras, has as objects pairs (X, h) consisting of an object X, called
carrier, and a morphism h : T X → X such that h◦µX = h◦T h and h◦ηX = idX .
A T -homomorphism between two T -algebras (X, h) and (Y, k) is a morphism
f : X → Y such that f ◦ h = k ◦ T f .
We will often refer to a T -algebra (X, h) as X if h is understood or if its
speciﬁc deﬁnition is irrelevant. Given an object X, (T X, µX ) is a T -algebra
called the free T -algebra on X. Given an object U and a T -algebra (V, v), there
is a bijective correspondence between T -algebra homomorphisms T U → V and
morphisms U → V : for a T -algebra homomorphism f : T U → V , deﬁne f † =
4

f ◦ η : U → V ; for a morphism g : U → V , deﬁne g ♯ = v ◦ T g : T U → V . Then
g ♯ is a T -algebra homomorphism called the free T -extension of g, and we have
f †♯ = f

g ♯† = g.

(1)

Furthermore, for all objects S and morphisms h : S → U ,
g ♯ ◦ T h = (g ◦ h)♯ .

(2)

Example 2.1. For the monad P the associated Eilenberg-Moore category is
the category of (complete) join-semilattices.
Given a set X, the free P-algebra
S
on X is the join-semilattice (PX, ) of subsets of X with the union operation
as join.
Although some results are completely abstract, the central deﬁnition of minimal generators in Section 3 is speciﬁc to monads T on the category Set. Therefore we restrict ourselves to this setting. More precisely, we consider automata
over a ﬁnite alphabet A with outputs in a set O. In order to deﬁne automata
in SetT as (pointed) coalgebras for the functor O × (−)A , we need to lift this
functor from Set to SetT . Such a lifting corresponds to a distributive law of T
over O × (−)A [see e.g., 13]. A distributive law of the monad T over a functor
F : Set → Set is a natural transformation ρ : T F ⇒ F T satisfying ρ ◦ ηF = F η
and F µ ◦ ρT ◦ T ρ = ρ ◦ µF . In most examples we will deﬁne a T -algebra
structure β : T O → O on O, which is well known to induce a distributive law
ρ : T (O × (−)A ) ⇒ O × T (−)A given by
β×ρ′

hT π1 ,T π2 i

ρX = T (O × X A ) −−−−−−−→ T O × T (X A ) −−−−X
→ O × T (X)A

(3)

for any set X, where ρ′ (U )(a) = T (λf : A → X.f (a)). In general, we assume
an arbitrary distributive law ρ : T (O × (−)A ) ⇒ O × T (−)A , which gives us the
following notion of automaton.
Definition 2.2 (T -automaton). A T -automaton is a triple (X, i : 1 → X, δ : X →
O × X A ), where X is an object of SetT denoting the state space of the automaton, i is a function designating the initial state, and δ is a T -algebra map
assigning an output and transitions to each state.
Notice that the initial state map i : 1 → X in the above deﬁnition is not
required to be a T -algebra map. However, it corresponds to the T -algebra map
i♯ : T 1 → X. Thus, a T -automaton is an automaton in SetT .
The functor F (X) = O × X A has a ﬁnal coalgebra in SetT [12] that can be
used to deﬁne the language accepted by a T -automaton.
Definition 2.3 (Language accepted). Given a T -automaton (X, i : 1 → X, δ : X →
∗
O × X A ), the language accepted by X is l ◦ i : 1 → OA , where l is the ﬁnal
coalgebra map. In the diagram below, ω is the ﬁnal coalgebra.
1

i

X

l

∗

ω(ϕ) = (ϕ(ε), λa w.ϕ(aw))
l(x)(ε) = π1 (δ(x))

ω

δ

O×X

OA

A
A id×l

∗

O × (OA )A

We use ε to denote the empty word.
5

l(x)(aw) = l(π2 (δ(x))(a))(w)

If the monad T is ﬁnitary, then the category SetT is locally ﬁnitely presentable, and hence it admits (strong epi, mono)-factorizations [2]. As in [4], we
use these factorizations to quotient the state-space of an automaton under language equivalence. The transition structure, γ, is obtained by diagonalization
via the factorization system. Diagramatically:
1
j

i

e

X

OA

γ

δ

O × XA

m

M
A

id×e

O × MA

∗

(4)

ω
id×m

A

∗

O × (OA )A

Here the epi e and mono m are obtained by factorizing the ﬁnal coalgebra map
∗
l : X → OA . We call the quotient automaton (M, j, γ) the observable quotient
of (X, i, δ).
3. T -succinct automata
Given a T -automaton X = (X, i, δ), our aim is to obtain an equivalent
automaton in Set with transition function Y → O × T (Y )A , where Y is smaller
than X.1 The key idea is to ﬁnd generators for X. Our deﬁnition of generators
is equivalent to the deﬁnition of a scoop due to Arbib and Manes [4, Section 7,
Deﬁnition 8].
Definition 3.1 (Generators for an algebra). We say that a set G is a set of
generators for a T -algebra X whenever there exists a function g : G → X such
that g ♯ : T G → X is a split epi in Set.
The intuition of requiring a split epi is that every element of X can now be
decomposed into a “combination” (deﬁned by T ) of elements of G. We show two
simple results on generators, which will allow us to ﬁnd initial sets of generators
for a given T -algebra.
Lemma 3.2. The carrier of any T -algebra X is a set of generators for it.
χ

Proof. Let T X −
→ X be the T -algebra structure on X. Then idX satisﬁes id♯X =
χ, and χ is a split epi because it is required to satisfy χ ◦ ηX = idX .
Lemma 3.3. Any set X is a set of generators for the free T -algebra T X.
♯
Proof. Follows directly from the fact that ηX : X → T X satisﬁes ηX
= idT X .

Once we have a set of generators G for X, we can deﬁne an equivalent free
representation of X , that is, an automaton whose state space is freely generated
from G.
1 Here, we are abusing notation and using O and A for both the objects in SetT and in the
base category Set. In particular, we use T C to also denote the free T -algebra over C.

6

Proposition 3.4 (Free representation of an automaton [4, Section 7, Proposition 9]). The free algebra T G forms the state space of an automaton equivalent
to X .
Proof. Let g : G → X witness G being a set of generators for X and let s : X →
T G be a right inverse of g ♯ . Recall that X = (X, i, δ) and deﬁne
i

s

j=1−
→X −
→ TG
g

id×sA

δ

γ=G−
→X−
→ O × X A −−−−→ O × (T G)A
Then (T G, j, γ ♯ ) is an automaton. We will show that g ♯ : T G → X is an automaton homomorphism. We have g ♯ ◦ j = g ♯ ◦ s ◦ i = i, and, writing F for the
functor O × (−)A and χ for the T -algebra structure on X,
TG

Tg

Tδ

TX

TFX

TFs

2

ρ
g♯

ρ

TFTG

FTs
id

FTX

χ

F T 2G

Fµ

FTG

F T g♯

FTX

1

3

F g♯

Fχ
δ

X

FX

commutes. Here 1 commutes because δ is a T -algebra homomorphism, 2 commutes by naturality of the distributive law ρ, and 3 commutes because g ♯ is a
T -algebra homomorphism. The triangle on the left unfolds the deﬁnition of g ♯ ,
and the remaining triangle commutes by s being right inverse to g ♯ . Note that
the composition in the top row of the diagram is γ ♯ . We conclude that g ♯ is
an automaton homomorphism, which using the ﬁnality in Deﬁnition 2.3 implies
that (T G, j, γ ♯ ) accepts the same language as X .
The state space T G of this free representation can be extremely large. Fortunately, the fact that T G is a free algebra allows for a much more succinct
version of this automaton.
Definition 3.5 (T -succinct automaton). Given an automaton of the form
(T X, i, δ), where T X is the free T -algebra on X, the corresponding T -succinct
automaton is the triple (X, i, δ ◦ η). The language accepted by the T -succinct
automaton is the language l ◦ i accepted by (T X, i, δ):
1
i

X
δ◦η

O × (T X)A

η

TX

l

OA

∗

ω

δ
id×lA

∗

O × (OA )A

The goal of our construction is to build a T -succinct automaton from a set
of generators that is minimal in a way that we will deﬁne now. In what follows
below we use the following piece of notation: if U and V are sets such that
U ⊆ V , then we write ιU
V for the inclusion map U → V .
7

Definition 3.6 (Minimal generators). Given a T -algebra X and a set of generators G for X witnessed by g : G → X, we say that r ∈ G is redundant if there
G\{r} ♯
exists a U ∈ T (G \ {r}) satisfying (g ◦ ιG
) (U ) = g(r); all other elements
are said to be isolated [4]2 . We call G a minimal set of generators for X if G
contains no redundant elements.
A minimal set of generators is not necessarily minimal in size. However,
under certain conditions this is the case. The following result was mentioned
but not proved by Arbib and Manes [4], who showed that its conditions are
satisﬁed for any ﬁnitely generated P-algebra. We note that these conditions do
not apply (in general) to any of the further examples in Section 6.
Proposition 3.7. If a T -algebra X is generated by the isolated elements I of
the set of generators X (Lemma 3.2) with their inclusion map ιIX and I is finite,
then there is no set of generators for X smaller than I, and every minimal set
of generators for X has the same size as I.
g

Proof. Let G −
→ X be a set of generators for X, and assume towards a contradiction that G is smaller than I. Then there must be an i ∈ I such that there
is no v ∈ G satisfying g(v) = i. Let g ′ : G → X \ {i} be pointwise equal to
g. Because g ♯ is a split epi and thus surjective, there is a U ∈ T G such that
g ♯ (U ) = i. Note that by (2),
X\{i}

g ♯ = (ιX

T (g′ )

(ι

X\{i} ♯

)

◦ g ′ )♯ = T G −−−→ T (X \ {i}) −−X
−−−−→ X.

X\{i}

Then (id ◦ ιX
)♯ (T (g ′ )(U )) = i, contradicting the fact that i is isolated in
the full set of generators X. Thus, G cannot be smaller than I. In fact, we see
that for every i ∈ I there is a v ∈ G satisfying g(v) = i. This yields a function
h : I → G such that g ◦ h = ιIX .
Suppose G is a minimal set of generators, and take any v ∈ G not in the
image of h. We will show that v is redundant in G. Since I constitutes a set of
generators for X, there exists a U ∈ T I such that (ιIX )♯ (U ) = g(v). Then
g ♯ (T (h)(U )) = (g ◦ h)♯ (U ) = (ιIX )♯ (U ) = g(v).
It follows that v is redundant in G, which contradicts G being minimal. Therefore, h is surjective and G has the same size as I.
4. T -minimization
In this section we describe a construction to compute a “minimal” succinct
T -automaton equivalent to a given T -automaton. This crucially relies on a procedure that ﬁnds a minimal set of generators by removing redundant elements
one by one. All that needs to be done for speciﬁc monads is determining whether
an element is redundant.
2 Arbib and Manes [4] define isolated elements only for the full set X rather than relative
to a set of generators for X. Our refinement plays an important role in finding a minimal set
of generators.

8

Proposition 4.1 (Generator reduction). Given a T -algebra X and a set of
generators G for X, if r ∈ G is redundant, then G \ {r} is a set of generators
for X.
Proof. Let G′ = G \ {r} and let g ′ : G′ → X be the restriction of g : G → X to
G′ . Since r is redundant, there is a U ∈ T (G′ ) such that g ′♯ (U ) = g(r). Deﬁne
e : G → T (G′ ) by
(
U
if x = r
e(x) =
η(x) if x 6= r.
We will show that g ′♯ ◦ e = g. Consider any x ∈ G. If x = r, then
g ′♯ (e(x)) = g ′♯ (e(r)) = g ′♯ (U ) = g(r) = g(x).
If x 6= r, then, using (1),
g ′♯ (e(x)) = g ′♯ (η(x)) = g ′♯† = g ′ (x) = g(x).
Let χ : T X → X be the algebra structure on X and take any right inverse
s : X → T G of g ♯ . Then
g ′♯ ◦ e♯ ◦ s = g ′♯ ◦ µ ◦ T e ◦ s

(deﬁnition of e♯ )

= χ ◦ T (g ′♯ ) ◦ T e ◦ s

(g ′♯ is a T -algebra homomorphism)

= χ ◦ T (g ′♯ ◦ e) ◦ s

(functoriality of T )

= χ ◦ Tg ◦ s

(g ′♯ ◦ e = g as shown above)

= g♯ ◦ s

(deﬁnition of g ♯ )

= idX

(s is right inverse to g ♯ ).

We thus see that e♯ ◦ s is right inverse to g ′♯ , which means that G′ is a set of
generators for X.
If we determine that an element is isolated, there is no need to check this
again later when the set of generators has been reduced. This is thanks to the
following result.
g′

g

Proposition 4.2. If G −
→ X and G′ −→ X are sets of generators for a T ′
algebra X such that G ⊆ G and g ′ is the restriction of g to the domain G′ , then
whenever an element r ∈ G′ is isolated in G, it is also isolated in G′ .
Proof. We will show that redundant elements in G′ are also redundant in G.
If r ∈ G′ is isolated in G′ , then there exists U ∈ T (G′ \ {r}) such that (g ′ ◦
′
G′ \{r} ♯
) (U ) = g ′ (r). Note that g ′ = g ◦ ιG
ιG′
G . We have
G\{r} ♯

(g ◦ ιG

G′ \{r}

G\{r}

) (T (ιG\{r} )(U )) = (g ◦ ιG

′

G′ \{r}

◦ ιG\{r} )♯ (U )
′

G \{r}

♯
= (g ◦ ιG
G ◦ ιG\{r} ) (U )
G′ \{r}

= (g ′ ◦ ιG\{r} )♯ (U )
= g ′ (r)
= g(r),
so r is redundant in G.
9

(2)

Finally, taking the observable quotient M of a T -automaton Q preserves
generators, considering that the T -automaton homomorphism m : Q → M is a
split epi in Set under the axiom of choice.
Proposition 4.3. If Q and M are T -algebras, m : Q → M is a T -algebra
g
homomorphism that is a split epi in Set, and G −
→ Q is a set of generators for
g
m
Q, then G −
→ Q −→ M is a set of generators for M .
Proof. Let a : T Q → Q be the T -algebra structure on Q and b : T M → M the
one on M . We have
(m ◦ g)♯ = b ◦ T (m ◦ g) = b ◦ T (m) ◦ T (g) = m ◦ a ◦ T g = m ◦ g ♯
using that m is a T -algebra homomorphism. It is well known that compositions
of split epis are split epis themselves, so G is a set of generators for M .
Now we are ready to deﬁne the construction that builds a T -succinct automaton accepting the same language as a T -automaton.
Construction 4.4 (T -minimization). Starting from a T -automaton (X, i, δ),
where X has a ﬁnite set of generators, we execute the following steps.
1. Take the observable quotient (M, i0 , δ0 ) of (X, i, δ).
2. Compute a minimal set of generators G of M by starting from the full set
M and applying Proposition 4.1.
3. Compute and return the corresponding T -succinct automaton as deﬁned
in Deﬁnition 3.5 via Proposition 3.4.
Generic minimization algorithms have been proposed in the literature. For
example, Adámek et al. give a general procedure to compute the observable
quotient [1], and König and Küpper provide a generic partition reﬁnement algorithm for coalgebras, with a focus on instantiations to weighted automata [14].
None of these works provide any complexity analysis. Recently, Dorsch et al. [11]
have presented a coalgebraic Paige–Tarjan algorithm and provided a complexity analysis for a class of functors in categories with image-factorization. These
restrictions match well the ones we make, and therefore their algorithm could
be applied in our ﬁrst step. Given a ﬁnite set of generators G, the loop in the
second step involves considering each element of G and checking whether it is
redundant. If so, we will remove the element from G and continue the loop.
The redundancy check is the only part for which computability needs to be
determined in each speciﬁc setting.
Example 4.5 (Join-semilattices). We give an example of the construction in
the category JSL of complete join-semilattices. We start from a minimal Pautomaton (in JSL) that has 4 states and is depicted below on the left. The
dashed blue lines indicate the JSL structure.
a, b
b

z

b

a
a, b

y

x

⊥

y

x

a
a, b

b
10

a, b
b

Since the automaton is minimal, it is isomorphic to its observable quotient.
We start from the full set of generators {⊥, x, y, z}. Note that z is the union
of x and y, so we can eliminate it. Additionally, ⊥ is the empty union and can
be removed as well. Both x and y are isolated elements and form the unique
minimal set of generators G = {x, y} (see the remark above Proposition 3.7).
These are exactly the join-irreducibles of M . They induce by Proposition 3.4
an automaton (T G, j, γ), where γ is the same transition structure as the above
automaton, but with {x, y} substituted for z; the initial state is the singleton set
{x}. The P-succinct automaton corresponding to this minimal set of generators
(Deﬁnition 3.5) is the non-deterministic automaton shown on the right.
Note that the deﬁnition of the automaton deﬁned in Proposition 3.4 depends
on the right inverse chosen for the extension of the generator map. When the
original JSL automaton is reachable (every state is reached by some set of
words, where a set of words reaches the join of the states reached by the words it
contains), this right inverse may be chosen in such a way to recover the canonical
residual finite state automaton (RFSA), as well as the simpliﬁed canonical RFSA,
both due to Denis et al. [10]. Details are given in [23]. See [17] for conditions
under which the canonical RFSA, referred to as the jiromaton, is a state-minimal
NFA.
5. Main construction
In this section we present the main construction of the paper. Given a finite
automaton (X, i, δ) in Set, i.e., an automaton where X is ﬁnite, this construction
builds an equivalent T -succinct automaton.
The ﬁrst step is taking the reachable part R of X and converting this automaton into a T -automaton recognising the same language.
Proposition 5.1. Let (T R, î, δ̂) be the T -automaton defined as follows:
1
i

R

î=ηR ◦i
ηR

TR
A
δ̂=((id×ηR
)◦δ)♯

δ

O × RA

A
id×ηR

O × T (R)A

Then (R, i, δ) and (T R, î, δ̂) accept the same language.
Proof. The diagram above means that ηR is a coalgebra homomorphism, and
as such it preserves language. Explicitly: x ∈ R accepts the same language as
ηR (x), which in particular holds for i(⋆) and î(⋆).
Now we can T -minimize (T R, î, δ̂) (Construction 4.4), which yields an equivalent T -automaton. Notice that, R being ﬁnite, any quotient of T R has a ﬁnite
set of generators. This is a consequence of R being a set of generators for T R
(Lemma 3.3) and of generators being preserved by quotients (Proposition 4.3).
It follows that every step of the T -minimization construction terminates.
Proposition 5.2. The T -succinct automaton defined above is at least as small
as the minimal deterministic automaton equivalent to X.
11

Proof. The situation is summed up in the following commutative diagram:
G

R

g

η

TR

e

M

m

OA

∗

Here G is the ﬁnal minimal set of generators for M resulting from the construction. Commutativity follows from G being a subset of the set of generators
R.
The minimal deterministic automaton equivalent to X is obtained from R
by merging language-equivalent states. Recalling (4) and the proof of Proposition 5.1, we see that e ◦ ηR is a coalgebra homomorphism. Together with
commutativity of the above diagram, this means that the language accepted by
r ∈ G (seen as a state of R) is given by (m ◦ g)(r). Since G is a subset of R,
to show that G is at least as small as the minimal deterministic automaton, we
only have to show that diﬀerent states in G accept diﬀerent languages. That is,
we will show that m ◦ g is injective. We know that m is injective by deﬁnition;
to see that g is injective, consider r1 , r2 ∈ G such that g(r1 ) = g(r2 ). Then
g(r1 ) = g(r2 ) = g ♯ (η(r2 )). Assuming r1 6= r2 leads to the contradiction that G
is not a minimal set of generators because in this case η(r2 ) ∈ T (G \ {r1 }).
Computing the determinization T R is an expensive operation that only terminates if T preserves ﬁnite sets. One could devise an optimized version of
Construction 4.4 in which the determinization is not computed completely in
order to minimize it. Instead, we could choose to work with data structures as
Böllig et al. [7] did for non-deterministic automata, and which we generalized
in recent work [23]. In these papers, partial representations of the determinized
automaton are used in an iterative process to compute the generators of the
state space of the minimal one.
6. Examples
6.1. Monads preserving finite sets
If T preserves ﬁnite sets, then there is a naive method to ﬁnd a redundant
element: assuming a ﬁnite set of generators G for a T -algebra X, the set T (G \
{r}) is also ﬁnite for any r ∈ G. Thus, we can loop over all U ∈ T (G \ {r}) and
check if the generator map g : G → X satisﬁes g ♯ (U ) = g(r).
6.1.1. Alternating automata.
We now use our construction to get small alternating ﬁnite automata (AFAs)
over a ﬁnite alphabet A. AFAs generalize both non-deterministic and universal
automata, where the latter are the dual of non-deterministic automata: a word
is accepted when all paths reading it are accepting. In an AFA, reading a symbol
leads to a DNF formula (without negation) of next states.
We use the characterization of alternating automata due to Bertrand [5].
Given a partially ordered set (P, ≤), an upset is a subset U of P such that
whenever x ∈ U and x ≤ y, then y ∈ U . Given Q ⊆ P , we write ↑Q for the
upward closure of Q, that is the smallest upset of P containing Q. We consider

12

a

q0
b

b
q1

q2
a

a

a, b
b

q4

q0

a

q2
b

a, b
a

q3

(a) Deterministic automaton

a, b
q1

(b) Small corresponding AFA

Figure 1: Automata for the language {a, ba, bb, baa}

the monad TAlt that maps a set X to the set of all upsets of P(X). Its unit is
given by ηX (x) =↑{{x}} and its multiplication by
µX (U ) = {V ⊆ X | ∃W ∈U ∀Y ∈W ∃Z∈Y Z ⊆ V }.
The sets of sets in TAlt (X) can be seen as DNF formulae over elements of X:
the outer powerset is interpreted disjunctively and the inner one conjunctively.
Accordingly, we deﬁne an algebra structure β : TAlt (2) → 2 on the output set
2 by letting β(U ) = 1 if {1} ∈ U and β(U ) = 0 otherwise. Recall from (3) in
Section 2 that such an algebra structure induces a distributive law.
We now explicitly spell out the T -minization algorithm that turns a DFA
(X, i, δ) into a TAlt -succinct AF A.
1. Compute the reachable states R of (X, i, δ) via a standard visit of its
graph.
2. Compute the corresponding freely-generated TAlt -automaton (TAlt R, î, δ̂),
by generating all DNF formulae TAlt R on R.
3. Compute the observable quotient (M, i0 , δ0 ) of (TAlt R, î, δ̂) via a standard minimization algorithm, such as the coalgebraic Paige–Tarjan algorithm [11].
4. Compute a minimal set of generators for M as follows. Consider the generator map idM : M → M , for which we have that id♯ is the algebra map
of M . Pick r ∈ M , and iterate over all DNF formulae ϕ over M \ {r}; if
there is ϕ which is mapped to r by the algebra map of M (i.e., id♯ ), r is
redundant and can be removed from M . Repeat until no more elements
are removed from M , which yields a minimal set of generators G.
5. Return the TAlt -succinct automaton (G, i0 , i0 ◦ η).
Note that every step of this algorithm terminates, as X is ﬁnite and the size of
|R|
TAlt R is 22 .
Example 6.1. Consider the regular language over A = {a, b} given by the
ﬁnite set {a, ba, bb, baa}. The minimal DFA accepting this language is given in
Figure 1a.
According to our construction, we ﬁrst construct a TAlt -automaton with
state space freely generated from this automaton (which is already reachable).
Then we TAlt -minimize it in order to obtain a small AFA. In this case, there is
a unique minimal subset of 3 generators: G = {q0 , q1 , q2 }. To see this, consider

13

the languages JqK accepted by states q of the deterministic automaton:
Jq0 K = {a, ba, bb, baa}

Jq2 K = {ε}

Jq1 K = {a, b, aa}

Jq3 K = {ε, a}.

Jq4 K = ∅

These languages generate the states of the minimal TAlt -automaton by interpreting joins as unions and meets as intersections. We note that Jq4 K is just an
empty join and Jq3 K = (Jq0 K ∩ Jq1 K) ∪ Jq2 K.3 These are the only redundant generators. Removing them leads to the AFA in Figure 1b. Here the black square
represents a conjunction of next states.
6.1.2. Complete Atomic Boolean Algebras
We now consider the monad C given by the double contravariant powerset
X
functor, namely CX = 22 . Here the outer powerset is treated disjunctively as
in the case of TAlt , and the sets provided by the inner powerset are interpreted
as valuations. Thus, elements of C(X) can be seen as full DNF formulae over
X: every conjunctive clause contains for each x ∈ X either x or the negation x
of x. The unit assigns to an element x the disjunction of all full conjunctions
containing x, and the multiplication turns formulae of formulae into full DNF
formulae in the usual way. Algebras for this monad are known as complete
atomic boolean algebras (CABAs).
Using the fact that 2 is a free CABA (2 ∼
= C(∅)), we obtain the following
semantics for C-succinct automata: a set of sets of states is accepting if and
only if it contains the exact set F of accepting states. This is diﬀerent from
alternating automata, where a subset of F is suﬃcient. Reading a symbol in a
C-succinct automaton works as follows. Suppose we are in a set of sets of states
S ∈ C(Q), where we read a symbol a. The resulting set of sets contains U ⊆ Q
if and only if there is a set V ∈ S such that every state in V transitions into a
set of sets containing U , and every state not in V does not transition into any
set of sets containing U .
Note that every DNF formula can be converted to a full DNF formula. This
implies that C-succinct automata can always be as small as the smallest AFAs
for a given language. With the following example we show that they can actually
be strictly smaller. The T -minimization algorithm for AFA we have given in
the previous section applies to this setting as well (including negation in DNF
formulae).
Example 6.2. Consider the regular language of words over the singleton alphabet A = {a} whose length is non-zero and even. The minimal DFA accepting this
language is shown in Figure 2a. We start the algorithm with the C-automaton
with state space freely generated from this this DFA and merge the languageequivalent states. Initially, the set of generators is the set of states of the original
DFA. By noting that the language accepted by q2 is the negation of the one accepted by q1 , in full DNF form Jq2 K = (Jq0 K ∩ Jq1 K) ∪ (Jq0 K ∩ Jq1 K) (where for
any language U its complement is deﬁned as U = A∗ \ U ), we see that q2 is
3 Strictly speaking, we should take the upwards-closure of this disjunction (adding any
possible set of elements to each conjunction as an additional clause). We choose to use the
equivalent succinct formula both here and in the subsequent AFA construction to aid readability.

14

q0
q0
a
a

a
q1

a

q2
q1

a

a

(b) C-succinct automaton

(a) Deterministic automaton

Figure 2: Automata for the language of non-zero even words over {a}

redundant. The set of generators {q0 , q1 } is minimal and corresponds to the Csuccinct automaton in Figure 2b. We depict C-succinct automata in the same
manner as AFAs, but note that their interpretation is diﬀerent. Here the transition into the black square represents the transition into the conjunction of the
negations of q0 and q1 .
We now show that there is no AFA with two states accepting the same
language. Suppose such an AFA exists, and let the state space be X = {x0 , x1 }.
Since a and aaa are not in the language but aa is, one of these states must
be accepting and the other must be rejecting.4 Without loss of generality we
assume that x0 is rejecting and x1 is accepting. The empty word is not in the
language, so our initial conﬁguration has to be ↑{{x0 }}. Since a is also not in
the language, x0 will have to transition to ↑{{x0 }} as well. However, this implies
that aa is not accepted by the AFA, which contradicts the assumption that it
accepts the right language.
Unfortunately, the fact that the transition behavior of a set of states depends
on states not in that set generally makes it diﬃcult to work with C-succinct
automata by hand.
6.1.3. Symmetry
We now consider succinct automata that exploit symmetry present in their
accepted language. Given a ﬁnite group G, consider the monad G × (−), where
the unit pairs any element with the unit of G and the multiplication applies the
multiplication of G. The algebras for G × (−) are precisely left group actions.
We assume an action on the alphabet A; if no such action is relevant, one may
π2
A. We also assume an action on the
consider the trivial action G × A −→
output set O. Group actions will be denoted by a centered dot. We consider the
distributive law ρ : G × (O × (−)A ) ⇒ O × (G × (−))A given by
ρX (g, o, f ) = (g · o, λa.(g, f (g −1 · a))).
We explain the resulting semantics of (G × (−))-succinct automata in an example.
4 If there were no rejecting states, the only way to reject a word is by ending up in the
empty set of sets of states. However, this means that extensions of that word are rejected
as well. Similarly, if there are no accepting states one can only accept by ending up in ↑{∅},
which accepts everything.

15

a, b

a
q0 , ⊥

q1 , ⊥
a

a

q3 , a

b/(ab)

a, b

b

a/e
q0 , ⊥

b

q2 , ⊥

b

(a) Deterministic automaton

a/e, b/e

q4 , b

q1 , ⊥

a/e

q3 , a

b/(ab)
(b) Corresponding (G × (−))-succinct automaton

Figure 3: Automata outputting the first symbol to appear twice in a row

Example 6.3. Consider the group Perm({a, b}) = {e, (ab)} of permutations
over elements a and b. Here e is the identity and (ab) swaps a and b. We consider
the alphabet A = {a, b} with an action Perm(A) × A → A given by applying
the permutation to the element of A, and the output set O = A ∪ {⊥} with an
action given by
(ab) · a = b

(ab) · b = a

(ab) · ⊥ = ⊥.

Figure 3a shows a deterministic automaton over the alphabet A with outputs
in O. States are labeled by pairs (q, o), where q is a state label and o the output
of the state. The recognized language is the one assigning to a word over A the
ﬁrst input symbol appearing twice in a row, or ⊥ if no such symbol exists. This
deterministic automaton is in fact the minimal (Perm(A)×(−))-automaton. The
action on its state space is deﬁned by
(ab) · q0 = q0

(ab) · q1 = q2

(ab) · q2 = q1

(ab) · q3 = q4

(ab) · q4 = q3 .

We note that in the set of generators given by the full state space, q1 , q2 , q3 , and
q4 are redundant. After removing q2 , only q3 and q4 are redundant. Subsequently
removing q4 leaves no redundant elements.
The ﬁnal (G × (−))-succinct automaton is shown in Figure 3b. Its actual
conﬁgurations are pairs of a group element and a state. Transition labels are of
the form x/g, where x ∈ A and g ∈ Perm(A). If we are in a conﬁguration (g, q)
and state q has an associated output o ∈ O, the actual output is g ·o. On reading
a symbol x ∈ A, we ﬁnd the outgoing transition of which the label starts with
the symbol g −1 ·x. Supposing this label contains a group element g ′ and leads to
a state q ′ , the resulting conﬁguration is (gg ′ , q ′ ). For example, consider reading
the word bb. We start in the conﬁguration (e, q0 ). Reading b here simply takes
the transition corresponding to b, which brings us to ((ab), q1 ). Now reading the
second b, we actually read (ab)−1 · b = (ab) · b = a. This brings us to ((ab), q3 ).
The output is then given by (ab) · a = b.
In general, sets of generators in this setting correspond to subsets in which
all orbits are represented. The orbits of a set X with a left group action are the
equivalence classes of the relation that identiﬁes elements x, y ∈ X whenever
there exists g ∈ G such that g ·x = y. Minimal sets of generators contain a single
16

representative for each orbit. The algorithm given for AFAs in section 6.1.1 can
be applied to this setting as well: step 4 will remove elements until only orbit
representatives are left.
6.2. Vector Spaces
We now exploit vector space structures. Given a ﬁeld F, consider the free
vector space monad V . It maps each set X to the set of functions X → F with
ﬁnite support (ﬁnitely many elements of X are mapped to a non-zero value). A
function f : X → Y is mapped to the function V (f ) : V (X) → V (Y ) given by
X
g(x).
V (f )(g)(y) =
x∈X,f (x)=y

The unit η : X → V (X) and multiplication µ : V V (X) → V (X) of the monad
are given by
(
X
1 if x = x′
′
f (g) · g(x) ∈ F.
η(x)(x ) =
µ(f )(x) =
′
0 if x 6= x
g∈V (X)
Here 0 and 1, as well as addition and multiplication, are those of the ﬁeld F.
Elements of V (X) can alternatively be written as formal sums v1 x1 + · · · + vn xn
with vi ∈ F and xi ∈ X for all i. We will use this notation in the example below.
Algebras for the free vector space monad are precisely vector spaces. We use
the output set O = F, and the alphabet can be any ﬁnite set A. Instantiating
(3), this leads to a pointwise distributive law ρ : V (O × (−)A ) ⇒ O × V (−)A
given at a set X by


X
X
ρ(f ) = 
f (o, g) · o, λa.λx.
f (o, g) .
(o,g)∈O×X A

(o,g)∈O×X A ,g(a)=x

With these deﬁnitions, the V -succinct automata are weighted automata. We
note that if F is inﬁnite, any non-trivial V -automaton will also be inﬁnite. However, we can still start from a given weighted automaton and apply a slight
modiﬁcation of Construction 4.4: minimize from the succinct representation,
use the states of the succinct representation as initial set of generators, and
ﬁnally ﬁnd a minimal set of generators. Moreover, we may add a reachability
analysis, which in this case cannot lead to a larger automaton. Thus, the resulting algorithm essentially comes down to the standard minimization algorithm
for weighted automata [19], where the process of removing redundant generators
is integrated into the minimization. If F is ﬁnite and we do want to start from
a deterministic automaton, we can consider this automaton as a weighted one
by assigning each transition a weight of 1.

Example 6.4. Consider for F = R the deterministic automaton in Figure 4a.
This is a minimal automaton in Set; the freely generated V -automaton is inﬁnite, and so is its minimization. However, that minimization has the states
of the automaton in Figure 4a as a set of generators. To gain insight into this
minimization, we compute the languages accepted by those generators (apart

17

q1 , 1
a, b, c

a
b

q0 , 0

q1 , 1

a, b, c

q2 , 1

c

a, c
a, b, c

q4 , 0

q0 , 0

a, b, c

a, b, c

b, c/2

q3 , 3

q2 , 1

(a) Deterministic automaton

(b) Succinct weighted automaton

Figure 4: Succinctness via a weighted automaton

from q0 ):
q1 :
q2 :

ε 7→ 1
ε 7→ 1

a 7→ 1
a 7→ 0

b 7→ 1
b 7→ 0

c 7→ 1
c 7→ 0

q3 :
q4 :

ε 7→ 3
ε 7→ 0

a 7→ 1
a 7→ 0

b 7→ 1
b 7→ 0

c 7→ 1
c 7→ 0

Words not displayed are mapped to 0 by any state. The language of q0 is the
only one assigning non-zero values to certain words of length two, such as aa,
and therefore q0 cannot be a redundant generator. The other generators are
redundant: writing JqK for the language of a state q, Jq4 K is just a zero-ary sum,
and we have
Jq1 K = Jq3 K − 2Jq2 K

Jq2 K =

1
1
Jq3 K − Jq1 K
2
2

Jq3 K = Jq1 K + 2Jq2 K.

Once q4 is removed, all other generators are still redundant. Further removing
q3 makes q1 and q2 isolated. Therefore, V -minimization yields the weighted
automaton shown in Figure 4b. Here a transition on an input x ∈ A with
weight w ∈ F receives the label x/w, or just x if w = 1. Weights multiply along
a path, and diﬀerent possible paths add up to assign a value to a word. Reading
c from q0 , for example, we move to q1 + 2q2 , which has an output of 1 + 2 ∗ 1 = 3.
In general, the (sub)sets of generators of a vector space are its subsets that
span the whole space, and such a set of generators is minimal precisely when
it forms a basis. The weighted automaton resulting from our algorithm is the
usual minimal weighted automaton for the language. Redundant elements can
be found using standard techniques such as Gaussian elimination.
7. Conclusions
We have presented a construction to obtain succinct representations of deterministic ﬁnite automata as automata with side-eﬀects. This construction is
very general in that it is based on the abstract characterisation of side-eﬀects
as monads. Nonetheless, it can be easily implemented. An essential part of our
construction is the computation of a minimal set of generators for an algebra.
We have provided an algorithm for this that works for any suitable Set monad.
18

We have applied the construction to several non trivial examples: alternating automata, automata with symmetries, CABA-structured automata, and weighted
automata.
Related work. This work revamps and extends results of Arbib and Manes [4],
as discussed throughout the paper. We note that most of their results are formulated in a more general category, whereas here we work speciﬁcally in Set.
The reason for this is that we focus on the procedure for ﬁnding minimal sets
of generators by removing redundant elements, which are deﬁned using set subtraction (Deﬁnition 3.6). This limitation is already present in the work of Arbib and Manes, who spend little time on the subject and only study the nondeterministic case in detail. Our main contribution, the general procedure for
ﬁnding a minimal set of generators, is not present in their work. It generalizes
several techniques to obtain compact automaton representations of languages,
some of them presented in the context of learning algorithms [10, 7, 3]. Preliminary results on generalizing succinct automaton constructions within a learning
algorithm can be found in [23].
In [17], Myers et al. present a coalgebraic construction of canonical nondeterministic automata. Their speciﬁc examples are the átomaton [9], obtained
from the atoms of the boolean algebra generated by the residual languages (the
languages accepted by the states of the minimal DFA); the canonical RFSA;
the minimal xor automaton [24], actually a weighted automaton over ﬁeld with
two elements rather than a non-deterministic one; and what they call the distromaton, obtained from the atoms of the distributive lattice generated by the
residual languages. They further provide speciﬁc algorithms for obtaining some
of their example succinct automata.
The underlying idea in the work of Myers et al. for ﬁnding succinct representations of algebras is similar to ours, and the deterministic structured automata
they start from are equivalent: in their paper the deterministic automata live
in a locally ﬁnite variety, which translates to the category of algebras for a
monad that preserves ﬁnite sets (such as those in Section 6.1). They also deﬁne the succinct automaton using a minimal set of generators for the algebra,
but instead of our algorithmic approach of getting to this set by removing redundant generators, they use a dual equivalence between ﬁnite algebras and a
suitable modiﬁcation of the category of sets and relations between them. This
seems to restrict their work to non-deterministic automata, although there may
be an easy generalization: the equivalence would be with a modiﬁcation of a
Kleisli category. A major diﬀerence with our work is that they have no general
algorithm to construct the succinct automata; as mentioned, speciﬁc ones are
provided for their examples. In fact, they provide no guidelines on how to ﬁnd
a suitable equivalence for a given variety. On the other hand, their equivalences
guarantee uniqueness up to isomorphism of the succinct automata, which is a
desirable property for many applications.
The restriction in the work of Myers et al. to locally ﬁnite varieties means
that our example of weighted automata over an inﬁnite ﬁeld (Section 6.2) cannot
be captured in their work. Conversely, since both the átomaton and the distromaton are non-deterministic NFAs obtained from categories of algebras with
more structure than JSLs, these examples are not covered by our work. Their
other examples, however, the canonical RFSA and the minimal xor automaton,
are obtained using instances of our method as well. The fact that the problem
19

of ﬁnding in general a suitable equivalence is open means it is not trivial to
determine whether our approach can be seen as a special case of a generalized
version of theirs when we restrict to monads that preserve ﬁnite sets.
Future work. The main question that remains is under which conditions the
notion of a minimal set of generators actually describes a size-minimal set of
generators. Proposition 3.7 provides a partial answer to this question, but its
conditions fail to apply the majority of our examples, even though in some of
these cases minimal does mean size-minimal. A related question is whether we
can ﬁnd heuristics to increase the state space of a T -automaton in such a way
that the number of generators decreases. The reason the canonical RFSAs of
Denis et al. [10] are not always state-minimal NFAs is because the states of
these NFAs, seen as singletons in the determinized automaton, in general are
not reachable. Hence, removing unreachable states from a T -automaton may
increase the size of minimal sets of generators, which is why Construction 4.4
does not include a reachability analysis. Although ﬁnding state-minimal NFAs
is PSPACE-complete, a moderate gain might still be possible.


arXiv:2007.06327v1 [cs.LO] 13 Jul 2020

Generating Functions for Probabilistic
Programs⋆
Lutz Klinkenberg1[0000−0002−3812−0572] , Kevin Batz1[0000−0001−8705−2564] ,
Benjamin Lucien Kaminski1,2[0000−0001−5185−2324] , Joost-Pieter
Katoen1[0000−0002−6143−1926] , Joshua Moerman1[0000−0001−9819−8374] , and
Tobias Winkler1[0000−0003−1084−6408]
1

RWTH Aachen University, 52062 Aachen, Germany
2
University College London, United Kingdom
{lutz.klinkenberg, kevin.batz, benjamin.kaminski, katoen, joshua,
tobias.winkler}@cs.rwth-aachen.de

Abstract. This paper investigates the usage of generating functions
(GFs) encoding measures over the program variables for reasoning about
discrete probabilistic programs. To that end, we define a denotational
GF-transformer semantics for probabilistic while-programs, and show
that it instantiates Kozen’s seminal distribution transformer semantics.
We then study the effective usage of GFs for program analysis. We show
that finitely expressible GFs enable checking super-invariants by means
of computer algebra tools, and that they can be used to determine termination probabilities. The paper concludes by characterizing a class of
— possibly infinite-state — programs whose semantics is a rational GF
encoding a discrete phase-type distribution.
Keywords: probabilistic programs · quantitative verification · semantics · formal power series.

1

Introduction

Probabilistic programs are sequential programs for which coin flipping is a firstclass citizen. They are used e.g. to represent randomized algorithms, probabilistic
graphical models such as Bayesian networks, cognitive models, or security protocols. Although probabilistic programs are typically rather small, their analysis
is intricate. For instance, approximating expected values of program variables at
program termination is as hard as the universal halting problem [15]. Determining higher moments such as variances is even harder. Deductive program verification techniques based on a quantitative version of weakest preconditions [18]
enable to reason about the outcomes of probabilistic programs, such as what is
the probability that a program variable equals a certain value. Dedicated analysis techniques have been developed to e.g., determine tail bounds [6], decide
almost-sure termination [19,8], or to compare programs [1].
⋆

This research was funded by the ERC AdG project FRAPPANT (787914) and the
DFG RTG 2236 UnRAVeL.

2

L. Klinkenberg et al.

This paper aims at exploiting the well-tried potential of probability generating
functions (PGFs [14]) for the analysis of probabilistic programs. In our setting,
PGFs are power series representations — generating functions — encoding discrete probability mass functions of joint distributions over program variables.
PGF representations — in particular if finite — enable a simple extraction of
important information from the encoded distributions such as expected values,
higher moments, termination probabilities or stochastic independence of program variables.
To enable the usage of PGFs for program analysis, we define a denotational semantics of a simple probabilistic while-language akin to probabilistic
GCL [18]. Our semantics is defined in a forward manner: given an input distribution over program variables as a PGF, it yields a PGF representing the resulting subdistribution. The “missing” probability mass represents the probability
of non-termination. More accurately, our denotational semantics transforms formal power series (FPS). Those form a richer class than PGFs, which allows for
overapproximations of probability distributions. While-loops are given semantics
as least fixed points of FPS transformers. It is shown that our semantics is in
fact an instantiation of Kozen’s seminal distribution-transformer semantics [16].
The semantics provides a sound basis for program analysis using PGFs. Using
Park’s Lemma, we obtain a simple technique to prove whether a given FPS overapproximates a program’s semantics i.e., whether an FPS is a so-called superinvariant. Such upper bounds can be quite useful: for almost-surely terminating
programs, such bounds can provide exact program semantics, whereas, if the
mass of an overapproximation is strictly less than one, the program is provably
non-almost-surely terminating. This result is illustrated on a non-trivial random
walk and on examples illustrating that checking whether an FPS is a superinvariant can be automated using computer algebra tools.
In addition, we characterize a class of — possibly infinite-state — programs
whose PGF semantics is a rational function. These homogeneous bounded programs (HB programs) are characterized by loops in which each unbounded variable has no effect on the loop guard and is in each loop iteration incremented by
a quantity independent of its own value. Operationally speaking, HB programs
can be considered as finite-state Markov chains with rewards in which rewards
can grow unboundedly large. It is shown that the rational PGF of any program that is equivalent to an almost-surely terminating HB program represents
a multi-variate discrete phase-type distribution [22]. We illustrate this result by
obtaining a closed-form characterization for the well-studied infinite-state dueling cowboys example [18].
Related work. Semantics of probabilistic programs is a well-studied topic. This
includes the seminal works by Kozen [16] and McIver and Morgan [18]. Other
related semantics of discrete probabilistic while-programs are e.g., given in several other articles like [18,24,10,23,4]. PGFs have recent scant attention in the
analysis of probabilistic programs. A notable exception is [5] in which generating
functions of finite Markov chains are obtained by Padé approximation. Computer

Generating Functions for Probabilistic Programs

3

algebra systems have been used to transform probabilistic programs [7], and more
recently in the automated generation of moment-based loop invariants [2].
Organization of this paper. After recapping FPSs and PGFs in Sections 2–3,
we define our FPS transformer semantics in Section 4, discuss some elementary properties and show it instantiates Kozen’s distribution transformer semantics [16]. Section 5 presents our approach for verifying upper bounds to
loop invariants and illustrates this by various non-trivial examples. In addition,
it characterizes programs that are representable as finite-state Markov chains
equipped with rewards and presents the relation to discrete phase-type distributions. Section 6 concludes the paper. All proofs can be found in the appendix.

2

Formal Power Series

Our goal is to make the potential of probability generating functions available
to the formal verification of probabilistic programs. The programs we consider
will, without loss of generality, operate on a fixed set of k program variables.
The valuations of those variables range over N. A program state σ is hence a
vector in Nk . We denote the state (0, . . . , 0) by 0.
A prerequisite for understanding probability generating functions are (multivariate) formal power series — a special way of representing a potentially infinite
k-dimensional array. For k=1, this amounts to representing a sequence.
Definition 1 (Formal Power Series). Let X = X1 , . . . , Xk be a fixed sequence of k distinct formal indeterminates. For a state σ = (σ1 , . . . , σk ) ∈ Nk ,
let Xσ abbreviate the formal multiplication X1σ1 · · · Xkσk . The latter object is
called a monomial and we denote the set of all monomials over X by Mon (X).
A (multivariate) formal power series (FPS) is a formal sum
X
[σ]F · Xσ ,
where
[ · ]F : N k → R ∞
F =
≥0 ,
σ∈Nk

R∞
≥0

where
denotes the extended positive real line. We denote the set of all FPSs
by FPS. Let F, G ∈ FPS. If [σ]F < ∞ for all σ ∈ Nk , we denote this fact
by F ≪ ∞. The addition F + G and scaling r · F by a scalar r ∈ R∞
≥0 is
defined coefficient-wise by
X
X

F +G =
[σ]F + [σ]G · Xσ
r · [σ]F · Xσ .
and
r·F =
σ∈Nk

σ∈Nk

For states σ = (σ1 , . . . , σk ) and τ = (τ1 , . . . , τk ), we define σ + τ = (σ1 +
τ1 , . . . , σk + τk ). The multiplication F · G is given as their Cauchy product (or
discrete convolution)
X
[σ]F · [τ ]G · Xσ+τ .
F ·G =
σ,τ ∈Nk

Drawing coefficients from the extended reals enables us to define a complete lattice on FPSs in Section 4. Our analyses in Section 5 will, however, only consider
FPSs with F ≪ ∞.

4

3

L. Klinkenberg et al.

Generating Functions
A generating function is a device somewhat similar to a bag. Instead of
carrying many little objects detachedly, which could be embarrassing, we
put them all in a bag, and then we have only one object to carry, the bag.
— George Pólya [25]

Formal power series pose merely a particular way of encoding an infinite kdimensional array as yet another infinitary object, but we still carry all objects
forming the array (the coefficients of the FPS) detachedly and there seems to
be no advantage in this particular encoding. It even seems more bulky. We will
now, however, see that this bulky encoding can be turned into a one-object bag
carrying all our objects: the generating function.
Definition 2 (Generating
Functions). The generating function of a formal
P
power series F = σ∈Nk [σ]F · Xσ ∈ FPS with F ≪ ∞ is defined as the partial
function
f:

[0, 1]k 99K R≥0 ,

(x1 , . . . , xk ) 7→

X

[σ]F · xσ1 1 · · · xσk k .

σ=(σ1 ,...,σk )∈Nk

In other words: in order to turn an FPS into its generating function, we merely
treat every formal indeterminate Xi as an “actual” indeterminate xi , and the
formal multiplications and the formal sum also as “actual” ones. The generating
function f of F is uniquely determined by F as we require all coefficients of
F to be non-negative, and so the ordering of the summands is irrelevant: For
a given point x ∈ [0, 1]k , the sum defining f (x) either converges absolutely to
some positive real or diverges absolutely to ∞. In the latter case, f is undefined
at x and hence f may indeed be partial.
Since generating functions stem from formal power series, they are infinitely
often differentiable at 0 = (0, . . . , 0). Because of that, we can recover F from f
as the (multivariate) Taylor expansion of f at 0.
Definition 3 (Multivariate Derivatives and Taylor Expansions). For
σ = (σ1 , . . . , σk ) ∈ Nk , we write f (σ) for the function f differentiated σ1 times
in x1 , σ2 times in x2 , and so on. If f is infinitely often differentiable at 0, then
the Taylor expansion of f at 0 is given by
X

σ∈Nk

f (σ) ( 0 )
· xσ1 · · · xσk k .
σ1 ! · · · σk ! 1

If we replace every indeterminate xi by the formal indeterminate Xi in the
Taylor expansion of generating function f of F , then we obtain the formal power
series F . It is in precisely that sense, that f generates F .

Generating Functions for Probabilistic Programs

5

Example 1 (Formal Power Series and Generating Functions). Consider the infinite (1-dimensional) sequence 1/2, 1/4, 1/8, 1/16, . . .. Its (univariate) FPS — the
entity carrying all coefficients detachedly — is given as
1
1
1
1
1
1
1 1
+ X + X2 + X3 + X4 + X5 +
X6 +
X7 + . . . .
2 4
8
16
32
64
128
256

(†)

On the other hand, its generating function — the bag — is given concisely by
1
.
2−x

(♭)

Figuratively speaking, (†) is itself the infinite sequence an := 21n , whereas (♭) is a
bag with the label “infinite sequence an := 21n ”. The fact that (†) generates (♭),
1
at 0 being 21 + 41 x + 18 x2 + . . ..
△
follows from the Taylor expansion of 2−x
The potential of generating functions is that manipulations to the functions —
i.e. to the concise representations — are in a one-to-one correspondence to the
associated manipulations to FPSs [9]. For instance, if f (x) is the generating
function of F encoding the sequence a1 , a2 , a3 , . . ., then the function f (x) · x is
the generating function of F · X which encodes the sequence 0, a1 , a2 , a3 , . . .
As another example for correspondence between operations on FPSs and
generating functions, if f (x) and g(x) are the generating functions of F and G,
respectively, then f (x) + g(x) is the generating function of F + G.
Example 2 (Manipulating to Generating Functions). Revisiting Example 1, if
1
we multiply 2−x
by x, we change the label on our bag from “infinite sequence
1
:= 21n ” and — just by
:=
an
2n ” to “a 0 followed by an infinite sequence an+1
changing the label — the bag will now contain what it says on its label. Indeed,
1 4
x
at 0 is 0 + 21 x + 41 x2 + 18 x3 + 16
x + . . . encoding
the Taylor expansion of 2−x
the sequence 0, 1/2, 1/4, 1/8, 1/16, . . .
△
Due to the close correspondence of FPSs and generating functions [9], we use
both concepts interchangeably, as is common in most mathematical literature.
We mostly use FPSs for definitions and semantics, and generating functions in
calculations and examples.
Probability Generating Functions. We now use formal power series to represent probability distributions.
Definition 4 (Probability Subdistribution). A probability subdistribution
(or simply subdistribution) over Nk is a function
X
µ : Nk → [0, 1],
such that
|µ| =
µ(σ) ≤ 1 .
σ∈Nk

We call |µ| the mass of µ. We say that µ is a (full) distribution if |µ| = 1,
and a proper subdistribution if |µ| < 1. The set of all subdistributions on Nk is
denoted by D≤ (Nk ) and the set of all full distributions by D(Nk ).

6

L. Klinkenberg et al.

We need subdistributions for capturing non-termination. The “missing” probability mass 1 − |µ| precisely models the probability of non-termination.
The generating function of a (sub-)distribution is called a probability generating function. Many properties of a distribution µ can be read off from its
generating function Gµ in a simple way. We demonstrate how to extract a few
common properties in the following example.
Example 3 (Geometric Distribution PGF). Recall Example 1. The presented formal power series encodes a geometric distribution µgeo with parameter 1/2 of a
single variable X. The fact that µgeo is a proper probability distribution, for
1
= 1. The expected
instance, can easily be verified computing Ggeo (1) = 2−1
1
′
△
value of X is given by Ggeo (1) = (2−1)2 = 1.
Extracting Common Properties. Important information about probability
distributions is, for instance, the first and higher moments. In general, the k th
factorial moment of variable Xi can be extracted from a PGF by computing
∂kG
(1, . . . , 1).3 This includes the mass |G| as the 0th moment. The marginal dis∂Xik
tribution of variable Xi can simply be extracted from G by G(1, . . . , Xi , . . . , 1).
We also note that PGFs can treat stochastic independence. For instance, for a
bivariate PGF H we can check for stochastic independence of the variables X
and Y by checking whether H(X, Y ) = H(X, 1) · H(1, Y ).

4

FPS Semantics for pGCL

In this section, we give denotational semantics to probabilistic programs in terms
of FPS transformers and establish some elementary properties useful for program
analysis. We begin by endowing FPSs and PGFs with an order structure:
Definition 5 (Order on FPS). For all F, G ∈ FPS, let
F  G

iff

∀ σ ∈ Nk :

[σ]G ≤ [σ]F .

Lemma 1 (Completeness of  on FPS). (FPS, ) is a complete latttice.
4.1

FPS Transformer Semantics

Recall that we assume programs to range over exactly k variables with valuations
in Nk . Our program syntax is similar to Kozen [16] and McIver & Morgan [18].
Definition 6 (Syntax of pGCL [16,18]). A program P in probabilistic Guarded
Command Language ( pGCL) adheres to the grammar
P ::= skip

xi := E

P;P

if(B) {P } else {P }
3

{P } [p] {P }

while (B) {P } ,

In general, one must take the limit Xi → 1 from below.

Generating Functions for Probabilistic Programs

7

where xi ∈ {x1 , . . . , xk } is a program variable, E is an arithmetic expression over
program variables, p ∈ [0, 1] is a probability, and B is a predicate (called guard)
over program variables.
The FPS semantics of pGCL will be defined in a forward denotational style,
where the program variables x1 , . . . , xk correspond to the formal indeterminates
X1 , . . . , Xk of FPSs.
For handling assignments, if-conditionals and while-loops, we need some
auxiliary functions on FPSs: For an arithmetic expression E over program variables, we denote by evalσ (E) the evaluation of E in program state σ. For a
predicate B ⊆ Nk and FPS F , we define the restriction of F to B by
X
hF iB :=
[σ]F · Xσ ,
σ∈B

i.e. hF iB is the FPS obtained from F by setting all coefficients [σ]F where σ 6∈ B
to 0. Using these prerequisites, our FPS transformer semantics is given as follows:
Definition 7 (FPS Semantics of pGCL). The semantics JP K : FPS → FPS of
a loop-free pGCL program P is given according to the upper part of Table 1.
The unfolding operator ΦB,P for the loop while (B) {P } is defined by
ΦB,P :

(FPS → FPS) → (FPS → FPS),


ψ 7→ λF . hF i¬B + ψ JP K hF iB .


The partial order (FPS, ) extends to a partial order FPS → FPS, ⊑ on FPS
transformers by a point-wise lifting of . The least element of this partial order is
the transformer 0 = λF . 0 mapping any FPS F to the zero series. The semantics
of while (B) {P } is then given by the least fixed point (with respect to ⊑) of its
unfolding operator, i.e.
Jwhile (B) {P }K = lfp ΦB,P .

Table 1. FPS transformer semantics of pGCL programs.
P

JP K (F )

skip
xi := E

F
P

{P1 } [p] {P2 }
if (B) {P1 } else {P2 }
P1 # P2
while(B){P }

evalσ (E)

σ∈Nk

µσ X1σ1 · · · Xi

· · · Xkσk

p · JP1 K (F ) + (1 − p) · JP2 K (F )


+ JP2 K hF i¬B

JP2 K JP1 K (F )

JP1 K hF iB


lfp ΦB,P (F ) ,

for


ΦB,P (ψ) = λF . hF i¬B + ψ JP K hF iB

8

(( G

L. Klinkenberg et al.

Example 4. Consider the program P = {x := 0} [1/2] {x := 1}# c := c + 1 and
the input PGF G = 1, which denotes a point mass on state σ = 0. Using the
annotation style shown in the left margin, denoting that JP ′ K (G) = G′ , we
calculate JP K (G) as follows:

P′
(( G′

(( 1
{x := 0} [1/2] {x := 1}#
((

1
2

+

X
2

c := c + 1
(( C2 + CX
2
As for the semantics of c := c + 1, see Table 2.

△

Before we study how our FPS transformers behave on PGFs in particular, we
now first argue that our FPS semantics is well-defined. While evident for loopfree programs, we appeal to the Kleene Fixed Point Theorem for loops [17],
which requires ω-continuous functions.
Theorem 1 (ω-continuity of pGCL Semantics). The semantic functional J · K
is ω-continuous, i.e. for all programs P ∈ pGCL and all increasing ω-chains
F1  F2  . . . in FPS,


= sup JP K (Fn ) .
JP K sup Fn
n∈N

n∈N

Theorem 2 (Well-definedness of FPS Semantics). The semantics functional J · K is well-defined, i.e. the semantics of any loop while (B) {P } exists
uniquely and can be written as
Jwhile (B) {P }K = lfp ΦB,P = sup ΦnB,P (0) .
n∈N

4.2

Healthiness Conditions of FPS Transformers

In this section we show basic, yet important, properties which follow from [16].
For instance, for any input FPS F , the semantics of a program cannot yield as
output an FPS with a mass larger than |F |, i.e. programs cannot create mass.
Table 2. Common assignments and their effects on the input PGF F (X, Y ).
P

JP K (F )

x := x + k
x := k · x

X k · F (X, Y )

x := x + y

F (X, XY )

F (X k , Y )

Generating Functions for Probabilistic Programs

9

Theorem 3 (Mass Conservation). For every P ∈ pGCL and F ∈ FPS, we
have JP K (F ) ≤ |F |.

A program P is called mass conserving if |JP K (F )| = |F | for all F ∈ FPS. Mass
conservation has important implications for FPS transformers acting on PGFs:
given as input a PGF, the semantics of a program yields a PGF.

Corollary 1 (PGF Transformers). For every P ∈ pGCL and G ∈ PGF, we
have JP K (G) ∈ PGF.

Restricted to PGF, our semantics hence acts as a subdistribution transformer.
Output masses may be smaller than input masses. The probability of nontermination of the programs is captured by the “missing” probability mass.
As observed in [16], semantics of probabilistic programs are fully defined by
their effects on point masses, thus rendering probabilistic program semantics
linear. In our setting, this generalizes to linearity of our FPS transformers.
Definition 8 (Linearity). Let F, G ∈ FPS and r ∈ R∞
≥0 be a scalar. The function ψ : FPS → FPS is called a linear transformer (or simply linear), if
ψ(r · F + G) = r · ψ(F ) + ψ(G) .
Theorem 4 (Linearity of pGCL Semantics). For every program P and
guard B, the functions h · iB and JP K are linear. Moreover, the unfolding operator ΦB,P maps linear transformers onto linear transformers.
As a final remark, we can unroll while loops:
Lemma 2 (Loop Unrolling). For any FPS F ,
Jwhile (B) {P }K (F ) = hF i¬B + Jwhile (B) {P }K JP K hF iB
4.3

Embedding into Kozen’s Semantics Framework


.

Kozen [16] defines a generic way of giving distribution transformer semantics
based on an abstract measurable space (X n , M (n) ). Our FPS semantics instantiates his generic semantics. The state space we consider is Nk , so that (Nk , P(Nk ))
is our measurable space.4 A measure on that space is a countably-additive function µ : P(Nk ) → [0, ∞] with µ(∅) = 0. We denote the set of all measures on our
space by M. Although, we represent measures by FPSs, the two notions are in
bijective correspondence τ : FPS → M, given by
X
τ (F ) = λS.
[σ]F .
σ∈S

This map preserves the linear structure and the order .
Kozen’s syntax [16] is slightly different from pGCL. We compensate for this by
a translation function T, which maps pGCL programs to Kozen’s. The following
theorem shows that our semantics agrees with Kozen’s semantics.5
4

5

We note that we want each point σ to be measurable, which enforces a discrete
measurable space.
Note that Kozen regards a program P itself as a function P : M → M.

10

L. Klinkenberg et al.

Theorem 5. The FPS semantics of pGCL is an instance of Kozen’s semantics,
i.e. for all pGCL programs P , we have
τ ◦ JP K = T(P ) ◦ τ .
Equivalently, the following diagram commutes:
FPS

τ

M

JP K

T(P )

FPS

M

τ

For more details about the connection between FPSs and measures, as well as
more information about the actual translation, see Appendix A.3.

5

Analysis of Probabilistic Programs

Our PGF semantics enables the representation of the effect of a pGCL program on
a given PGF. As a next step, we investigate to what extent a program analysis
can exploit such PGF representations. To that end, we consider the overapproximation with loop invariants (Section 5.1) and provide examples showing that
checking whether an FPS transformer overapproximates a loop can be checked
with computer algebra tools. In addition, we determine a subclass of pGCL programs whose effect on an arbitrary input state is ensured to be a rational PGF
encoding a phase-type distribution (Section 5.2).
5.1

Invariant-style Overapproximation of Loops

In this section, we seek to overapproximate loop semantics, i.e. for a given loop
W = while (B) {P }, we want to find a (preferably simple) FPS transformer ψ,
such that JW K ⊑ ψ, meaning that for any input G, we have JW K (G)  ψ(G)
(cf. Definition 7). Notably, even if G is a PGF, we do not require ψ(G) to be
one. Instead, ψ(G) can have a mass larger than one. This is fine, because it still
overapproximates the actual semantics coefficient-wise. Such overapproximations
immediately carry over to reading off expected values (cf. Section 3), for instance
∂
∂X

JW K (G) (1)

≤

∂
∂X ψ(G)(1)

.

We use invariant-style reasoning for verifying that a given ψ overapproximates
the semantics of JW K. For that, we introduce the notion of a superinvariant and
employ Park’s Lemma, a well-known concept from fixed point theory, to obtain
a conceptually simple proof rule for verifying overapproximations of while loops.
Theorem 6 (Superinvariants and Loop Overapproximations). Let ΦB,P
be the unfolding operator of while(B){P } (cf. Def. 7) and ψ : FPS → FPS. Then
ΦB,P (ψ) ⊑ ψ

implies

Jwhile (B) {P }K ⊑ ψ .

Generating Functions for Probabilistic Programs

11

We call a ψ satisfying ΦB,P (ψ) ⊑ ψ a superinvariant. We are interested in linear
superinvariants, as our semantics is also linear (cf. Theorem 4). Furthermore,
linearity allows to define ψ solely in terms of its effect on monomials, which
makes reasoning considerably simpler:
Corollary 2. Given a function f : Mon (X) → FPS, let the linear extension fˆ
of f be defined by
X
[σ]F f (Xσ ) .
fˆ: FPS → FPS, F 7→
σ∈Nk

Let ΦB,P be the unfolding operator of while (B) {P }. Then
∀ σ ∈ Nk :

ΦB,P (fˆ)(Xσ ) ⊑ fˆ(Xσ )

implies

Jwhile (B) {P }K ⊑ fˆ .

We call an f satisfying the premise of the above corollary a superinvariantlet. Notice that superinvariantlets and their extensions agree on monomials, i.e.
f (Xσ ) = fˆ(Xσ ). Let us examine a few examples for superinvariantlet-reasoning.
Example 5 (Verifying Precise Semantics). In Program 1.1, in each iteration, a
fair coin flip determines the value of x. Subsequently, c is incremented by 1.
Consider the following superinvariantlet:
(
C
, if i = 1;
f (X i C j ) = C j · 2−C
X i,
if i 6= 1.
To verify that f is indeed a superinvariantlet, we have to show that

ΦB,P (fˆ)(X i C j ) = X i C j x6=1 + fˆ JP K X i C j x=1
!

⊑ fˆ X i C j

For i 6= 1, we get
ΦB,P (fˆ)(X i C j ) =

X iC j


.

x6=1

+ fˆ(JP K (0))

= X i C j = f (X i C j ) = fˆ(X i C j ) .
For i = 1, we get
ΦB,P (fˆ)(X 1 C j ) = fˆ
=
=

1 0 j+1
+ 12 X 1 C j+1
2X C


0 j+1
1
+ 21 f X 1 C j+1
2f X C


1 j
C j+1
= fˆ X 1 C j
2−C = f X C


(by linearity of fˆ)
. (by definition of f )

C
.
Hence, Corollary 2 yields JW K (X) ⊑ f (X) = 2−C
For this example, we can state even more. As the program is almost surely
terminating, and f (X i C j ) = 1 for all (i, j) ∈ N2 , we conclude that fˆ is exactly
the semantics of W , i.e. fˆ = JW K.
△

12

L. Klinkenberg et al.

while ( x = 1 ) {
{x := 0} [ 1/2 ] {x := 1}#
c := c + 1
}
Program 1.1. Geometric distribution generator.

while ( x > 0 ) {
{x := x + 1} [ 1/2 ] {x := x - 1}#
c := c + 1
}
Program 1.2. Left-bounded 1-dimensional random walk.

Example 6 (Verifying Proper Overapproximations). Program 1.2 models a one
dimensional, left-bounded random walk. Given an input (i, j) ∈ N2 , it is evident
that this program can only terminate in an even (if i is even) or odd (if i
is odd) number of steps. This information can be encoded into the following
superinvariantlet:
f (X 0 C j ) = C j
f (X i+1 C j ) = C j ·

and
(

C
1−C 2 ,
1
1−C 2 ,

if i is odd;
if i is even.

It is straightforward to verify that f is a proper superinvariantlet (proper because
C
3
5
1−C 2 = C + C + C + . . . is not a PGF) and hence f properly overapproximates
the loop semantics. Another superinvariantlet for Program 1.2 is given by
 √

 1− 1−C 2 i
, if i ≥ 1;
i j
j
C
h(X C ) = C ·
1,
if i = 0.
Given that the program terminates almost-surely [11] and that h is a superinvariantlet yielding only PGFs, it follows that the extension of h is exactly the
semantics of Program 1.2. An alternative derivation of this formula for the case
h(X) can be found, e.g., in [12].
For both f and h, we were able to prove that they are indeed superinvariantlets automatically, using the computer algebra library SymPy [20]. The code
is included in Appendix B (Program 1.5).
△

Example 7 (Proving Non-almost-sure Termination). In Program 1.3, the branching probability of the choice statement depends on the value of a program variable. This notation is just syntactic sugar, as this behavior can be mimicked by
loop constructs together with coin flips [3, pp. 115f].

Generating Functions for Probabilistic Programs

13

while ( x > 0 ) {
{x := x - 1} [ 1/x ] {x := x + 1}
}
Program 1.3. A non-almost-surely terminating loop.

while ( x < 1 and t < 2 ) {
if ( t = 0 ) {
{x := 1} [ a ] {t := 1}# c := c + 1
} else {
{x := 1} [ b ] {t := 0}# d := d + 1
}
}
Program 1.4. Dueling cowboys.

To prove that Program 1.3 does not terminate almost-surely, we consider the
following superinvariantlet:
f (X i ) = 1 −

i−2
1 X 1
·
,
e n=0 n!

where e = 2.71828 . . . is Euler’s number.

Again, the superinvariantlet property was verified automatically, here
 using2Math1
1
ematica [13]. Now, consider for instance f (X 3 ) = 1 − 1e · 0!
+ 1!
= 1 − e < 1.
This proves, that the program terminates on X 3 with a probability strictly
smaller than 1, witnessing that the program is not almost surely terminating. △
5.2

Rational PGFs

In several of the examples from the previous sections, we considered PGFs which
were rational functions, that is, fractions of two polynomials. Since those are a
particularly simple class of PGFs, it is natural to ask which programs have
rational semantics. In this section, we present a semantic characterization of a
class of while-loops whose output distribution is a (multivariate) discrete phasetype dsitribution [21,22]. This implies that the resulting PGF of such programs
is an effectively computable rational function for any given input state. Let us
illustrate this by an example.
Example 8 (Dueling Cowboys). Program 1.4 models two dueling cowboys [18].
The hit chance of the first cowboy is a percent and the hit chance of the second
cowboy is b percent, where a, b ∈ [0, 1].6 The cowboys shoot at each other in
turns, as indicated by the variable t, until one of them gets hit (x is set to 1).
6

These are not program variables.

14

L. Klinkenberg et al.

The variable c counts the number of shots of the first cowboy and d those of the
second cowboy.
We observe that Program 1.4 is somewhat independent of the value of c, in
the sense that moving the statement c := c + 1 to either immediately before
or after the loop, yields an equivalent program. In our notation, this is expressed
as JW K (C · H) = C · JW K (H) for all PGFs H. By symmetry, the same applies
to variable d. Unfolding the loop once on input 1, yields
JW K (1) = (1 − a)C · JW K (T ) + aCX .
A similar equation for JW K (T ) involving JW K (1) on its right-hand side holds.
This way we obtain a system of two linear equations, although the program itself
is infinite-state. The linear equation system has a unique solution JW K (1) in the
field of rational functions over the variables C, D, T , and X which is the PGF
G :=

aCX + (1 − a)bCDT X
.
1 − (1 − b)(1 − a)CD

From G we can easily read off the following: The probability that the first cowboy
a
, and the expected total number of
wins (x = 1 and t = 0) equals 1−(1−a)(1−b)
1
∂
G(1) = a+b−ab
. Notice that this quantity equals
shots of the first cowboy is ∂C
∞ if a and b are both zero, i.e. if both cowboys have zero hit chance.
If we write GV for the PGF obtained by substituting all but the variables in
V with 1, then we moreover see that GC · GD 6= GC,D . This means that C and
D (as random variables) are stochastically dependent.
△

The distribution encoded in the PGF JW K (1) is a discrete phase-type distribution. Such distributions are defined as follows: A Markov reward chain is a
Markov chain where each state is augmented with a reward vector in Nk . By definition, a (discrete) distribution on Nk is of phase-type iff it is the distribution of
the total accumulated reward vector until absorption in a Markov reward chain
with a single absorbing state and a finite number of transient states. In fact,
Program 1.4 can be described as a Markov reward chain with two states (X 0 T 0
and X 0 T 1 ) and 2-dimensional reward vectors corresponding to the “counters”
(c, d): the reward in state X 0 T 0 is (1, 0) and (0, 1) in the other state.
Each pGCL program describes a Markov reward chain [10]. It is not clear which
(non-trivial) syntactical restrictions to impose to guarantee for such chains to be
finite. In the remainder of this section, we give a characterization of while-loops
that are equivalent to finite Markov reward chains. The idea of our criterion is
that each variable has to fall into one of the following two categories:
Definition 9 (Homogeneous and Bounded Variables). Let P ∈ pGCL be a
program, B be a guard and xi be a program variable. Then:
– xi is called homogeneous for P if JP K (Xi ·G) = Xi ·JP K (G) for all G ∈ PGF.
– xi is called bounded by B if the set {σi | σ ∈ B} is finite.

Generating Functions for Probabilistic Programs

15

Intuitively, homogeneity of xi means that it does not matter whether one increments the variable before or after the execution of P . Thus, a homogeneous
variable behaves like an increment-only counter even if this may not be explicit
in the syntax. In Example 8, the variables c and d in Program 1.4 are homogeneous (for both the loop-body and the loop itself). Moreover, x and t are clearly
bounded by the loop guard. We can now state our characterization.
Definition 10 (HB Loops). A loop while (B) {P } is called homogeneousbounded (HB) if for all program states σ ∈ B, the PGF JP K (Xσ ) is a polynomial
and for all program variables x it either holds that
– x is homogeneous for P and the guard B is independent of x, or that
– x is bounded by the guard B.
In an HB loop, all the possible valuations of the bounded variables satisfying B
span the finite transient state space of a Markov reward chain in which the
dimension of the reward vectors equals the number of homogeneous variables.
The additional condition that JP K (Xσ ) is a polynomial ensures that there is
only a finite amount of terminal (absorbing) states. Thus, we have the following:
Proposition 1. Let W be a while-loop. Then JW K (Xσ ) is the (rational) PGF
of a multivariate discrete phase-type distribution if and only if W is equivalent
to an HB loop that almost-surely terminates on input σ.
To conclude, we remark that there are various simple syntactic conditions for
HB loops: For example, if P is loop-free, then JP K (Xσ ) is always a polynomial.
Similarly, if x only appears in assignments of the form x := x + k, k ≥ 0, then
x is homogeneous. Such updates of variables are e.g. essential in constant probability programs [?]. The crucial point is that such conditions are only sufficient
but not necessary. Our semantic conditions thus capture the essence of phasetype distribution semantics more adequately while still being reasonably simple
(albeit — being non-trivial semantic properties — undecidable in general).

6

Conclusion

We have presented a denotational distribution transformer semantics for probabilistic while-programs where the denotations are generation functions (GFs).
Moreover, we have provided a simple invariant-style technique to prove that
a given GF overapproximates the program’s semantics and identified a class
of (possibly infinite-state) programs whose semantics is a rational GF ecoding a
phase-type distribution. Directions for future work include the (semi-)automated
synthesis of invariants and the development of notions on how precise overapproximations by invariants actually are.


A

Proofs of Section 4

A.1

Proofs of Section 4.1

Lemma 1 (Completeness of  on FPS). (FPS, ) is a complete latttice.
Proof. We start by showing that (FPS, ) is a partial order. Let F, G, H ∈ FPS,
σ ∈ Nk . For reflexivity, consider the following:
G  G

∀σ ∈ Nk : [σ]G ≤ [σ]G

iff
iff

true .

For antisymmetry, consider the following:

implies
implies
implies

G  H and H  G

∀σ ∈ Nk : [σ]G ≤ [σ]H

and

[σ]H ≤ [σ]G

and

[σ]H ≤ [σ]F

k

∀σ ∈ N : [σ]G = [σ]H
G = H .

For transitivity, consider the following:

implies
implies
implies

G  H
k

and H  F

∀σ ∈ N : [σ]G ≤ [σ]H
k

∀σ ∈ N : [σ]G ≤ [σ]F
G  F .

Next, we show that every set S ⊆ FPS has a supremum
X
sup [σ]F Xσ
sup S =
σ∈Nk

F ∈S

P
in FPS. In particular, notice that sup ∅ = σ∈Nk 0 · Xσ . The fact that sup S ∈
k
FPS is trivial since supF ∈S [σ]F ∈ R∞
≥0 for every σ ∈ N . Furthermore, the fact
that sup S is an upper bound on S is immediate since  is defined coefficientwise. Finally, sup S is also the least upper bound, since, by definition of , we
have [σ]sup S = supF ∈S [σ]F .
The following proofs rely on the Monotone Sequence Theorem (MST), which
we recall here: If (an )n∈N is a monotonically increasing sequence in R∞
≥0 , then
supn an = limn→∞ an . In particular, if (an )n∈N and (bn )n∈N are monotonically
increasing sequences in R∞
≥0 , then
sup an + sup bn =
n

n

lim an + lim bn =

n→∞

n→∞

lim an + bn = sup an + bn .

n→∞

n

Generating Functions for Probabilistic Programs

19

Theorem 1 (ω-continuity of pGCL Semantics). The semantic functional J · K
is ω-continuous, i.e. for all programs P ∈ pGCL and all increasing ω-chains
F1  F2  . . . in FPS,
JP K


sup Fn
n∈N


= sup JP K (Fn ) .
n∈N

Proof. By induction on the structure of P . Let S = {F1 , F2 , . . .} be an increasing
ω-chain in FPS. First, we consider the base cases.
The case P = skip. We have
JP K (sup S) = sup S = sup {F } = sup {JP K (F )} .
F ∈S

F ∈S

P
The case P = xi := E. Let sup S = Ĝ = σ∈Nk [σ]Ĝ · X σ , where for each σ ∈ Nk
we have [σ]Ĝ = supF ∈S [σ]F . We calculate
JP K (sup S)

= JP K Ĝ


X
= JP K 
[σ]Ĝ · X σ 
σ∈Nk





= JP K 

X

[σ]Ĝ · X1σ1 · · · Xiσi · · · Xkσk 

= JP K 

X

[σ]Ĝ · X1σ1 · · · Xiσi · · · Xkσk 

σ∈Nk



σ∈Nk

=

X

σ∈Nk

=

= sup
F ∈S

sup [σ]F

F ∈S

X

· · · Xkσk

evalσ (E)

· X1σ1 · · · Xi

evalσ (E)

σ∈Nk



X

σ∈Nk

= sup JP K (F )
F ∈S


[σ]F · X1σ1 · · · Xi

= sup JP K 
F ∈S

evalσ (E)

[σ]Ĝ · X1σ1 · · · Xi

X

σ∈Nk



· · · Xkσk

· · · Xkσk

(sup on FPS is defined coefficient–wise)


[σ]F · X1σ1 · · · Xiσi · · · Xkσk 

20

L. Klinkenberg et al.

As the induction hypothesis now assume that for some arbitrary, but fixed,
programs P1 , P2 and all increasing ω-chains S1 , S2 in FPS it holds that both
JP1 K (sup S1 ) =

sup JP1 K (F )

and

F ∈S1

JP2 K (sup S2 ) =

sup JP2 K (F ) .

F ∈S2

We continue with the induction step.
The case P = {P1 } [p] {P2 }. We have
JP K (sup S)
= p · JP1 K (sup S) + (1 − p) · JP2 K (sup S)


= p · sup JP1 K (F ) + (1 − p) · sup JP2 K (F )
F ∈S
F ∈S


=
sup p · JP1 K (F ) + sup (1 − p) · JP2 K (F )
F ∈S

(I.H. on P1 and P2 )

F ∈S

(scalar multiplication is defined point–wise)

= sup (p · JP1 K (F ) + (1 − p) · JP2 K (F ))

(apply MST coefficient–wise.)

F ∈S

= sup J{P1 } [p] {P2 }K (F )
F ∈S

= sup JP K (F ) .
F ∈S

The case P = if (B) {P1 } else {P2 }. We have
JP K (sup S)
= JP1 K (hsup SiB ) + JP2 K (hsup Si¬B )


= JP1 K sup hF iB
+ JP2 K sup hF i¬B
F ∈S

F ∈S

(restriction defined coefficient–wise)

= sup JP1 K (hF iB ) + sup JP2 K (hF i¬B )
F ∈S

F ∈S

= sup (JP1 K (hF iB ) + JP2 K (hF i¬B ))

(I.H. on P1 and P2 )
(apply MST coefficient–wise)

F ∈S

= sup Jif (B) {P1 } else {P2 }K (F )
F ∈S

= sup JP K (F ) .
F ∈S

The case P = while(B){P1 }. Recall that for every G ∈ FPS,
JP K (G) = (lfp ΦB,P1 ) (G)

= sup ΦnB,P1 (0) (G) .
n∈N

Hence, it suffices to show that


n
n
sup ΦB,P1 (0) (sup S) = sup
sup ΦB,P1 (0) (F ) .
n∈N

F ∈S

n∈N

Generating Functions for Probabilistic Programs

21

Assume for the moment that for every n ∈ N and all increasing ω-chains S in
FPS,


ΦnB,P1 (0) (sup S) = sup ΦnB,P1 (0) (F ) .
(1)
F ∈S

We then have


sup ΦnB,P1 (0) (sup S)
n∈N

= sup ΦnB,P1 (0) (sup S)
n∈N

= sup sup ΦnB,P1 (0) (F )
n∈N F ∈S

= sup sup ΦnB,P1 (0) (F )
F ∈S n∈N


= sup
,
sup ΦnB,P1 (0) (F )
F ∈S

(sup for ΦB,P1 is defined point–wise)
(Equation 1)
(swap suprema)
(sup for ΦB,P1 is defined point–wise)

n∈N

which is what we have to show. It remains to prove Equation 1 by induction on n.
Base case n = 0. We have


Φ0B,P1 (0) (sup S) = sup S = sup F = sup Φ0B,P1 (0) (F ) .
F ∈S

F ∈S

Induction step. We have


Φn+1
B,P1 (0) (sup S)

= ΦB,P1 ΦnB,P1 (0) (sup S)

= hsup Si¬B + ΦnB,P1 (0) (JP1 K (hsup SiB ))


= hsup Si¬B + ΦnB,P1 (0) sup JP1 K (hF iB )
F ∈S

= hsup Si¬B + sup ΦnB,P1 (0) (JP1 K (hF iB ))
F ∈S

= sup hF i¬B + ΦnB,P1 (0) (JP1 K (hF iB ))
F ∈S


= sup Φn+1
B,P1 (0) (F ) .
F ∈S

(Def. of ΦB,P1 )
(I.H. on P1 )
(I.H. on n)
(apply MST)
(Def. of ΦB,P1 )

This completes the proof.
Theorem 2 (Well-definedness of FPS Semantics). The semantics functional J · K is well-defined, i.e. the semantics of any loop while (B) {P } exists
uniquely and can be written as
Jwhile (B) {P }K = lfp ΦB,P = sup ΦnB,P (0) .
n∈N

22

L. Klinkenberg et al.

Proof. First, we show that the unfolding operator ΦB,P is ω-continuous. For
that, let f1 ⊑ f2 ⊑ . . . be an ω-chain in FPS → FPS. Then,
ΦB,P


sup{fn }

n∈N


= λG. hGi¬B + sup{fn } (JP K (hGiB ))
n∈N

= λG. hGi¬B + sup{fn (JP K (hGiB ))}
n∈N

(sup on FPS → FPS is defined point–wise)

= sup {λG. hGi¬B + fn (JP K (hGiB ))}
n∈N

(apply monotone sequence theorem coefficient-wise)
= sup {ΦB,P (fn )} .

(Def. of ΦB,P )

n∈N

Since ΦB,P is ω-continuous and (FPS → FPS, ⊑) forms a complete lattice (Lemma 1),
we get by the Kleene fixed point Theorem [17] that ΦB,P has a unique least fixed
point given by supn∈N ΦnB,P (0).
Theorem 3 (Mass Conservation). For every P ∈ pGCL and F ∈ FPS, we
have JP K (F ) ≤ |F |.
Proof. By induction on the structure of P . For the loop–free cases, this is
straightforward. For the case P = while(B){P1 }, we proceed as follows. For
every r ∈ R∞
≥0 , we define the set
FPSr = {F ∈ FPS | |F | ≤ r}
of all FPSs whose mass is at most r. First, we define the restricted unfolding
operator
ΦB,P1 ,r : (FPSr → FPSr ) → (FPSr → FPSr ),

ψ 7→ ΦB,P1 (ψ) .

Our induction hypothesis on P1 implies that ΦB,P1 ,r is well–defined.
It is now only left to show that (FPSr , ) is an ω-complete partial order,
because then ΦB,P1 ,r has a least fixed point in FPSr for every r ∈ R∞
≥0 . The
theorem then follows by letting r = |G|, because
(lfp ΦB,P1 ) (G) =


lfp ΦB,P1 ,|G| (G)

implies

|(lfp ΦB,P1 ) (G)| ≤ |G| .

(FPSr , ) is an ω-complete partial order. The fact that (FPSr , ) is a partial
order is immediate. It remains to show ω-completeness. For that, let f1  f2
. . . be an ω-chain in FPSr . We have to show that supn Fn ∈ FPSr , which is the
case if and only if
X
sup fn =
sup [σ]fn ≤ r .
n

σ∈Nk

n

Generating Functions for Probabilistic Programs

23

Now let g : N → Nk be some bijection from N to Nk . We have
X

=

sup [σ]fn

σ∈Nk
∞
X

n

(series converges absolutely)

sup [g(i)]fn

i=0

= sup
N

n

N
X
i=0

= sup sup
N

n

= sup sup
n

N

sup [g(i)]fn (rewrite infinite series as supremum of partial sums)
n

N
X

[g(i)]fn

(apply monotone sequence theorem)

[g(i)]fn

(swap suprema)

i=0

N
X
i=0

P
Now observe that supN N
i=0 [g(i)]fn = |fn |, which is a monotonically increasing
sequence in n. Moreover, since fn ∈ FPSr , this sequence is bounded from above
by r. Hence, the least upper bound supn |fn | of the sequence |fn | is no larger
than r, too. This completes the proof.

A.2

Proofs of Section 4.2

Lemma 3 (Representation of JwhileK). Let W = while (B) {P } be a pGCL
program. An alternative representation is:

JW K = λG.

∞
X
i=0

ϕi (G)

¬B

,

where ϕ(G) = JP K (hGiB ).

Proof. First we show by induction, that ΦnB,P (0)(G) =
Base case. We have

Φ0B,P (0)(G) = 0 =

−1
X
i=0

ϕi (G)

¬B

Pn−1
i=0

.

ϕi (G)

¬B

.

24

L. Klinkenberg et al.

Induction step. We have

n
Φn+1
B,P (0)(G) = ΦB,P ΦB,P (0)(G)
= hGi¬B + ΦnB,P (0)(JP K hGiB )
= hGi¬B + ΦnB,P (0)(ϕ(G))
n−1
X

= hGi¬B +

i=0
n
X

= hGi¬B +
=

n
X

ϕi (G)

JW K (G)

= sup ΦnB,P (0) (G)
n∈N

= sup ΦnB,P (0)(G)
n∈N
)
( n
X
i
= sup
ϕ (G) ¬B
n∈N

=

∞
X
i=0

ϕi

i=1

i=0

Overall, we thus get

ϕi+1

¬B

¬B

¬B

.

(sup on FPS → FPS is defined point–wise)
(see above)

i=0

ϕi (G)

¬B

Theorem 4 (Linearity of pGCL Semantics). For every program P and guard B,
the functions h · iB and JP K are linear. Moreover, the unfolding operator ΦB,P
maps linear transformers onto linear transformers.
Proof. Linearity of h·iB . We have
*
ha · G + F iB =
=
=

a·

*

σ∈Nk

X

X

σ∈B

=a·

µσ X σ +

(a · µσ + νσ ) X σ

(a · µσ + νσ )X
a · µσ X σ

X

X

µσ X

νσ X σ

σ∈Nk

σ∈Nk

X

σ∈B

=

X

σ

+

+

+

B

σ

X

νσ X σ

σ∈B

+

σ∈B

= a · hGiB + hF iB

X

σ∈B

νσ X σ

B

Generating Functions for Probabilistic Programs

25

Linearity of JP K. By induction on the structure of P . First, we consider the
base cases.
The case P = skip. We have
JskipK (r · F + G) = r · F + G = r · JskipK (F ) + JskipK (G)
The case P = xi := E
JXi := EK (r · F + G)
X
eval (E)
· · · Xkσk
[σ]r·F +G X1σ1 · · · Xi σ
=
σ∈Nk

=

X

σ∈Nk

= r·

= r·

evalσ (E)

· · · Xkσk

evalσ (E)

· · · Xkσk


eval (E)
· · · Xi σ

· · · Xkσk


(r · [σ]F + [σ]G ) · X1σ1 · · · Xi

X

[σ]F · X1σ1 · · · Xi

X

X1σ1

σ∈Nk

σ∈Nk

+

[σ]F ·

X

σ∈Nk

evalσ (E)

[σ]G · X1σ1 · · · Xi

= r · JP K (F ) + JP K (G) .

(+ and · defined coefficient–wise)


eval (E)
· · · Xkσk
+ [σ]G · X1σ1 · · · Xi σ
(+ and · defined coefficient–wise)

· · · Xkσk


(+ and · defined coefficient–wise)

Next, we consider the induction step.
The case P = P1 ; P2 . We have
JP1 ; P2 K (r · F + G)
= JP2 K (JP1 K (r · F + G))

= JP2 K (r · JP1 K (F ) + JP1 K (G))
= r · JP2 K (JP1 K (F )) + JP2 K (JP1 K (G)) .

(I.H. on P1 )
(I.H. on P2 )

The case P = if (B) {P1 } else {P2 }. We have
Jif (B) {P1 } else {P2 }K (r · F + G)

= hJP1 K (r · F + G)iB + hJP2 K (r · F + G)i¬B
= hr · JP1 K (F ) + JP1 K (G))iB + hr · JP2 K (F ) + JP2 K (G)i¬B
(I.H. on P1 and P2 )

= r · hJP1 K (F )iB + hJP2 K (F )i¬B + hJP1 K (G)iB + hJP2 K (G)i¬B
(linearity of h·iB and h·i¬B )
= r · Jif (B) {P1 } else {P2 }K (F ) + Jif (B) {P1 } else {P2 }K (G)

26

L. Klinkenberg et al.

The case P = {P1 } [p] {P2 }.
J{P1 } [p] {P2 }K (r · F + G)

= p · JP1 K (r · F + G) + (1 − p) · JP2 K (r · F + G)

= p · (r · JP1 K (F ) + JP1 K (G)) + (1 − p) · (r · JP2 K (F ) + JP2 K (G))
(I.H. on P1 and P2 )
= r · (p · JP1 K (F ) + (1 − p) · JP2 K (F )) + p · JP1 K (G) + (1 − p) · JP2 K (G)
(reorder terms)

= r · J{P1 } [p] {P2 }K (F ) + J{P1 } [p] {P2 }K
The case P = while (B) {P1 }.
Jwhile (B) {P1 }K (r · F + G)

= sup ΦnB,P1 (0) (r · F + G)
n∈N

= sup ΦnB,P1 (0)(r · F + G)
(sup on FPS → FPS defined point–wise)
n∈N

= sup r · ΦnB,P1 (0)(F ) + ΦnB,P1 (0)(G)
n∈N

(by straightforward induction on n using I.H. on P1 )
 n

= r · sup ΦB,P1 (0)(F ) + sup ΦnB,P1 (0)(G)
n∈N

n∈N

(apply monotone sequence theorem coefficient–wise)

= r · Jwhile (B) {P1 }K (F ) + Jwhile (B) {P1 }K (G)
Linearity of ΦB,P (f ) for linear f .
 *

+
X
X
ΦB,P (f ) 
µσ X σ  =
µσ X σ
σ∈Nk

σ∈Nk

=

=

=

*

*

X

σ∈Nk

X

σ∈Nk

X

σ∈Nk

=

X

σ∈Nk

=

µσ X σ

X

σ∈Nk

µσ X σ

+

+

¬B

¬B



*
+ 
X

+ f JP K 
µσ X σ


+f
+

¬B

σ∈Nk

X

σ∈Nk

X

σ∈Nk

B



µσ JP K (hX σ iB )

(1. & 2.)

µσ · f (JP K (hX σ iB ))
(f lin.)

µσ hX σ i¬B + µσ · f (JP K (hX σ iB ))
µσ · hX σ i¬B + f (JP K (hX σ iB ))
µσ · ΦB,P (f )(X σ )


Generating Functions for Probabilistic Programs

27

Lemma 2 (Loop Unrolling). For any FPS F ,
Jwhile (B) {P }K (F ) = hF i¬B + Jwhile (B) {P }K JP K hF iB
Proof. Let W, W ′ be as described in Lemma 2.


.

JW K (G) = (lfp ΦB,P ) (G)
= ΦB,P (lfp ΦB,P ) (G)
= hGi¬B + (lfp ΦB,P ) JP K hGiB


= Jif (B) {P ; W } else {skip}K (G)
= JW ′ K (G)

A.3

Proofs of Section 4.3

Lemma 4. The mapping τ is a bijection. The inverse τ −1 of τ is given by
X
µ ({σ}) · Xσ
τ −1 : M → FPS, µ 7→
σ∈Nk

Proof. We show this by showing τ −1 ◦ τ = id and τ ◦ τ −1 = id.


!
X
X
−1
−1
σ

λN .
ασ
=τ
τ ◦τ
ασ X
σ∈N

σ∈Nk

=

X X

σ∈Nk s∈{σ}



τ ◦ τ −1 (µ) = τ 

X

σ∈Nk

ασ · Xσ =

X

ασ Xσ

σ∈Nk



µ({σ}) · Xσ  = λN .

X

µ({σ}) = µ(N ) = µ

σ∈N

Lemma 5. The mappings τ and τ −1 are monotone linear maps.
Proof. First, we show that τ −1 is linear (and hence τ , due to bijectivity):
X
(µ + ν)({σ}) · Xσ
τ −1 (µ + ν) =
σ∈Nk

=

X

(µ({σ}) + ν({σ})) · Xσ

X

(µ({σ}) · Xσ + ν({σ}) · Xσ )

σ∈Nk

=

σ∈Nk



=

X

σ∈Nk

(as M forms a vector space with standard +)





µ({σ}) · Xσ  + 

X

σ∈Nk



ν({σ}) · Xσ  = τ −1 (µ) + τ −1 (ν)

28

L. Klinkenberg et al.

Second, we show that τ is monotone:
Assume Gµ ⊑ Gµ′ .



τ (Gµ ) = τ 

X

σ∈Nk

≤ λS.

X

µ({σ}) · X
µ′ ({σ})



σ

= λS.

X

µ({σ})

σ∈S

σ∈S



=τ

X

σ∈Nk

(as µ({σ}) ≤ µ′ ({σ}) per definition of ⊑)

µ′ ({σ}) · Xσ  = τ (Gµ′ )

Third, we show that τ −1 is monotone:
Assume µ ⊑ µ′ .

τ −1 (µ) =

X

σ∈Nk

⊑

X

σ∈Nk

µ({σ}) · Xσ
µ′ ({σ}) · Xσ (as µ({σ}) ≤ µ′ ({σ}) per definition of ⊑)

= τ −1 (µ′ )
Lemma 6. Let f : (P, ≤) → (Q, ≤) be a monotone isomorphism for any partially ordered sets P and Q. Then,
f ∗ : Hom(P, P ) → Hom(Q, Q),

φ 7→ f ◦ ϕ ◦ f −1

is also a monotone isomorphism.
Proof. Let f be such a monotone isomorphism, and f ∗ the corresponding lifting.
First, we note that f ∗ is also bijective. Its inverse is given by (f ∗ )−1 = (f −1 )∗ .
Second, f ∗ is monotone, as shown in the following calculation.
f ≤ g =⇒ ∀x.

=⇒ ∀x.
=⇒ ∀x.

f (x) ≤ g(x)


τ ◦ f τ −1 ◦ τ (x) ≤ τ ◦ g τ −1 ◦ τ (x)
τ ∗ ◦ f (τ (x)) ≤ τ ∗ ◦ g (τ (x))

=⇒ ∀y. τ ∗ ◦ f (y) ≤ τ ∗ ◦ g(y)
=⇒ τ ∗ (f ) ≤ τ ∗ (g)

Lemma 7. Let P, Q be complete lattices, and τ a monotone isomorphism. Also
let lfp be the least fixed point operator. Then the following diagram commutes.

Generating Functions for Probabilistic Programs

Hom(P, P )

τ∗

lfp

29

Hom(Q, Q)
lfp

P

τ

Q

Proof. Let ϕ ∈ Hom(P, P ) be arbitrary.

lfp ϕ = inf p ϕ(p) = p


τ (lfp ϕ) = τ inf p ϕ(p) = p

= inf τ (p) ϕ(p) = p

= inf τ (p) ϕ(τ −1 ◦ τ (p)) = τ −1 ◦ τ (p)

= inf τ (p) τ ◦ ϕ(τ −1 ◦ τ (p)) = τ (p)

= inf q τ ◦ ϕ(τ −1 (q)) = q

= inf q τ ∗ (ϕ)(q) = q
= lfp τ ∗ (ϕ)
Definition 11. Let T be the program translation from pGCL to a modified Kozen
syntax, defined inductively:
T(skip) = skip
T(xi := E) = xi := fE (x1 , . . . , xk )
T({P } [p] {Q}) = {T(P )} [p] {T(Q)}
T(P # Q) = T(P ); T(Q)
T(if (B) {P } else {Q}) = if B then T(P ) else T(Q) fi
T(while (B) {P }) = while B do T(P ) od ,
where p is a probability, k = |Var(P )|, B is a Boolean expression and P, Q are
pGCL programs. The extended construct skip as well as {P } [p] {Q} is only syntactic sugar and can be simulated by the original Kozen semantics. The intended
semantics of these constructs are
and

[skip] = id
[{P } [p] {Q}] = p · T(P ) + (1 − p) · T(Q).

Lemma 8. For all guards B, the following identity holds: eB ◦ τ = τ ◦ h·iB .
P
Proof. For all Gµ = σ∈Nk µ ({σ}) · Xσ ∈ FPS:
eB ◦ τ (Gµ ) = eB (µ)

= λS. µ(S ∩ B)


X
X
τ ◦ hGµ iB = τ 
µ ({σ}) · Xσ +
0 · Xσ 
σ∈B

= λS. µ(S ∩ B)

σ6∈B

30

=⇒

L. Klinkenberg et al.

∀Gµ ∈ FPS. eB ◦ τ (Gµ ) = τ ◦ hGµ iB

Theorem 5. The FPS semantics of pGCL is an instance of Kozen’s semantics,
i.e. for all pGCL programs P , we have
τ ◦ JP K = T(P ) ◦ τ .
Proof. The proof is done via induction on the program structure. We omit the
loop-free cases, as they are straightforward.
By definition, T(while (B) {P }) = while B do P od. Hence, the corresponding Kozen semantics is equal to lfp TB,P , where
T : (M → M) → (M → M),

S 7→ eB̄ + (S ◦ P ◦ eB ) .

First, we show that τ −∗ ◦ TB,P ◦ τ ∗ = ΦB,P , where τ ∗ is the canonical lifting
of τ , i.e., τ ∗ (S) = τ ◦ S ◦ τ −1 for all S ∈ (FPS → FPS).

 −∗
τ ◦ TB,P ◦ τ ∗ (S) = τ −∗ ◦ TB,P ◦ τ ◦ S ◦ τ −1

= τ −∗ eB̄ + τ ◦ S ◦ τ −1 ◦ P ◦ eB
= τ −1 ◦ eB̄ ◦ τ + τ −1 ◦ τ ◦ S ◦ τ −1 ◦ P ◦ eB ◦ τ

= τ −1 ◦ eB̄ ◦ τ + S ◦ τ −1 ◦ P ◦ eB ◦ τ

= τ −1 ◦ τ ◦ h·iB̄ + S ◦ τ −1 ◦ P ◦ τ ◦ h·iB

= h·iB̄ + S ◦ τ −1 ◦ τ ◦ JP K ◦ h·iB
(Using I.H. on P ◦ τ )

= h·iB̄ + S ◦ JP K ◦ h·iB

= ΦB,P (S)

Having this equality at hand, we can easily proof the correspondence of our
while semantics to the one defined by Kozen in the following manner:
τ ◦ Jwhile (B) {P }K = T(while (B) {P }) ◦ τ

⇔ τ ◦ lfp ΦB,P = lfp TB,P ◦ τ

⇔ lfp ΦB,P = τ −1 ◦ lfp TB,P ◦ τ

⇔ lfp ΦB,P = τ −∗ (lfp TB,P )

⇔ lfp ΦB,P = lfp τ −∗ ◦ TB,P ◦ τ
⇔ lfp ΦB,P = lfp ΦB,P

B

(Definition of τ ∗ )

∗

(cf. Lemma 7)

Proofs of Section 5

Theorem 6 (Superinvariants and Loop Overapproximations). Let ΦB,P
be the unfolding operator of while(B){P } (cf. Def. 7) and ψ : FPS → FPS. Then
ΦB,P (ψ) ⊑ ψ

implies

Jwhile (B) {P }K ⊑ ψ .

Generating Functions for Probabilistic Programs

31

Proof. Instance of Park’s Lemma [?].

Corollary 2. Given a function f : Mon (X) → FPS, let the linear extension fˆ
of f be defined by
fˆ:

FPS → FPS,

F 7→

X

[σ]F f (Xσ ) .

σ∈Nk

Let ΦB,P be the unfolding operator of while (B) {P }. Then
∀ σ ∈ Nk :

ΦB,P (fˆ)(Xσ ) ⊑ fˆ(Xσ )

implies

Jwhile (B) {P }K ⊑ fˆ .

Proof. Let G ∈ FPS be arbitrary.
ΦB,P (fˆ)(G) =

X

[σ]G ΦB,P (fˆ)(Xσ )

(By Theorem 4)

[σ]G f (Xσ )

(By assumption)

σ∈Nk

⊑

=⇒

X

σ∈Nk

= fˆ(G)
JW K ⊑ fˆ

(By Theorem 6)

Proof of Example 6.


ΦB,P fˆ X i C j =
X iC j

i=0


1
+ fˆ
X iC j
2

·
i>0

case i = 0 : ⇒ (C j + fˆ(0)) = C j = f (X 0 C j )

C  ˆ i−1 j
f (X C ) + fˆ X i+1 C j
case i > 0 : ⇒
2 (
1
i even
2,
= C j · 1−C
= f (X i C j )
C
,
i odd
1−C 2

=⇒ ΦB,P fˆ (X i C j ) ⊑ f (X i C j ).

C
1
+
X iC j
X
2

· XC
i>0


32

L. Klinkenberg et al.

Thus fˆ is a superinvariant.
i

j

ΦB,P (ĥ)(X C ) =

i

X C

j
i=0

+ ĥ


1
X iC j
2

C
1
·
X iC j
+
i>0 X
2

i>0

· XC

case i = 0 : ⇒ (C j + ĥ(0)) = 1 = h(X 0 C j )


C
case i > 0 : ⇒
ĥ X i−1 C j + ĥ X i+1 C j
2

!i−1
!i+1 
√
√
C j+1  1 − 1 − C 2
1 − 1 − C2

=
+
·
2
C
C
j

= C ·

1−

!i
√
1 − C2
= h(X i C j )
C

=⇒ ΦB,P (ĥ)(X i C j ) = h(X i C j )
Thus fˆ is a superinvariant.
Verification Python Script
from sympy import ∗
i nit p rint ing ()
x , c = sy m b o l s ( ’ x , c ’ )
i , j = sy m b o l s ( ’ i , j ’ , i n t e g e r=True )
#d e f i n e t h e h i g h e r o r d e r t r a n s f o r m e r
def Phi ( f ) :
return c /2 ∗ ( f . s u b s ( i , i −1) + f . s u b s ( i , i +1) )
def c o m p u t e d i f f e r e n c e ( f ) :
return ( Phi ( f ) − f ) . s i m p l i f y ( )
#d e f i n e P o l yno m ia l v e r i f y e r
def v e r i f y p o l y ( p o l y ) :
p r i n t ( ” Check c o e f f i c i e n t s f o r non−p o s i t i v i t y : ” )
for c o e f f in poly . c o e f f s ( ) :
i f coeff > 0:
return F a l s e
return True
# a c t u a l v e r i f i c a t i o n method
def v e r i f y ( f ) :
p r i n t ( ” Check i n v a r i a n t : ” )
pprint ( f )
result = compute difference( f )
if result . is zero :
print ( ” I n v a r i a n t i s a f i x p o i n t ! ” )
return True
else :
print ( ” I n v a r i a n t i s not a f i x p o i n t − check
try :
return v e r i f y p o l y ( Poly ( r e s u l t ) )
except P o l i f i c a t i o n F a i l e d :
p r i n t ( ” I n v a r i a n t i s n o t a Poly ! ” )
return F a l s e
except :
p r i n t ( ” Unexpected E r r o r ” )
raise

if

r e m a i n d e r i s Poly ” )


Generating Functions for Probabilistic Programs

33

# d e f i n e t h e l o o p i n v a r i a n t g u e s s ( i != 0) c a s e
f = c ∗∗ j ∗ ( ( c / (1− c ∗ ∗ 2 ) ) ∗ ( i % 2 ) + (1/(1 − c ∗ ∗ 2 ) ) ∗ ( ( i +1) % 2 ) )
# Second i n v a r i a n t :
h = c ∗∗ j ∗ ( ( 1 − s q r t ( 1 − c ∗ ∗ 2 ) ) / c ) ∗∗ i
print ( ” I n v a r i a n t v e r i f i e d ” i f
print ( ” I n v a r i a n t v e r i f i e d ” i f

v e r i f y ( f ) e l s e ”Unknown ” )
v e r i f y ( h ) e l s e ”Unknown ” )

Program 1.5. Python program checking the invariants

Proof of Example 7.
ΦB,P (fˆ)(X i ) =

Xi

i=0

+

1 ˆ
·f
i


Xi

·
i>0

case i = 0 : ⇒ 1 + ∞ · fˆ(0) + − ∞ · fˆ(0)

1
X


+


1
1−
· fˆ X i
i


= 1 + ∞ · 0 + − ∞ · 0 = 1 = f Xi


1 ˆ i−1
1
case i > 0 : ⇒ 0 +
· fˆ X i+1
+ 1−
·f X
i
i
!
!


i−3
i−1
1
1 X 1
1 X 1
1
+ 1−
· 1− ·
· 1− ·
=
i
e n=0 n!
i
e n=0 n!
!
!
i−3
i−1
1 X 1
1
1
1 X 1
+
1− ·
−
−
·
+
=
i
ei n=0 n!
e n=0 n!
i
!


i−1
1 X 1
1
1
1
=
1− ·
+
·
+
e n=0 n!
ei
(i − 2)! (i − 1)!
!
!
i−1
i−2
1
1 X 1
1 X 1
+
=
1− ·
=
1− ·
e n=0 n!
e(i − 1)!
e n=0 n!

= f Xi
=⇒ ΦB,P (fˆ)(X i ) = f (X i )

i>0

·X


i−1
1 X 1
·
ei n=0 n!

Mathematica input query:
Input:
Output:

1
·
k
0

k−3
1 X 1
1− ·
e n=0 n!

!

1
+ (1 − ) ·
k

k−1
1 X 1
1− ·
e n=0 n!

!

−

k−2
1 X 1
1− ·
e n=0 n!

!

!


n-Complete Test Suites for IOCO
Petra van den Bos(B) , Ramon Janssen, and Joshua Moerman
Institute for Computing and Information Sciences,
Radboud University, Nijmegen, The Netherlands
{petra,ramonjanssen,joshua.moerman}@cs.ru.nl

Abstract. An n-complete test suite for automata guarantees to detect
all faulty implementations with a bounded number of states. This
principle is well-known when testing FSMs for equivalence, but the problem becomes harder for ioco conformance on labeled transitions systems.
Existing methods restrict the structure of speciﬁcations and implementations. We eliminate those restrictions, using only the number of implementation states, and fairness in test execution. We provide a formalization,
a construction and a correctness proof for n-complete test suites for ioco.

1

Introduction

The holy grail of model-based testing is a complete test suite: a test suite that
can detect any possible faulty implementation. For black-box testing, this is
impossible: a tester can only make a ﬁnite number of observations, but for an
implementation of unknown size, it is unclear when to stop. Often, a so called
n-complete test suite is used to tackle this problem, meaning it is complete for
all implementations with at most n states.
For speciﬁcations modeled as ﬁnite state machines (FSMs) (also called Mealy
machines), this has already been investigated extensively. In this paper we
will explore how an n-complete test suite can be constructed for suspension
automata. We use the ioco relation [11] instead of equivalence of FSMs.
An n-complete test suite for FSM equivalence usually provides some way
to reach all states and transitions of the implementation. After reaching some
state, it is tested whether this is the correct state, by observing behavior which
is unique for that state, and hence distinguishing it from all other states.
Unlike FSM equivalence, ioco is not an equivalence relation, meaning that different implementations may conform to the same speciﬁcation and, conversely,
an implementation may conform to diﬀerent speciﬁcations. In this paper, we
focus on the problem of distinguishing states. For ioco, this cannot be done with
simple identiﬁcation. If an implementation state conforms to multiple speciﬁcations states, those states are deﬁned to be compatible. Incompatible states can
be handled in ways comparable to FSM-methods, but distinguishing compatible
states requires more eﬀort.
P. van den Bos and R. Janssen—Supported by NWO project 13859 (SUMBAT).
c IFIP International Federation for Information Processing 2017

Published by Springer International Publishing AG 2017. All Rights Reserved
N. Yevtushenko et al. (Eds.): ICTSS 2017, LNCS 10533, pp. 91–107, 2017.
DOI: 10.1007/978-3-319-67549-7 6

92

P. van den Bos et al.

In this paper, we give a structured approach for distinguishing incompatible
states. We also propose a strategy to handle compatible states. Obviously, they
cannot be distinguished in the sense of incompatible states. We thus change the
aim of distinguishing: instead of forcing a non-conformance to either speciﬁcation
state, we may also prove conformance to both. As our only tool in proving this
is by further testing, this is a recursive problem: during complete testing, we are
required to prove conformance to multiple states by testing. We thus introduce a
recursively deﬁned test suite. We give examples where this still gives a ﬁnite test
suite, together with a completeness proof for this approach. To show an upper
bound for the required size of a test suite, we also show that an n-complete test
suite with ﬁnite size can always be constructed, albeit an ineﬃcient one.
Related Work. Testing methods for Finite State Machines (FSMs) have been
analyzed thoroughly, and n-complete test suites are already known for quite
a while. A survey is given in [3]. Progress has been made on generalizing these
testing methods to nondeterministic FSMs, for example in [6,9]. FSM-based work
that more closely resembles ioco is reduction of non-deterministic FSMs [4].
Complete testing in ioco received less attention than in FSM theory on this
subject. The original test generation method [11] is an approach in which test
cases are generated randomly. The method is complete in the sense that any
fault can be found, but there is no upper bound to the required number and
length of test cases.
In [8], complete test suites are constructed for Mealy-IOTSes. Mealy-IOTSes
are a subclass of suspension automata, but are similar to Mealy machines as
(sequences of) outputs are coupled to inputs. This makes the transition from
FSM testing more straightforward.
The work most similar to ours [10] works on deterministic labeled transition systems, adding quiescence afterwards, as usual for ioco. Non-deterministic
models are thus not considered, and cannot be handled implicitly through determinization, as determinization can only be done after adding quiescence. Some
further restrictions are made on the speciﬁcation domains. In particular, all
speciﬁcation states should be reachable without depending on choices for output transitions of the implementation. Furthermore, all states should be mutually incompatible. In this sense, our test suite construction can be applied to
a broader set of systems, but will potentially be much less eﬃcient. Thus, we
prioritize exploring the bounds of n-complete test suites for ioco, whereas [10]
aims at eﬃcient test suites, by restricting the models which can be handled.

2

Preliminaries

The original ioco theory is deﬁned for labeled transition systems, which may
contain internal transitions, be nondeterministic, and may have states without outputs [11]. To every state without outputs, a self-loop with quiescence is
added as an artiﬁcial output. The resulting labeled transition system is then
determinized to create a suspension automaton, which is equivalent to the initial

n-Complete Test Suites for IOCO

93

labeled transition system with respect to ioco [13]. In this paper, we will consider a slight generalization of suspension automata, such that our results hold
for ioco in general: quiescent transitions usually have some restrictions, but we
do not require them and we will treat quiescence as any other output. We will
deﬁne them in terms of general automata with inputs and outputs.
Definition 1. An I/O-automaton is a tuple (Q, LI , LO , T, q0 ) where
–
–
–
–
–

Q is a ﬁnite set of states
LI is a ﬁnite set of input labels
LO is a ﬁnite set of output labels
T : Q × (LI ∪ LO )  Q is the (partial) transition function
q0 ∈ Q is the initial state

We denote the domain of I/O-automata for LI and LO with A(LI , LO ).
For the remainder of this paper we ﬁx LI and LO as disjoint sets of input and
output labels respectively, with L = LI ∪ LO , and omit them if clear from the
context. Furthermore, we use a, b as input symbols and x, y, z as output symbols.
Definition 2. Let S = (Q, LI , LO , T, q0 ) ∈ A, q ∈ Q, B ⊆ Q, μ ∈ L and
σ ∈ L∗ . Then we deﬁne:


∅
if T (q, μ) = ⊥
q after μ =
{T (q, μ)} otherwise

q after μ
B after μ =
q  ∈B

q after  = {q}
q after μσ = (q after μ) after σ

B after σ =


q  after σ

q  ∈B

out(B) = {x ∈ LO | B after x = ∅}
in(B) = {a ∈ LI | B after a = ∅}
init(B) = in(B) ∪ out(B)
Straces(B) = {σ  ∈ L∗ | B after σ  = ∅}


S is output-enabled if ∀p ∈ Q : out(p) = ∅ SA = {S ∈ A | S is output-enabled}


S is input-enabled if ∀p ∈ Q : in(p) = LI SAIE = {S ∈ SA | S is input-enabled}

We interchange singleton sets with its element, e.g. we write out(q) instead
of out({q}). Deﬁnitions on states will sometimes be used for automata as well,
acting on their initial states. Similarly, deﬁnitions on automata will be used for
states, acting on the automaton with that state as its initial state. For example,
for S = (Q, LI , LO , T, q0 ) ∈ A and q ∈ Q, we may write S after μ instead of q0
after μ, and we may write that q is input-enabled if S is input-enabled.
In this paper, speciﬁcations are suspension automata in SA, and implementations are input-enabled suspension automata in SAIE . The ioco relation formalizes when implementations conform to speciﬁcations. We give a deﬁnition
relating suspension automata, following [11], and the coinductive deﬁnition [7]
relating states. Both deﬁnitions have been proven to coincide.
Definition 3. Let S ∈ SA, and I ∈ SAIE . Then we say that I ioco S if ∀σ ∈
Straces(S) : out(I after σ) ⊆out(S after σ).

94

P. van den Bos et al.

Definition 4. Let S = (Qs , LI , LO , Ts , q0s ) ∈ SA, and I = (Qi , LI , LO , Ti , q0i ) ∈
SAIE . Then for qi ∈ Qi , qs ∈ Qs , we say that qi ioco qs if there exists a
coinductive ioco relation R ⊆ Qi × Qs such that (qi , qs ) ∈ R, and ∀(q, p) ∈ R:
– ∀a ∈ in(p) : (q after a, p after a) ∈ R
– ∀x ∈ out(q) : x ∈ out(p) ∧ (q after x, p after x) ∈ R
In order to deﬁne complete test suites, we require execution of tests to be
fair : if a trace σ is performed often enough, then every output x appearing
in the implementation after σ will eventually be observed. Furthermore, the
implementation may give an output after σ before the tester can supply an
input. We then assume that the tester will eventually succeed in performing
this input after σ. This fairness assumption is unavoidable for any notion of
completeness in testing suspension automata: a fault can never be detected if an
implementation always chooses paths that avoid this fault.

3

Distinguishing Experiments

An important part of n-complete test suites for FSM equivalence is the distinguishing sequence, used to identify an implementation state. As ioco is not
an equivalence relation, there does not have to be a one-to-one correspondence
between speciﬁcation and implementation states.
3.1

Equivalence and Compatibility

We ﬁrst describe equivalence and compatibility relations between states, in order
to deﬁne distinguishing experiments. We consider two speciﬁcations to be equivalent, denoted S1 ≈ S2 , if they have the same implementations conforming to
them. Then, for all implementations I, we have I ioco S1 iﬀ I ioco S2 . For two
inequivalent speciﬁcations, there is thus an implementation which conforms to
one, but not the other.
Intuitively, equivalence relates states with the same traces. However, implicit
underspeciﬁcation by absent inputs should be handled equivalently to explicit
underspeciﬁcation with chaos. This is done by using chaotic completion [11].
This deﬁnition of equivalence is inspired by the relation wioco [12], which relates
speciﬁcations based on their sets of traces.
Definition 5. Let (Q, LI , LO , T, q0 ) ∈ SA. Deﬁne chaos, a speciﬁcation to
which every implementation conforms, as X = ({χ}, LI , LO , {(χ, x, χ) | x ∈
L}, χ). Let QX = Q ∪ {χ}. The relation ≈ ⊆ QX × QX relates all equivalent
states. It is the largest relation for which it holds that q ≈ q  if:
out(q) = out(q  ) ∧ (∀μ ∈ init(q) ∩ init(q  ) : q after μ ≈ q  after μ)
∧ (∀a ∈ in(q)\in(q  ) : q after a ≈ χ) ∧ (∀a ∈ in(q  )\in(q) : q  after a ≈ χ)

n-Complete Test Suites for IOCO

95

For two inequivalent speciﬁcations, there may still exist an implementation
that conforms to the two. In that case, we deﬁne the speciﬁcations to be compatible, following the terminology introduced in [9,10]. We introduce an explicit
relation for compatibility.
Definition 6. Let (Q, LI , LO , T, q0 ) ∈ SA. The relation ♦ ⊆ Q × Q relates all
compatible states. It is the largest relation for which it holds that q ♦ q  if:
(∀a ∈ in(q) ∩ in(q  ) : q after a ♦ q  after a)
∧ (∃x ∈ out(q) ∩ out(q  ) : q after x ♦ q  after x)
Compatibility is symmetric and reﬂexive, but not transitive. Conversely, two
speciﬁcations are incompatible if there exists no implementation conforming to
both. When q1 ♦ q2 , we can indeed easily make an implementation which conforms to both q1 and q2 : the set of outputs of the implementation state can
simply be out(q1 )∩out(q2 ), which is non-empty by deﬁnition of ♦. Upon such
an output transition or any input transition, the two successor states are again
compatible, thus the implementation can keep picking transitions in this manner. For example, in Fig. 1, compatible states 2 and 3 of the speciﬁcation are
both implemented by state 2 of the implementation.

a

3

a
x

4

y
x
y

x

1

2

x
6

y

a

5

z

(a) Speciﬁcation S with 2 ♦ 3.

a

3

a
x

4
a

x
y

x

1
a x
6
z

y

2∧3
2

a

5
a

(b) An implementation of S.

a
4∧5

x
6

y
z

(c) The merge of speciﬁcation states 2 and 3.

Fig. 1. A speciﬁcation, an implementation, and a merge of two states.

Beneš et al. [1] describe the construction of merging speciﬁcations. For speciﬁcation states qs and qs , their merge is denoted qs ∧ qs . For any implementation
state qi , it holds that qi ioco qs ∧ qi ioco qs ⇐⇒ qi ioco (qs ∧ qs ). Intuitively, a
merge of two states thus only allows behavior allowed by both states. Figure 1c
shows the merge of speciﬁcation states 2 and 3. The merge of qs and qs can be
implemented if and only if qs ♦ qs : indeed, for incompatible states, the merge
has states without any output transitions, which is denoted invalid in [1].
3.2

Distinguishing Trees

When an implementation is in state qi , two incompatible speciﬁcation states qs
and qs are distinguished by showing to which of the two qi conforms, assuming
that it conforms to one. Conversely, we can say that we have to show a nonconformance of qi to qs or qs . Generally, a set of states D is distinguished by

96

P. van den Bos et al.

showing non-conformance to all its states, possibly except one. As a base case,
if |D| ≤ 1, then D is already distinguished. We will construct a distinguishing
tree as an input-enabled automaton which distinguishes D after reaching pass.
Definition
7. Let μ be a symbol and D a set of states. Then injective(μ, D) if

μ ∈ {in(q) | q ∈ D} ∪ LO ∧ ∀q, q  ∈ D : q = q  ∧ μ ∈init(q)∩init(q  ) =⇒ q
after μ = q  after μ. This is extended to sets of symbols Σ as injective(Σ, D) if
∀μ ∈ Σ : injective(μ, D).
Definition 8. Let (Q, LI , LO , T, q0 ) ∈ SA(LI , LO ), and D ⊆ Q a set of mutually incompatible states. Then deﬁne DT (LI , LO , D) ⊆ A(LO , LI ) inductively
as the domain of input-enabled distinguishing trees for D, such that for every
Y ∈ DT (LI , LO , D) with initial state t0 :
– if |D| ≤ 1, then t0 is the verdict state pass, and
– if |D| > 1, then t0 has either
• a transition for a single input a ∈ LI to a Y  ∈ DT (LI , LO , D after a)
such that injective(a, D), and transitions to a verdict state reset for all
x ∈ LO , or
• a transition for every output x ∈ LO to a Y  ∈ DT (LI , LO , D after x)
such that injective(x, D).
Furthermore, pass or reset is always reached after a ﬁnite number of steps,
and these states are sink states, i.e. contain transitions only to itself.
A distinguishing tree can synchronize with an implementation to reach a
verdict state. As an implementation is output-enabled and the distinguishing
tree is input-enabled, this never blocks. If the tree performs an input, the implementation may provide an output ﬁrst, resulting in reset: another attempt is
needed to perform the input. If no input is performed by the tree, it waits for
any output, after which it can continue. In this way, the tester is guaranteed to
steer the implementation to a pass, where the speciﬁcation states disagree on
the allowed outputs: the implementation has to choose an output, thus has to
choose which speciﬁcations (not) to implement.
For a set D of mutually incompatible states, such a tree may not exist. For
example, consider states 1, 3 and 5 in Fig. 2. States 1 and 3 both lead to the
same state after a, and can therefore not be distinguished. Similarly, states 3
and 5 cannot be distinguished after b. Labels a and b are therefore not injective
according to Deﬁnition 7 and should not be used. This concept is similar in FSM
testing [5]. A distinguishing sequence always exists when |D| = 2. When |D| > 2,
we can thus use multiple experiments to separate all states pairwise.
Lemma 9. Let S ∈ SA. Let q and q  be two states of S, such that q ♦ q  . Then
there exists a distinguishing tree for q and q  .
Proof. Since q ♦ q  , we know that:
(∃a ∈ in(q) ∩ in(q  ) : q after a ♦ q  after a)
∨ (∀x ∈ out(q) ∩ out(q  ) : q after x ♦ q  after x)

n-Complete Test Suites for IOCO

97

So we have that some input or all outputs, enabled in both q and q  , lead to
incompatible states, for which this holds again. Hence, we can construct a tree
with nodes that either have a child for an enabled input of both states, or
children for all outputs enabled in the states (children for not enabled outputs
are distinguishing trees for ∅), as in the second case of Deﬁnition 8. If this tree
would be inﬁnite, then this tree would describe inﬁnite sequences of labels. Since
S is ﬁnite, such a sequence would be a cycle in S. This would mean that q ♦ q  ,
which is not the case. Hence we have that the tree is ﬁnite, as required by
Deﬁnition 8.


z
5

a
y
b

a
4

b

3
z

a

2

x

z
1

b

Fig. 2. No distinguishing tree exists for {1,3,5}.

3.3

Distinguishing Compatible States

Distinguishing techniques such as described in Sect. 3.2 rely on incompatibility
of two speciﬁcations, by steering the implementation to a point where the speciﬁcations disagree on the allowed outputs. This technique fails for compatible speciﬁcations, as an implementation state may conform to both speciﬁcations. Thus,
a tester then cannot steer the implementation to showing a non-conformance to
either.
We thus extend the aim of a distinguishing experiment: instead of showing a
non-conformance to any of two compatible states qs and qs , we may also prove
conformance to both. This can be achieved with an n-complete test suite for
qs ∧ qs ; this will be explained in Sect. 4.1. Note that even for an implementation
which does not conform to one of the speciﬁcations, n-complete testing is needed.
Such an implementation may be distinguished, but it is unknown how, due to
compatibility. See for example the speciﬁcation and implementation of Fig. 1.
State 2 of the implementation can only be distinguished from state 3 by observing
ax, which is non-conforming behavior for state 2. Although y would also be nonconforming for state 2, this behavior is not observed.
In case that a non-conformance to the merged speciﬁcation is found with an
n-complete test suite, then the outcome is similar to that of a distinguishing tree
for incompatible states: we have disproven conformance to one of the individual
speciﬁcations (or to both).

4

Test Suite Definition

The number n of an n-complete test suite T of a speciﬁcation S tells how many
states an implementation I is allowed to have to give the guarantee that I ioco

98

P. van den Bos et al.

S after passing T (we will deﬁne passing a test suite later). To do this, we must
only count the states relevant for conformance.
Definition 10. Let S = (Qs , LI , LO , T, q0s ) ∈ SA, and I = (Qi , LI , LO , Ti , q0i ) ∈
SAIE . Then,
– A state qs ∈ Qs is reachable if ∃σ ∈ L∗ : S after σ = qs .
– A state qi ∈ Qi is speciﬁed if ∃σ ∈ Straces(S) : I after σ = qi . A transition
(qi , μ, qi ) ∈ Ti is speciﬁed if qi is speciﬁed, and if either μ ∈ LO , or μ ∈
LI ∧ ∃σ ∈ L∗ : I after σ = qi ∧ σμ ∈ Straces(S).
– We denote the number of reachable states of S with |S|, and the number of
speciﬁed, reachable states of I with |I|.
Definition 11. Let S ∈ SA be a speciﬁcation. Then a test suite T for S is
n-complete if for each implementation I: I passes T =⇒ (I ioco S ∨ |I| > n).
In particular, |S|-complete means that if an implementation passes the test
suite, then the implementation is correct (w.r.t. ioco) or it has strictly more
states than the speciﬁcation. Some authors use the convention that n denotes
the number of extra states (so the above would be called 0-completeness).
To deﬁne a full complete test suite, we ﬁrst deﬁne sets of distinguishing
experiments.
Definition 12. Let (Q, LI , LO , T, q0 ) ∈ SA. For any state q ∈ Q, we choose a
set W (q) of distinguishing experiments, such that for all q  ∈ Q with q = q  :
– if q ♦ q  , then W (q) contains a distinguishing tree for D ⊆ Q, s.t. q, q  ∈ D.
– if q ♦ q  , then W (q) contains a complete test suite for q ∧ q  .
Moreover, we need sequences to access all speciﬁed, reachable implementation
states. After such sequences distinguishing experiments can be executed. We will
defer the explicit construction of the set of access sequences. For now we assume
some set P of access sequences to exist.
Definition 13. Let S ∈ SA and I ∈ SAIE . Let P be a set of access sequences
and let P + = {σ ∈ P ∪ P · L | S after σ = ∅}. Then the distinguishing test suite
is deﬁned as T = {στ | σ ∈ P + , τ ∈ W (q0 after σ)}. An element t ∈ T is a test.
4.1

Distinguishing Experiments for Compatible States

The distinguishing test suite relies on executing distinguishing experiments. If
a speciﬁcation contains compatible states, the test suite contains distinguishing
experiments which are themselves n-complete test suites. This is thus a recursive
construction: we need to show that such a test suite is ﬁnite. For particular
speciﬁcations, recursive repetition of the distinguishing test suite as described
above is already ﬁnite. For example, speciﬁcation S in Fig. 1 contains compatible
states, but in the merge of every two compatible states, no further compatible
states remain. A test suite for S needs to distinguish states 2 and 3. For this

n-Complete Test Suites for IOCO

99

purpose, it uses an n-complete test suite for 2 ∧ 3, which contains no compatible
states, and thus terminates by only containing distinguishing trees.
However, the merge of two compatible states may in general again contain
compatible states. In these cases, recursive repetition of distinguishing test suites
may not terminate. An alternative unconditional n-complete test suite may be
constructed using state counting methods [4], as shown in the next section.
Although ineﬃcient, it shows the possibility of unconditional termination. The
recursive strategy thus may serve as a starting point for other, eﬃcient constructions for n-complete test suites.
Unconditional n-complete Test Suites. We introduce Lemma 16 to bound
test suite execution. We ﬁrst deﬁne some auxiliary deﬁnitions.
Definition 14. Let S ∈ SA, σ ∈ L∗ , and x ∈ LO . Then σx is an iococounterexample if S after σ = ∅, x ∈out(S after σ).
Naturally, I ioco S if and only if Straces(I) contains no ioco-counterexample.
Definition 15. Let S = (Qs , LI , LO , Ts , qs0 ) ∈ SA and I ∈ SAIE . A trace σ ∈
Straces(S) is short if ∀qs ∈ Qs : |{ρ | ρ is a preﬁx of σ ∧ qs0 after ρ = qs }| ≤ |I|.
 S, then Straces(I) contains
Lemma 16. Let S ∈ SA and I ∈ SAIE . If I ioco

a short ioco-counterexample.
 S, then Straces(I) must contain an ioco-counterexample σ. If
Proof. If I ioco

σ is short, the proof is trivial, so assume it is not. Hence, there exists a state
qs , with at least |I| + 1 preﬁxes of σ leading to qs . At least two of those preﬁxes
ρ and ρ must lead to the same implementation state, i.e. it holds that qi0 after
ρ = qi0 after ρ and qs0 after ρ = qs0 after ρ . Assuming |ρ| < |ρ | without loss
of generality, we can thus create an ioco-counterexample σ  shorter than σ by

replacing ρ by ρ. If σ  is still not short, we can repeat this process until it is.
We can use Lemma 16 to bound exhaustive testing to obtain n-completeness.
When any speciﬁcation state is visited |I| + 1 times with any trace, then any
extensions of this trace will not be short, and we do not need to test them.
Fairness allows us to test all short traces which are present in the implementation.
Corollary 17. Given a speciﬁcation S the set of all traces of length at most
|S| ∗ n is an n-complete test suite.
Example 18. Figure 3 shows an example of a non-conforming implementation
with a counterexample yyxyyxyyxyyx, of maximal length 4 · 3 = 12.
4.2

Execution of Test Suites

A test στ is executed by following σ, and then executing the distinguishing
experiment τ . If the implementation chooses any output deviating from σ, then
the test gives a reset and should be reattempted. Finishing τ may take several

100

P. van den Bos et al.
x
1
y
4

y
y
x

2
y

x
3

(a) Speciﬁcation S.

x
3

1
y

y
2

(b) Implementation I.

Fig. 3. A speciﬁcation, and a non-conforming implementation.

executions: a distinguishing tree may give a reset, and an n-complete test suite
to distinguish compatible states may contain multiple tests. Therefore σ needs
to be run multiple times, in order to allow full execution of the distinguishing
experiment. By assuming fairness, every distinguishing experiment is guaranteed
to termininate, and thus also every test.
The verdict of a test suite T for speciﬁcation S is concluded simply by checking for observed ioco-counterexamples to S during execution. When executing
a distinguishing experiment w as part of T, the verdict of w is ignored when
concluding a verdict for T: we only require w to be fully executed, i.e. be reattempted if it gives a reset, until it gives a pass or fail. For example, if σ leads
to speciﬁcation state q, and q needs to be distinguished from compatible state
q  , a test suite T for q ∧ q  is needed to distinguished q and q  . If T ﬁnds a
non-conformance to either q or q  , it yields fail. Only in the former case, T will
also yield fail, and in the latter case, T will continue with other tests: q and
q  have been successfully distinguished, but no non-conformance to q has been
found. If all tests have been executed in this manner, T will conclude pass.
4.3

Access Sequences

In FSM-based testing, the set P for reaching all implementation states is taken
care of rather eﬃciently. The set P is constructed by choosing a word σ for
each speciﬁcation state, such that σ leads to that state (note the FSMs are fully
deterministic). By passing the tests P · W , where W is a set of distinguishing
experiment for every reached state, we know the implementation has at least
some number of states (by observing that many diﬀerent behaviors). By passing
tests P ·L·W we also verify that every transition has the correct
 destination state.
By extending these tests to P · L≤k+1 · W (where L≤k+1 = m∈{0,··· ,k+1} Lm ),
we can reach all implementation states if the implementation has at most k
more states than the speciﬁcation. For suspension automata, however, things
are more diﬃcult for two reasons: (1) A speciﬁcation state may be reachable
only if an implementation chooses to implement a particular, optional output
transition (in which case this state is not certainly reachable [10]), and (2) if
the speciﬁcation has compatible states, the implementation may implement two
speciﬁcation states with a single implementation state.
Consider Fig. 4 for an example. An implementation can omit state 2 of the
speciﬁcation, as shown in Fig. 4b. Now Fig. 4c shows a fault not found by a test

n-Complete Test Suites for IOCO

101

suite P · L≤1 · W : if we take y ∈ P , z ∈ L, and observe z ∈ W (3), we do not
reach the faulty y transition in the implementation. So by leaving out states, we
introduce an opportunity to make a fault without needing more states than the
speciﬁcation. This means that we may need to increase the size of the test suite
in order to obtain the desired completeness. In this example, however, a test
suite P · L≤2 · W is enough, as the test suite will contain a test with yzz ∈ P · L2
after which the faulty output y ∈ W (3) will be observed.

x
y

2

1

3

(a) Speciﬁcation S.

z

y

z

y

y

z
(b) Conforming implementation.

z
(c) Non-conforming
implementation.

Fig. 4. A speciﬁcation with not certainly reachable states 2 and 3.

Clearly, we reach all states in a n-state implementation for any speciﬁcation
S, by taking P to be all traces in Straces(S) of at most length n. This set P
can be constructed by simple enumeration. We then have that the traces in the
set P will reach all speciﬁed, reachable states in all implementations I such that
|I| ≤ n. In particular this will mean that P + reaches all speciﬁed transitions.
Although this generates exponentially many sequences, the length is substantially shorter than the sequences obtained by the unconditional n-complete test
suite. We conjecture that a much more eﬃcient construction is possible with a
careful analysis of compatible states and the not certainly reachable states.
4.4

Completeness Proof for Distinguishing Test Suites

We let T be the distinguishing test suite as deﬁned in Deﬁnition 13. As discussed
before, if q and q  are compatible, the set W (q) can be deﬁned using another
complete test suite. If the test suite is again a distinguishing test suite, completeness of it is an induction hypothesis. If, on the other hand, the unconditional
n-complete test suite is used, completeness is already guaranteed (Corollary 17).
Theorem 19. Let S = (Qs , LI , LO , Ts , q0s ) ∈ SA be a speciﬁcation. Let T be a
distinguishing test suite for S. Then T is n-complete.
Proof. We will show that for any implementation of the correct size and which
passes the test suite we can build a coinductive ioco relation which contain the
initial states. As a basis for that relation we take the states which are reached by
the set P . This may not be an ioco relation, but by extending it (in two steps)
we obtain a full ioco relation. Extending the relation is an instance of a so-called
up-to technique, we will use terminology from [2].

102

P. van den Bos et al.

More precisely, Let I = (Qi , LI , LO , Ti , q0i ) ∈ SAIE be an implementation
with |I| ≤ n which passes T. By construction of P , all reachable speciﬁed implementation states are reached by P and so all speciﬁed transitions are reached
by P + .
The set P deﬁnes a subset of Qi × Qs , namely R = {(q0i after σ, q0s
after σ) | σ ∈ P }. We add relations for all equivalent states: R = {(i, s) |
(i, s ) ∈ R, s ∈ Qs , s ≈ s }. Furthermore, let J = {(i, s, s ) | i ∈ Qi , s, s ∈
Qs such that i ioco s ∧ i ioco
s } and Ri,s,s be the ioco relation for i ioco s ∧


i ioco s , now deﬁne R = R ∪ (i,s,s )∈J Ri,s,s . We want to show that R deﬁnes
a coinductive ioco relation. We do this by showing that R progresses to R.
Let (i, s) ∈ R. We assume that we have seen all of out(i) and that
out(i) ⊆ out(s) (this is taken care of by the test suite and the fairness assumption). Then, because we use P + , we also reach the transitions after i. We need
to show that the input and output successors are again related.
– Let a ∈ LI . Since I is input-enabled we have a transition for a with i after
a = i2 . Suppose there is a transition for a from s: s after a = s2 (if not, then
we’re done). We have to show that (i2 , s2 ) ∈ R.
– Let x ∈ LO . Suppose there is a transition for x: i after x = i2 Then (since
out(i)⊆out(s)) there is a transition for x from s: s after x = s2 . We have to
show that (i2 , s2 ) ∈ R.
In both cases we have a successor (i2 , s2 ) which we have to prove to be in R. Now
since P reaches all states of I, we know that (i2 , s2 ) ∈ R for some s2 . If s2 ≈ s2
then (i2 , s2 ) ∈ R ⊆ R holds trivially, so suppose that s2 ≈ s2 . Then there exists
a distinguishing experiment w ∈ W (s2 ) ∩ W (s2 ) which has been executed in i2 ,
namely in two tests: a test σw for some σ ∈ P + with S after σ = s2 , and a test
σ  w for some σ  ∈ P with S after σ  . Then there are two cases:
– If s2 ♦ s2 then w is a distinguishing tree separating s2 and s2 . Then there is
a sequence ρ taken in w of the test σw, i.e. w after ρ reaches a pass state
of w, and similarly there is a sequence ρ that is taken in w of the test σ  w.
By construction of distinguishing trees, ρ must be an ioco-counterexample for
either s2 or s2 , but because T passed this must be s2 . Similarly, ρ disproves
s2 . One implementation state can implement at most one of {ρ, ρ }. This
contradicts that the two tests passed, so this case cannot happen.
– If s2 ♦ s2 (but s2 ≈ s2 as assumed above), then w is a test suite itself for
s2 ∧ s2 . If w passed in both tests then i2 ioco s2 and i2 ioco s2 , and hence
(i2 , s2 ) ∈ Ri,s2 ,s2 ⊆ R. If w failed in one of the tests σw or σ  w, then i2 does
not conform to both s2 and s2 , and hence w also fails in the other test. So
again, there is a counterexample ρ for s2 and ρ for s2 . One implementation
state can implement at most one of {ρ, ρ }. This contradicts that the two
tests passed, so this case cannot happen.
We have now seen that R progresses to R. It is clear that R progresses to R
too. Then, since each Ri,s,s is an ioco relation, they progress to Ri,s,s ⊆ R. And
so the union, R, progresses to R, meaning that R is a coinductive ioco relation.


Furthermore, we have (i0 , s0 ) ∈ R (because  ∈ P ), concluding the proof.

n-Complete Test Suites for IOCO

103

We remark that if the speciﬁcation does not contain any compatible states,
that the proof can be simpliﬁed a lot. In particular, we do not need n-complete
test suites for merges of states, and we can use the relation R instead of R.

5

Constructing Distinguishing Trees

Lee and Yannakakis proposed an algorithm for constructing adaptive distinguishing sequences for FSMs [5]. With a partition reﬁnement algorithm, a splitting
tree is build, from which the actual distinguishing sequence is extracted.
A splitting tree is a tree of which each node is identiﬁed with a subset of the
states of the speciﬁcation. The set of states of a child node is a (strict) subset of
the states of its parent node. In contrast to splitting trees for FSMs, siblings may
overlap: the tree does not describe a partition reﬁnement. We deﬁne leaves(Y )
as the set of leaves of a tree Y . The algorithm will split the leaf nodes, i.e. assign
children to every leaf node. If all leaves are identiﬁed with a singleton set of
states, we can distinguish all states of the root node.
Additionally, every non-leaf node is associated with a set of labels from L. We
denote the labels of node D with labels(D). The distinguishing tree that is going
to be constructed from the splitting tree is built up from these labels. As argued
in Sect. 3.2, we require injective distinguishing trees, thus our splitting trees only
contain injective labels, i.e. injective(labels(D), D) for all non-leaf nodes D.
Below we list three conditions that describe when it is possible to split the
states of a leaf D, i.e. by taking some transition, we are able to distinguish some
states from the other states of D. We will see later how a split is done. If the
ﬁrst condition is true, at least one state is immediately distinguished from all
other states. The other two conditions describe that a leaf D can be split if after
an input or all outputs some node D is reached that already is split, i.e. D is
a non-leaf node. Consequently, a split for condition 1 should be done whenever
possible, and otherwise a split for condition 2 or 3 can be done. Depending on
the implementation one is testing, one may prefer splitting with either condition
2 or 3, when both conditions are true.
We present each condition by ﬁrst giving an intuitive description in words,
and then a more formal deﬁnition. With Π(A) we denote the set of all non-trivial
partitions of a set of states A.
Definition 20. A leaf D of tree Y can be split if one of the following conditions
hold:
1. All outputs are enabled in some but not in all states.
∀x ∈ out(D) : injective(x, D) ∧ ∃d ∈ D : d after x = ∅
2. Some states reach diﬀerent leaves than other states for all outputs.
∀x ∈ out(D) : injective(x, D) ∧ ∃P ∈ Π(D), ∀d, d ∈ P :
(d = d =⇒ ∀l ∈ leaves(Y ) : l ∩ d after x = ∅ ∨ l ∩ d after x = ∅)

104

P. van den Bos et al.

3. Some states reach diﬀerent leaves than other states for some input.
∃a ∈ in(D) : injective(a, D) ∧ ∃P ∈ Π(D), ∀d, d ∈ P :
(d = d =⇒ ∀l ∈ leaves(Y ) : l ∩ d after a = ∅ ∨ l ∩ d after a = ∅)
Algorithm 1 shows how to split a single leaf of the splitting tree (we chose
arbitrarily to give condition 2 a preference over condition 3). A splitting tree is
constructed in the following manner. Initially, a splitting tree is a leaf node of
the state set from the speciﬁcation. Then, the full splitting tree is constructed by
splitting leaf nodes with Algorithm 1 until no further splits can be made. If all
leaves in the resulting splitting tree are singletons, the splitting tree is complete
and a distinguishing tree can be constructed (described in the next section).
Otherwise, no distinguishing tree exists. Note that the order of the splits is left
unspeciﬁed.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

Input: A speciﬁcation S = (Q, LI , LO , T, q0 ) ∈ SA
Input: The current (unﬁnished) splitting tree Y
Input: A leaf node D from Y
if Condition 1 holds for D then
P := {D after x | x ∈ out(D)};
labels(D) := out(D);
Add the partition blocks of P as children of D;
else if Condition 2 holds for D then
labels(D) := out(D);
foreach x ∈ out(D) do
P := the ﬁnest partition for Condition 2 with D and x;
Add the partition blocks of P as children of D;
end
else if Condition 3 holds for D with input a then
P := the ﬁnest partition for Condition 3 with D and a;
labels(D) := {a};
Add the partition blocks of P as children of D;
return Y ;

Algorithm 1. Algorithm for splitting a leaf node of a splitting tree.

Example 21. Let us apply Algorithm 1 on the suspension automaton in Fig. 5a.
Figure 5b shows the resulting splitting tree. We initialize the root node to
{1, 2, 3, 4, 5}. Condition 1 applies, since states 1 and 5 only have output y
enabled, while states 2, 3 and 4 only have outputs x and z enabled. Thus, we
add leaves {1, 5} and {2, 3, 4}.
We can split {1, 5} by taking an output transition for y according to condition
2, as 1 after y = 4 ∈ {2, 3, 4}, while 5 after y = 1 ∈ {1, 5}, i.e. 1 and 5 reach
diﬀerent leaves. Condition 2 also applies for {2, 3, 4}. We have that {2, 3} after
x = {2, 4} ⊆ {2, 3, 4} while 4 after x = 5 ∈ {5}. Hence we obtain children {4}

n-Complete Test Suites for IOCO

105

{1,2,3,4,5}: x, y, z
y

5

1

a

2

a

y

x

z

4

z
z

x

x

{2,3,4}: x, z

{1,5}: y
a
{1}

3

(a) Example speciﬁcation with
mutually incompatible states.

{5}

{4} {2,3}: a {2} {3,4}: a
{2} {3}

{3} {4}

(b) Splitting tree of Figure 5a.

Fig. 5. Speciﬁcation and its splitting tree.

and {2, 3} for output x. For z we have that 2 after z = 1 ∈ {1} while {3, 4}
after z = {3, 4} ⊆ {2, 3, 4}, so we obtain children {2} and {3, 4} for z.
We can split {2,3} by taking input transition a according to condition 3,
since 2 after a = 4 and 3 after a = 2, and no leaf of the splitting tree contains
both state 2 and state 4. Note that we could also have split on output transitions
x and z. Node {3, 4} cannot be split for output transition z, since {3, 4} after
z = {3, 4} which is a leaf, and hence condition 2 does not hold. However node
{3, 4} can be split for input transition a, as 3 after a = 2 and 4 after a = 4.
Now all leaves are singleton, so we can distinguish all states with this tree.
A distinguishing tree Y ∈ DT (LI , LO , D) for D can be constructed from a
splitting tree with singleton leaf nodes. This follows the structure in Deﬁnition 8,
and we only need to choose whether to provide an input, or whether to observe
outputs. We look at the lowest node D in the split tree such that D ⊆ D .

x

{1,2,3,4,5}
y

{2,4,5}
x y z

z
{1,3,4}
x y

{1,4}
x y
z

{1} {1,3} {5} {4} {3} {2,5}
{4,5}
y
x
z
x y
z
x y
z
{5} {1} {3} {2} {4} {4}

z

{4}

{3,4}
a

x

y

z

{4} {1} {1} {2,4} reset reset reset
y
x
z
{4,5}
x y
z
{5}

{1} {3}

∅

{1,3}
x y
z
{2}

{4} {4}

Fig. 6. Distinguishing tree of Fig. 5a. The states are named by the sets of states which
they distinguish. Singleton and empty sets are the pass states. Self-loops in verdict
states have been omitted, for brevity.

106

P. van den Bos et al.

If labels(D ) has an input, then Y has a transition for this input, and a transition
to reset for all outputs. If labels(D ) contains outputs, then Y has a transition for
all outputs. In this manner, we recursively construct states of the distinguishing
tree until |D| ≤ 1, in which case we have reached a pass state. Figure 6 shows
the distinguishing tree obtained from the splitting tree in Fig. 5b.

6

Conclusions

We ﬁrmly embedded theory on n-complete test suites into ioco theory, without making any restricting assumptions. We have identiﬁed several problems
where classical FSM techniques fail for suspension automata, in particular for
compatible states. An extension of the concept of distinguishing states has been
introduced such that compatible states can be handled, by testing the merge
of such states. This requires that the merge itself does not contain compatible
states. Furthermore, upper bounds for several parts of a test suite have been
given, such as reaching all states in the implementation.
These upper bounds are exponential in the number of states, and may limit
practical applicability. Further investigation is needed to eﬃciently tackle these
parts of the test suite. Alternatively, looser notions for completeness may circumvent these problems. Furthermore, experiments are needed to compare our
testing method and random testing as in [11] quantitatively, in terms of eﬃciency
of computation and execution time, and the ability to ﬁnd bugs, preferably on
a real world case study.


Residual Nominal Automata
Joshua Moerman
RTWH Aachen University, Germany

Matteo Sammartino
Royal Holloway University of London, UK
University College London, UK

Abstract
We are motivated by the following question: which nominal languages admit an active learning
algorithm? This question was left open in previous work, and is particularly challenging for languages
recognised by nondeterministic automata. To answer it, we develop the theory of residual nominal
automata, a subclass of nondeterministic nominal automata. We prove that this class has canonical
representatives, which can always be constructed via a finite number of observations. This property
enables active learning algorithms, and makes up for the fact that residuality – a semantic property
– is undecidable for nominal automata. Our construction for canonical residual automata is based on
a machine-independent characterisation of residual languages, for which we develop new results in
nominal lattice theory. Studying residuality in the context of nominal languages is a step towards a
better understanding of learnability of automata with some sort of nondeterminism.
2012 ACM Subject Classification Theory of computation → Automata over infinite objects; Theory
of computation → Automated reasoning
Keywords and phrases nominal automata, residual automata, derivative language, decidability,
closure, exact learning, lattice theory
Digital Object Identifier 10.4230/LIPIcs.CONCUR.2020.44
Related Version Full version at https://arxiv.org/abs/1910.11666.
Funding ERC AdG project 787914 FRAPPANT, EPSRC Standard Grant CLeVer (EP/S028641/1).
Acknowledgements We would like to thank Gerco van Heerdt for providing examples similar to
that of Lr in the context of probabilistic automata. We thank Borja Balle for references on residual
probabilistic languages, and Henning Urbat for discussions on nominal lattice theory. Lastly, we
thank the reviewers of a previous version of this paper for their interesting questions and suggestions.

1

Introduction

Formal languages over infinite alphabets have received considerable attention recently. They
include data languages for reasoning about XML databases [32], trace languages for analysis
of programs with resource allocation [18], and behaviour of programs with data flows [19].
Typically, these languages are accepted by register automata, first introduced in the seminal
paper [20]. Another appealing model is that of nominal automata [6]. While nominal automata
are as expressive as register automata, they enjoy convenient properties. For example, the
deterministic ones admit canonical minimal models, and the theory of formal languages and
many textbook algorithms generalise smoothly.
In this paper, we investigate the properties of so-called residual nominal automata. An
automaton accepting a language L is residual whenever the language of each state is a
derivative of L. In the context of regular languages over finite alphabets, residual finite state
automata (RFSAs) are a subclass of nondeterministic finite automata (NFAs) introduced by
Denis et al. [14] as a solution to the well-known problem of NFAs not having unique minimal
representatives. They show that every regular language L admits a unique canonical RFSA.
© Joshua Moerman and Matteo Sammartino;
licensed under Creative Commons License CC-BY
31st International Conference on Concurrency Theory (CONCUR 2020).
Editors: Igor Konnov and Laura Kovács; Article No. 44; pp. 44:1–44:21
Leibniz International Proceedings in Informatics
Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl Publishing, Germany

44:2

Residual Nominal Automata

Nondeterministic
Residual

Nondeterministic−

Residual−
Deterministic
Figure 1 Relationship between classes of nominal languages. Edges are strict inclusions. With ·−
we denote classes where automata are not allowed to guess values, i.e., to store symbols in registers
without explicitly reading them.

Residual automata play a key role in the context of exact learning 1 , in which one computes
an automaton representation of an unknown language via a finite number of observations.
The defining property of residual automata allows one to (eventually) observe the semantics
of each state independently. In the finite-alphabet setting, residuality underlies the seminal
algorithm L? for learning deterministic automata [1] (deterministic automata are always
residual), and enables efficient algorithms for learning nondeterministic [8] and alternating
automata [2, 3]. Residuality has also been studied for learning probabilistic automata [13].
Existence of canonical residual automata is crucial for the convergence of these algorithms.
Our investigation of residuality in the nominal setting is motivated by the following
question: which nominal languages admit an exact learning algorithm? In previous work [28],
we have shown that the L? algorithm generalises smoothly to nominal languages, meaning that
deterministic nominal automata can be learned. However, the general non-deterministic case
proved to be significantly more challenging. In fact, in stark contrast with the finite-alphabet
case, nondeterministic nominal automata are strictly more expressive than deterministic
ones, thus residual automata are not just succinct representations of deterministic languages.
As a consequence, our attempt to generalise the NL? algorithm for nondeterministic finite
automata to the nominal setting did not fully succeed: we could only prove that it works for
deterministic languages, leaving the nondeterministic case open. By investigating residual
languages, and how they relate to deterministic and nondeterministic ones, we are finally
able to settle this case.
In summary, our contributions are as follows:
Section 3: We refine nominal languages as depicted in Figure 1, by giving separating
languages for each class.
Section 4: We develop new results of nominal lattice theory, and we provide the main
characterisation theorem (Theorem 4.10), showing that the class of residual languages
allow for canonical automata which: a) are minimal in their respective class and unique
(up to isomorphism); b) can be constructed via a finite number of observations of the
language. Both properties are crucial for learning. We prove this important result by
a machine-independent characterisation of those classes of languages. We also give an
analogous result for non-guessing languages (Theorem 4.16).
Section 5: We study decidability and closure properties. Many decision problems, such as
equivalence and universality, are known to be undecidable for nondeterministic nominal
automata. For residual automata, we show that universality becomes decidable. However,
the problem of whether an automaton is residual is undecidable.

1

Exact learning is also known as query learning or active (automata) learning [1].

J. Moerman and M. Sammartino

44:3

Section 6: We settle important open questions about exact learning of nominal languages.
We show that residuality does not imply convergence of existing algorithms, and we give
a (modified) NL? -style algorithm that works precisely for residual languages.
This research mirrors that of residual probabilistic automata [13]. There, too, one has
distinct classes of which the deterministic and residual ones admit canonical automata
and have an algebraic characterisation. We believe that our results contribute to a better
understanding of learnability of automata with some sort of nondeterminism.

2

Preliminaries

We recall the notions of nominal sets [33] and nominal automata [6]. Let A be a countably
infinite set of atoms 2 and let Perm(A) be the set of permutations on A, i.e., the bijective
functions π : A → A. Permutations form a group where the unit is given by the identity
function, the inverse by functional inverse, and multiplication by function composition.
A nominal set is a set X equipped with a function · : Perm(A) × X → X, interpreting
permutations over X. This function must be a group action of Perm(A), i.e., it must satisfy
id ·x = x and π · (π 0 · x) = (π ◦ π 0 ) · x. We say that a set A ⊂ A supports x ∈ X whenever
π · x = x for all π fixing A, i.e., such that π|A = idA . We require for nominal sets that each
element x has a finite support. We denote by supp(x) the smallest finite set supporting x.
The orbit orb(x) of x ∈ X is the set of elements in X reachable from x via permutations:
orb(x) := {π · x | π ∈ Perm(A)}. X is orbit-finite whenever it is a finite union of orbits.
Orbit-finite sets are finitely-representable, hence algorithmically tractable [5].
Given a nominal set X, a subset Y ⊆ X is equivariant if it is preserved by permutations,
i.e., π · Y = Y , for all π ∈ Perm(A), where π acts element-wise. This definition extends
to relations and functions. For instance, a function f : X → Y between nominal sets is
equivariant whenever π · f (x) = f (π · x). Given a nominal set X, the nominal power set is
defined as Pfs (X) := {U ⊆ X | U is finitely supported}.
We recall the notion of nominal automaton from [6]. The theory of nominal automata seamlessly extends classical automata theory by having orbit-finite nominal sets and equivariant
functions in place of finite sets and functions.
I Definition 2.1. A (nondeterministic) nominal automaton A consists of: an orbit-finite
nominal set Σ, the alphabet; an orbit-finite nominal set of states Q; equivariant subsets
I, F ⊆ Q of initial and final states; and an equivariant subset δ ⊆ Q × Σ × Q of transitions.
The usual notions of acceptance and language apply. We denote the language of A
by L(A), and the language accepted by a state q ∈ Q by L(q). Note that the language
L(A) ∈ Pfs (Σ∗ ) is equivariant, and that L(q) ∈ Pfs (Σ∗ ) need not be equivariant, but it is
supported by supp(q).
We recall the notion of derivative language [14].3
I Definition 2.2. Given a language L and a word u ∈ Σ∗ , we define the derivative of L w.r.t.
u as u−1 L := {w | uw ∈ L} and the set of all derivatives as Der(L) := u−1 L | u ∈ Σ∗ .
These definitions seamlessly extend to the nominal setting. Note that w−1 L is finitely
supported whenever L is.
2
3

Sometimes these are called data values.
This is sometimes called a residual language or left quotient. We do not use the term residual language
here, because residual language will mean a language accepted by a residual automaton.

CONCUR 2020

44:4

Residual Nominal Automata

Of special interest are the deterministic, residual, and non-guessing nominal automata,
which we introduce next.
I Definition 2.3. A nominal automaton A is:
Deterministic if I = {q0 }, and for each q ∈ Q and a ∈ Σ there is a unique q 0 such that
(q, a, q 0 ) ∈ δ. In this case, the relation is in fact functional δ : Q × Σ → Q.
Residual if each state q ∈ Q accepts a derivative of L(A), formally: L(q) = w−1 L(A) for
some word w ∈ Σ∗ . The words w such that L(q) = w−1 L(A) are called characterising
words for the state q.
Non-guessing if supp(q0 ) = ∅, for each q0 ∈ I, and supp(q 0 ) ⊆ supp(q) ∪ supp(a), for each
(q, a, q 0 ) ∈ δ.
Observe that the transition function of a deterministic automaton preserves supports (i.e., if
C supports (q, a) then C also supports δ(q, a)). Consequently, all deterministic automata are
non-guessing. For the sake of succinctness, in the following we drop the qualifier “nominal”
when referring to these classes of nominal automata.
For many examples, it is useful to define the notion of an anchor. Given a state q, a word
w is an anchor if δ(I, w) = {q}, that is, the word w leads to q and no other state. Every
anchor for q is also a characterising word for q (but not vice versa).
Finally, we recall the Myhill-Nerode theorem for nominal automata.
I Theorem 2.4 ([6, Theorem 5.2]). Let L be a language. Then L is accepted by a deterministic
automaton if and only if Der(L) is orbit-finite.

3

Separating languages

Deterministic, nondeterministic and residual automata have the same expressive power when
dealing with finite alphabets. The situation is more nuanced in the nominal setting. We
now give one language for each class in Figure 1. For the sake of simplicity, we will use the
one-orbit nominal set of atoms A as alphabet. These languages separate the different classes,
meaning that they belong to the respective class, but not to the classes below or beside it.
For each example language L, we depict: a nominal automaton recognising L (on the
left); the set of derivatives Der(L) (on the right). We make explicit the poset structure of
Der(L): grey rectangles represent orbits of derivatives, and lines stand for set inclusions (we
grey out irrelevant ones). This poset may not be orbit-finite, in which case we depict a small,
indicative part. Observing the poset structure of Der(L) explicitly is important for later,
where we show that the existence of residual automata depends on it. We write aa−1 L to
mean (aa)−1 L. Variables a, b, . . . are always atoms and u, w, . . . are always words.

Deterministic: First symbol equals last symbol
Consider the language Ld := {awa | a ∈ A, w ∈ A∗ }. This is accepted by the following
deterministic nominal automaton. The automaton is actually infinite-state, but we represent
it symbolically using a register-like notation, where we annotate each state with the current
6= a

a
a

Ad =

a

a

a
6= a

aa−1 Ld

bb−1 Ld

···

a−1 Ld

b−1 Ld

···

Ld

Figure 2 A deterministic automaton accepting Ld , and the poset Der(Ld ).

J. Moerman and M. Sammartino

44:5

value of a register. Note that the derivatives a−1 Ld , b−1 Ld , . . . are in the same orbit. In total
Der(Ld ) has three orbits, which correspond to the three orbits of states in the deterministic
automaton. The derivative awa−1 Ld , for example, equals aa−1 Ld .

Non-guessing residual: Some atom occurs twice
The language is Lng,r := {uavaw | u, v, w ∈ A∗ , a ∈ A}. The poset Der(Lng,r ) is not
orbit-finite, so by the nominal Myhill-Nerode theorem there is no deterministic automaton
accepting Lng,r . However, derivatives of the form ab−1 Lng,r can be written as a union
ab−1 Lng,r = a−1 Lng,r ∪ b−1 Lng,r . In fact, we only need an orbit-finite set of derivatives to
recover Der(Lng,r ). These orbits are highlighted in the diagram on the right. Selecting the
“right” derivatives is the key idea behind constructing residual automata in Theorem 4.10.
aa−1 Lng,r

A
a

Ang,r =

A

A

a

···

abc−1 Lng,r

···

···

ab−1 Lng,r

···

a

a−1 Lng,r · · · b−1 Lng,r
Lng,r

Figure 3 A (nonresidual) nondeterministic automaton accepting Lng,r , and the poset Der(Lng,r ).

Nondeterministic: Last letter is unique
The language is Ln := {wa | a not in w} ∪ {}. Derivatives a−1 Ln are again unions of smaller
S
languages: a−1 Ln = b6=a ab−1 Ln . (We have omitted languages like aa−1 Ln , as they only
differ from a−1 Ln on the empty word.) However, the poset Der(L) has an infinite descending
chain of languages (with an increasing support), namely a−1 L ⊃ ab−1 L ⊃ abc−1 L ⊃ . . . The
existence of a such a chain implies that Ln cannot be accepted by a residual automaton. This
is a consequence of Theorem 4.10, as we shall see later.
Ln
a−1 Ln

6= a

An =

guess a

a

a

···

b−1 Ln

···

ab−1 Ln

···

···

abc−1 Ln

···

Figure 4 A nondeterministic automaton accepting Ln , and the poset Der(Ln ).

CONCUR 2020

44:6

Residual Nominal Automata

Residual: Last letter is unique but anchored
Consider the alphabet Σ = A ∪ {Anc(a) | a ∈ A}, where Anc is nothing more than a label.
We add the transitions (a, Anc(a), a) to the automaton in the previous example. We obtain
the language Lr = L(Ar ). Here, we have forced the automaton to be residual, by adding an
anchor to the first state. Nevertheless, guessing is still necessary. In the poset, we note that all
elements in the descending chain can now be obtained as unions of Anc(a)−1 Lr . For instance,
S
a−1 Lr = b6=a Anc(b)−1 Lr . Note that Anc(a)Anc(b)−1 Lr = ∅ and Anc(a)a−1 Lr = {}.
Lr
a−1 Lr

6= a
guess a

Ar =

a

a

b−1 Lr

···

···

ab−1 Lr

···

···

abc−1 Lr

···

Anc(a)

Anc(c)−1 Lr

Anc(a)a−1 Lr

Anc(c)Anc(d)−1 Lr

Figure 5 A residual automaton accepting Lr , and the poset Der(Lr ).

Non-guessing nondeterministic: Repeated atom with different successor
The language is Lng := {uabvac | u, v ∈ A∗ , a, b, c ∈ A, b 6= a}. (We allow a = b or a = c.)
This is a language which can be accepted by a non-guessing automaton. However, there is no
residual automaton for this language. The poset structure of Der(Lng ) is very complicated.
We will return to this example after Theorem 4.10.

A

aba−1 Lng
a

Ang =

a

cba−1 Lng

b
aa−1 Lng

ba−1 Lng

A

ab

a

b

6= b

a−1 Lng

b−1 Lng
Lng

Figure 6 A deterministic automaton accepting Lng , and the poset Der(Lng ).

ab−1 Lng

J. Moerman and M. Sammartino

4

44:7

Canonical Residual Nominal Automata

In this section we will give a characterisation of canonical residual automata. We will first
introduce notions of nominal lattice theory, then we will state our main result (Theorem 4.10).
We conclude the section by providing similar results for non-guessing automata.

4.1

Nominal lattice theory

We abstract away from words and languages and consider the set Pfs (Z) for an arbitrary
nominal set Z. This is a Boolean algebra of which the operations ∧, ∨, ¬ are all equivariant
maps [17]. Moreover, the finitely supported union
_
: Pfs (Pfs (Z)) → Pfs (Z)
is also equivariant. We note that this is more general than a binary union, but it is not a
complete join semi-lattice. Hereafter, we shall denote set inclusion by ≤ (< when strict).
I Definition 4.1. Given a nominal set Z and X ⊆ Pfs (Z) equivariant4 , we define the set
generated by X as
n_
o
hXi :=
x | x ⊆ X finitely supported ⊆ Pfs (Z).
W
I Remark 4.2. The set hXi is closed under the operation , and moreover is the smallest
W
equivariant set closed under containing X. In other words, h−i defines a closure operator.
We will often say “X generates Y ”, by which we mean Y ⊆ hXi.
I Definition 4.3. Let X ⊆ Pfs (Z) equivariant and x ∈ X, we say that x is join-irreducible
W
in X if it is non-empty and x = x =⇒ x ∈ x, for every finitely supported x ⊆ X. The set
of all join-irreducible elements is denoted by
JI(X) := {x ∈ X | x join-irreducible in X} .
This is again an equivariant set.
I Remark 4.4. In lattice and order theory, join-irreducible elements are usually defined only
for a lattice (see, e.g., [11]). However, we define them for arbitrary subsets of a lattice. (Note
that a subset of a lattice is merely a poset.) This generalisation will be needed later, when
we consider the poset Der(L) which is not a lattice, but is contained in the lattice Pfs (Σ∗ ).
I Remark 4.5. The notion of join-irreducible, as we have defined here, corresponds to the
notion of prime in [8, 14, 28]. Unfortunately, the word prime has a slightly different meaning
in lattice theory. We stick to the terminology of lattice theory.
If a set Y is well-behaved, then its join-irreducible elements will actually generate the set Y .
This is normally proven with a descending chain condition. We first restrict our attention to
orbit-finite sets. The following Lemma extends [11, Lemma 2.45] to the nominal setting.
I Lemma 4.6. Let X ⊆ Pfs (Z) be an orbit-finite and equivariant set.
1. Let a ∈ X, b ∈ Pfs (Z) and a 6≤ b. Then there is a join-irreducible x ∈ X such that x ≤ a
and x 6≤ b.
W
2. Let a ∈ X, then a = {x ∈ X | x join-irreducible in X and x ≤ a}.

4

A similar definition could be given for finitely supported X. In fact, all results in this section generalise
to finitely supported. But we use equivariance for convenience.

CONCUR 2020

44:8

Residual Nominal Automata

I Corollary 4.7. Let X ⊆ Pfs (Z) be an orbit-finite equivariant subset. The join-irreducibles
of X generate X, i.e., X ⊆ hJI(X)i.
So far, we have defined join-irreducible elements relative to some fixed set. We will now
show that these elements remain join-irreducible when considering them in a bigger set, as
long as the bigger set is generated by the smaller one. This will later allow us to talk about
the join-irreducible elements.
I Lemma 4.8. Let Y ⊆ X ⊆ Pfs (Z) equivariant and suppose that X ⊆ hJI(Y )i. Then
JI(Y ) = JI(X).
In other words, the join-irreducibles of X are the smallest set generating X.
I Corollary 4.9. If an orbit-finite set Y generates X, then JI(X) ⊆ Y .

4.2

Characterising Residual Languages

We are now ready to state and prove the main theorem of this paper. We fix the alphabet
Σ. Recall that the nominal Myhill-Nerode theorem tells us that a language is accepted
by a deterministic automaton if and only if Der(L) is orbit-finite. Here, we give a similar
characterisation for languages accepted by residual automata. Moreover, the following result
gives a canonical construction.
I Theorem 4.10. Given a language L ∈ Pfs (Σ∗ ), the following are equivalent:
1. L is accepted by a residual automaton.
2. There is some orbit-finite set J ⊆ Der(L) which generates Der(L).
3. The set JI(Der(L)) is orbit-finite and generates Der(L).
Proof. We prove three implications:
(1 ⇒ 2) Take the set of languages accepted by the states: J := {L(q) | q ∈ A}. This
is clearly orbit-finite, since Q is. Moreover, each derivative is generated as follows:
W
w−1 L = {L(q) | q ∈ δ(I, w)}.
(2 ⇒ 3) We can apply Lemma 4.8 with Y = J and X = Der(L). Now it follows that
JI(Der(L)) is orbit-finite (since it is a subset of J) and generates Der(L).
(3 ⇒ 1) We can construct the following residual automaton, whose language is exactly L:
Q := JI(Der(L))

I := w−1 L ∈ Q | w−1 L ≤ L

F := w−1 L ∈ Q |  ∈ w−1 L

δ(w−1 L, a) := v −1 L ∈ Q | v −1 L ≤ wa−1 L
First, note that A := (Σ, Q, I, F, δ) is a well-defined nominal automaton. In fact, all the
components are orbit-finite, and equivariance of ≤ implies equivariance of δ. Second, we
show by induction on words that each state q = w−1 L accepts its corresponding language,
namely L(q) = w−1 L.
 ∈ L(w−1 L) ⇐⇒ w−1 L ∈ F ⇐⇒  ∈ w−1 L

au ∈ L(w−1 L) ⇐⇒ u ∈ L δ(w−1 L, a)


⇐⇒ u ∈ L v −1 L ∈ Q | v −1 L ≤ wa−1 L
_
(i)
⇐⇒ u ∈
v −1 L ∈ Q | v −1 L ≤ wa−1 L
⇐⇒ ∃v −1 L ∈ Q with v −1 L ≤ wa−1 L and u ∈ v −1 L
(ii)

⇐⇒ u ∈ wa−1 L ⇐⇒ au ∈ w−1 L

J. Moerman and M. Sammartino

44:9

At step (i) we have used the induction hypothesis (u is a shorter word than au) and
the fact that L(−) preserves unions. At step (ii, right-to-left) we have used that v −1 L is
join-irreducible. The other
steps are unfolding definitions.
W  −1
w L | w−1 L ≤ L , since the join-irreducible languages generFinally, note that L =
ate all languages. In particular, the initial states (together) accept L.
J
I Corollary 4.11. The construction above defines a canonical residual automaton with the
following uniqueness property: it has the minimal number of orbits of states and the maximal
number of orbits of transitions.
For finite alphabets, the classes of languages accepted by DFAs and NFAs are the same
(by determinising an NFA). This means that Der(L) is always finite if L is accepted by an
NFA, and we can always construct the canonical RFSA. Here, this is not the case, that is why
we need to stipulate (in Theorem 4.10) that the set JI(Der(L)) is orbit-finite and actually
generates Der(L). Either condition may fail, as we will see in Example 4.13.
I Example 4.12. In this example we show that residual automata can also be used to
compress deterministic automata. The language L := {abb . . . b | a =
6 b} can be accepted by
a deterministic automaton of 4 orbits, and this is minimal. (A zero amount of bs is also
accepted in L.) The minimal residual automaton, however, has only 2 orbits, given by the
join-irreducible languages:
−1 L = {abb . . . b | a 6= b}
ab−1 L = {bb . . . b}

(a, b ∈ A distinct)

The trick in defining the automaton is that the a-transition from −1 L to ab−1 L guesses the
value b. In the next section (Section 4.3), we will define the canonical non-guessing residual
automaton, which has 3 orbits.
I Example 4.13. We return to the examples Ln and Lng from Section 3. We claim that
neither language can be accepted by a residual automaton.
For Ln we note that there is an infinite descending chain of derivatives
Ln > a−1 Ln > ab−1 Ln > abc−1 Ln > · · ·
Each of these languages can be written as a union of smaller derivatives. For instance,
S
a−1 Ln = b6=a ab−1 Ln . This means that JI(Der(Ln )) = ∅, hence it does not generate Der(Ln )
and by Theorem 4.10 there is no residual automaton.
In the case of Lng , we have an infinite ascending chain
Lng < a−1 Lng < ba−1 Lng < cba−1 Lng < · · ·
This in itself is not a problem: the language Lng,r also has an infinite ascending chain.
However, for Lng , none of the languages in this chain are a union of smaller derivatives. Put
differently: all the languages in this chain are join-irreducible (see appendix for the details).
So the set JI(Der(Lng )) is not orbit-finite. By Theorem 4.10, we conclude that there is no
residual automaton accepting Lng .
I Remark 4.14. For arbitrary (nondeterministic) languages there is also a characterisation in
the style of Theorem 4.10. Namely, L is accepted by an automaton iff there is an orbit-finite
set Y ⊆ Pfs (Σ∗ ) which generates the derivatives. However, note that the set Y need not be a
subset of the set of derivatives. In these cases, we do not have a canonical construction for
the automaton. Different choices for Y define different automata and there is no way to pick
Y naturally.

CONCUR 2020

44:10

Residual Nominal Automata

4.3

Automata without guessing

We reconsider the above results for non-guessing automata. Nondeterminism in nominal
automata allows naturally for guessing, meaning that the automaton may store symbols
in registers without explicitly reading them. However, the original definition of register
automata in [20] does not allow for guessing, and non-guessing automata remain actively
researched [29]. Register automata with guessing were introduced in [21], because it was
realised that non-guessing automata are not closed under reversal.
To adapt to non-guessing automata, we redefine join-irreducible elements. As we would
like to remove states which can be written as a “non-guessing” union of other states, we only
consider joins of sets of elements where all elements are supported by the same support.
I Definition 4.15. Let X ⊆ Pfs (Z) be equivariant and x ∈ X, we say that x is joinW
irreducible− in X if x = x =⇒ x ∈ x, for every finitely supported x ⊆ X such that
supp(x0 ) ⊆ supp(x), for each x0 ∈ x. The set of all join-irreducible− elements is denoted by

JI− (X) := x ∈ X | x join-irreducible− in X .
The only change required is an additional condition on the elements and supports in x. In
particular, the sets x are uniformly supported sets. Unions of such sets are called uniformly
supported unions.
All the lemmas from the previous section are proven similarly. We state the main result
for non-guessing automata.
I Theorem 4.16. Given a language L ∈ Pfs (Σ∗ ), the following are equivalent:
1. L is accepted by a non-guessing residual automaton.
2. There is some orbit-finite set J ⊆ Der(L) which generates Der(L) by uniformly supported
unions.
3. The set JI− (Der(L)) is orbit-finite and generates Der(L) by uniformly supported unions.
Proof. The proof is similar to that of Theorem 4.10. However, we need a slightly different
definition of the canonical automaton. It is defined as follows.
Q := JI− (Der(L))

I := w−1 L ∈ Q | w−1 L ≤ L, supp(w−1 L) ⊆ supp(L)

F := w−1 L ∈ Q |  ∈ w−1 L

δ(w−1 L, a) := v −1 L ∈ Q | v −1 L ≤ wa−1 L, supp(v −1 L) ⊆ supp(wa−1 L)
Note that, in particular, the initial states have empty support since L is equivariant. This
means that the automaton cannot guess any values at the start. Similarly, the transition
relation does not allow for guessing.
J
To better understand the structure of the canonical non-guessing residual automaton, we
recall the following fact (see [33] for details) and its consequence on non-guessing automata.
I Lemma 4.17. Let X be an orbit-finite nominal set and A ⊂ A be a finite set of atoms.
The set {x ∈ X | A supports x} is finite.
I Corollary 4.18. The transition relation δ of non-guessing automata can be equivalently be
described as a function δ : Q × Σ → Pfin (Q), where Pfin (Q) is the set of finite subsets of Q.
In particular, this shows that the canonical non-guessing residual automaton has finite
nondeterminism. It also shows that it is sufficient to consider finite unions in Theorem 4.16,
instead of uniformly supported unions.

J. Moerman and M. Sammartino

5

44:11

Decidability and Closure Results

In this section we investigate decidability and closure properties. First, a positive result:
universality is decidable for residual automata. This is in contrast to the nondeterministic
case, where universality is undecidable, even for non-guessing automata [4].
I Proposition 5.1. Universality for residual nominal automata is decidable. Formally: given
a residual automaton A, it is decidable whether L(A) = Σ∗ .
Second, a negative result: determining whether an automaton is residual is undecidable.
In other words, residuality cannot be characterised as a syntactic property. This adds value to
learning techniques, as they are able to provide automata that are residual by construction,
thus “getting around” this undecidability issue.
I Proposition 5.2. The problem of determining whether a given nondeterministic nominal
automaton is residual is undecidable.
The above result is obtained by reducing the universality problem for general nondeterministic nominal automata to the residuality problem. Given an automaton A, we construct
another automaton A0 which is residual if and only if A is universal (see appendix for details).
This result also holds for the subclass of non-guessing automata, as the construction of A0
does not introduce any guessing and universality for non-guessing nondeterministic nominal
automata is undecidable.
I Remark 5.3. Equivalence between residual nominal automata is still an open problem. The
usual proof of undecidability of equivalence is via a reduction from universality. This proof does
not work anymore, because universality for residual automata is decidable (Proposition 5.1).
We conjecture that equivalence remains undecidable for residual automata.

Closure properties
We will now show that several closure properties fail for residual languages. Interestingly, this
parallels the situation for probabilistic languages: residual ones are not even closed under
convex sums. We emphasise that residual automata were devised for learning purposes, where
closure properties play no significant role. In fact, one typically exploits closure properties
of the wider class of nondeterministic models, e.g., for automata-based verification. The
following results show that in our setting this is indeed unavoidable.
Consider the alphabet Σ = A ∪ {Anc(a) | a ∈ A} and the residual language Lr from
Section 3. We consider a second language L2 = A∗ which can be accepted by a deterministic
(hence residual) automaton. We have the following non-closure results:
Union: The language L = Lr ∪ L2 cannot be accepted by a residual automaton. In fact,
although derivatives of the form Anc(a)−1 L are still join-irreducible (see Section 3,
residual case), they have no summand A∗ , which means that they cannot generate
S
a−1 L = A∗ ∪ b6=a Anc(b)−1 L. By Theorem 4.10(3) it follows that L is not residual.
Intersection: The language L = Lr ∩ L2 = Ln cannot be accepted by a residual automaton,
as we have seen in Section 3.
Concatenation: The language L = L2 · Lr cannot be accepted by a residual automaton, for
similar reasons as the union.
Reversal: The language {aw | a not in w} is residual (even deterministic), but its reverse
language is Ln and cannot be accepted by a residual automaton.
Complement: Consider the language Lng,r of words where some atom occurs twice. Its
complement Lng,r is the language of all fresh atoms, which cannot even be recognised by
a nondeterministic nominal automaton [6].
Closure under Kleene star is yet to be settled.
CONCUR 2020

44:12

Residual Nominal Automata

6

Exact learning

In our previous paper on learning nominal automata [28], we provided an exact learning
algorithm for nominal deterministic languages. Moreover, we observed by experimentations
that the algorithm was also able to learn specific nondeterministic languages. However, several
questions on nominal languages remained open, most importantly:
Which nominal languages can be characterised via a finite set of observations?
Which nominal languages admit an Angluin-style learning algorithm?
In this section we will answer these questions using the theory developed in the previous
sections.

6.1

Angluin-style learning

We briefly review the classical automata learning algorithms L? by Angluin [1] for deterministic
automata, and NL? by Bollig et al. [8] for residual automata. Then we discuss convergence
in the nominal setting.
Both algorithms can be seen as a game between two players: the learner and the teacher.
The learner aims to construct the minimal automaton for an unknown language L over a
finite alphabet Σ. In order to do this, it may ask the teacher, who knows about the language,
two types of queries:
Membership query: Is a given word w in the target language, i.e., w ∈ L?
Equivalence query: Does a given hypothesis automaton H recognise the target language,
i.e., L = L(H)?
If the teacher replies yes to an equivalence query, then the algorithm terminates, as the
hypothesis H is correct. Otherwise, the teacher must supply a counterexample, that is a word
in the symmetric difference of L and L(H). Availability of equivalence queries may seem like
a strong assumption, and in fact it is often weakened by allowing only random sampling
(see [22] or [35] for details).
Observations about the language made by the learner via queries are stored in an
observation table T . This is a table where rows and columns range over two finite sets of
words S, E ⊆ Σ? respectively, and T (u, v) = 1 if and only if uv ∈ L. Intuitively, each row of
T approximates a derivative of L, in fact we have T (u) ⊆ u−1 L. However, the information
contained in T may be incomplete: some derivatives w−1 L are not reached yet because no
membership queries for w have been posed, and some pairs of rows T (u), T (v) may seem
equal to the learner, because no word has been seen yet which distinguishes them. The
learning algorithm will add new words to S when new derivatives are discovered, and to E
when words distinguishing two previously identical derivatives are discovered.
The table T is closed whenever one-letter extensions of derivatives are already in the
table, i.e., T has a row for ua−1 L, for all u ∈ S, a ∈ Σ. If the table is closed,5 L? is able
to construct an automaton from T , where states are distinct rows (i.e., derivatives). The
construction follows the classical one for the canonical automaton of a language from its
derivatives [31]. The NL? algorithm uses a modified notion of closedness, where one is allowed
to take unions (i.e., a one-letter extension can be written as unions of rows in T ), and hence
is able to learn a RFSA accepting the target language.

5

L? also needs the table to be consistent. We do not need that in our discussion here.

J. Moerman and M. Sammartino

44:13

When the table is not closed, then a derivative is missing, and a corresponding row needs
to be added. Once an automaton is constructed, it is submitted in an equivalence query. If
a counterexample is returned, then again the table is extended6 , after which the process is
repeated iteratively.

6.2

The nominal case

In [28] we have given nominal versions of L? and NL? , called νL? and νNL? respectively. They
seamlessly extend the original algorithms by operating on orbit-finite sets. The algorithm
νL? always terminates for deterministic languages, because distinct derivatives, and hence
distinct rows in the observation table, are orbit-finitely many (see Theorem 2.4).
However, it will never terminate for languages not accepted by deterministic automata
(such as residual or nondeterministic languages).
I Theorem 6.1 ([27]). νL? converges if and only if Der(L) is orbit-finite, in which case
it outputs the canonical deterministic automaton accepting L. Moreover, at most O(nk)
equivalence queries are needed, where n is the number of orbits of the minimal deterministic
automaton, and k is the maximum support size of its states.
The nondeterministic case is more interesting. Using Theorem 4.10, we can finally establish
which nondeterministic languages can be characterised via orbit-finitely-many observations.
I Corollary 6.2 (of Theorem 4.10). Let L be a nondeterministic nominal language. Then L
can be represented via an observation table with orbit-finitely-many rows and columns if and
only if L is residual. Rows of this table correspond to join-irreducible derivatives.
This explains why in [28] νNL? was able to learn some residual nondeterministic automata:
an orbit-finite observation table exists, which allows νNL? to construct the canonical residual
automaton. Unfortunately, the current νNL? algorithm does not guarantee that it finds this
orbit-finite observation table. We only have that guarantee for deterministic languages. The
following example shows that νNL? may indeed diverge when trying to close the table.
I Example 6.3. Suppose νNL? tries to learn the residual language L accepted by the
automaton below over the alphabet Σ = A ∪ {Anc(a) | a ∈ A}. This is a slight modification
of the residual language of Section 3.
6= a
guess a
Anc(a)

Anc(6= a)

a

6= a

a
a

Anc(a)
a

The algorithm starts by considering the row for the empty word , and its one-letter extensions
 · a = a and  · Anc(a) = Anc(a). These rows correspond to the derivatives −1 L = L, a−1 L
and Anc(a)−1 L. Column labels are initialised to the empty word . At this point a−1 L and
Anc(a)−1 L appear identical, as the only column  does not distinguish them. However, they
appear different from −1 L, so the algorithm will add the row for either a or Anc(a) in order

6

L? and NL? adopt different counterexample-handling strategies: the former adds a new row, the latter a
new column. Both result in a new derivative being detected.

CONCUR 2020

44:14

Residual Nominal Automata

to close the table. Suppose the algorithm decides to add a. Then it will consider one-letter
extensions ab, abc, abcd, etc... Since these correspond to different derivatives – each strictly
smaller than the previous one – the algorithm will get stuck in an attempt to close the table.
At no point it will try to close the table with the word Anc(a), since it stays equivalent to
a. So in this case νNL? will not terminate. However, if the algorithm instead adds Anc(a)
to the row labels, it will then also add Anc(a)Anc(b), which is a characterising word for the
initial state. In that case, νNL? will terminate.
While there is no hope of convergence in the non-residual case, as no orbit-finite observation table exists characterising derivatives, we now propose a modification of νNL? which
guarantees termination for residual languages.
I Theorem 6.4. There is an algorithm which query learns residual nominal languages.
Proof (Sketch). When the algorithm adds a word w to the set of rows, then it also adds
all other words of length |w|.7 Since all words of bounded length are added, the algorithm
will eventually find all words that are characterising for states of the canonical residual
automaton, and it will therefore be able to reconstruct this automaton. See appendix for
details.
J
Unfortunately, considering all words bounded by a certain length requires many membership
queries. In fact, characterising words can be exponential in length [14], meaning that this
algorithm may need doubly exponentially many membership queries.
I Remark 6.5. We note that nondeterministic automata can be enumerated, and hence can be
learned via equivalence queries only. This would result in a highly inefficient algorithm. This
parallels the current understanding of learning probabilistic languages. Although efficient
(learning in the limit) learning algorithms for deterministic and residual languages exist [12],
the general case is still open.

7

Conclusions, related and future work

In this paper we have investigated a subclass of nondeterministic automata over infinite
alphabets. This class naturally arises in the context of query learning, where automata have
to be constructed from finitely many observations. Although there are many classes of data
languages, we have shown that our class of residual languages admit canonical automata.
The states of these automata correspond to join-irreducible elements.
In the context of learning, we show that convergence of standard Angluin-style algorithms
is not guaranteed, even for residual languages. We propose a modified algorithm which
guarantees convergence at the expense of an increase in the number of observations.
We emphasise that, unlike other algorithms based on residuality such as NL? [8] and
?
AL [2], our algorithm does not depend on the size, or even the existence, of the minimal
deterministic automaton for the target language. This is a crucial difference, since dependence
on the minimal deterministic automaton hinders generalisation to nondeterministic nominal
automata, which are strictly more expressive. Ideally, in the residual case, one would like
an algorithm for which the complexity depends only on the length of characterising words,
which is an intrinsic feature of residual automata. To the best of our knowledge, no such
algorithm exists in the finite setting.

7

The set {w ∈ Σ∗ | |w| = k} is orbit-finite, for any fixed k ∈ N.

J. Moerman and M. Sammartino

44:15

We also show that universality is decidable for residual automata, in contrast to undecidability in the general nondeterministic case. As future work, we plan to attack the language
inclusion/equivalence problem for residual automata. This is a well-known and challenging
problem for data languages, which has been answered for specific subclasses [9, 10, 29, 34].
Of special interest is the subclass of unambiguous automata [10, 29]. We note that
residual languages are orthogonal to unambiguous languages. For instance, the language
Ln is unambiguous but not residual, whereas Lng,r is residual but ambiguous. Moreover,
their intersection has neither property, and every deterministic language has both properties.
One interesting fact is that if a canonical residual automaton is unambiguous, then the
join-irreducibles form an anti-chain.
Other related work are nominal languages/expressions with an explicit notion of binding [15, 25, 26, 34]. Although these are sub-classes of nominal languages, binding is an
important construct, e.g., to represent resource-allocation. Availability of a notion of derivatives [25] suggests that residuality may prove beneficial for learning these languages.
Residual automata over finite alphabets also have a categorical characterisation [30].
We see no obstructions in generalising those results to nominal sets. This would amount
to finding the right notion of nominal (complete) join-semilattice, with either finitely or
uniformly supported joins.
Finally, in [16, 17] aspects of nominal lattices and Boolean algebras are investigated.
To the best of our knowledge, our results of nominal lattice theory, especially those on
join-irreducibles, are new.


Omitted proofs

W
I Remark 4.2. The set hXi is closed under the operation , and moreover is the smallest
W
equivariant set closed under containing X. In other words, h−i defines a closure operator.
We will often say “X generates Y ”, by which we mean Y ⊆ hXi.
W
Proof. Take any x ⊆ hXi finitely supported. All x ∈ x are of the form yx , for some yx ⊆ X
finitely supported. Consider the finitely supported set T = {y | y ∈ yx , x ∈ x} ⊆ X. Then
W
W
W
we see that x = T ∈ hXi, meaning that hXi is closed under . The second part of the
W
claim is easy: any set closed under
and containing X must also contain hXi.
J
I Lemma 4.6. Let X ⊆ Pfs (Z) be an orbit-finite and equivariant set.
1. Let a ∈ X, b ∈ Pfs (Z) and a 6≤ b. Then there is a join-irreducible x ∈ X such that x ≤ a
and x 6≤ b.
W
2. Let a ∈ X, then a = {x ∈ X | x join-irreducible in X and x ≤ a}.
Proof. In this proof we need a technicality. Let P be a finitely supported, non-empty poset
(i.e., both P and ≤ are supported by a finite A ⊂ A). If P is A-orbit-finite then P has a
minimal element, as we can consider the finite poset of A-orbits and find a minimal A-orbit.
Here we use the notion of an A-orbit, i.e., an orbit defined over permutations that fix A.
(See [33, Chapter 5] for details.)
Ad 1. Consider the set S = {x ∈ X | x ≤ a, x 6≤ b}. This is a finitely supported and
supp(S)-orbit-finite set, hence it has some minimal element m ∈ S. We shall prove that m is
join-irreducible in X. Let x ⊆ X finitely supported and assume that x0 < m for each x0 ∈ x.
Note that x0 < m ≤ a and so that x0 ∈
/ S (otherwise m was not minimal). Hence x0 ≤ b (by
W
W
W
W
definition of S). So x ≤ b and so x ∈
/ S, which concludes that x 6= m, and so x < m
as required.
Ad 2. Consider the set T = {x ∈ JI(X) | x ≤ a}. This set is finitely supported, so we
W
may define the element b = T ∈ Pfs (Z). It is clear that b ≤ a, we shall prove equality by
contradiction. Suppose a 6≤ b, then by (1.), there is a join-irreducible x such that x ≤ a and
W
y 6≤ b. By the first property of x we have x ∈ T , so that x 6≤ b = T is a contradiction. We
W
conclude that a = b, i.e. a = T as required.
J
I Lemma 4.8. Let Y ⊆ X ⊆ Pfs (Z) equivariant and suppose that X ⊆ hJI(Y )i. Then
JI(Y ) = JI(X).
W
Proof. (⊇) Let x ∈ X be join-irreducible in X. Suppose that x = y for some finitely
supported y ⊆ Y . Note that also y ⊆ X Then x = y0 for some y0 ∈ y, and so x is
join-irreducible in Y .
W
(⊆) Let y ∈ Y be join-irreducible in Y . Suppose that y = x for some finitely supported
x ⊆ X. Note that every element x ∈ x is a union of elements in JI(Y ) (by the assumption
W
X ⊆ hJI(Y0 )i). Take yx = {y ∈ JI(Y ) | y ≤ x}, then we have x = yx and
y=

_

x=

_ n_

o _
yx | x ∈ x =
{y0 | y0 ∈ yx , x ∈ x} .

The last set is a finitely supported subset of Y , and so there is a y0 in it such that y = y0 .
Moreover, this y0 is below some x0 ∈ x, which gives y0 ≤ x0 ≤ y. We conclude that y = x0
for some x0 ∈ x.
J

CONCUR 2020

44:18

Residual Nominal Automata

I Corollary 4.11. The construction above defines a canonical residual automaton with the
following uniqueness property: it has the minimal number of orbits of states and the maximal
number of orbits of transitions.
Proof. State minimality follows from Corollary 4.9, where we note that the states of any
residual automata accepting L form a generating subset of Der(L). Maximality of transitions
follows from the fact that it is saturated, meaning that no transitions can be added without
changing the language.
J
I Example 4.13. All the languages in the following ascending chain are join-irreducible.
Lng < a−1 Lng < ba−1 Lng < cba−1 Lng < · · ·
Proof. Consider the word w = ak . . . a1 a0 with k ≥ 1 and all ai distinct atoms. We will
prove that w−1 Lng is join-irreducible in Der(Lng ), by considering all u−1 Lng ⊆ w−1 Lng .
Observe that if u is a suffix of w, then u−1 Lng ⊆ w−1 Lng . This is easily seen from the
given automaton, since it may skip any prefix. We now show that u being a suffix of w is
also a necessary condition.
First, suppose that u contains an atom a different from all ai . If it is the last symbol of
u, then aaa0 ∈ u−1 Lng , but aaa0 ∈
/ w−1 Lng . If a is succeeded by b (not necessarily distinct),
−1
then either aa or aa0 is in u Lng . But neither aa nor aa0 is in w−1 Lng . This shows that
for u−1 Lng ⊆ w−1 Lng , we necessarily have that u ∈ {a0 , . . . , ak }∗ . (This also means that
automatically supp(u−1 Lng ) ⊆ supp(w−1 Lng ).)
Second, when u = , we have u−1 Lng ⊆ w−1 Lng . And for |u| = 1, if u = a0 , then
−1
u Lng ⊆ w−1 Lng . If u = ai with i > 0, then ai ai ai−1 ∈ u−1 Lng , but that word is not in
w−1 Lng . This shows that for u−1 Lng ⊆ w−1 Lng with |u| ≤ 1, we necessarily have that u is a
suffix of w.
Third, we prove the same for |u| ≤ 2. We first consider which bigrams may occur in
u. Suppose that u contains a bigram ai aj with i > 0 and j 6= i − 1. Then ai ai−1 is in
u−1 Lng , but not in w−1 Lng . Suppose that u contains a0 ai (i > 0) or a0 a0 , then u−1 Lng
contains either a0 a0 or a0 a1 respectively. Neither of these words are in w−1 Lng . This shows
that u−1 Lng ⊆ w−1 Lng implies that u may only contain the bigrams ai ai−1 . In particular,
these bigrams compose in a unique way. So u is a (contiguous) subword of w, whenever
u−1 Lng ⊆ w−1 Lng .
Continuing, suppose that u ends in the bigram ai+1 ai with i > 0. Then we have ai ai ai−1
in u−1 Lng , but not in w−1 Lng . This shows that u has to end in a1 a0 . That is, for u−1 Lng ⊆
w−1 Lng with |u| ≥ 2, we necessarily have that u is a suffix of w.
So far, we have shown that
{u | u−1 Lng ⊆ w−1 Lng } = {u | u is a suffix of w}.
W
To see that w−1 Lng is indeed join-irreducible, we consider the join X = {u−1 Lng |
u is a strict suffix of w}. Note that ak ak ∈
/ X, but ak ak ∈ w−1 Lng . We conclude that
W −1
−1
−1
−1
w Lng 6= {u Lng | u Lng $ w Lng } as required.
J
I Proposition 5.1. Universality for residual nominal automata is decidable. Formally: given
a residual automaton A, it is decidable whether L(A) = Σ∗ .
Proof. In the constructions below, we use computation with atoms. This is a computation
paradigm which allow algorithmic manipulation of infinite – but orbit-finite – nominal
sets. For instance, it allows looping over such a set in finite time. Important here is that
this paradigm is equivalent to regular computability (see [7]) and implementations exist to
compute with atoms [23, 24].

J. Moerman and M. Sammartino

44:19

We will sketch an algorithm that, given a residual automaton A, answers whether
L(A) = Σ∗ . The algorithm decides negatively in the following cases:
I = ∅. In this case the language accepted by A is empty.
Suppose there is a q ∈ Q with q ∈
/ F . By residuality we have L(q) = w−1 L(A) for some
w. Note that q is not accepting, so that  ∈
/ w−1 L(A). Put differently: w ∈
/ L(A). (We
note that w is not used by the algorithm. It is only needed for the correctness.)
Suppose there is a q ∈ Q and a ∈ Σ such that δ(q, a) = ∅. Again L(q) = w−1 L(A) for
some w. Note that a is not in L(q). This means that wa is not in the language.
When none of these three cases hold, the algorithm decides positively. We shall prove that
this is indeed the correct decision. If none of the above conditions hold, then I 6= ∅, Q = F ,
and for all q ∈ Q, a ∈ Σ we have δ(q, a) 6= ∅. Here we can prove that the language of each
state is L(q) = Σ∗ . Given that there is an initial state, the automaton accepts Σ∗ .
Note that the operations on sets performed in the above cases all terminate, because all
involve orbit-finite sets.
J
I Proposition 5.2. The problem of determining whether a given nondeterministic nominal
automaton is residual is undecidable.
Proof. The construction is inspired by [14, Proposition 8.4].8 We show undecidability by
reducing the universality problem for nominal automata to the residuality problem.
Let A = (Σ, Q, I, F, δ) be a nominal (nondeterministic) automaton on the alphabet Σ.
We first extend the alphabet:

Σ0 = Σ ∪ q | q ∈ Q ∪ {q | q ∈ Q} ∪ {$, #} ,
where we assume the new symbols to be disjoint from Σ. We define A0 = (Σ0 , Q0 , I 0 , F 0 , δ 0 ) by

Q0 = {q | q ∈ Q} ∪ q | q ∈ Q ∪ {>, x, y}

I 0 = q | q ∈ Q ∪ {x, y}
F 0 = {q | q ∈ F } ∪ {>}


δ 0 = {(q, a, q 0 | (q, a, q 0 ) ∈ δ} ∪ (q, q, q) | q ∈ Q ∪ (q, q, q) | q ∈ Q

∪ {(x, $, >), (x, #, x), (y, #, y)}∪ {(>, a, >) | a ∈ Σ} ∪ (y, $, i) | i ∈ I
See Figure 7 for a sketch of the automaton A0 . The blue part is a copy of the original
automaton. The red part forces the original states to be residual, by providing anchors to
each state. Finally the orange part is the interesting part. The key players are states x and y
with their languages L(y) ⊆ L(x). Note that their languages are equal if and only if A is
universal.
Before we assume anything about A, let us analyse A0 . In particular, let us consider
whether the residuality property holds for each state. For the original states of A the property
holds, as we can provide anchors: All the states q and q are anchored by the words q and q
respectively. Then we consider the states x and >, their languages are L(>) = Σ∗ = $−1 L(A0 )
and L(x) = #−1 L(A0 ) (see Figure 7). The only remaining state for which we do not yet
know whether the residuality property holds is state y.
If L(A) = Σ∗ (i.e. the original automaton is universal), then we note that L(y) = L(x).
In this case, L(y) = #−1 L(A0 ). So, in this case, A0 is residual.

8

They prove that checking residuality for NFAs is PSpace-complete via a reduction from universality.
Instead of using NFAs, they use a union of n DFAs. This would not work in the nominal setting.

CONCUR 2020

44:20

Residual Nominal Automata

#

#

x

y

q

A
$

$

q

q

q
p

$

>
p
Σ

p
p

Figure 7 Sketch of the automaton A0 constructed in the proof of Proposition 5.2.

Suppose that A0 is residual. Then L(y) = w−1 L0 for some word w. Provided that L(A)
is not empty, there is some u ∈ L(A). So we know that $u ∈ L(y). This means that word
w cannot start with a ∈ Σ, q, q for q ∈ Q, or $ as their derivatives do not contain $u. The
only possibility is that w = #k for some k > 0. This implies L(y) = L(x), meaning that the
language of A is universal.
This proves that A is universal iff A0 is residual. Moreover, the construction A 7→ A0 is
effective, as it performs computations with orbit-finite sets.
J
I Theorem 6.4. There is an algorithm which query learns residual nominal languages.
Proof. As explained in the text, we modify the νNL? algorithm from [28]: When the table is
not closed, we not only add the missing words, but all the words of the same length. This
guarantees that the algorithm finds rows for all join-irreducible derivatives, i.e., all states of
the canonical residual automaton.
The pseudocode is given in Algorithm 1, where the modifications to νNL? are highlighted
in red. We briefly explain the notation. An observation table T is defined by a set of row
(resp. column) indices S (resp. E). The value T (s, e) is given by L(se) (we may do this via
membership queries). We denote the set of rows by Rows(S, E) := {row(s) | s ∈ SΣ ∪ S},
Algorithm 1 Modified nominal NL? algorithm for Theorem 6.4.

Modified νNL? learner
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

S, E = {}
repeat
while (S, E) is not residually-closed or not residually-consistent
if (S, E) is not residually-closed
find s ∈ S, a ∈ A such that row(sa) ∈ JI(Rows(S, E)) \ Rows> (S, E)
k = length of the word sa
S = S ∪ Σ≤k
if (S, E) is not residually-consistent
find s1 , s2 ∈ S, a ∈ A, and e ∈ E such that row(s1 ) v row(s2 ) and
L(s1 ae) = 1, L(s2 ae) = 0
E = E ∪ orb(ae)
Make the conjecture N (S, E)
if the Teacher replies no, with a counter-example t
E = E ∪ {orb(t0 ) | t0 is a suffix of t}
until the Teacher replies yes to the conjecture N (S, E).
return N (S, E)

J. Moerman and M. Sammartino

44:21

where row(s)(e) = T (s, e). Note that Rows(S, E) also includes rows for one-letter extensions.
The set of rows labelled by S is denoted by Rows> (S, E) := {row(s) | s ∈ S}. The set
Rows(S, E) is a poset, ordered by r1 v r2 iff r1 (e) ≤ r2 (e) for all e ∈ E. To construct a
hypothesis N (S, E), we use the construction from Theorem 4.10, where Rows(S, E) plays
the role of Der(L).
We can give a bound to the number of equivalence queries. Given an orbit-finite nominal
set X, let |X| be the number of its orbit. Then equivalence queries are bounded by O(m +
|Σ≤m+1 | × k), where m is the length of the longest characterising word and k is the maximum
support size of the canonical residual automaton. Intuitively, each of the rows in the table
could be a separate state, and for each state there is some work to be done, concerning
learning the right support and local symmetries (see [27] for details on this).
J

CONCUR 2020


Proceedings of Machine Learning Research 93:54–66, 2019

International Conference on Grammatical Inference

Learning Product Automata
Joshua Moerman

joshua.moerman@cs.ru.nl

Institute for Computing and Information Sciences
Radboud University
Nijmegen, the Netherlands

Editors: Olgierd Unold, Witold Dyrka, and Wojciech Wieczorek

Abstract
We give an optimisation for active learning algorithms, applicable to learning Moore machines with decomposable outputs. These machines can be decomposed themselves by
projecting on each output. This results in smaller components that can then be learnt with
fewer queries. We give experimental evidence that this is a useful technique which can reduce the number of queries substantially. Only in some cases the performance is worsened
by the slight overhead. Compositional methods are widely used throughout engineering,
and the decomposition presented in this article promises to be particularly interesting for
learning hardware systems.
Keywords: query learning, product automata, composition

1. Introduction
Query learning (or, active learning) is becoming a valuable tool in engineering of both
hardware and software systems (Vaandrager, 2017). Indeed, applications can be found in
a broad range of applications: finding bugs in network protocols as shown by FiterăuBroştean et al. (2016, 2017), assisting with refactoring legacy software as shown by Schuts
et al. (2016), and reverse engineering bank cards by Chalupar et al. (2014).
These learning techniques originate from the field of grammatical inference. One of the
crucial steps for applying these to black box systems was to move from deterministic finite
automata to deterministic Moore or Mealy machines, capturing reactive systems with any
kind of output. With little adaptations, the algorithms work well, as shown by the many
applications. This is remarkable, since little specific knowledge is used beside the input
alphabet of actions.
Realising that composition techniques are ubiquitous in engineering, we aim to use more
structure of the system during learning. In the present paper we use the simplest type of
composition; we learn product automata, where outputs of several components are simply
paired. Other types of compositions, such as sequential composition of Mealy machines, are
discussed in Section 5.
To the best of the author’s knowledge, this has not been done before explicitly. Furthermore, libraries such as LearnLib (see Isberner et al., 2015) and libalf (see Bollig et al.,
2010b) do not include such functionality out of the box. Implicitly, however, it has been
done before. Rivest and Schapire (1994) use two tricks to reduce the size of some automata
in their paper “Diversity-based inference of finite automata”. The first trick is to look at

c 2019 J. Moerman.

Learning Product Automata

o1

o2
i

M

i

o1

M1
o2
M2

Figure 1: A Moore machine with two outputs (left) can be equivalently seen as two (potentially smaller) Moore machines with a single output each (right).

the reversed automaton (in their terminology, the diversity-based automaton). The second
trick (which is not explicitly mentioned, unfortunately) is to have a different automaton
for each observable (i.e., output). In one of their examples the two tricks combined give a
reduction from ±1019 states to just 54 states.
In this paper, we isolate this trick and use it in query learning. We give an extension
of L* which handles products directly and we give a second algorithm which simply runs
two learners simultaneously. Furthermore, we argue that this is particularly interesting
in the context of model learning of hardware, as systems are commonly engineered in a
compositional way. We give preliminary experimental evidence that the technique works
and improves the learning process. As benchmarks, we learn (simulated) circuits which
provide several output bits.

2. Preliminaries
We use the formalism of Moore machines to describe our algorithms. Nonetheless, the
results can also be phrased in terms of Mealy machines.
Definition 1 A Moore machine is a tuple M = (Q, I, O, δ, o, q0 ) where Q, I and O are
finite sets of states, inputs and outputs respectively, δ : Q × I → Q is the transition function,
o : Q → O is the output function, and q0 ∈ Q is the initial state. We define the size of the
machine, |M |, to be the cardinality of Q.
We extend the definition of the transition function to words as δ : Q × I ∗ → Q. The
behaviour of a state q is the map JqK : I ∗ → O defined by JqK(w) = o(δ(q, w)). Two states
q, q 0 are equivalent if JqK = Jq 0 K. A machine is minimal if all states have different behaviour
and all states are reachable. We will often write JM K to mean Jq0 K and say that machines
are equivalent if their initial states are equivalent.
Definition 2 Given two Moore machines with equal input sets, M1 = (Q1 , I, O1 , δ1 , o1 , q0 1 )
and M2 = (Q2 , I, O2 , δ2 , o2 , q0 2 ), we define their product M1 × M2 by:
M1 × M2 = (Q1 × Q2 , I, O1 × O2 , δ, o, (q0 1 , q0 2 )),
where δ((q1 , q2 ), a) = (δ1 (q1 , a), δ2 (q2 , a)) and o((q1 , q2 )) = (o1 (q1 ), o2 (q2 )).

55

Learning Product Automata

0

0

1

0

0

1

0

1

Figure 2: A state of the 8-bit register machine.
The product is formed by running both machines in parallel and letting I act on both
machine synchronously. The output of both machines is observed. Note that the product
Moore machine might have unreachable states, even if the components are reachable. The
product of more than two machines is defined by induction.
Let M be a machine with outputs in O1 × O2 . By post-composing the output function with projection functions we get two machines, called components, M1 and M2 with
outputs in O1 and O2 respectively. This is depicted in Figure 1. Note that M is equivalent to M1 × M2 . If M and its components Mi are taken to be minimal,
then we have
p
|M | ≤ |M1 | · |M2 | and |Mi | ≤ |M |. In the p
best case we have |Mi | = |M | and so the behaviour of M can be described using only 2 |M | states, which is less than |M | (if |M | > 4).
With iterated products the reduction can be more substantial as shown in the following example. This reduction in state space is beneficial for learning algorithms.
We introduce basic notation: πi : A1 × A2 → Ai are the usual projection functions. On
a function f : X → A1 × A2 we use the shorthand πi f to denote πi ◦ f . As usual, uv denotes
concatenation of string u and v, and this is lifted to sets of strings U V = {uv | u ∈ U, v ∈ V }.
We define the set [n] = {1, . . . , n} and the set of Boolean values B = {0, 1}.
2.1. Example
We take the n-bit register machine example from Rivest and Schapire (1994). The state
space of the n-bit register machine Mn is given by n bits and a position of the reading/writing head, see Figure 2. The inputs are commands to control the position of the
head and to flip the current bit. The output is the current bit vector. Formally it is defined
as Mn = (Bn × [n], {L, R, F }, Bn , δ, o, i), where the initial state is i = ((0, . . . , 0), 1) and the
output is o(((b1 , . . . , bn ), k)) = (b1 , . . . , bn ). The transition function is defined such that L
moves the head to the left, R moves the head to the right (and wraps around on either
ends), and F flips the current bit. Formally,
(
((b1 , . . . , bn ), k − 1) if k > 1,
δ(((b1 , . . . , bn ), k), L) =
((b1 , . . . , bn ), n)
if k = 1,
(
((b1 , . . . , bn ), k + 1) if k < n,
δ(((b1 , . . . , bn ), k), R) =
((b1 , . . . , bn ), 1)
if k = n,
δ(((b1 , . . . , bn ), k), F ) = ((b1 , . . . , ¬bk , . . . , bn ), k).
The machine Mn is minimal and has n · 2n states. So although this machine has very
simple behaviour, learning it will require a lot of queries because of its size. Luckily, the
machine can be decomposed into smaller components. For each bit l, we define a component
Mnl = (B × [n], {L, R, F }, B, δ l , π1 , (0, 1)) which only stores one bit and the head position.
The transition function δ l is defined similarly as before on L and R, but only flips the bit
on L if the head is on position l (i.e., δ l ((b, l), F ) = (¬b, l) and δ l ((b, k)) = (b, k) if k 6= l).
56

Learning Product Automata

The product Mn1 × · · · × Mnn is equivalent to Mn . Each of the components Mnl is minimal
and has only 2n states. So by this decomposition, we only need 2 · n2 states to describe the
whole behaviour of Mn . Note, however, that the product Mn1 × · · · × Mnn is not minimal;
many states are unreachable.

3. Learning
We describe two approaches for active learning of product machines. One is a direct extension of the well-known L* algorithm. The other reduces the problem to any active learning
algorithm, so that one can use more optimised algorithms.
We fix an unknown target machine M with a known input alphabet I and output
alphabet O = O1 × O2 . The goal of the learning algorithm is to infer a machine equivalent
to M , given access to a minimally adequate teacher as introduced by Angluin (1987). The
teacher will answer the following two types of queries.
• Membership queries (MQs): The query consists of a word w ∈ I ∗ and the teacher will
answer with the output JM K(w) ∈ O.
• Equivalence queries (EQs): The query consists of a Moore machine H, the hypothesis,
and the teacher will answer with YES if M and H are equivalent and she will answer
with a word w such that JM K(w) 6= JHK(w) otherwise.
3.1. Learning product automata with an L* extension
We can use the general framework for automata learning as set up by van Heerdt et al.
(2017). The general account does not directly give concrete algorithms, but it does give
generalised definitions for closedness and consistency. The main data structure for the
algorithm is an observation table.
Definition 3 An observation table is a triple (S, E, T ) where S, E ⊆ I ∗ are finite sets of
words and T : S ∪ SI → OE is defined by T (s)(e) = JM K(se).
During the L* algorithm the sets S, E grow and T encodes the knowledge of JM K so far.
Definition 4 Let (S, E, T ) be an observation table.
• The table is product-closed if for all t ∈ SI there exist s1 , s2 ∈ S such that
πi T (t) = πi T (si ) for i = 1, 2.
• The table is product-consistent if for i = 1, 2 and for all s, s0 ∈ S we have
πi T (s) = πi T (s0 ) implies πi T (sa) = πi T (s0 a) for all a ∈ I.
These definitions are related to the classical definitions of closedness and consistency as
shown in the following lemma. The converses of the first two points do not necessarily hold.
We also proof that if a observation table is product-closed and product-consistent, then a
well-defined product machine can be constructed which is consistent with the table.

57

Learning Product Automata

Algorithm 1 The product-L* algorithm.
1: Initialise S and E to {}
2: Initialise T with MQs
3: repeat
4:
while (S, E, T ) is not product-closed or -consistent do
5:
if (S, E, T ) not product-closed then
6:
find t ∈ SI such that there is no s ∈ S with πi T (t) = πi T (s) for some i
7:
add t to S and fill the new row using MQs
8:
if (S, E, T ) not product-consistent then
9:
find s, s0 ∈ S, a ∈ I and e ∈ E such that πi T (s) = πi T (s0 ) but πi T (sa)(e) 6=
πi T (s0 a)(e) for some i
10:
add ae to E and fill the new column using MQs
11:
Construct H (by Lemma 6)
12:
if EQ(H) gives a counterexample w then
13:
add w and all its prefixes to S
14:
fill the new rows with MQs
15: until EQ(H) = YES
16: return H
Lemma 5 Let OT = (S, E, T ) be an observation table and let πi OT = (S, E, πi T ) be a
component. The following implications hold.
1.
2.
3.
4.

OT
OT
OT
OT

is
is
is
is

closed =⇒ OT is product-closed.
consistent ⇐= OT is product-consistent.
product-closed ⇐⇒ πi OT is closed for each i.
product-consistent ⇐⇒ πi OT is consistent for each i.

Proof (1) If OT is closed, then each t ∈ SI has a s ∈ S such that T (t) = T (s). This
implies in particular that πi T (t) = πi T (s), as required. (In terms of the definition, this
means we can take s1 = s2 = s.)
(2) Let OT be product-consistent and s, s0 ∈ S such that T (s) = T (s0 ). We then know
that πi T (s) = πi T (s0 ) for each i and hence πi T (sa) = πi T (s0 a) for each i and a. This means
that T (sa) = T (s0 a) as required.
Statements (3) and (4) just rephrase the definitions.

Lemma 6 Given a product-closed and -consistent table we can define a product Moore
machine consistent with the table, where each component is minimal.
Proof If the table OT is product-closed and -consistent, then by the previous lemma, the
tables πi OT are closed and consistent in the usual way. For these tables we can use the
construction of Angluin (1987). As a result we get a minimal machine Hi which is consistent
with table πi OT . Taking the product of these gives a machine which is consistent with OT .
(Beware that this product is not necessarily the minimal machine consistent with OT .)

58

Learning Product Automata

Algorithm 2 Learning product machines with other learners.
1: Initialise two learners L1 and L2
2: repeat
3:
while Li queries M Q(w) do
4:
forward M Q(w) to the teacher and get output o
5:
return πi o to Li
{at this point both learners constructed a hypothesis}
6:
Let Hi be the hypothesis of Li
7:
Construct H = H1 × H2
8:
if EQ(H) returns a counterexample w then
9:
if JH1 K(w) 6= π1 JM K(w) then
10:
return w to L1
11:
if JH2 K(w) 6= π2 JM K(w) then
12:
return w to L2
13: until EQ(H) = YES
14: return YES to both learners
15: return H
The product-L* algorithm (Algorithm 1) resembles the original L* algorithm, but uses
the new notions of closed and consistent. Its termination follows from the fact that L*
terminates on both components.
By Lemma 5 (1) we note that the algorithm does not need more rows than we would
need by running L* on M . By point (4) of the same lemma, we find that it does not need
more columns than L* would need on each component combined. This means that in the
worst case, the table is twice as big as the original L* would do. However, in good cases
(such as the running example), the table is much smaller, as the number of rows is less for
each component and the columns needed for each component may be similar.
3.2. Learning product automata via a reduction
The previous algorithm constructs two machines from a single table. This suggests that we
can also run two learning algorithms to construct two machines. We lose the fact that the
data structure is shared between the learners, but we gain that we can use more efficient
algorithms than L* without any effort.
Algorithm 2 is the algorithm for learning product automata via this reduction. It runs
two learning algorithms at the same time. All membership queries are passed directly to the
teacher and only the relevant output is passed back to the learner. (In the implementation,
the query is cached, so that if the other learner poses the same query, it can be immediately answered.) If both learners are done posing membership queries, they will pose an
equivalence query at which point the algorithm constructs the product automaton. If the
equivalence query returns a counterexample, the algorithm forwards it to the learners.
The crucial observation is that a counterexample is necessarily a counterexample for at
least one of the two learners. (If at a certain stage only one learner makes an error, we keep
the other learner suspended, as we may obtain a counterexample for that one later on.)

59

Learning Product Automata

This observation means that at least one of the learners makes progress and will eventually
converge. Hence, the whole algorithm will converge.
In the worst case, twice as many queries will be posed, compared to learning the whole
machine at once. (This is because learning the full machine also learns its components.)
In good cases, such as the running example, it requires much less queries. Typical learning
algorithms require roughly O(n2 ) membership queries, where n is the number of states in
the minimal machine. For the example Mn this bound gives O((n · 2n )2 ) = O(n2 · 22n )
queries. When learning the components Mnl with the above algorithm, the bound gives just
O((2n)2 + · · · + (2n)2 ) = O(n3 ) queries.

4. Experiments
We have implemented the algorithm via reduction in LearnLib.1 As we expect the reduction
algorithm to be the most efficient and simpler, we leave an implementation of the direct
extension of L* as future work. The implementation handles products of any size (as opposed
to only products of two machines). Additionally, the implementation also works on Mealy
machines and this is used for some of the benchmarks.
In this section, we compare the product learner with a regular learning algorithm. We
use the TTT algorithm by Isberner et al. (2014) for the comparison and also as the learners
used in Algorithm 2. We measure the number of membership and equivalence queries. The
results can be found in Table 1.
The equivalence queries are implemented by random sampling so as to imitate the intended application of learning black-box systems. This way, an exact learning algorithm
turns into a PAC (probably approximately correct) algorithm. Efficiency is typically measured by the total number of input actions which also accounts for the length of the membership queries (including the resets). This is a natural measure in the context of learning
black box systems, as each action requires some amount of time to perform.
We evaluated the product learning algorithm on the following two classes of machines.
n-bit register machine The machines Mn are as described before. We note that the
product learner is much more efficient, as expected.
Circuits In addition to the (somewhat artificial) examples Mn , we use circuits which
appeared in the logic synthesis workshops (LGSynth89/91/93), part of the ACM/SIGDA
benchmarks.2 These models have been used as benchmarks before for FSM-based testing
methods by Hierons and Türker (2015) and describe the behaviour of real-world circuits.
The circuits have bit vectors as outputs, and can hence be naturally be decomposed by
taking each bit individually. As an example, Figure 3 depicts one of the circuits (bbara).
The behaviour of this particular circuit can be modelled with seven states, but when restricting to each individual output bit, we obtain two machines of just four states. For the
circuits bbsse and mark1, we additionally regrouped bit together in order to see how the
performance changes when we decompose differently.
1. The implementation and models can be found on-line at https://gitlab.science.ru.nl/moerman/
learning-product-automata.
2. The original files describing these circuits can be found at https://people.engr.ncsu.edu/brglez/
CBL/benchmarks/.

60

Learning Product Automata

/0

--0-/0
--10/0
-111/0
0011/0

1011

/0
-111

-11
1/0

0

1
00 011
-1
11 /0
11
/0
/0

10
11
/00
-111 0011
/
/00 00

-111/00
1011/00

--0-/1
--10/1
-111/1

1011/00

1/0
101

--0-/00
--10/00

/0
11 /0
10 011
0

0

1/0
001
--0-/01
--10/01
1011/01

--0-/0
--10/0
0011/0
1011/0

-111/0

--10/0
--0-/0

1011/0

/0
11
1/0 0
10
-11011/
0

--0-/00
--10/00
0011/00

/00
0011
1/00
/00
101
11
-1

0
1011/0
0011
/00
-111/0
0
--0-/00
--10/00

1/0
001

--0-/00
--10/00

-111/0

001
1/0

0

0

/0
11

10
/00

0011

--0-/1
--10/1
1011/1

--10/0
--0-/0

--0-/00
--10/00

--0-/0
--10/0

001
1/0

-111/00

00 111/
11 0
/0

--0-/10
--10/10
-111/10

--0-/0
--10/0

Figure 3: The bbara circuit (left) has two output bits. This can be decomposed into two
smaller circuits with a single output bit (middle and right).

For some circuits the number of membership queries is reduced compared to a regular
learner. Unfortunately, the results are not as impressive as for the n-bit register machine.
An interesting case is ex3 where the number of queries is slightly increased, but the total
amount of actions performed is substantially reduced. The number of actions needed in
total is actually reduced in all cases, except for bbsse. This exception can be explained
by the fact that the biggest component of bbsse still has 25 states, which is close to the
original 31 states. We also note that the choice of decomposition matters, for both mark1
and bbsse it was beneficial to regroup components.
In Figure 4, we look at the size of each hypothesis generated during the learning process.
We note that, although each component grows monotonically, the number of reachable
states in the product does not grow monotonically. In this particular instance where we
learn mark1 there was a hypothesis of 58 128 states, much bigger than the target machine of
202 states. This is not an issue, as the teacher will allow it and answer the query regardless.
Even in the PAC model with membership queries, this poses no problem as we can still
efficiently determine membership. However, in some applications the equivalence queries
are implemented with a model checker (e.g., in the work by Fiterău-Broştean et al., 2016)
or a sophisticated test generation tool. In these cases, the increased size of intermediate
hypotheses may be undesirable.

5. Discussion
We have shown two query learning algorithms which exploit a decomposable output. If
the output can be split, then also the machine itself can be decomposed in components.
As the preliminary experiments show, this can be a very effective optimization for learning
black box reactive systems. It should be stressed that the improvement of the optimization
depends on the independence of the components. For example, the n-bit register machine
has nearly independent components and the reduction in the number of queries is big. The
more realistic circuits did not show such drastic improvements in terms of queries. When

61

Learning Product Automata

Machine
M2
M3
M4
M5
M6
M7
M8
bbara
keyb
ex3
bbsse
mark1
bbsse*
mark1*

States
8
24
64
160
384
896
2048
7
41
28
31
202
31
202

Components
2
3
4
5
6
7
8
2
2
2
7
16
4
8

Product learner
EQs
MQs Actions
3
100
621
3
252
1 855
8
456
3 025
6
869
7 665
11
1 383
12 870
11
2 087
24 156
13
3 289
41 732
3
167
1 049
25 12 464 153 809
24
1 133
9 042
20 14 239 111 791
30 16 712 145 656
19 11 648
89 935
22 13 027 117 735

EQs
5
5
6
17
25
52
160
3
24
18
8
67
8
67

TTT learner
MQs Actions
115
869
347
2946
1 058
13 824
2 723
34 657
6 250
90 370
14 627 226 114
34 024 651 678
216
1 535
6024 265 805
878
91 494
4 872
35 469
15 192 252 874
4 872
35 469
15 192 252 874

number of states

Table 1: Comparison of the product learner with an ordinary learner.

104
102
100
2

4

6

8

10
12
14
Hypothesis-number

16

18

20

22

Figure 4: The number of states for each hypothesis while learning mark1.

62

Learning Product Automata

taking the length of the queries in account as well (i.e., counting all actions performan on
the system), we see an improvement for most of the test cases.
In the remainder of this section we discuss related ideas and future work.
5.1. Measuring independence
As the results show, the proposed technique is often beneficial but not always. It would
be interesting to know when to use decomposition. It is an interesting question how to
(quantitatively) measure the independence. Such a measure can potentially be used by the
learning algorithm to determine whether to decompose or not.
5.2. Generalisation to subsets of products
In some cases, we might know even more about our output alphabet. The output set O may
be a proper subset of O1 × O2 , indicating that some outputs can only occur “synchronised”.
For example, we might have O = {(0, 0)} ∪ {(a, b) | a, b ∈ [3]}, that is, the output 0 for
either component can only occur if the other component is also 0.
In such cases we can use the above algorithm still, but we may insist that the teacher
only accepts machines with output in O for the equivalence queries (as opposed to outputs
in {0, 1, 2, 3}2 ). When constructing H = H1 × H2 in line 7 of Algorithm 2, we can do a
reachability analysis on H to check for non-allowed outputs. If such traces exist, we know
it is a counterexample for at least one of the two learners. With such traces we can fix the
defect ourselves, without having to rely on the teacher.
5.3. Product DFAs
For two DFAs (Q1 , δ1 , F1 , q0 1 ) and (Q2 , δ2 , F2 , q0 2 ), a state in the product automaton is
accepting if both components are accepting. In the formalism of Moore machines, the finals
states are determined by their characteristic function and this means that the output is given
by o(q1 , q2 ) = o1 (q1 )∧o2 (q2 ). Again, the components may be much smaller than the product
and this motivated Heinz and Rogers (2013) to learn (a subclass of) product DFAs. This
type of product is more difficult to learn as the two components are not directly observable.
Such automata are also relevant in model checking and some of the (open) problems are
discussed by Kupferman and Mosheiff (2015).
5.4. Learning automata in reverse
The main result of Rivest and Schapire (1994) was to exploit the structure of the socalled “diversity-based” automaton. This automaton may also be called the reversed Moore
machine. Reversing provides a duality between reachability and equivalence. This duality is
theoretically explored by Rot (2016) and Bonchi et al. (2014) in the context of Brzozowski’s
minimization algorithm.
Let M R denote the reverse of M , then we have JM R K(w) = JM K(wR ). This allows us
to give an L* algorithm which learns M R by posing membership queries with the words
reversed. We computed M R for the circuit models and all but one of them was much larger
than the original. This suggests that it might not be useful as an optimisation in learning
hardware or software systems. However, a more thorough investigation is desired.
63

Learning Product Automata

A; B
i

o
A

B

Figure 5: The sequential compostion A; B of two Mealy machines A and B.
5.5. Other types of composition
The case of learning a sequential composition is investigated by Abel and Reineke (2016).
In their work, there are two Mealy machines, A and B, and the output of A is fed into B, see
Figure 5. The goal is to learn a machine for B, assuming that A is known (i.e., white box).
The oracle only answers queries for the sequential composition, which is defined formally as
JA; BK(w) = JBK(JAK(w)). Since B can only be interacted with through A, we cannot use
L* directly. The authors show how to learn B using a combination of L* and SAT solvers.
Moreover, they give evidence that this is more efficient than learning A; B as a whole.
An interesting generalisation of the above is to consider A as an unknown as well.
The goal is to learn A and B simultaneously, while observing the outputs of B and the
communication between the components. The authors conjecture that this would indeed
be possible and result in a learning algorithm which is more efficient than learning A; B
(private communication).
Another type of composition is used by Bollig et al. (2010a). Here, several automata
are put in parallel and communicate with each other. The goal is not to learn a black box
system, but to use learning when designing such a system. Instead of words, the teacher
(i.e., designer in this case) receives message sequence charts which encode the processes and
actions. Furthermore, they exploit partial order reduction in the learning algorithm.
We believe that a combination of our and the above compositional techniques can improve the scalability of learning black box systems. Especially in the domain of software
and hardware we expect such techniques to be important, since the systems themselves are
often designed in a modular way.

Acknowledgments
We would like to thank Nathanaël Fijalkow, Ramon Janssen, Gerco van Heerdt, Harco Kuppens, Alexis Linard, Alexandra Silva, Rick Smetsers, and Frits Vaandrager for proofreading
this paper and providing useful feedback. Thanks to Andreas Abel for discussing the case
of learning a sequential composition of two black box systems. Also thanks to anonymous
reviewers for interesting references and comments.


Nominal Techniques and
Black Box Testing for
Automata Learning

Joshua Moerman

ii

Work in the thesis has been carried out under the auspices of the research school IPA
(Institute for Programming research and Algorithmics)
Printed by Gildeprint, Enschede
Typeset using ConTEXt MKIV
ISBN: 978–94–632–3696–6
IPA Dissertation series: 2019-06
Copyright © Joshua Moerman, 2019
www.joshuamoerman.nl

Nominal Techniques and Black Box
Testing for Automata Learning

Proefschrift
ter verkrijging van de graad van doctor
aan de Radboud Universiteit Nijmegen
op gezag van de rector magnificus prof. dr. J.H.J.M. van Krieken,
volgens besluit van het college van decanen
in het openbaar te verdedigen
op
maandag 1 juli 2019
om
16:30 uur precies

door
Joshua Samuel Moerman
geboren op 1 oktober 1991
te Utrecht

iv
Promotoren:
– prof. dr. F.W. Vaandrager
– prof. dr. A. Silva (University College London, Verenigd Koninkrijk)
Copromotor:
– dr. S.A. Terwijn
Leden manuscriptcommissie:
– prof. dr. B.P.F. Jacobs
– prof. dr. A.R. Cavalli (Télécom SudParis, Frankrijk)
– prof. dr. F. Howar (Technische Universität, Dortmund, Duitsland)
– prof. dr. S. Lasota (Uniwesytet Warszawkski, Polen)
– dr. D. Petrișan (Université Paris Diderot, Frankrijk)

Paranimfen:
– Alexis Linard
– Tim Steenvoorden

Samenvatting
Het leren van automaten speelt een steeds grotere rol bij de verificatie van software. Tijdens het leren, verkent een leeralgoritme het gedrag van software. Dit gaat in principe
volledig automatisch, en het algoritme pakt vanzelf interessante eigenschappen op
van de software. Het is hiermee mogelijk een redelijk precies model te maken van de
werking van het stukje software dat we onder de loep nemen. Fouten en onverwacht
gedrag van software kunnen hiermee worden blootgelegd.
In dit proefschrift kijken we in eerste instantie naar technieken voor testgeneratie.
Deze zijn nodig om het leeralgoritme een handje te helpen. Na het automatisch verkennen van gedrag, formuleert het leeralgoritme namelijk een hypothese die de software
nog niet goed genoeg modelleert. Om de hypothese te verfijnen en verder te leren,
hebben we tests nodig. Efficiëntie staat hierbij centraal: we willen zo min mogelijk
testen, want dat kost tijd. Aan de andere kant moeten we wel volledig testen: als er een
discrepantie is tussen het geleerde model en de software, dan willen we die met een
test kunnen aanwijzen.
In de eerste paar hoofdstukken laten we zien hoe testen van automaten te werk
gaat. We geven een theoretisch kader om verschillende, bestaande n-volledige testgeneratiemethodes te vergelijken. Op grond hiervan beschrijven we een nieuw, efficiënt
algoritme. Dit nieuwe algoritme staat centraal bij een industriële casus waarin we een
model van complexe printer-software van Océ leren. We laten ook zien hoe een van
de deelproblemen – het onderscheiden van toestanden met zo kort mogelijke invoer –
efficiënt kan worden opgelost.
Het tweede thema in dit proefschrift is de theorie van formele talen en automaten
met oneindige alfabetten. Ook dit is zinnig voor het leren van automaten. Software, en in
het bijzonder internet-communicatie-protocollen, maken namelijk vaak gebruik van
„identifiers” om bijvoorbeeld verschillende gebruikers te onderscheiden. Het liefst
nemen we oneindig veel van zulke identifiers aan, aangezien we niet weten hoeveel
er nodig zijn voor het leren van de automaat.
We laten zien hoe we de leeralgoritmes gemakkelijk kunnen veralgemeniseren
naar oneindige alfabetten door gebruik te maken van nominale verzamelingen. In het
bijzonder kunnen we hiermee registerautomaten leren. Vervolgens werken we de
theorie van nominale automaten verder uit. We laten zien hoe je deze structuren
efficiënt kan implementeren. En we geven een speciale klasse van nominale automaten
die een veel kleinere representatie hebben. Dit zou gebruikt kunnen worden om zulke
automaten sneller te leren.

vi

Summary
Automata learning plays a more and more prominent role in the field of software
verification. Learning algorithms are able to automatically explore the behaviour of
software. By revealing interesting properties of the software, these algorithms can
create models of the, otherwise unknown, software. These learned models can, in
turn, be inspected and analysed, which often leads to finding bugs and inconsistencies
in the software.
An important tool which we need when learning software is test generation. This
is the topic of the first part of this thesis. After the learning algorithm has learned a
model and constructed a hypothesis, test generation methods are used to validate this
hypothesis. Efficiency is key: we want to test as little as possible, as testing may take
valuable time. However, our tests have to be complete: if the hypothesis fails to model
the software well, we better have a test which shows this discrepancy.
The first few chapters explain black box testing of automata. We present a theoretical framework in which we can compare existing n-complete test generation methods.
From this comparison, we are able to define a new, efficient algorithm. In an industrial
case study on embedded printer software, we show that this new algorithm works
well for finding counterexamples for the hypothesis. Besides the test generation, we
show that one of the subproblems – finding the shortest sequences to separate states –
can be solved very efficiently.
The second part of this thesis is on the theory of formal languages and automata
with infinite alphabets. This, too, is discussed in the context of automata learning. Many
pieces of software make use of identifiers or sequence numbers. These are used, for
example, in order to distinguish different users or messages. Ideally, we would like to
model such systems with infinitely many identifiers, as we do not know beforehand
how many of them will be used.
Using the theory of nominal sets, we show that learning algorithms can easily be
generalised to automata with infinite alphabets. In particular, this shows that we can
learn register automata. Furthermore, we deepen the theory of nominal sets. First,
we show that, in a special case, these sets can be implemented in an efficient way.
Second, we give a subclass of nominal automata which allow for a much smaller
representation. This could be useful for learning such automata more quickly.

viii

Acknowledgements
Foremost, I would like to thank my supervisors. Having three of them ensured that
there were always enough ideas to work on, theory to understand, papers to review,
seminars to attend, and chats to have. Frits, thank you for being a very motivating
supervisor, pushing creativity, and being only a few meters away. It started with a
small puzzle (trying a certain test algorithm to help with a case study), which was a
great, hands-on start of my Ph.D.. You introduced me to the field of model learning
in a way that showcases both the theoretical and practical aspects.
Alexandra, thanks for introducing me to abstract reasoning about state machines,
the coalgebraic way. Although not directly shown in this thesis, this way of thinking
has helped and you pushed me to pursuit clear reasoning. Besides the theoretical
things I’ve learned, you have also taught me many personal lessons inside and outside
of academia; thanks for inviting me to London, Caribbean islands, hidden cocktail
clubs, and the best food. And thanks for leaving me with Daniela and Matteo, who
introduced me to nominal techniques, while you were on sabbatical.
Bas, thanks for broadening my understanding of the topics touched upon in this
thesis. Unfortunately, we have no papers together, but the connections you showed to
logic, computational learning, and computability theory have influenced the thesis
nevertheless. I am grateful for the many nice chats we had.
I would like to thank the members of the manuscript committee, Bart, Ana, Falk,
Sławek, and Daniela. Reading a thesis is undoubtedly a lot of work, so thank you for
the effort and feedback you have given me. Thanks, also, to the additional members
coming to Nijmegen to oppose during the defence, Jan Friso, Jorge, and Paul.
On the first floor of the Mercator building, I had the pleasure of spending four
years with fun office mates. Michele, thanks for introducing me to the Ph.D. life, by
always joking around. Hopefully, we can play a game of Briscola again. Alexis, many
thanks for all the tasty proeverijen, whether it was beers, wines, poffertjes, kroketten,
or anything else. Your French influences will be missed. Niels, thanks for the abstract
nonsense and bashing on politics.
Next to our office, was the office with Tim, with whom I had the pleasure of
working from various coffee houses in Nijmegen. Further down the corridor, there
was the office of Paul and Rick. Paul, thanks for being the kindest colleague I’ve
had and for inviting us to your musical endeavours. Rick, thanks for the algorithmic
sparring, we had a great collaboration. Was there a more iconic duo on our floor? A
good contender would be Petra and Ramon. Thanks for the fun we had with ioco,
together with Jan and Mariëlle. Nils, thanks for steering me towards probabilistic

x
things and opening a door to Aachen. I am also very grateful to Jurriaan for bringing
back some coalgebra and category theory to our floor, and hosting me in London. My
other co-authors, Wouter, David, Bartek, Michał, and David, also deserve many credits
for all the interesting discussion we had. Harco, thanks for the technical support.
Special thanks go to Ingrid, for helping with the often-overlooked, but important,
administrative matters.
Doing a Ph.D. would not be complete without a good amount of playing kicker,
having borrels, and eating cakes at the iCIS institute. Thanks to all of you, Markus,
Bram, Marc, Sam, Bas, Joost, Dan, Giso, Baris, Simone, Aleks, Manxia, Leon, Jacopo,
Gabriel, Michael, Paulus, Marcos, Bas, and Henning.1
Thanks to the people I have met across the channel (which hopefully will remain
part of the EU): Benni, Nath, Kareem, Rueben, Louis, Borja, Fred, Tobias, Paul, Gerco,
and Carsten, for the theoretical adventure, but also for joining me to Phonox and other
parties in London. I am especially thankful to Matteo and Emanuela for hosting me
many times and for Hillary and Justin for accommodating me for three months each.
I had a lot of fun at the IPA events. I’m very thankful to Tim and Loek for organising
these events. Special thanks to Nico and Priyanka for organising a Halloween social
event with me. Also thanks to all the participants in the IPA events, you made it
a lot of fun! My gratitude extends to all the people I have met at summer schools
and conferences. I had a lot of fun learning about different cultures, languages, and
different ways of doing research. Hope we meet again!
Besides all the fun research, I had a great time with my friends and family. We went
to nice parties, had excellent dinners, and much more; thanks, Nick, Edo, Gabe, Saskia,
Stijn, Sandra, Geert, Marco, Carmen, and Wesley. Thanks to Marlon, Hannah, Wouter,
Dennis, Christiaan, and others from #RU for borrels, bouldering, and jams. Thanks to
Ragnar, Josse, Julian, Jeroen, Vincent, and others from the BAPC for algorithmic fun.
Thanks to my parents, Kees and Irene, and my brother, David, and his wife, Germa,
for their love and support. My gratitude extends to my family in law, Ine, Wim, Jolien
and Jesse. My final words of praise go to Tessa, my wife, I am very happy to have you
on my side. You inspire me in many ways, and I enjoy doing all the fun stuff we do.
Thank you a lot.

1

In no particular order. These lists are randomised.

Contents
Samenvatting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

v

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
Acknowledgements
1

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Model Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Applications of Model Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Research challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Black Box Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Nominal Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Conclusion and Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

Part 1:

Testing Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

17

2

FSM-based Test Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Mealy machines and sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Test generation methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Hybrid ADS method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Proof of completeness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Related Work and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

19
19
26
31
35
36
38

3

Applying Automata Learning to Embedded Control Software . . . . . . . . . 41
Engine Status Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Learning the ESM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

4

Minimal Separating Sequences for All Pairs of States . . . . . . . . . . . . . . . .
Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Minimal Separating Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Optimising the Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Application in Conformance Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

59
60
64
67
70
71
72

xii
Part 2:

Nominal Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

73

5

Learning Nominal Automata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
Overview of the Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
Angluin’s Algorithm for Nominal DFAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
Learning Non-Deterministic Nominal Automata . . . . . . . . . . . . . . . . . . . . 93
Implementation and Preliminary Experiments . . . . . . . . . . . . . . . . . . . . . . 101
Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
Discussion and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

6

Fast Computations on Ordered Nominal Sets . . . . . . . . . . . . . . . . . . . . . . .
Nominal sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Representation in the total order symmetry . . . . . . . . . . . . . . . . . . . . . . . . .
Implementation and Complexity of ONS . . . . . . . . . . . . . . . . . . . . . . . . . . .
Results and evaluation in automata theory . . . . . . . . . . . . . . . . . . . . . . . .
Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

109
111
113
118
120
126
128

7

Separation and Renaming in Nominal Sets . . . . . . . . . . . . . . . . . . . . . . . . .
Monoid actions and nominal sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
A monoidal construction from Pm-sets to Sb-sets . . . . . . . . . . . . . . . . . . .
Nominal and separated automata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Related and future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

131
133
137
143
149

Bibliography

.......................................................

Curriculum Vitae

...................................................

151
169

Chapter 1
Introduction
When I was younger, I often learned how to play with new toys by messing about
with them, by pressing buttons at random, observing their behaviour, pressing more
buttons, and so on. Only resorting to the manual – or asking “experts” – to confirm
my beliefs on how the toys work. Now that I am older, I do mostly the same with new
devices, new tools, and new software. However, now I know that this is an established
computer science technique, called model learning.
Model learning2 is an automated technique to construct a state-based model –
often a type of automaton – from a black box system. The goal of this technique can
be manifold: it can be used to reverse-engineer a system, to find bugs in it, to verify
properties of the system, or to understand the system in one way or another. It is not
just random testing: the information learned during the interaction with the system
is actively used to guide following interactions. Additionally, the information learned
can be inspected and analysed.
This thesis is about model learning and related techniques. In the first part, I
present results concerning black box testing of automata. Testing is a crucial part in
learning software behaviour and often remains a bottleneck in applications of model
learning. In the second part, I show how nominal techniques can be used to learn
automata over structured infinite alphabets. The study on nominal automata was
directly motivated by work on learning network protocols which rely on identifiers or
sequence numbers.
But before we get ahead of ourselves, we should first understand what we mean by
learning, as learning means very different things to different people. In educational
science, learning may involve concepts such as teaching, blended learning, and interdisciplinarity. Data scientists may think of data compression, feature extraction, and
neural networks. In this thesis we are mostly concerned with software verification.
But even in the field of verification several types of learning are relevant.

1 Model Learning
In the context of software verification, we often look at stateful computations with
inputs and outputs. For this reason, it makes sense to look at words, or traces. For an
alphabet Σ, we denote the set of words by Σ∗ .
2

There are many names for the type of learning, such as active automata learning. The generic name “model
learning” is chosen as a counterpoint to model checking.

2

Chapter 1

The learning problem is defined as follows. There is some fixed, but unknown,
language ℒ ⊆ Σ∗ . This language may define the behaviour of a software component,
a property in model checking, a set of traces from a protocol, etc. We wish to infer a
description of ℒ after only having observed a small part of this language. For example,
we may have seen hundred words belonging to the language and a few which do not
belong to the language. Then concluding with a good description of ℒ is difficult, as
we are missing information about the infinitely many words we have not observed.
Such a learning problem can be stated and solved in a variety of ways. In the
applications we do in our research group, we often try to infer a model of a software
component. (Chapter 3 describes such an application.) In these cases, a learning algorithm can interact with the software. So it makes sense to study a learning paradigm
which allows for queries, and not just a data set of samples.
A typical query learning framework was established by Angluin (1987). In her
framework, the learning algorithm may pose two types of queries to a teacher, or oracle:
Membership queries (MQ) The learner poses such a query by providing a word
w ∈ Σ∗ to the teacher. The teacher will then reply whether w ∈ ℒ or not. This
type of query is often generalised to more general output, in these cases we consider
ℒ : Σ∗ → O and the teacher replies with ℒ(w). In some papers, such a query is then
called an output query.
Equivalence queries (EQ) The learner can provide a hypothesised description H of
ℒ to the teacher. If the hypothesis is correct, the teacher replies with yes. If, however,
the hypothesis is incorrect, the teacher replies with no together with a counterexample,
i.e., a word which is in ℒ but not in the hypothesis or vice versa.
By posing many such queries, the learner algorithm is supposed to converge to
a correct model. This type of learning is hence called exact learning. Angluin (1987)
showed that one can do this efficiently for deterministic finite automata (DFAs), when
ℒ is in the class of regular languages.
It should be clear why this is called query learning or active learning. The learning
algorithm initiates interaction with the teacher by posing queries, it may construct its
own data points and ask for their corresponding label. Active learning is in contrast
to passive learning where all observations are given to the algorithm up front.
Another paradigm which is relevant for our type of applications is PAC-learning
with membership queries. Here, the algorithm can again use MQs as before, but the EQs
are replaced by random sampling. So the allowed query is:
Random sample queries (EX) If the learner poses this query (there are no parameters), the teacher responds with a random word w together with its label, i.e., whether
w ∈ ℒ or not. (Here, random means that the words are sampled by some probability
distribution known to the teacher.)

Introduction

3

Instead of requiring that the learner exactly learns the model, we only require the
following. The learner should probably return a model which is approximate to the
target. This gives the name probably approximately correct (PAC). Note that there are
two uncertainties: the probable and the approximate part. Both parts are bounded by
parameters, so one can determine the confidence.
As with many problems in computer science, we are also interested in the efficiency
of learning algorithms. Instead of measuring time or space, we analyse the number of
queries posed by an algorithm. Efficiency often means that we require a polynomial
number of queries. But polynomial in what? The learner has no input, other than the
access to a teacher. We ask the algorithms to be polynomial in the size of the target (i.e.,
the size of the description which has yet to be learned). In the case of PAC learning
we also require it to be polynomial in the two parameters for confidence.
Deterministic automata can be efficiently learned in the PAC model. In fact, any efficient exact learning algorithm with MQs and EQs can be transformed into an efficient
PAC algorithm with MQs (see Kearns & Vazirani, 1994, exercise 8.1). For this reason,
we mostly focus on the former type of learning in this thesis. The transformation
from exact learning to PAC learning is implemented by simply testing the hypothesis
with random samples. This can be postponed until we actually implement a learning
algorithm and apply it.
When using only EQs, only MQs, or only EXs, then there are hardness results for
exact learning of DFAs. So the combinations MQs + EQs (for exact learning) and MQs
+ EXs (for PAC learning) have been carefully picked, they provide a minimal basis
for efficient learning. See the book of Kearns and Vazirani (1994) for such hardness
results and more information on PAC learning.
So far, all the queries are assumed to be just there. Somehow, these are existing
procedures which we can invoke with MQ(w), EQ(H), or EX(). This is a useful abstraction
when designing a learning algorithm. One can analyse the complexity (in terms of
number of queries) independently of how these queries are resolved. Nevertheless,
at some point in time one has to implement them. In our case of learning software
behaviour, membership queries are easily implemented: simply provide the word w
to a running instance of the software and observe the output.3 Equivalence queries,
however, are in general not doable. Even if we have the (machine) code, it is often way
too complicated to check equivalence. That is why we resort to testing with EX queries.
The EX query from PAC learning normally assumes a fixed, unknown probability
distribution on words. In our case, we choose and implement a distribution to test
against. This cuts both ways: On the one hand, it allows us to only test behaviour
we really care about, on the other hand the results are only as good as our choice of
distribution. We deviate even further from the PAC-model as we sometimes change

3

In reality, it is a bit harder than this. There are plentiful of challenges to solve, such as timing, choosing
your alphabet, choosing the kind of observations to make, and being able to reliably reset the software.

4

Chapter 1

our distribution while learning. Yet, as applications show, this is a useful way of
learning software behaviour.

2 Applications of Model Learning
Since this thesis contains only one real-world application of learning in Chapter 3,
it is good to mention a few others. Although we remain in the context of learning
software behaviour, the applications are quite different from each other. This is by no
means a complete list.
Bug finding in protocols. A prominent example is by Fiterău-Broștean, et al. (2016).
They learn models of TCP implementations – both clients and server sides. Interestingly, they found bugs in the (closed source) Windows implementation. Later,
Fiterău-Broștean and Howar (2017) also found a bug in the sliding window of the
Linux implementation of TCP. Other protocols have been learned as well, such as the
MQTT protocol by Tappler, et al. (2017), TLS by de Ruiter and Poll (2015), and SSH
by Fiterău-Broștean, et al. (2017). Many of these applications reveal bugs by learning
a model and consequently apply model checking. The combination of learning and
model checking was first described by Peled, et al. (2002).
Bug finding in smart cards. Aarts, et al. (2013) learn the software on smart cards
of several Dutch and German banks. These cards use the EMV protocol, which is
run on the card itself. So this is an example of a real black box system, where no
other monitoring is possible and no code is available. No vulnerabilities were found,
although each card had a slightly different state machine. The e.dentifier, a card
reader implementing a challenge-response protocol, has been learned by Chalupar, et
al. (2014). They built a Lego machine which could automatically press buttons and
the researchers found a security flaw in this card reader.
Regression testing. Hungar, et al. (2003) describe the potential of automata learning
in regression testing. The aim is not to find bugs, but to monitor the development
process of a system. By considering the differences between models at different stages,
one can generate regressions tests.
Refactoring legacy software. Model learning can also be used in order to verify
refactored software. Schuts, et al. (2016) have applied this at a project within Philips.
They learn both an old version and a new version of the same component. By comparing the learned models, some differences could be seen. This gave developers
opportunities to solve problems before replacing the old component by the new one.

Introduction

5

3 Research challenges
In this thesis, we will mostly see learning of deterministic automata or Mealy machines.
Although this is limited, as many pieces of software require richer models, it has been
successfully applied in the above examples. The limitations include the following.
– The system behaves deterministically.
– One can reliably reset the system.
– The system can be modelled with a finite state space. This also means that the
model does not incorporate time or data.
– The input alphabet is finite.
– One knows when the target is reached.
Research challenge 1: Approximating equivalence queries. Having confidence
in a learned model is difficult. We have PAC guarantees (as discussed before), but
sometimes we may want to draw other conclusions. For example, we may require the
hypothesis to be correct, provided that the real system is implemented with a certain
number of states. Efficiency is important here: We want to obtain those guarantees
fast and we want to quickly find counterexamples when the hypothesis is wrong. Test
generation methods is the topic of the first part in this thesis. We will review existing
algorithms and discuss new algorithms for test generation.
Research challenge 2: Generalisation to infinite alphabets. Automata over infinite
alphabets are very useful for modelling protocols which involve identifiers or timestamps. Not only is the alphabet infinite in these cases, the state space is as well, since
the values have to be remembered. In the second part of this thesis, we will see how
nominal techniques can be used to tackle this challenge.
Being able to learn automata over an infinite alphabet is not new. It has been
tackled, for instance, by Howar, et al. (2012), Bollig, et al. (2013) and in the theses
of Aarts (2014), Cassel (2015), and Fiterău-Broștean (2018). In the first thesis, the
problem is solved by considering abstractions, which reduces the alphabet to a finite
one. These abstractions are automatically refined when a counterexample is presented
to the algorithms. Fiterău-Broștean (2018) extends this approach to cope with “fresh
values”, crucial for protocols such as TCP. In the thesis by Cassel (2015), another
approach is taken. The queries are changed to tree queries. The approach in my thesis
will be based on symmetries, which gives yet another perspective into the problem of
learning such automata.

4 Black Box Testing
An important step in automata learning is equivalence checking. Normally, this is
abstracted away and done by an oracle, but we intend to implement such an oracle

6

Chapter 1

ourselves for our applications. Concretely, the problem we need to solve is that of
conformance checking4 as it was first described by Moore (1956).
The problem is as follows: Given the description of a finite state machine and a
black box system, does the system behave exactly as the description? We wish to
determine this by running experiments on the system (as it is black box). It should
be clear that this is a hopelessly difficult task, as an error can be hidden arbitrarily
deep in the system. That is why we often assume some knowledge of the system. In
this thesis we often assume a bound on the number of states of the system. Under these
conditions, Moore (1956) already solved the problem. Unfortunately, his experiment
is exponential in size, or in his own words: “fantastically large.”
Years later, Chow (1978) and Vasilevskii (1973) independently designed efficient
experiments. In particular, the set of experiments is polynomial in the number of
states. These techniques will be discussed in detail in Chapter 2. More background
and other related problems, as well as their complexity results, are well exposed in a
survey of Lee and Yannakakis (1994).

slow
spinning

<EFBFBD>

fast
spinning

<EFBFBD>

<EFBFBD> <20>

<EFBFBD> <20>

no
spinning

<EFBFBD> <20>

<EFBFBD>

<EFBFBD> <20>

no
spinning

<EFBFBD>

Figure 1.1 Behaviour of a record player
modelled as a finite state machine.
To give an example of conformance checking, we model a record player as a finite
state machine. We will not model the audible output – that would depend not only
on the device, but also the record one chooses to play5. Instead, the only observation
we can make is looking how fast the turntable spins. The device has two buttons: a

4
5

Also known as machine verification or fault detection.
In particular, we have to add time to the model as one side of a record only lasts for roughly 25 minutes.
Unless we take a record with sound on the locked groove such as the Sgt. Pepper’s Lonely Hearts Club Band
album by The Beatles.

Introduction

7

start-stop button ( ) and a speed button (<28> ) which toggles between 33 13 rpm and
<EFBFBD> <20>

45 rpm. When turned on, the system starts playing immediately at 33 13 rpm – this

is useful for DJing. The intended behaviour of the record player has four states as
depicted in Figure 1.1.
Let us consider some faults which could be present in an implementation with
four states. In Figure 1.2, two flawed record players are given. In the first (Figure 1.2a),
the sequence <20> <20> leads us to the wrong state. However, this is not immediately
observable, the turntable is in a non-spinning state as it should be. The fault is only
visible when we press
once more: now the turntable is spinning fast instead of
slow. The sequence <20> <20>
is a counterexample. In the second example (Figure 1.2b),
the fault is again not immediately obvious: after pressing <20>
we are in the wrong
state as observed by pressing . Here, the counterexample is <20>
.
When a model of the implementation is given, it is not hard to find counterexamples. However, in a black box setting we do not have such a model. In order test
whether a black box system is equivalent to a model, we somehow need to test all
possible counterexamples. In this example, a test suite should include sequences such
as <20> <20>
and <20>
.
<EFBFBD> <20>

<EFBFBD> <20>

<EFBFBD> <20>

<EFBFBD> <20>

<EFBFBD> <20>

<EFBFBD> <20>

<EFBFBD> <20>

<EFBFBD> <20>

<EFBFBD> <20>

slow
spinning

<EFBFBD> <20>

<EFBFBD> <20>

<EFBFBD>

fast
spinning

<EFBFBD>

slow
spinning

<EFBFBD>

<EFBFBD> <20>

<EFBFBD> <20>

fast
spinning

<EFBFBD>

<EFBFBD> <20>

<EFBFBD> <20>

<EFBFBD> <20>

<EFBFBD> <20>

<EFBFBD> <20>

<EFBFBD> <20>
<EFBFBD> <20>

no
spinning

no
spinning

<EFBFBD>

no
spinning

<EFBFBD>

no
spinning

<EFBFBD>

<EFBFBD>

(a)
Figure 1.2

(b)
Two faulty record players.

5 Nominal Techniques
In the second part of this thesis, I will present results related to nominal automata.
Usually, nominal techniques are introduced in order to solve problems involving
name binding in topics like lambda calculus. However, we use them in automata

8

Chapter 1

theory, specifically to model register automata. These are automata which have an
infinite alphabet, often thought of as input actions with data. The control flow of the
automaton may actually depend on the data. However, the data cannot be used in an
arbitrary way as this would lead to many decision problems, such as emptiness and
equivalence, being undecidable.6 A principal concept in nominal techniques is that of
symmetries.
To motivate the use of symmetries, we will look at an example of a register auttomaton. In the following automaton we model a (not-so-realistic) login system for a
single person. The alphabet consists of the following actions:

sign-up(p)
login(p)

logout()
view()

The sign-up action allows one to set a password p. This can only be done when the
system is initialised. The login and logout actions speak for themselves and the
view action allows one to see the secret data (we abstract away from what the user
actually gets to see here). A simple automaton with roughly this behaviour is given
in Figure 1.3. We will only informally discuss its semantics for now.

q0

∗/<2F>

sign-up(p) / <20>
set r ≔ p

login(p) / <20>
if r = p
q1
r

q2
r

logout() / <20>
∗/<2F>

view() / <20>
∗/<2F>

Figure 1.3 A simple register automaton. The symbol ∗ denotes any input
otherwise not specified. The r in states q1 and q2 is a register.
To model the behaviour, we want the domain of passwords to be infinite. After all,
one should allow arbitrarily long passwords to be secure. This means that a register
automaton is actually an automaton over an infinite alphabet.
Common algorithms for automata, such as learning, will not work with an infinite
alphabet. Any loop which iterates over the alphabet will diverge. In order to cope
with this, we will use the symmetries present in the alphabet.
Let us continue with the example and look at its symmetries. If a person signs up
with a password “hello” and consequently logins with “hello”, then this is not distinguishable from a person signing up and logging in with “bye”. This is an example
of symmetry: the values “hello” and “bye” can be permuted, or interchanged. Note,
however, that the trace sign-up(hello) login(bye) is different from the two before:
6

The class of automata with arbitrary data operations is sometimes called extended finite state machines.

Introduction

9

no permutation of “hello” and “bye” will bring us to a logged-in state with that trace.
So we see that, despite the symmetry, we cannot simply identify the value “hello”
and “bye”. For this reason, we keep the alphabet infinite and explicitly mention its
symmetries.
Using symmetries in automata theory is not a new idea. In the context of model
checking, the first to use symmetries were Emerson and Sistla (1996) and Ip and
Dill (1996). But only Ip and Dill (1996) used it to deal with infinite data domains.
For automata learning with infinite domains, symmetries were used by Sakamoto
(1997). He devised an L∗ learning algorithm for register automata, much like the
one presented in Chapter 5. The symmetries are crucial to reduce the problem to
a finite alphabet and use the regular L∗ algorithm. (Chapter 5 shows how to do it
with more general symmetries.) Around the same time Ferrari, et al. (2005) worked
on automata theoretic algorithms for the π-calculus. Their approach was based on
the same symmetries and they developed a theory of named sets to implement their
algorithms. Named sets are equivalent to nominal sets. However, nominal sets are
defined in a more elementary way. The nominal sets we will soon see are introduced
by Gabbay and Pitts (2002) to solve certain problems in name binding in abstract
syntaxes. Although this is not really related to automata theory, it was picked up
by Bojańczyk, et al. (2014), who provide an equivalence between register automata
and nominal automata. (This equivalence is exposed in more detail in the book of
Bojańczyk, 2018.) Additionally, they generalise the work on nominal sets to other
symmetries.
The symmetries we encounter in this thesis are listed below, but other symmetries
can be found in the literature. The symmetry directly corresponds to the data values
(and operations) used in an automaton. The data values are often called atoms.
– The equality symmetry. Here the domain can be any countably infinite set. We can
take, for example, the set of strings we used before as the domain from which we
take passwords. No further structure is used on this domain, meaning that any
value is just as good as any other. The symmetries therefore consist of all bijections
on this domain.
– The total order symmetry. In this case, we take a countable infinite set with a dense
total order. Typically, this means we use the rational numbers, ℚ, as data values
and symmetries which respect the ordering.

5.1 What is a nominal set?
So what exactly is a nominal set? I will not define it here and leave the formalities to
the corresponding chapters. It suffices, for now, to think of nominal sets as abstract
sets (often infinite) on which a group of symmetries acts. This action makes it possible
to interpret the symmetries of the data values in the abstract set. For automata, this

10

Chapter 1

allows us to talk about symmetries on the state space, the set of transitions, and the
alphabet.
In order to implement these sets algorithmically, we impose two finiteness requirements. Both properties can be expressed using only the group action.
– Each element is finitely supported. A way to think of this requirement is that each
element is “constructed” out of finitely many data values.
– The set is orbit-finite. This means that we can choose finitely many elements such
that any other element is a permuted version of one of those elements.
If we wish to model the automaton from Figure 1.3 as a nominal automaton, then we
can simply define the state space as
Q = {q0 } ∪ {q1,a | a ∈ 𝔸} ∪ {q2,a | a ∈ 𝔸},

where 𝔸 is the set of atoms. In this example, 𝔸 is the set of all possible passwords.
The set Q is infinite, but satisfies the two finiteness requirements.
The upshot of doing this is that the set Q (and transition structure) corresponds
directly to the semantics of the automaton. We do not have to encode how values relate
or how they interact. Instead, the set (and transition structure) defines all we need
to know. Algorithms, such as reachability, minimisation, and learning, can be run on
such automata, despite the sets being infinite. These algorithms can be implemented
rather easily by using a libraries such as Nλ, Lois, or Ons from Chapter 6. These
libraries implement a data structure for nominal sets, and provide ways to iterate over
such (infinite) sets.
One has to be careful as not all results from automata theory transfer to nominal automata. A notable example is the powerset construction, which converts a
non-deterministic automaton into a deterministic one. The problem here is that the
powerset of a set is generally not orbit-finite and so the finiteness requirement is not
met. Consequently, languages accepted by nominal DFAs are not closed under Kleene
star, or even concatenation.

6 Contributions
This thesis is split into two parts. Part 1 contains material about black box testing, while
Part 2 is about nominal techniques. The chapters can be read in isolation. However,
the chapters do get more technical and mathematical – especially in Part 2.
Detailed discussion on related work and future directions of research are presented
in each chapter.
Chapter 2: FSM-based test methods. This chapter introduces test generation methods which can be used for learning or conformance testing. The methods are presented

Introduction

11

in a uniform way, which allows to give a single proof of completeness for all these methods. Moreover, the uniform presentation gives room to develop new test generation
methods. The main contributions are:
– Uniform description of known methods: Theorem 26 (p. 35)
– A new proof of completeness: Section 5 (p. 36)
– New algorithm (hybrid ADS) and its implementation: Section 3.2 (p. 34)
Chapter 3: Applying automata learning to embedded control software. In this
chapter we will apply model learning to an industrial case study. It is a unique
benchmark as it is much bigger than any of the applications seen before (3410 states
and 77 inputs). This makes it challenging to learn a model and the main obstacle is
finding counterexamples. The main contribution is:
– Application of the hybrid ADS algorithm: Section 2.2 (p. 49)
– Succesfully learn a large-scale system: Section 2.3 (p. 51)
This is based on the following publication:
Smeenk, W., Moerman, J., Vaandrager, F. W., & Jansen, D. N. (2015). Applying Automata Learning to Embedded Control Software. In Formal Methods and Software
Engineering - 17th International Conference on Formal Engineering Methods, ICFEM, Proceedings. Springer. doi:10.1007/978-3-319-25423-4_5.
Chapter 4: Minimal separating sequences for all pairs of states. Continuing on
test generation methods, this chapter presents an efficient algorithm to construct
separating sequences. Not only is the algorithm efficient – it runs in 𝒪(n log n) time – it
also constructs minimal length sequences. The algorithm is inspired by a minimisation
algorithm by Hopcroft (1971), but extending it to construct witnesses is non-trivial.
The main contributions are:
– Efficient algorithm for separating sequences: Algorithms 4.2 & 4.4 (p. 66 & 68)
– Applications to black box testing: Section 4 (p. 70)
– Implementation: Section 5 (p. 71)
This is based on the following publication:
Smetsers, R., Moerman, J., & Jansen, D. N. (2016). Minimal Separating Sequences for
All Pairs of States. In Language and Automata Theory and Applications - 10th International
Conference, LATA, Proceedings. Springer. doi:10.1007/978-3-319-30000-9_14.
Chapter 5: Learning nominal automata. In this chapter, we show how to learn
automata over infinite alphabets. We do this by translating the L∗ algorithm directly to
a nominal version, νL∗ . The correctness proofs mimic the original proofs by Angluin

12

Chapter 1

(1987). Since our new algorithm is close to the original, we are able to translate
variants of the L∗ algorithm as well. In particular, we provide a learning algorithm for
nominal non-deterministic automata. The main contributions are:
–
–
–
–

L∗ -algorithm for nominal automata: Section 3 (p. 86)
Its correctness and complexity: Theorem 7 & Corollary 11 (p. 89 & 93)
Generalisation to non-deterministic automata: Section 4.2 (p. 96)
Implementation in Nλ: Section 5.2 (p. 103)

This is based on the following publication:
Moerman, J., Sammartino, M., Silva, A., Klin, B., & Szynwelski, M. (2017). Learning
nominal automata. In Proceedings of the 44th ACM SIGPLAN Symposium on Principles
of Programming Languages, POPL. ACM. doi:10.1145/3009837.3009879.
Chapter 6: Fast computations on ordered nominal sets. In this chapter, we provide a
library to compute with nominal sets. We restrict our attention to nominal sets over the
total order symmetry. This symmetry allows for a rather easy characterisation of orbits,
and hence an easy implementation. We experimentally show that it is competitive
with existing tools, which are based on SMT solvers. The main contributions are:
– Characterisation theorem of orbits: Table 6.1 (p. 118)
– Complexity results: Theorems 18 & 21 (p. 119 and 123)
– Implementation: Section 3 (p. 118)
This is based on the following publication:
Venhoek, D., Moerman, J., & Rot, J. (2018). Fast Computations on Ordered Nominal Sets. In Theoretical Aspects of Computing - ICTAC - 15th International Colloquium,
Proceedings. Springer. doi:10.1007/978-3-030-02508-3_26.
Chapter 7: Separation and Renaming in Nominal Sets. We investigate how to
reduce the size of certain nominal automata. This is based on the observation that
some languages (with outputs) are not just invariant under symmetries, but invariant
under arbitrary transformations, or renamings. We define a new type of automaton, the
separated nominal automaton, and show that they exactly accept those languages which
are closed under renamings. All of this is shown by using a theoretical framework:
we establish a strong relationship between nominal sets on one hand, and nominal
renaming sets on the other. The main contributions are:
– Adjunction between nominal sets and renaming sets: Theorem 16 (p. 138)
– This adjunction is monoidal: Theorem 17 (p. 139)
– Separated automata have reduced state space: Example 36 (p. 147)
This is based on a paper under submission:

Introduction

13

Moerman, J. & Rot, J. (2019). Separation and Renaming in Nominal Sets. (Under submission).
Besides these chapters in this thesis, I have published the following papers. These
are not included in this thesis, but a short summary of those papers is presented
below.
Complementing Model Learning with Mutation-Based Fuzzing. Our group at
the Radboud University participated in the RERS challenge 2016. This is a challenge
where reactive software is provided and researchers have to asses validity of certain
properties (given as LTL specifications). We approached this with model learning:
Instead of analysing the source code, we simply learned the external behaviour, and
then used model checking on the learned model. This has worked remarkably well,
as the models of the external behaviour are not too big. Our results were presented at
the RERS workshop (ISOLA 2016). The report can be found on arΧiv:
Smetsers, R., Moerman, J., Janssen, M., & Verwer, S. (2016). Complementing Model
Learning with Mutation-Based Fuzzing. CoRR, abs/1611.02429. Retrieved from http:/
/arxiv.org/abs/1611.02429.
𝐧-Complete test suites for IOCO.

In this paper, we investigate complete test suites
for labelled transition systems (LTSs), instead of deterministic Mealy machines. This
is a much harder problem than conformance testing of deterministic systems. The
system may adversarially avoid certain states the tester wishes to test. We provide a test
suite which is n-complete (provided the implementation is a suspension automaton).
My main personal contribution here is the proof of completeness, which resembles the
proof presented in Chapter 2 closely. The conference paper was presented at ICTSS:
van den Bos, P., Janssen, R., & Moerman, J. (2017). n-Complete Test Suites for IOCO.
In ICTSS 2017 Proceedings. Springer. doi:10.1007/978-3-319-67549-7_6.
An extended version has appeared in:
van den Bos, P., Janssen, R., & Moerman, J. (2018). n-Complete Test Suites for IOCO.
Software Quality Journal. Advanced online publication. doi:10.1007/s11219-018-9422-x.
Learning Product Automata. In this article, we consider Moore machines with
multiple outputs. These machines can be decomposed by projecting on each output,
resulting in smaller components that can be learned with fewer queries. We give
experimental evidence that this is a useful technique which can reduce the number of
queries substantially. This is all motivated by the idea that compositional methods are
widely used throughout engineering and that we should use this in model learning.
This work was presented at ICGI 2018:

14

Chapter 1

Moerman, J. (2019). Learning Product Automata. In International Conference on Grammatical Inference, ICGI, Proceedings. Proceedings of Machine Learning Research. (To
appear).

7 Conclusion and Outlook
With the current tools for model learning, it is possible to learn big state machines
of black box systems. It involves using the clever algorithms for learning (such as
the TTT algorithm by Isberner, 2015) and efficient testing methods (see Chapter 2).
However, as the industrial case study from Chapter 3 shows, the bottleneck is often
in conformance testing.
In order to improve on this bottleneck, one possible direction is to consider ‘grey
box testing.’ The methods discussed in this thesis are all black box methods, this could
be considered as ‘too pessimistic.’ Often, we do have (parts of the) source code and
we do know relationships between different inputs. A question for future research
is how this additional information can be integrated in a principled manner in the
learning and testing of systems.
Black box testing still has theoretical challenges. Current generalisations to nondeterministic systems or language inclusion (such as black box testing for IOCO) often
need exponentially big test suites. Whether this is necessary is unknown (to me): we
only have upper bounds but no lower bounds. An interesting approach could be to
see if there exists a notion of reduction between test suites. This is analogous to the
reductions used in complexity theory to prove hardness of problems, or reductions
used in PAC theory to prove learning problems to be inherently unpredictable.
Another path taken in this thesis is the research on nominal automata. This was
motivated by the problem of learning automata over infinite alphabets. So far, the
results on nominal automata are mostly theoretical in nature. Nevertheless, we show
that the nominal algorithms can be implemented and that they can be run concretely
on black box systems (Chapter 5). The advantage of using the foundations of nominal
sets is that the algorithms are closely related to the original L∗ algorithm. Consequently,
variations of L∗ can easily be implemented. For instance, we show that NL∗ algorithm
for non-deterministic automata works in the nominal case too. (We have not attempted
to implement more recent algorithms such as TTT.) The nominal learning algorithms
can be implemented in just a few hundreds lines of code, much less than the approach
taken by, e.g., Fiterău-Broștean (2018).
In this thesis, we tackle some efficiency issues when computing with nominal sets.
In Chapter 6 we characterise orbits in order to give an efficient representation (for the
total-order symmetry). Another result is the fact that some nominal automata can be
‘compressed’ to separated automata, which can be exponentially smaller (Chapter 7).
However, the nominal tools still leave much to be desired in terms of efficiency.

Introduction

15

Last, it would be interesting to marry the two paths taken in this thesis. I am
not aware of n-complete test suites for register automata or nominal automata. The
results on learning nominal automata in Chapter 5 show that this should be possible,
as an observation table gives a test suite.7 However, there is an interesting twist to
this problem. The test methods from Chapter 2 can all account for extra states. For
nominal automata, we should be able to cope with extra states and extra registers. It
would be interesting to see how the test suite grows as these two dimensions increase.

7

The rows of a table are access sequences, and the columns provide a characterisation set.

16

Chapter 1

Part 1:
Testing Techniques

18

Chapter

Chapter 2
FSM-based Test Methods
In this chapter, we will discuss some of the theory of test generation methods for black
box conformance checking. Since the systems we consider are black box, we cannot
simply determine equivalence with a specification. The only way to gain confidence
is to perform experiments on the system. A key aspect of test generation methods is
the size and completeness of the test suites. On one hand, we want to cover as much
as the specification as possible, hopefully ensuring that we find mistakes in any faulty
implementation. On the other hand: testing takes time, so we want to minimise the
size of a test suite.
The test methods described here are well-known in the literature of FSM-based
testing. They all share similar concepts, such as access sequences and state identifiers. In
this chapter we will define these concepts, relate them with one another and show
how to build test suites from these concepts. This theoretically discussion is new and
enables us to compare the different methods uniformly. For instance, we can prove
all these methods to be n-complete with a single proof.
The discussion also inspired a new algorithm: the hybrid ADS methods. This method
is applied to an industrial case study in Chapter 3. It combines the strength of the
ADS method (which is not always applicable) with the generality of the HSI method.
This chapter starts with the basics: Mealy machines, sequences and what it means
to test a black box system. Then, starting from Section 1.3 we define several concepts,
such as state identifiers, in order to distinguish one state from another. These concepts
are then combined in Section 2 to derive test suites. In a similar vein, we define a novel
test method in Section 3 and we discuss some of the implementation details of the
hybrid-ads tool. We summarise the various test methods in Section 4. All methods
are proven to be n-complete in Section 5. Finally, in Section 6, we discuss related work.

1 Mealy machines and sequences
We will focus on Mealy machines, as those capture many protocol specifications and
reactive systems.
We fix finite alphabets I and O of inputs respectively outputs. We use the usual
notation for operations on sequences (also called words): uv for the concatenation of
two sequences u, v ∈ I∗ and |u| for the length of u. For a sequence w = uv we say that
u and v are a prefix and suffix respectively.

20

Chapter 2

Definition 1. A (deterministic and complete) Mealy machine M consists of a finite
set of states S, an initial state s0 ∈ S and two functions:
– a transition function δ : S × I → S, and
– an output function λ : S × I → O.
Both the transition function and output function are extended inductively to sequences
as δ : S × I∗ → S and λ : S × I∗ → O∗ :
δ(s, ϵ) = s
δ(s, aw) = δ(δ(s, a), w)

λ(s, ϵ) = ϵ
λ(s, aw) = λ(s, a)λ(δ(s, a), w)

The behaviour of a state s is given by the output function λ(s, −) : I∗ → O∗ . Two states
s and t are equivalent if they have equal behaviours, written s ∼ t, and two Mealy
machines are equivalent if their initial states are equivalent.
Remark 2. We will use the following conventions and notation. We often write
s ∈ M instead of s ∈ S and for a second Mealy machine M′ its constituents are denoted
S′ , s′0 , δ′ and λ′ . Moreover, if we have a state s ∈ M, we silently assume that s is not
a member of any other Mealy machine M′ . (In other words, the behaviour of s is
determined by the state itself.) This eases the notation since we can write s ∼ t without
needing to introduce a context.
An example Mealy machine is given in Figure 2.1.
s1

a/0

a/1

s2

b/0, c/0

a/1

b/0, c/0

b/0

s0

c/0

c/0

b/0

s3

a/1

s4

c/1

a/1

b/0

Figure 2.1 An example specification with
input I = {a, b, c} and output O = {0, 1}.

1.1 Testing
In conformance testing we have a specification modelled as a Mealy machine and
an implementation (the system under test, or SUT) which we assume to behave as
a Mealy machine. Tests, or experiments, are generated from the specification and
applied to the implementation. We assume that we can reset the implementation
before every test. If the output is different from the specified output, then we know

FSM-based Test Methods

21

the implementation is flawed. The goals is to test as little as possible, while covering
as much as possible.
A test suite is nothing more than a set of sequences. We do not have to encode
outputs in the test suite, as those follow from the deterministic specification.
Definition 3.

A test suite is a finite subset T ⊆ I∗ .

A test t ∈ T is called maximal if it is not a proper prefix of another test s ∈ T. We denote
the set of maximal tests of T by max(T ). The maximal tests are the only tests in T we
actually have to apply to our SUT as we can record the intermediate outputs. In the
examples of this chapter we will show max(T ) instead of T.
We define the size of a test suite as usual (Dorofeeva, et al., 2010 and Petrenko,
et al., 2014). The size of a test suite is measured as the sum of the lengths of all its
maximal tests plus one reset per test.
Definition 4.

The size of a test suite T is defined to be ‖T ‖ =

(|t| + 1).

∑

t∈max(T )

1.2 Completeness of test suites
Example 5. No test suite is complete. Consider the specification in Figure 2.2a.
This machine will always outputs a cup of coffee – when given money. For any test
suite we can make a faulty implementation which passes the test suite. A faulty
implementation might look like Figure 2.2b, where the machine starts to output beers
after n steps (signalling that it’s the end of the day), where n is larger than the length
of the longest sequence in the suite. This shows that no test-suite can be complete and
it justifies the following definition.

<EFBFBD>

/<2F>
s0

(a)

<EFBFBD>

s′0

<EFBFBD>

/<2F>

s′1

<EFBFBD>

⋯

/<2F>

<EFBFBD>

/<2F>

/<2F>
s′n

(b)

Figure 2.2 A basic example showing that finite test suites are incomplete. The implementation on the right will pass any test suite if we choose n big enough.
Definition 6. Let M be a Mealy machine and T be a test suite. We say that T is
m-complete (for M) if for all inequivalent machines M′ with at most m states there
exists a t ∈ T such that λ(s0 , t) ≠ λ′ (s′0 , t).
We are often interested in the case of m-completeness, where m = n + k for some
k ∈ ℕ and n is the number of states in the specification. Here k will stand for the
number of extra states we can test.

22

Chapter 2

Note the order of the quantifiers in the above definition. We ask for a single test
suite which works for all implementations of bounded size. This is crucial for black
box testing, as we do not know the implementation, so the test suite has to work for
all of them.

1.3 Separating Sequences
Before we construct test suites, we discuss several types of useful sequences. All the
following notions are standard in the literature, and the corresponding references
will be given in Section 2, where we discuss the test generation methods using these
notions. We fix a Mealy machine M for the remainder of this chapter.
Definition 7. We define the following kinds of sequences.
– Given two states s, t in M we say that w is a separating sequence if λ(s, w) ≠ λ(t, w).
– For a single state s in M, a sequence w is a unique input output sequence (UIO) if for
every inequivalent state t in M we have λ(s, w) ≠ λ(t, w).
– Finally, a (preset) distinguishing sequence (DS) is a single sequence w which separates all states of M, i.e., for every pair of inequivalent states s, t in M we have
λ(s, w) ≠ λ(t, w).
The above list is ordered from weaker to stronger notions, i.e., every distinguishing
sequence is an UIO sequence for every state. Similarly, an UIO for a state s is a
separating sequence for s and any inequivalent t. Separating sequences always exist
for inequivalent states and finding them efficiently is the topic of Chapter 4. On the
other hand, UIOs and DSs do not always exist for a machine.
A machine M is minimal if every distinct pair of states is inequivalent (i.e.,
s ∼ t ⟹ s = t). We will not require M te be minimal, although this is often
done in literature. Minimality is sometimes convenient, as one can write ‘every other
state t’ instead of ‘every inequivalent state t’.
Example 8. For the machine in Figure 2.1, we note that state s0 and s2 are separated
by the sequence aa (but not by any shorter sequence). In fact, the sequence aa is an
UIO for state s0 since it is the only state outputting 10 on that input. However, state
s2 has no UIO: If the sequence were to start with b or c, state s3 and s4 respectively
have equal transition, which makes it impossible to separate those states after the first
symbol. If it starts with an a, states s3 and s4 are swapped and we make no progress
in distinguishing these states from s2 . Since s2 has no UIO, the machine as a whole
does not admit a DS.
In this example, all other states actually have UIOs. For the states s0 , s1 , s3 and s4 ,
we can pick the sequences aa, a, c and ac respectively. In order to separate s2 from
the other state, we have to pick multiple sequences. For instance, the set {aa, ac, c}
will separate s2 from all other states.

FSM-based Test Methods

23

1.4 Sets of separating sequences
As the example shows, we need sets of sequences and sometimes even sets of sets of
sequences – called families.8
Definition 9. We define the following kinds of sets of sequences. We require that all
sets are prefix-closed, however, we only show the maximal sequences in examples.9
– A set of sequences W is a called a characterisation set if it contains a separating
sequence for each pair of inequivalent states in M.
– A state identifier for a state s ∈ M is a set Ws such that for every inequivalent t ∈ M
a separating sequence for s and t exists in Ws .
– A set of state identifiers {Ws }s is harmonised if Ws ∩ Wt contains a separating
sequence for inequivalent states s and t. This is also called a separating family.
A state identifier Ws will be used to test against a single state. In contrast to a characterisation set, it only include sequences which are relevant for s. The property of
being harmonised might seem a bit strange. This property ensures that the same tests
are used for different states. This extra consistency within a test suite is necessary for
some test methods. We return to this notion in more detail in Example 22.
We may obtain a characterisation set by simply considering every pair of states and
look for a difference. However, it turns out a harmonised set of state identifiers exists
for every machine and this can be constructed very efficiently (Chapter 4). From a
set of state identifiers we may obtain a characterisation set by taking the union of all
those sets.
Example 10. As mentioned before, state s2 from Figure 2.1 has a state identifier
{aa, ac, b}. In fact, this set is a characterisation set for the whole machine. Since the
other states have UIOs, we can pick singleton sets as state identifiers. For example,
state s0 has the UIO aa, so a state identifier for s0 is W0 = {aa}. Similarly, we can take
W1 = {a} and W3 = {c}. But note that such a family will not be harmonised since the
sets {a} and {c} have no common separating sequence.
One more type of state identifier is of our interest: the adaptive distinguishing sequence.
It it the strongest type of state identifier, and as a result not many machines have
one. Like DSs, adaptive distinguishing sequences can identify a state using a single
word. We give a slightly different (but equivalent) definition than the one of Lee and
Yannakakis (1994).
Definition 11. A separating family ℋ is an adaptive distinguishing sequence (ADS) if
each set max(Hs ) is a singleton.

8
9

A family is often written as {Xs }s∈M or simply {Xs }s , meaning that for each state s ∈ M we have a set Xs .
Taking these sets to be prefix-closed makes many proofs easier.

24

Chapter 2

It is called an adaptive sequence, since it has a tree structure which depends on the
output of the machine. To see this tree structure, consider the first symbols of each of
the sequences in the family. Since the family is harmonised and each set is essentially
given by a single word, there is only one first symbol. Depending on the output after
the first symbol, the sequence continues.
Example 12. In Figure 2.3 we see a machine with an ADS. The ADS is given as
follows:
H0 = {aba}

H1 = {aaba}

H2 = {aba}

H3 = {aaba}

Note that all sequences start with a. This already separates s0 , s2 from s1 , s3 . To
further separate the states, the sequences continues with either a b or another a. And
so on.

s0

a/0, b/0

s1

a

0

b/0

a

b
a/1,
b/0

s3

0
s2

0

0

a/1
a/0, b/0

s2

a

b

1
s0
0
s1

(a)

1

0
a

1
s3

(b)

Figure 2.3 (a): A Mealy machine with an ADS and (b): the tree structure
of this ADS.
Given an ADS, there exists an UIO for every state. The converse – if every state has
an UIO, then the machine admits an ADS – does not hold. The machine in Figure 2.1
admits no ADS, since s2 has no UIO.

1.5 Partial equivalence
Definition 13. We define the following notation.
– Let W be a set of sequences. Two states x, y are W-equivalent, written x ∼W y, if
λ(x, w) = λ(y, w) for all w ∈ W.
– Let 𝒲 be a family. Two states x, y are 𝒲-equivalent, written x ∼𝒲 y, if λ(x, w) =
λ(y, w) for all w ∈ Wx ∩ Wy .

FSM-based Test Methods

25

The relation ∼W is an equivalence relation and W ⊆ V implies that V separates more
states than W, i.e., x ∼V y ⟹ x ∼W y. Clearly, if two states are equivalent (i.e., s ∼ t),
then for any set W we have s ∼W t.
Lemma 14. The relations ∼W and ∼𝒲 can be used to define characterisation sets and
separating families. Concretely:
– W is a characterisation set if and only if for all s, t in M, s ∼W t implies s ∼ t.
– 𝒲 is a separating family if and only if for all s, t in M, s ∼𝒲 t implies s ∼ t.
Proof.
– W is a characterisation set by definition means s ̸∼ t ⟹ s ̸∼W t as W contains a
separating sequence (if it exists at all). This is equivalent to s ∼W t ⟹ s ∼ t.
– Let 𝒲 be a separating family and s ̸∼ t. Then there is a sequence w ∈ Ws ∩ Wt
such that λ(s, w) ≠ λ(t, w), i.e., s ̸∼𝒲 t. We have shown s ̸∼ t ⟹ s ̸∼𝒲 t, which is
equivalent to s ∼𝒲 t ⟹ s ∼ t. The converse is proven similarly.
□

1.6 Access sequences
Besides sequences which separate states, we also need sequences which brings a
machine to specified states.
Definition 15. An access sequence for s is a word w such that δ(s0 , w) = s. A set P
consisting of an access sequence for each state is called a state cover. If P is a state cover,
then the set {pa | p ∈ P, a ∈ I} is called a transition cover.

1.7 Constructions on sets of sequences
In order to define a test suite modularly, we introduce notation for combining sets of
words. For sets of words X and Y, we define
– their concatenation X ⋅ Y = {xy | x ∈ X, y ∈ Y},
– iterated concatenation X0 = {ϵ} and Xn+1 = X ⋅ Xn , and
– bounded concatenation X≤n = ⋃i≤n Xi .
On families we define
– flattening: ⋃ 𝒳 = {x | x ∈ Xs , s ∈ S},
– union: 𝒳 ∪ 𝒴 is defined point-wise: (𝒳 ∪ 𝒴)s = Xs ∪ Ys ,
– concatenation10: X ⊙ 𝒴 = {xy | x ∈ X, y ∈ Yδ(s0 ,x) }, and
– refinement: 𝒳; 𝒴 defined by11

10
11

We will often see the combination P ⋅ I ⊙ 𝒳, this should be read as (P ⋅ I) ⊙ 𝒳.
We use the convention that ∩ binds stronger than ∪. In fact, all the operators here bind stronger than ∪.

26

Chapter 2
(𝒳; 𝒴)s = Xs ∪ Ys ∩ ∪ Yt .
s∼𝒳 t
s̸∼𝒴 t

The latter construction is new and will be used to define a hybrid test generation
method in Section 3. It refines a family 𝒳, which need not be separating, by including
sequences from a second family 𝒴. It only adds those sequences to states if 𝒳 does not
distinguish those states. This is also the reason behind the ;-notation: first the tests
from 𝒳 are used to distinguish states, and then for the remaining states 𝒴 is used.
Lemma 16. For all families 𝒳 and 𝒴:
– 𝒳; 𝒳 = 𝒳,
– 𝒳; 𝒴 = 𝒳, whenever 𝒳 is a separating family, and
– 𝒳; 𝒴 is a separating family whenever 𝒴 is a separating family.
Proof. For the first item, note that there are no states t such that s ∼𝒳 t and s ̸∼𝒳 t.
Consequently, the union is empty, and the expression simplifies to
(𝒳; 𝒳)s = Xs ∪ (Xs ∩ ∅) = Xs .

If 𝒳 is a separating family, then the only t for which s ∼𝒳 t hold are t such that s ∼ t
(Lemma 14). But s ∼ t is ruled out by s ̸∼𝒴 t, and again so
(𝒳; 𝒴)s = Xs ∪ (Ys ∩ ∅) = Xs .

For the last item, suppose that s ∼𝒳;𝒴 t. Then s and t agree on every sequence in
(𝒳; 𝒴)s ∩ (𝒳; 𝒴)t . We distinguish two cases:
– Suppose s ∼𝒳 t, then Ys ∩ Yt ⊆ (𝒳; 𝒴)s ∩ (𝒳; 𝒴)t . And so s and t agree on Ys ∩ Yt ,
meaning s ∼𝒴 t. Since 𝒴 is a separating family, we have s ∼ t.
– Suppose s ̸∼𝒳 t. This contradicts s ∼𝒳;𝒴 t, since Xs ∩ Xt ⊆ (𝒳; 𝒴)s ∩ (𝒳; 𝒴)t .
We conclude that s ∼ t. This proves that 𝒳; 𝒴 is a separating family.
□

2 Test generation methods
In this section, we review the classical conformance testing methods: the W, Wp, UIO,
UIOv, HSI, ADS methods. At the end of this section, we construct the test suites for
the running example. Our hybrid ADS method uses a similar construction.
There are many more test generation methods. Literature shows, however, that
not all of them are complete. For example, the method by Bernhard (1994) is falsified
by Petrenko (1997), and the UIO-method from Sabnani and Dahbura (1988) is shown
to be incomplete by Chan, et al. (1989). For that reason, completeness of the correct
methods is shown in Theorem 26. The proof is general enough to capture all the
methods at once. We fix a state cover P throughout this section and take the transition
cover Q = P ⋅ I.

FSM-based Test Methods

27

2.1 W-method (Chow, 1978 and Vasilevskii, 1973)
After the work of Moore (1956), it was unclear whether a test suite of polynomial
size could exist. He presented a finite test suite which was complete, however it was
exponential in size. Both Chow (1978) and Vasilevskii (1973) independently prove
that test suites of polynomial size exist.12 The W-method is a very structured test suite
construction. It is called the W-method as the characterisation set is often called W.
Definition 17.

Given a characterisation set W, we define the W test suite as
TW = (P ∪ Q) ⋅ I≤k ⋅ W.

This – and all following methods – tests the machine in two phases. For simplicity, we
explain these phases when k = 0. The first phase consists of the tests P ⋅ W and tests
whether all states of the specification are (roughly) present in the implementation.
The second phase is Q ⋅ W and tests whether the successor states are correct. Together,
these two phases put enough constraints on the implementation to know that the
implementation and specification coincide (provided that the implementation has no
more states than the specification).

2.2 The Wp-method (Fujiwara, et al., 1991)
Fujiwara, et al. (1991) realised that one needs fewer tests in the second phase of the
W-method. Since we already know the right states are present after phase one, we
only need to check if the state after a transition is consistent with the expected state.
This justifies the use of state identifiers for each state.
Definition 18.

Let 𝒲 be a family of state identifiers. The Wp test suite is defined as
TWp = P ⋅ I≤k ⋅ ∪ 𝒲 ∪ Q ⋅ I≤k ⊙ 𝒲.

Note that ⋃ 𝒲 is a characterisation set as defined for the W-method. It is needed for
completeness to test states with the whole set ⋃ 𝒲. Once states are tested as such, we
can use the smaller sets Ws for testing transitions.

2.3 The HSI-method (Luo, et al., 1995 and Petrenko, et al., 1993)
The Wp-method in turn was refined by Luo, et al. (1995) and Petrenko, et al. (1993).
They make use of harmonised state identifiers, allowing to take state identifiers in the
initial phase of the test suite.
Definition 19.
12

Let ℋ be a separating family. We define the HSI test suite by

More precisely: the size of TW is polynomial in the size of the specification for each fixed k.

28

Chapter 2
THSI = (P ∪ Q) ⋅ I≤k ⊙ ℋ.

Our hybrid ADS method is an instance of the HSI-method as we define it here. However, Luo, et al. (1995) and Petrenko, et al. (1993) describe the HSI-method together
with a specific way of generating the separating families. Namely, the set obtained by
a splitting tree with shortest witnesses. The hybrid ADS method does not refine the
HSI-method defined in the more restricted sense.

2.4 The ADS-method (Lee & Yannakakis, 1994)
As discussed before, when a Mealy machine admits a adaptive distinguishing sequence, only a single test has to be performed for identifying a state. This is exploited
in the ADS-method.
Definition 20. Let 𝒵 be an adaptive distinguishing sequence. The ADS test suite is
defined as
TADS = (P ∪ Q) ⋅ I≤k ⊙ 𝒵.

2.5 The UIOv-method (Chan, et al., 1989)
Some Mealy machines which do not admit an adaptive distinguishing sequence, may
still admit state identifiers which are singletons. These are exactly UIO sequences and
gives rise to the UIOv-method. In a way this is a generalisation of the ADS-method,
since the requirement that state identifiers are harmonised is dropped.
Definition 21. Let 𝒰 = {a single UIO for s}s∈S be a family of UIO sequences, the
UIOv test suite is defined as
TUIOv = P ⋅ I≤k ⋅ ∪ 𝒰 ∪ Q ⋅ I≤k ⊙ 𝒰.

One might think that using a single UIO sequence instead of the set ⋃ 𝒰 to verify the
state is enough. In fact, this idea was used for the UIO-method which defines the test
suite (P ∪ Q) ⋅ I≤k ⊙ 𝒰. The following is a counterexample, due to Chan, et al. (1989),
to such conjecture.
Example 22. The Mealy machines in Figure 2.4 shows that UIO-method does not
define a 3-complete test suite. Take for example the UIOs u0 = aa, u1 = a, u2 = ba for
the states s0 , s1 , s2 respectively. The test suite then becomes {aaaa, abba, baaa, bba}
and the faulty implementation passes this suite. This happens because the sequence
u2 is not an UIO in the implementation, and the state s′2 simulates both UIOs u1 and
u2 . Hence we also want to check that a state does not behave as one of the other states,
and therefore we use ⋃ 𝒰. With the same UIOs as above, the resulting UIOv test suite
for the specification in Figure 2.4 is {aaaa, aba, abba, baaa, bba} of size 23. (Recall
that we also count resets when measuring the size.)

FSM-based Test Methods

29

b/1
s1

s′1

a/0
a/1

b/1
b/1
a/0

s0

a/0
a/1

s2

a/0

s′0

b/1

s′2

b/1

Specification
Figure 2.4

b/1

Implementation

An example where the UIO-method is not complete.

2.6 All test suites for Figure 2.1
Let us compute all the previous test suites on the specification in Figure 2.1. We will
be testing without extra states, i.e., we construct 5-complete test suites. We start by
defining the state and transition cover. For this, we take all shortest sequences from
the initial state to the other states. This state cover is depicted in Figure 2.5. The
transition cover is simply constructed by extending each access sequence with another
symbol.
P = {ϵ, a, aa, b, ba}
Q = P ⋅ I = {a, b, c, aa, ab, ac, aaa, aab, aac, ba, bb, bc, baa, bab, bac}
s1

b/0, c/0

b/0

s3

c/1

ϵ
a/1

a/0

a/1

s2

Figure 2.5

b/0, c/0

s0

c/0

c/0

b/0

a/1

s4

a/1

b/0

A state cover for the specification from Figure 2.1.

As shown earlier, the set W = {aa, ac, c} is a characterisation set. The W-method,
which simply combines P ∪ Q with W, gives the following test suite of size 169:
TW = {

aaaaa, aaaac, aaac, aabaa, aabac, aabc, aacaa,
aacac, aacc, abaa, abac, abc, acaa, acac, acc, baaaa,
baaac, baac, babaa, babac, babc, bacaa, bacac, bacc,
bbaa, bbac, bbc, bcaa, bcac, bcc, caa, cac, cc }

30

Chapter 2

With the Wp-method we get to choose a different state identifier per state. Since many
states have an UIO, we can use them as state identifiers. This defines the following
family 𝒲:
W0 = {aa}

W1 = {a}

W2 = {aa, ac, c}

W3 = {c}

W4 = {ac}

For the first part of the Wp test suite we need ⋃ 𝒲 = {aa, ac, c}. For the second part,
we only combine the sequences in the transition cover with the corresponding suffixes.
All in all we get a test suite of size 75:
TWp = {

aaaaa, aaaac, aaac, aabaa, aacaa, abaa,
acaa, baaac, baac, babaa, bacc, bbac, bcaa, caa

}

For the HSI-method we need a separating family ℋ. We pick the following sets:
H0 = {aa, c}

H1 = {a}

H2 = {aa, ac, c}

H3 = {a, c}

H4 = {aa, ac, c}

(We repeat that these sets are prefix-closed, but we only show the maximal sequences.)
Note that these sets are harmonised, unlike the family 𝒲. For example, the separating
sequence a is contained in both H1 and H3 . This ensures that we do not have to consider ⋃ ℋ in the first part of the test suite. When combining this with the corresponding
prefixes, we obtain the HSI test suite of size 125:
THSI = {

aaaaa, aaaac, aaac, aabaa, aabc, aacaa, aacc,
abaa, abc, acaa, acc, baaaa, baaac, baac, babaa,
babc, baca, bacc, bbaa, bbac, bbc, bcaa, bcc, caa, cc

}

On this particular example the Wp-method outperforms the HSI-method. The reason
is that many states have UIOs and we picked those to be the state identifiers. In general,
however, UIOs may not exist (and finding them is hard).
The UIO-method and ADS-method are not applicable in this example because
state s2 does not have an UIO.
s′1

a/0

s′2

b/0, c/0

a/1

a/1, b/0, c/0

s′3

b/0

s′0
c/0

c/0

a/1

b/0

Figure 2.6 A faulty implementation
for the specification in Figure 2.1.

c/1

a/1

s′4

b/0

FSM-based Test Methods

31

We can run these test suites on the faulty implementation shown in Figure 2.6. Here,
the a-transition from state s′2 transitions to the wrong target state. It is not an obvious
mistake, since the faulty target s′0 has very similar transitions as s2 . Yet, all the test
suites detect this error. When choosing the prefix aaa (included in the transition
cover), and suffix aa (included in the characterisation set and state identifiers for s2 ),
we see that the specification outputs 10111 and the implementation outputs 10110.
The sequence aaaaa is the only sequence (in any of the test suites here) which detects
this fault.
Alternatively, the a-transition from s′2 would transition to s′4 , we need the suffix ac
as aa will not detect the fault. Since the sequences ac is included in the state identifier
for s2 , this fault would also be detected. This shows that it is sometimes necessary to
include multiple sequences in the state identifier.
Another approach to testing would be to enumerate all sequences up to a certain
length. In this example, we need sequences of at least length 5. Consequently, the test
suite contains 243 sequences and this boils down to a size of 1458. Such a brute-force
approach is not scalable.

3 Hybrid ADS method
In this section, we describe a new test generation method for Mealy machines. Its
completeness will be proven in Theorem 26, together with completeness for all methods defined in the previous section. From a high level perspective, the method uses
the algorithm by Lee and Yannakakis (1994) to obtain an ADS. If no ADS exists, their
algorithm still provides some sequences which separates some inequivalent states.
Our extension is to refine the set of sequences by using pairwise separating sequences.
Hence, this method is a hybrid between the ADS-method and HSI-method.
The reason we do this is the fact that the ADS-method generally constructs small
test suites as experiments by Dorofeeva, et al. (2010) suggest. The test suites are small
since an ADS can identify a state with a single word, instead of a set of words which
is generally needed. Even if the ADS does not exist, using the partial result of Lee and
Yannakakis’ algorithm can reduce the size of test suites.
We will now see the construction of this hybrid method. Instead of manipulating
separating families directly, we use a splitting tree. This is a data structure which is
used to construct separating families or adaptive distinguishing sequences.
Definition 23. A splitting tree (for M) is a rooted tree where each node u has
– a non-empty set of states l(u) ⊆ M, and
– if u is not a leaf, a sequence σ(u) ∈ I∗ .
We require that if a node u has children C(u) then
– the sets of states of the children of u partition l(u), i.e., the set P(u) = {l(v) | v ∈ C(u)}
is a non-trivial partition of l(u), and

32

Chapter 2

– the sequence σ(u) witnesses the partition P(u), meaning that for all p, q ∈ P(u) we
have p = q iff λ(s, σ(u)) = λ(t, σ(u)) for all s ∈ p, t ∈ q.
A splitting tree is called complete if all inequivalent states belong to different leaves.
Efficient construction of a splitting tree is described in more detail in Chapter 4. Briefly,
the splitting tree records the execution of a partition refinement algorithm (such as
Moore’s or Hopcroft’s algorithm). Each non-leaf node encodes a split together with
a witness, which is a separating sequence for its children. From such a tree we can
construct a state identifier for a state by locating the leaf containing that state and
collecting all the sequences you read when traversing to the root.
For adaptive distinguishing sequences an additional requirement is put on the
splitting tree: for each non-leaf node u, the sequence σ(u) defines an injective map
x ↦ (δ(x, σ(u)), λ(x, σ(u))) on the set l(u). Lee and Yannakakis (1994) call such splits
valid. Figure 2.7 shows both valid and invalid splits. Validity precisely ensures that
after performing a split, the states are still distinguishable. Hence, sequences of such
splits can be concatenated.
s0 , s 1 , s 2 , s 3 , s 4
a
s0 , s2 , s3 , s4
c

s1

s0 , s2 , s4
aa

s3

s2 , s4
ac

s0

s2

s4

Figure 2.7 A complete splitting tree
with shortest witnesses for the specification of Figure 2.1. Only the splits a,
aa, and ac are valid.
The following lemma is a result of Lee and Yannakakis (1994).
Lemma 24. A complete splitting tree with only valid splits exists if and only if there
exists an adaptive distinguishing sequence.
Our method uses the exact same algorithm as the one by Lee and Yannakakis. However, we also apply it in the case when the splitting tree with valid splits is not complete
(and hence no adaptive distinguishing sequence exists). Their algorithm still produces
a family of sets, but is not necessarily a separating family.

FSM-based Test Methods

33

In order to recover separability, we refine that family. Let 𝒵′ be the result of Lee
and Yannakakis’ algorithm (to distinguish from their notation, we add a prime) and
let ℋ be a separating family extracted from an ordinary splitting tree. The hybrid ADS
family is defined as 𝒵′ ; ℋ, and can be computed as sketched in Algorithm 2.1 (the
algorithm works on splitting trees instead of separating families). By Lemma 16 we
note the following: in the best case this family is an adaptive distinguishing sequence;
in the worst case it is equal to ℋ; and in general it is a combination of the two families.
In all cases, the result is a separating family because ℋ is.

1
2
3
4
5
6
7
8
9

Require: A Mealy machine M
Ensure: A separating family Z
T1 ← splitting tree for Moore’s minimisation algorithm
T2 ← splitting tree with valid splits (see Lee & Yannakakis, 1994)
𝒵′ ← (incomplete) family constructed from T2
for all inequivalent states s, t in the same leaf of T2 do
u ← lca(T1 , s, t)
Zs ← Z′s ∪ {σ(u)}
Zt ← Zt′ ∪ {σ(u)}

end for
return Z
Algorithm 2.1

Obtaining the hybrid separating family 𝒵′ ; ℋ

With the hybrid family we can define the test suite as follows. Its m-completeness is
proven in Section 5.
Definition 25. Let P be a state cover, 𝒵′ be a family of sets constructed with the Lee
and Yannakakis algorithm, and ℋ be a separating family. The hybrid ADS test suite is
Th-ADS = (P ∪ Q) ⋅ I≤k ⊙ (𝒵′ ; ℋ).

3.1 Example
In the figure we see the (unique) result of Lee and Yannakakis’ algorithm. We note
that the states s2 , s3 , s4 are not split, so we need to refine the family for those states.
We take the separating family ℋ from before. From the incomplete ADS in Figure 2.8b above we obtain the family 𝒵′ . These families and the refinement 𝒵′ ; ℋ are
given below.

34

Chapter 2
s0 , s1 , s2 , s3 , s4
a
0
s0 , s 2 , s 3 , s 4
aa

s1

s1

s2 , s3 , s4

s0

a

1
0

a

s0

(a)

1
s2 , s 3 , s 4

(b)

Figure 2.8 (a): Largest splitting tree with only valid splits for Figure 2.1.
(b): Its incomplete adaptive distinguishing tree.
H0
H1
H2
H3
H4

= {aa, c}
= {a}
= {aa, ac, c}
= {a, c}
= {aa, ac, c}

Z′0
Z′1
Z′2
Z′3
Z′4

= {aa}
= {a}
= {aa}
= {aa}
= {aa}

(Z′ ; H)0
(Z′ ; H)1
(Z′ ; H)2
(Z′ ; H)3
(Z′ ; H)4

= {aa}
= {a}
= {aa, ac, c}
= {aa, c}
= {aa, ac, c}

With the separating family 𝒵′ ; ℋ we obtain the following test suite of size 96:
Th-ADS = {

aaaaa, aaaac, aaac, aabaa, aacaa, abaa, acaa,
baaaa, baaac, baac, babaa, bacaa, bacc, bbaa, bbac,
bbc, bcaa, caa }

We note that this is indeed smaller than the HSI test suite. In particular, we have a
smaller state identifier for s0 : {aa} instead of {aa, c}. As a consequence, there are less
combinations of prefixes and suffixes. We also observe that one of the state identifiers
grew in length: {aa, c} instead of {a, c} for state s3 .

3.2 Implementation
All the algorithms concerning the hybrid ADS-method have been implemented and
can be found at https://github.com/Jaxan/hybrid-ads. We note that Algorithm 2.1 is
implemented a bit more efficiently, as we can walk the splitting trees in a particular order. For constructing the splitting trees in the first place, we use Moore’s minimisation
algorithm and the algorithms by Lee and Yannakakis (1994). We keep all relevant
sets prefix-closed by maintaining a trie data structure. A trie data structure also allows
us to immediately obtain the set of maximal tests only.

3.3 Randomisation
Many constructions of the test suite generation can be randomised. There may exist
many shortest access sequences to a state and we can randomly pick any. Also in the

FSM-based Test Methods

35

construction of state identifiers many steps in the algorithm are non-deterministic: the
algorithm may ask to find any input symbol which separates a set of states. The tool
randomises many such choices. We have noticed that this can have a huge influence
on the size of the test suite. However, a decent statistical investigation still lacks at the
moment.
In many of the applications such as learning, no bound on the number of states of
the SUT is known. In such cases it is possible to randomly select test cases from an
infinite test suite. Unfortunately, we lose the theoretical guarantees of completeness
with random generation. Still, as we will see in Chapter 3, this can work really well.
We can randomly test cases as follows. In the above definition for the hybrid ADS
test suite we replace I≤k by I∗ to obtain an infinite test suite. Then we sample tests as
follows:
1. sample an element p from P uniformly,
2. sample a word w from I∗ with a geometric distribution, and
3. sample uniformly from (𝒵′ ; ℋ)s for the state s = δ(s0 , pw).

4 Overview
We give an overview of the aforementioned test methods. We classify them in two
directions,
– whether they use harmonised state identifiers or not and
– whether it used singleton state identifiers or not.
Theorem 26. Assume M to be minimal, reachable, and of size n. The following test
suites are all n + k-complete:
Arbitrary
Many / pairwise

Harmonised

Wp

HSI

P ⋅ I≤k ⋅ ⋃ 𝒲 ∪ Q ⋅ I≤k ⊙ 𝒲

(P ∪ Q) ⋅ I≤k ⊙ ℋ

Hybrid

Hybrid ADS
(P ∪ Q) ⋅ I≤k ⊙ (𝒵′ ; ℋ)

Single / global

UIOv

ADS

P ⋅ I≤k ⋅ ⋃ 𝒰 ∪ Q ⋅ I≤k ⊙ 𝒰

(P ∪ Q) ⋅ I≤k ⊙ 𝒵

Proof. See Corollary 33 and 35.

□

Each of the methods in the right column can be written simpler as P ⋅ I≤k+1 ⊙ ℋ, since
Q = P ⋅ I. This makes them very easy to implement.
It should be noted that the ADS-method is a specific instance of the HSI-method
and similarly the UIOv-method is an instance of the Wp-method. What is generally

36

Chapter 2

meant by the Wp-method and HSI-method is the above formula together with a
particular way to obtain the (harmonised) state identifiers.
We are often interested in the size of the test suite. In the worst case, all methods
generate a test suite with a size in 𝒪(pn3 ) and this bound is tight (Vasilevskii, 1973).
Nevertheless we expect intuitively that the right column performs better, as we are
using a more structured set (given a separating family for the HSI-method, we can
always forget about the common prefixes and apply the Wp-method, which will never
be smaller if constructed in this way). Also we expect the bottom row to perform
better as there is a single test for each state. Small experimental results confirm this
intuition (Dorofeeva, et al., 2010).
On the example in Figure 2.1, we computed all applicable test suites in Sections 2.6
and 3.1. The UIO and ADS methods are not applicable. For the W, Wp, HSI and
hybrid ADS methods we obtained test suites of size 169, 75, 125 and 96 respectively.

5 Proof of completeness
In this section, we will prove n-completeness of the discussed test methods. Before we
dive into the proof, we give some background on the proof-principle of bisimulation.
The original proofs of completeness often involve an inductive argument (on the
length of words) inlined with arguments about characterisation sets. This can be hard
to follow and so we prefer a proof based on bisimulations, which defers the inductive
argument to a general statement about bisimulation. Many notions of bisimulation
exist in the theory of labelled transition systems, but for Mealy machines there is just
one simple definition. We give the definition and the main proof principle, all of
which can be found in a paper by Rutten (1998).
Definition 27. Let M be a Mealy machine. A relation R ⊆ S × S is called a bisimulation
if for every (s, t) ∈ R we have
– equal outputs: λ(s, a) = λ(t, a) for all a ∈ I, and
– related successor states: (δ(s, a), δ(t, a)) ∈ R for all a ∈ I.
Lemma 28.

If two states s, t are related by a bisimulation, then s ∼ t.13

We use a slight generalisation of the bisimulation proof technique, called bisimulation
up-to. This allows one to give a smaller set R which extends to a bisimulation. A good
introduction of these up-to techniques is given by Bonchi and Pous (2015) or the thesis
of Rot (2015). In our case we use bisimulation up-to ∼-union. The following lemma
can be found in the given references.

13

The converse – which we do not need here – also holds, as ∼ is a bisimulation.

FSM-based Test Methods

37

Definition 29. Let M be a Mealy machine. A relation R ⊆ S × S is called a bisimulation
up-to ∼-union if for every (s, t) ∈ R we have
– equal outputs: λ(s, a) = λ(t, a) for all a ∈ I, and
– related successor states: (δ(s, a), δ(t, a)) ∈ R or δ(s, a) ∼ δ(t, a) for all a ∈ I.
Lemma 30.

Any bisimulation up-to ∼-union is contained in a bisimulation.

We fix a specification M which has a minimal representative with n states and an
implementation M′ with at most n + k states. We assume that all states are reachable
from the initial state in both machines (i.e., both are connected).
The next proposition gives sufficient conditions for a test suite of a certain shape
to be complete. We then prove that these conditions hold for the test suites in this
chapter.
Proposition 31. Let 𝒲 and 𝒲′ be two families of words and P a state cover for M.
Let T = P ⋅ I≤k ⊙ 𝒲 ∪ P ⋅ I≤k+1 ⊙ 𝒲′ be a test suite. If
1. for all x, y ∈ M : x ∼Wx ∩Wy y implies x ∼ y,
2. for all x, y ∈ M and z ∈ M′ : x ∼Wx z and z ∼Wy′ y implies x ∼ y, and
3. the machines M and M′ agree on T,
then M and M′ are equivalent.
Proof. First, we prove that P ⋅ I≤k reaches all states in M′ . For p, q ∈ P and x =
δ(s0 , p), y = δ(s0 , q) such that x ̸∼Wx ∩Wy y, we also have δ′ (s′0 , p) ̸∼Wx ∩Wy δ′ (s′0 , q) in
the implementation M′ . By (1) this means that there are at least n different behaviours
in M′ , hence at least n states.
Now n states are reached by the previous argument (using the set P). By assumption M′ has at most k extra states. If those extra states are reachable, they are reachable
from an already visited state in at most k steps. So we can reach all states of M′ by
using I≤k after P.
Second, we show that the reached states are bisimilar. Define the relation R =
{(δ(s0 , p), δ′ (s′0 , p)) | p ∈ P ⋅ I≤k }. Note that for each (s, i) ∈ R we have s ∼Ws i. For each
state i ∈ M′ there is a state s ∈ M such that (s, i) ∈ R, since we reach all states in both
machines by P ⋅ I≤k . We will prove that this relation is in fact a bisimulation up-to
∼-union.
For output, we note that (s, i) ∈ R implies λ(s, a) = λ′ (i, a) for all a, since the
machines agree on P ⋅ I≤k+1 . For the successors, let (s, i) ∈ R and a ∈ I and consider
the successors s2 = δ(s, a) and i2 = δ′ (i, a). We know that there is some t ∈ M with
(t, i2 ) ∈ R. We also know that we tested i2 with the set Wt . So we have:
s2 ∼Ws′ i2 ∼Wt t.
2

By the second assumption, we conclude that s2 ∼ t. So s2 ∼ t and (t, i) ∈ R, which
means that R is a bisimulation up-to ∼-union. Moreover, R contains the pair (s0 , s′0 ).
By using Lemmas 30 and 28, we conclude that the initial s0 and s′0 are equivalent. □

38

Chapter 2

Before we show that the conditions hold for the test methods, we reflect on the above
proof first. This proof is very similar to the completeness proof by Chow (1978).14 In
the first part we argue that all states are visited by using some sort of counting and
reachability argument. Then in the second part we show the actual equivalence. To the
best of the authors knowledge, this is first m-completeness proof which explicitly uses
the concept of a bisimulation. Using a bisimulation allows us to slightly generalise
and use bisimulation up-to ∼-union, dropping the the often-assumed requirement
that M is minimal.
Lemma 32. Let 𝒲′ be a family of state identifiers for M. Define the family 𝒲 by
Ws = ⋃ 𝒲′ . Then the conditions (1) and (2) in Proposition 31 are satisfied.
Proof. The first condition we note that Wx ∩ Wy = Wx = Wy , and so x ∼Wx ∩Wy y
implies x ∼Wx y, now by definition of state identifier we get x ∼ y.
For the second condition, let x ∼∪ 𝒲′ z ∼Wy′ y. Then we note that Wy′ ⊆ ⋃ W ′ and
so we get x ∼Wy′ z ∼Wy′ y. By transitivity we get x ∼Wy′ y and so by definition of state
identifier we get x ∼ y.
□
Corollary 33.

The W, Wp, and UIOv test suites are n + k-complete.

Lemma 34. Let ℋ be a separating family and take 𝒲 = 𝒲′ = ℋ. Then the conditions
(1) and (2) in Proposition 31 are satisfied.
Proof. Let x ∼Hx ∩Hy y, then by definition of separating family x ∼ y. For the second
condition, let x ∼Hx z ∼Hy y. Then we get x ∼Hx ∩Hy z ∼Hx ∩Hy y and so by transitivity
x ∼Hx ∩Hy y, hence again x ∼ y.
□
Corollary 35.

The HSI, ADS and hybrid ADS test suites are n + k-complete.

6 Related Work and Discussion
In this chapter, we have mostly considered classical test methods which are all based
on prefixes and state identifiers. There are more recent methods which almost fit in
the same framework. We mention the P (Simão & Petrenko, 2010), H (Dorofeeva,
et al., 2005), and SPY (Simão, et al., 2009) methods. The P method construct a test
suite by carefully considering sufficient conditions for a p-complete test suite (here
p ≤ n, where n is the number of states). It does not generalise to extra states, but it
seems to construct very small test suites. The H method is a refinement of the HSImethod where state identifiers for a testing transitions are reconsidered. (Note that
Proposition 31 allows for a different family when testing transitions.) Last, the SPY
14

In fact, it is also similar to Lemma 4 by Angluin (1987) which proves termination in the L* learning algorithm. This correspondence was noted by Berg, et al. (2005).

FSM-based Test Methods

39

method builds upon the HSI-method and changes the prefixes in order to minimise the
size of a test suite, exploiting overlap in test sequences. We believe that this technique
is independent of the HSI-method and can in fact be applied to all methods presented
in this chapter. As such, the SPY method should be considered as an optimisation
technique, orthogonal to the work in this chapter.
Recently, Hierons and Türker (2015) devise a novel test method which is based
on incomplete distinguishing sequences and is similar to the hybrid ADS method.
They use sequences which can be considered to be adaptive distinguishing sequences
on a subset of the state space. With several of those one can cover the whole state
space, obtaining a m-complete test suite. This is a bit dual to our approach, as our
“incomplete” adaptive distinguishing sequences define a course partition of the complete state space. Our method becomes complete by refining the tests with pairwise
separating sequences.
Some work is put into minimising the adaptive distinguishing sequences themselves. Türker and Yenigün (2014) describe greedy algorithms which construct small
adaptive distinguishing sequences. Moreover, they show that finding the minimal
adaptive distinguishing sequence is NP-complete in general, even approximation is
NP-complete. We expect that similar heuristics also exist for the other test methods
and that they will improve the performance. Note that minimal separating sequences
do not guarantee a minimal test suite. In fact, we see that the hybrid ADS method
outperforms the HSI-method on the example in Figure 2.1 since it prefers longer, but
fewer, sequences.
Some of the assumptions made at the start of this chapter have also been challenged. For non-deterministic Mealy machine, we mention the work of Petrenko and
Yevtushenko (2014). We also mention the work of van den Bos, et al. (2017) and
Simão and Petrenko (2014) for input/output transition systems with the ioco relation.
In both cases, the test suites are still defined in the same way as in this chapter: prefixes followed by state identifiers. However, for non-deterministic systems, guiding
an implementation into a state is harder as the implementation may choose its own
path. For that reason, sequences are often replaced by automata, so that the testing
can be adaptive. This adaptive testing is game-theoretic and the automaton provides
a strategy. This game theoretic point of view is further investigated by van den Bos
and Stoelinga (2018). The test suites generally are of exponential size, depending on
how non-deterministic the systems are.
The assumption that the implementation is resettable is also challenged early on.
If the machine has no reliable reset (or the reset is too expensive), one tests the system with a single checking sequence. Lee and Yannakakis (1994) give a randomised
algorithm for constructing such a checking sequence using adaptive distinguishing sequences. There is a similarity with the randomised algorithm by Rivest and Schapire
(1993) for learning non-resettable automata. Recently, Groz, et al. (2018) give a
deterministic learning algorithm for non-resettable machines based on adaptive distinguishing sequences.

40

Chapter 2

Many of the methods described here are benchmarked on small or random Mealy
machines by Dorofeeva, et al. (2010) and Endo and Simão (2013). The benchmarks
are of limited scope, the machine from Chapter 3, for instance, is neither small nor
random. For this reason, we started to collect more realistic benchmarks at http:/
/automata.cs.ru.nl/.

Chapter 3
Applying Automata Learning to Embedded
Control Software
Wouter Smeek
Océ Technologies B.V.

Joshua Moerman
Radboud University

Frits Vaandrager
Radboud University

David N. Jansen
Radboud University
Abstract

Using an adaptation of state-of-the-art algorithms for black-box automata learning, as implemented in the LearnLib tool, we succeeded to learn a model of the
Engine Status Manager (ESM), a software component that is used in printers
and copiers of Océ. The main challenge that we encountered was that LearnLib, although effective in constructing hypothesis models, was unable to find
counterexamples for some hypotheses. In fact, none of the existing FSM-based
conformance testing methods that we tried worked for this case study. We
therefore implemented an extension of the algorithm of Lee & Yannakakis for
computing an adaptive distinguishing sequence. Even when an adaptive distinguishing sequence does not exist, Lee & Yannakakis’ algorithm produces
an adaptive sequence that “almost” identifies states. In combination with a
standard algorithm for computing separating sequences for pairs of states, we
managed to verify states with on average 3 test queries. Altogether, we needed
around 60 million queries to learn a model of the ESM with 77 inputs and 3.410
states. We also constructed a model directly from the ESM software and established equivalence with the learned model. To the best of our knowledge, this is
the first paper in which active automata learning has been applied to industrial
control software.

This chapter is based on the following publication:
Smeenk, W., Moerman, J., Vaandrager, F. W., & Jansen, D. N. (2015). Applying Automata Learning to Embedded Control Software. In Formal Methods and Software
Engineering - 17th International Conference on Formal Engineering Methods, ICFEM, Proceedings. Springer. doi:10.1007/978-3-319-25423-4_5

42

Chapter 3

Once they have high-level models of the behaviour of software components, software
engineers can construct better software in less time. A key problem in practice, however, is the construction of models for existing software components, for which no or
only limited documentation is available.
The construction of models from observations of component behaviour can be
performed using regular inference – also known as automata learning (see Angluin,
1987; de la Higuera, 2010; Steffen, et al., 2011). The most efficient such techniques use
the set-up of active learning, illustrated in Figure 3.1, in which a “learner” has the task
to learn a model of a system by actively asking questions to a “teacher”.
Learner

Teacher
Membership Query
Output

SUT

Equivalence Query

yes or no with counterexample

Figure 3.1

MBT tool

Active learning of reactive systems.

The core of the teacher is a System Under Test (SUT), a reactive system to which one can
apply inputs and whose outputs one may observe. The learner interacts with the SUT
to infer a model by sending inputs and observing the resulting outputs (“membership
queries”). In order to find out whether an inferred model is correct, the learner may
pose an “equivalence query”. The teacher uses a model-based testing (MBT) tool to
try and answer such queries: Given a hypothesised model, an MBT tool generates a
long test sequence using some conformance testing method. If the SUT passes this
test, then the teacher informs the learner that the model is deemed correct. If the
outputs of the SUT and the model differ, this constitutes a counterexample, which
is returned to the learner. Based on such a counterexample, the learner may then
construct an improved hypothesis. Hence, the task of the learner is to collect data by
interacting with the teacher and to formulate hypotheses, and the task of the MBT
tool is to establish the validity of these hypotheses. It is important to note that it may
occur that an SUT passes the test for an hypothesis, even though this hypothesis is
not valid.
Triggered by various theoretical and practical results, see for instance the work by
Aarts (2014); Berg, et al. (2005); Cassel, et al. (2015); Howar, et al. (2012); Leucker
(2006); Merten, et al. (2012); Raffelt, et al. (2009), there is a fast-growing interest in
automata learning technology. In recent years, automata learning has been applied
successfully, e.g., to regression testing of telecommunication systems (Hungar, et al.,
2003), checking conformance of communication protocols to a reference implementation (Aarts, et al., 2014), finding bugs in Windows and Linux implementations of TCP

Applying Automata Learning to Embedded Control Software

43

(Fiterău-Broștean, et al., 2014), analysis of botnet command and control protocols
(Cho, et al., 2010), and integration testing (Groz, et al., 2008 and Li, et al., 2006).
In this chapter, we explore whether LearnLib by Raffelt, et al. (2009), a state-ofthe-art automata learning tool, is able to learn a model of the Engine Status Manager
(ESM), a piece of control software that is used in many printers and copiers of Océ.
Software components like the ESM can be found in many embedded systems in one
form or another. Being able to retrieve models of such components automatically is
potentially very useful. For instance, if the software is fixed or enriched with new
functionality, one may use a learned model for regression testing. Also, if the source
code of software is hard to read and poorly documented, one may use a model of the
software for model-based testing of a new implementation, or even for generating an
implementation on a new platform automatically. Using a model checker one may
also study the interaction of the software with other components for which models
are available.
The ESM software is actually well documented, and an extensive test suite exists.
The ESM, which has been implemented using Rational Rose Real-Time (RRRT), is
stable and has been in use for 10 years. Due to these characteristics, the ESM is an
excellent benchmark for assessing the performance of automata learning tools in
this area. The ESM has also been studied in other research projects: Ploeger (2005)
modelled the ESM and other related managers and verified properties based on the
official specifications of the ESM, and Graaf and van Deursen (2007) have checked
the consistency of the behavioural specifications defined in the ESM against the RRRT
definitions.
Learning a model of the ESM turned out to be more complicated than expected.
The top level UML/RRRT statechart from which the software is generated only has
16 states. However, each of these states contains nested states, and in total there are
70 states that do not have further nested states. Moreover, the C++ code contained
in the actions of the transitions also creates some complexity, and this explains why
the minimal Mealy machine that models the ESM has 3.410 states. LearnLib has been
used to learn models with tens of thousands of states by Raffelt, et al. (2009), and
therefore we expected that it would be easy to learn a model for the ESM. However,
finding counterexamples for incorrect hypotheses turned out to be challenging due
to the large number of 77 inputs. The test algorithms implemented in LearnLib, such
as random testing, the W-method by Chow (1978) and Vasilevskii (1973) and the
Wp-method by Fujiwara, et al. (1991), failed to deliver counterexamples within an
acceptable time. Automata learning techniques have been successfully applied to
case studies in which the total number of input symbols is much larger, but in these
cases it was possible to reduce the number of inputs to a small number (less than 10)
using abstraction techniques (Aarts, et al., 2015 and Howar, et al., 2011). In the case of
ESM, use of abstraction techniques only allowed us to reduce the original 156 concrete
actions to 77 abstract actions.

44

Chapter 3

We therefore implemented an extension of an algorithm of Lee and Yannakakis
(1994) for computing adaptive distinguishing sequences. Even when an adaptive
distinguishing sequence does not exist, Lee & Yannakakis’ algorithm produces an
adaptive sequence that “almost” identifies states. In combination with a standard
algorithm for computing separating sequences for pairs of states, we managed to verify
states with on average 3 test queries and to learn a model of the ESM with 77 inputs
and 3.410 states. We also constructed a model directly from the ESM software and
established equivalence with the learned model. To the best of our knowledge, this is
the first paper in which active automata learning has been applied to industrial control
software. Preliminary evidence suggests that our adaptation of Lee & Yannakakis’
algorithm outperforms existing FSM-based conformance algorithms.
During recent years most researchers working on active automata learning focused
their efforts on efficient algorithms and tools for the construction of hypothesis models.
Our work shows that if we want to further scale automata learning to industrial
applications, we also need better algorithms for finding counterexamples for incorrect
hypotheses. Following Berg, et al. (2005), our work shows that the context of automata
learning provides both new challenges and new opportunities for the application of
testing algorithms. All the models for the ESM case study together with the learning
and testing statistics are available at http://www.mbsd.cs.ru.nl/publications/papers
/fvaan/ESM/, as a benchmark for both the automata learning and testing communities.
It is now also included in the automata wiki at http://automata.cs.ru.nl/.

1 Engine Status Manager
The focus of this article is the Engine Status Manager (ESM), a software component
that is used to manage the status of the engine of Océ printers and copiers. In this
section, the overall structure and context of the ESM will be explained.

1.1 ESRA
The requirements and behaviour of the ESM are defined in a software architecture
called Embedded Software Reference Architecture (ESRA). The components defined
in this architecture are reused in many of the products developed by Océ and form an
important part of these products. This architecture is developed for cut-sheet printers
or copiers. The term cut-sheet refers to the use of separate sheets of paper as opposed
to a continuous feed of paper.
An engine refers to the printing or scanning part of a printer or copier. Other
products can be connected to an engine that pre- or postprocess the paper, for example
a cutter, folder, stacker or stapler.

Applying Automata Learning to Embedded Control Software

45

Engine Software
Controller

External Interface Adapters
Managers

OS Facilities
and Services

Functions

Figure 3.2

Global overview of the engine software.

Figure 3.2 gives an overview of the software in a printer or copier. The controller
communicates the required actions to the engine software. This includes transport
of digital images, status control, print or scan actions and error handling. The controller is responsible for queuing, processing the actions received from the network
and operators and delegating the appropriate actions to the engine software. The
managers communicate with the controller using the external interface adapters. These
adapters translate the external protocols to internal protocols. The managers manage
the different functions of the engine. They are divided by the different functionalities
such as status control, print or scan actions or error handling they implement. In
order to do this, a manager may communicate with other managers and functions. A
function is responsible for a specific set of hardware components. It translates commands from the managers to the function hardware and reports the status and other
information of the function hardware to the managers. This hardware can for example
be the printing hardware or hardware that is not part of the engine hardware such as
a stapler. Other functionalities such as logging and debugging are orthogonal to the
functions and managers.

1.2 ESM and connected components
The ESM is responsible for the transition from one status of the printer or copier to
another. It coordinates the functions to bring them in the correct status. Moreover, it
informs all its connected clients (managers or the controller) of status changes. Finally,
it handles status transitions when an error occurs.
Figure 3.3 shows the different components to which the ESM is connected. The Error Handling Manager (EHM), Action Control Manager (ACM) and other clients request
engine statuses. The ESM decides whether a request can be honored immediately, has
to be postponed or ignored. If the requested action is processed the ESM requests the
functions to go to the appropriate status. The EHM has the highest priority and its
requests are processed first. The EHM can request the engine to go into the defect
status. The ACM has the next highest priority. The ACM requests the engine to switch
between running and standby status. The other clients request transitions between the
other statuses, such as idle, sleep, standby and low power. All the other clients have

46

Chapter 3

Error
Handling
Manager

Action
Control
Manager

Other Client

Top Capsule

Engine
Status
Manager

Information
Manager

Function

Figure 3.3 Overview of the managers and clients
connected to the ESM.
the same lowest priority. The Top Capsule instantiates the ESM and communicates
with it during the initialisation of the ESM. The Information Manager provides some
parameters during the initialisation.
There are more managers connected to the ESM but they are of less importance
and are thus not mentioned here.

1.3 Rational Rose RealTime
The ESM has been implemented using Rational Rose RealTime (RRRT). In this tool
so-called capsules can be created. Each of these capsules defines a hierarchical statechart
diagram. Capsules can be connected with each other using structure diagrams. Each
capsule contains a number of ports that can be connected to ports of other capsules by
adding connections in the associated structure diagram. Each of these ports specifies
which protocol should be used. This protocol defines which messages may be sent
to and from the port. Transitions in the statechart diagram of the capsule can be
triggered by arriving messages on a port of the capsule. Messages can be sent to
these ports using the action code of the transition. The transitions between the states,
actions and guards are defined in C++ code. From the state diagram, C++ source
files are generated.
The RRRT language and semantics is based on UML (Object Management Group
(OMG), 2004) and ROOM (Selic, et al., 1994). One important concept used in RRRT
is the run-to-completion execution model (Eshuis, et al., 2002). This means that when
a received message is processed, the execution cannot be interrupted by other arriving
messages. These messages are placed in a queue to be processed later.

Applying Automata Learning to Embedded Control Software

47

1.4 The ESM state diagram
defect

goingToDefect

sleep

goingToSleep

power on
startup

awakening

power off
idle
resetting

goingToStandby

goingToLowPower

standby

medium
starting

lowPower

stopping
running

Figure 3.4

Top states and transitions of the ESM.

Figure 3.4 shows the top states of the ESM statechart. The statuses that can be requested by the clients and managers correspond to gray states. The other states are
so called transitory states. In transitory states the ESM is waiting for the functions to
report that they have moved to the corresponding status. Once all functions have
reported, the ESM moves to the corresponding status.
The idle status indicates that the engine has started up but that it is still cold
(uncontrolled temperature). The standby status indicates that the engine is warm
and ready for printing or scanning. The running status indicates that the engine is
printing or scanning. The transitions from the overarching state to the goingToSleep
and goingToDefect states indicate that it is possible to move to the sleep or defect
status from any state. In some cases it is possible to awake from sleep status, in other
cases the main power is turned off. The medium status is designed for diagnostics. In
this status the functions can each be in a different status. For example one function is
in standby status while another function is in idle status.
The statechart diagram in Figure 3.4 may seem simple, but it hides many details.
Each of the states has up to 5 nested states. In total there are 70 states that do not
have further nested states. The C++ code contained in the actions of the transitions
is in some cases non-trivial. The possibility to transition from any state to the sleep or
defect state also complicates the learning.

48

Chapter 3

2 Learning the ESM
In order to learn a model of the ESM, we connected it to LearnLib by Merten, et al.
(2011), a state-of-the-art tool for learning Mealy machines developed at the University
of Dortmund. A Mealy machine is a tuple M = (I, O, Q, q0 , δ, λ), where
– I is a finite set of input symbols,
– O is a finite set of output symbols,
– Q is a finite set of states,
– q0 ∈ Q is an initial state,
– δ : Q × I → Q is a transition function, and
– λ : Q × I → O is an output function.
The behaviour of a Mealy machine is deterministic, in the sense that the outputs are
fully determined by the inputs. Functions δ and λ are extended to accept sequences
in the standard way. We say that Mealy machines M = (I, O, Q, q0 , δ, λ) and M′ =
(I′ , O′ , Q′ , q′0 , δ′ , λ′ ) are equivalent if they generate an identical sequence of outputs for
every sequence of inputs, that is, if I = I′ and, for all w ∈ I∗ , λ(q0 , w) = λ′ (q′0 , w). If
the behaviour of an SUT is described by a Mealy machine M then the task of LearnLib
is to learn a Mealy machine M′ that is equivalent to M.

2.1 Experimental set-up
A clear interface to the ESM has been defined in RRRT. The ESM defines ports from
which it receives a predefined set of inputs and to which it can send a predefined
set of outputs. However, this interface can only be used within RRRT. In order to
communicate with the LearnLib software a TCP connection was set up. An extra
capsule was created in RRRT which connects to the ports defined by the ESM. This
capsule opened a TCP connection to LearnLib. Inputs and outputs are translated to
and from a string format and sent over the connection. Before each membership query,
the learner needs to bring the SUT back to its initial state. In other words, LearnLib
needs a way to reset the SUT.
Some inputs and outputs sent to and from the ESM carry parameters. These parameters are enumerations of statuses, or integers bounded by the number of functions
connected to the ESM. Currently, LearnLib cannot handle inputs with parameters;
therefore, we introduced a separate input action for every parameter value. Based on
domain knowledge and discussions with the Océ engineers, we could group some of
these inputs together and reduce the total number of inputs. When learning the ESM
using one function, 83 concrete inputs are grouped into four abstract inputs. When
using two functions, 126 concrete inputs can be grouped. When an abstract input
needs to be sent to the ESM, one concrete input of the represented group is randomly
selected, as in the approach of Aarts, et al. (2015). This is a valid abstraction because
all the inputs in the group have exactly the same behaviour in any state of the ESM.

Applying Automata Learning to Embedded Control Software

49

This has been verified by doing code inspection. No other abstractions were found
during the research. After the inputs are grouped a total of 77 inputs remain when
learning the ESM using 1 function, and 105 inputs remain when using 2 functions.
It was not immediately obvious how to model the ESM by a Mealy machine, since
some inputs trigger no output, whereas other inputs trigger several outputs. In order
to resolve this, we benefited from the run-to-completion execution model used in
RRRT. Whenever an input is sent, all the outputs are collected until quiescence is
detected. Next, all the outputs are concatenated and are sent to LearnLib as a single
aggregated output. In model-based testing, quiescence is usually detected by waiting
for a fixed time-out period. However, this causes the system to be mostly idle while
waiting for the time-out, which is inefficient. In order to detect quiescence faster, we
exploited the run-to-completion execution model used by RRRT: we modified the ESM
to respond to a new low-priority test input with a (single) special output. This test
input is sent after each normal input. Only after the normal input is processed and
all the generated outputs have been sent, the test input is processed and the special
output is generated; upon its reception, quiescence can be detected immediately and
reliably.

2.2 Test selection strategies
In the ESM case study the most challenging problem was finding counterexamples
for the hypotheses constructed during learning.
LearnLib implements several algorithms for conformance testing, one of which is
a random walk algorithm. The random walk algorithm works by first selecting the
length of the test query according to a geometric distribution, cut off at a fixed upper
bound. Each of the input symbols in the test query is then randomly selected from
the input alphabet I from a uniform distribution. In order to find counterexamples,
a specific sequence of input symbols is needed to arrive at the state in the SUT that
differentiates it from the hypothesis. The upper bound for the size of this search
space is |I|n where |I| is the size of the input alphabet used, and n the length of the
counterexample that needs to be found. If this sequence is long the chance of finding
it is small. Because the ESM has many different input symbols to choose from, finding
the correct one is hard. When learning the ESM with 1 function there are 77 possible
input symbols. If for example the length of the counterexample needs to be at least 6
inputs to identify a certain state, then the upper bound on the number of test queries
would be around 2 × 1011 . An average test query takes around 1 ms, so it would take
about 7 years to execute these test queries.
Augmented DS-method15. In order to reduce the number of tests, Chow (1978) and
Vasilevskii (1973) pioneered the so called W-method. In their framework a test query
15

This was later called the hybrid ADS-method.

50

Chapter 3

consists of a prefix p bringing the SUT to a specific state, a (random) middle part
m and a suffix s assuring that the SUT is in the appropriate state. This results in a
test suite of the form PI≤k W, where P is a set of (shortest) access sequences, I≤k the
set of all sequences of length at most k, and W is a characterisation set. Classically,
this characterisation set is constructed by taking the set of all (pairwise) separating
sequences. For k = 1 this test suite is complete in the sense that if the SUT passes all
tests, then either the SUT is equivalent to the specification or the SUT has strictly more
states than the specification. By increasing k we can check additional states.
We tried using the W-method as implemented by LearnLib to find counterexamples. The generated test suite, however, was still too big in our learning context.
Fujiwara, et al. (1991) observed that it is possible to let the set W depend on the state
the SUT is supposed to be. This allows us to only take a subset of W which is relevant
for a specific state. This slightly reduces the test suite without losing the power of
the full test suite. This method is known as the Wp-method. More importantly, this
observation allows for generalisations where we can carefully pick the suffixes.
In the presence of an (adaptive) distinguishing sequence one can take W to be
a single suffix, greatly reducing the test suite. Lee and Yannakakis (1994) describe
an algorithm (which we will refer to as the LY algorithm) to efficiently construct
this sequence, if it exists. In our case, unfortunately, most hypotheses did not enjoy
existence of an adaptive distinguishing sequence. In these cases the incomplete result
from the LY algorithm still contained a lot of information which we augmented by
pairwise separating sequences.
I46

O9
I6.0

O3.14

O3.3

I10

I11

I10

I10

Q
{18, 133
1287, 1295}

I19 I31.0 I37.3 I9.2

Q

{555}

{856}

I19 I31.0 I37.3 I9.2
I19

I19
I9.1 I11 I37.0 I10 I18

{7, 106, 1025,
130, 1289, 1291}
{514}

{516}
{425}

{1135}

... O28.0

{788}

{1137}
{597}

{556}

{465}

... Q

{857}

{563}

Figure 3.5 A small part of an incomplete distinguishing sequence as produced by
the LY algorithm. Leaves contain a set of possible initial states, inner nodes have input
sequences and edges correspond to different output symbols (of which we only drew
some), where Q stands for quiescence.

Applying Automata Learning to Embedded Control Software

51

As an example we show an incomplete adaptive distinguishing sequence for one of
the hypothesis in Figure 3.5. When we apply the input sequence I46 I6.0 I10 I19 I31.0
I37.3 I9.2 and observe outputs O9 O3.3 Q … O28.0, we know for sure that the SUT was
in state 788. Unfortunately, not all paths lead to a singleton set. When for instance
we apply the sequence I46 I6.0 I10 and observe the outputs O9 O3.14 Q, we know for
sure that the SUT was in one of the states 18, 133, 1287 or 1295. In these cases we have
to perform more experiments and we resort to pairwise separating sequences.
We note that this augmented DS-method is in the worst case not any better than
the classical Wp-method. In our case, however, it greatly reduced the test suites.
Once we have our set of suffixes, which we call Z now, our test algorithm works
as follows. The algorithm first exhausts the set PI≤1 Z. If this does not provide a
counterexample, we will randomly pick test queries from PI2 I∗ Z, where the algorithm samples uniformly from P, I2 and Z (if Z contains more that 1 sequence for the
supposed state) and with a geometric distribution on I∗ .
Sub-alphabet selection. Using the above method the algorithm still failed to learn
the ESM. By looking at the RRRT-based model we were able to see why the algorithm
failed to learn. In the initialisation phase, the controller gives exceptional behaviour
when providing a certain input eight times consecutively. Of course such a sequence
is hard to find in the above testing method. With this knowledge we could construct
a single counterexample by hand by which means the algorithm was able to learn the
ESM.
In order to automate this process, we defined a sub-alphabet of actions that are
important during the initialisation phase of the controller. This sub-alphabet will be
used a bit more often than the full alphabet. We do this as follows. We start testing with
the alphabet which provided a counterexample for the previous hypothesis (for the
first hypothesis we take the sub-alphabet). If no counterexample can be found within
a specified query bound, then we repeat with the next alphabet. If both alphabets
do not produce a counterexample within the bound, the bound is increased by some
factor and we repeat all. This method only marginally increases the number of tests.
But it did find the right counterexample we first had to construct by hand.

2.3 Results
Using the learning set-up discussed in Section 2.1 and the test selection strategies
discussed in Section 2.2, a model of the ESM using 1 function could be learned. After
an additional eight hours of testing no counterexample was found and the experiment
was stopped. The following list gives the most important statistics gathered during
the learning:
– The learned model has 3.410 states.
– Altogether, 114 hypotheses were generated.
– The time needed for learning the final hypothesis was 8 h, 26 min, and 19 s.

52

Chapter 3

– 29.933.643 membership queries were posed (on average 35,77 inputs per query).
– 30.629.711 test queries were required (on average 29,06 inputs per query).

3 Verification
To verify the correctness of the model that was learned using LearnLib, we checked
its equivalence with a model that was generated directly from the code.

3.1 Approach
As mentioned already, the ESM has been implemented using Rational Rose RealTime
(RRRT). Thus a statechart representation of the ESM is available. However, we have
not been able to find a tool that translates RRRT models to Mealy machines, allowing us
to compare the RRRT-based model of the ESM with the learned model. We considered
several formalisms and tools that were proposed in the literature to flatten statecharts
to state machines. The first one was a tool for hierarchical timed automata (HTA) by
David, et al. (2002). However, we found it hard to translate the output of this tool, a
network of Uppaal timed automata, to a Mealy machine that could be compared to the
learned model. The second tool that we considered has been developed by Hansen, et
al. (2010). This tool misses some essential features, for example the ability to assign
new values to state variables on transitions. Finally, we considered a formalism called
object-oriented action systems (OOAS) by Krenn, et al. (2009), but no tools to use
this could be found.
In the end we decided to implement the required model transformations ourselves.
Figure 3.6 displays the different formats for representing models that we used and
the transformations between those formats.
RRRT
UML
statechart

PapyrusUML

Uppaal
RFSM
.xml

HEFSM

LearnLib
Mealy
Machine
.dot

CADP
LTS

Figure 3.6 Formats for representing models and transformations between formats.
We used the bisimulation checker of CADP by Garavel, et al. (2011) to check the
equivalence of labelled transition system models in .aut format. The Mealy machine
models learned by LearnLib are represented as .dot files. A small script converts

Applying Automata Learning to Embedded Control Software

53

these Mealy machines to labelled transition systems in .aut format. We used the
Uppaal tool by Behrmann, et al. (2006) as an editor for defining extended finite state
machines (EFSM), represented as .xml files. A script developed in the ITALIA project
(http://www.italia.cs.ru.nl/) converts these EFSM models to LOTOS, and then
CADP takes care of the conversion from LOTOS to the .aut format.
The Uppaal syntax is not sufficiently expressive to directly encode the RRRT definition of the ESM, since this definition makes heavy use of UML (Object Management
Group (OMG), 2004) concepts such as state hierarchy and transitions from composite
states, concepts which are not present in Uppaal. Using Uppaal would force us to
duplicate many transitions and states.
We decided to manually create an intermediate hierarchical EFSM (HEFSM) model
using the UML drawing tool PapyrusUML (Lanusse, et al., 2009). The HEFSM model
closely resembles the RRRT UML model, but many elements used in UML state machines are left out because they are not needed for modelling the ESM and complicate
the transformation process.

3.2 Model transformations
We explain the transformation from the HEFSM model to the EFSM model using
examples. The transformation is divided into five steps, which are executed in order:
1. combine transitions without input or output signal,
2. transform supertransitions,
3. transform internal transitions,
4. add input signals that do not generate an output, and
5. replace invocations of the next function.
1. Empty transitions. In order to make the model more readable and to make it
easy to model if and switch statements in the C++ code the HEFSM model allows
for transitions without a signal. These transitions are called empty transitions. An
empty transition can still contain a guard and an assignment. However these kinds of
transitions are only allowed on states that only contain empty outgoing transitions.
This was done to make the transformation easy and the model easy to read.
In order to transform a state with empty transitions all the incoming and outgoing
transitions are collected. For each combination of incoming transition a and outgoing
transition b a new transition c is created with the source of a as source and the target of
b as target. The guard for transition c evaluates to true if and only if the guard of a and
b both evaluate to true. The assignment of c is the concatenation of the assignment of
a and b. The signal of c will be the signal of a because b cannot have a signal. Once
all the new transitions are created all the states with empty transitions are removed
together with all their incoming and outgoing transitions.
Figure 3.7 shows an example model with empty transitions and its transformed
version. Each of the incoming transitions from the state B is combined with each of

54

Chapter 3

the outgoing transitions. This results into two new transitions. The old transitions
and state B are removed.
A

A

OP()
["a==0"]

B

["a==1"]

["a==1"]

["a==0"]
C

D

C

D

Figure 3.7 Example of empty transition transformation. On
the left the original version. On the right the transformed version.
2. Supertransitions. The RRRT model of the ESM contains many transitions originating from a composite state. Informally, these supertransitions can be taken in each
of the substates of the composite state if the guard evaluates to true. In order to model
the ESM as closely as possible, supertransitions are also supported in the HEFSM
model.
In RRRT transitions are evaluated from bottom to top. This means that first the
transitions from the leaf state are considered, then transitions from its parent state
and then from its parent’s parent state, etc. Once a transition for which the guard
evaluates to true and the correct signal has been found it is taken. When flattening the
statechart, we modified the guards of supertransitions to ensure the correct priorities.
A

IP()

B

B

IP() ["a==1"]
C

A
IP() ["a!=1"]

IP() ["a==1"]
C

IP()

Figure 3.8 Example of supertransition transformation. On the
left the original version. On the right the transformed version.
Figure 3.8 shows an example model with supertransitions and its transformed version.
The supertransition from state A can be taken at each of A’s leaf states B and C. The
transformation removes the original supertransition and creates a new transition at
states B and C using the same target state. For leaf state C this is easy because it does not
contain a transition with the input signal IP. In state B the transition to state C would be
taken if a signal IP was processed and the state variable a equals 1. The supertransition
can only be taken if the other transition cannot be taken. This is why the negation of
other the guard is added to the new transition. If the original supertransition is an

Applying Automata Learning to Embedded Control Software

55

internal transition the model needs further transformation after this transformation.
This is described in the next paragraph. If the original supertransition is not an internal
transition the new transitions will have the initial state of A as target.
3. Internal transitions. The ESM model also makes use of internal transitions in
RRRT. Using such a transition the current state does not change. If such a transition
is defined on a composite state it can be taken from all of the substates and return to
the same leaf state it originated from. If defined on a composite state it is thus also
a supertransition. This is also possible in the HEFSM model. In order to transform
an internal transition it is first seen as a supertransition and the above transformation
is applied. Then the target of the transition is simply set to the leaf state it originates
from. An example can be seen in Figure 3.8. If the supertransition from state A is also
defined to be an internal transition the transformed version on the right would need
another transformation. The new transitions that now have the target state A would
be transformed to have the same target state as their current source state.
4. Quiescent transitions. In order to reduce the number of transitions in the HEFSM
model quiescent transitions are added automatically. For every state all the transitions
for each signal are collected in a set T. A new self transition a is added for each signal.
The guard for transition a evaluates to true if and only if none of the guards of the
transactions in T evaluates to true. This makes the HEFSM input enabled without
having to specify all the transitions.
5. The next function. In RRRT it is possible to write the guard and assignment in
C++ code. It is thus possible that the value of a variable changes while an input signal
is processed. In the HEFSM however all the assignments only take effect after the
input signal is processed. In order to simulate this behaviour the next function is used.
This function takes a variable name and evaluates to the value of this variable after
the transition.

3.3 Results
Figure 3.9 shows a visualisation of the learned model that was generated using Gephi
(Bastian, et al., 2009). States are coloured according to the strongly connected components. The number of transitions between two states is represented by the thickness of
the edge. The large number of states (3.410) and transitions (262.570) makes it hard
to visualise this model. Nevertheless, the visualisation does provide insight in the
behaviour of the ESM. The three protrusions at the bottom of Figure 3.9 correspond to
deadlocks in the model. These deadlocks are “error” states that are present in the ESM
by design. According to the Océ engineers, the sequences of inputs that are needed to
drive the ESM into these deadlock states will always be followed by a system power
reset. The protrusion at the top right of the figure corresponds to the initialisation
phase of the ESM. This phase is performed only once and thus only transitions from
the initialisation cluster to the main body of states are present.

56

Chapter 3

Figure 3.9

Final model of the ESM.

During the construction of the RRRT-based model, the ESM code was thoroughly
inspected. This resulted in the discovery of missing behaviour in one transition of the
ESM code. An Océ software engineer confirmed that this behaviour is a (minor) bug,
which will be fixed. We have verified the equivalence of the learned model and the
RRRT-based model by using CADP (Garavel, et al., 2011).

4 Conclusions and Future Work
Using an extension of algorithm by Lee and Yannakakis (1994) for adaptive distinguishing sequences, we succeeded to learn a Mealy machine model of a piece of
widely used industrial control software. Our extension of Lee & Yannakakis’ algorithm is rather obvious, but nevertheless appears to be new. Preliminary evidence
suggests that it outperforms existing conformance testing algorithms. We are currently performing experiments in which we compare the new algorithm with other
test algorithms on a number of realistic benchmarks.
There are several possibilities for extending the ESM case study. To begin with,
one could try to learn a model of the ESM with more than one function. Another
interesting possibility would be to learn models of the EHM, ACM, and other managers
connected to the ESM. Using these models some of the properties discussed by Ploeger

Applying Automata Learning to Embedded Control Software

57

(2005) could be verified at a more detailed level. We expect that the combination
of LearnLib with the extended Lee & Yannakakis algorithm can be applied to learn
models of many other software components.
In the specific case study described in this article, we know that our learning
algorithm has succeeded to learn the correct model, since we established equivalence
with a reference model that was constructed independently from the RRRT model of
the ESM software. In the absence of a reference model, we can never guarantee that
the actual system behaviour conforms to a learned model. In order to deal with this
problem, it is important to define metrics that quantify the difference (or distance)
between a hypothesis and a correct model of the SUT and to develop test generation
algorithms that guarantee an upper bound on this difference. Preliminary work in
this area is reported by Smetsers, et al. (2014).

Acknowledgements
We thank Lou Somers for suggesting the ESM case study and for his support of our
research. Fides Aarts and Harco Kuppens helped us with the use of LearnLib and
CADP, and Jan Tretmans gave useful feedback.

58

Chapter 3

Chapter 4
Minimal Separating Sequences
for All Pairs of States
Rick Smetsers
Radboud University

Joshua Moerman
Radboud University
David N. Jansen
Radboud University

Abstract
Finding minimal separating sequences for all pairs of inequivalent states
in a finite state machine is a classic problem in automata theory. Sets of
minimal separating sequences, for instance, play a central role in many
conformance testing methods. Moore has already outlined a partition
refinement algorithm that constructs such a set of sequences in 𝒪(mn)
time, where m is the number of transitions and n is the number of
states. In this chapter, we present an improved algorithm based on the
minimisation algorithm of Hopcroft that runs in 𝒪(m log n) time. The
efficiency of our algorithm is empirically verified and compared to the
traditional algorithm.

This chapter is based on the following publication:
Smetsers, R., Moerman, J., & Jansen, D. N. (2016). Minimal Separating Sequences for
All Pairs of States. In Language and Automata Theory and Applications - 10th International
Conference, LATA, Proceedings. Springer. doi:10.1007/978-3-319-30000-9_14

60

Chapter 4

In diverse areas of computer science and engineering, systems can be modelled by finite
state machines (FSMs). One of the cornerstones of automata theory is minimisation of
such machines – and many variation thereof. In this process one obtains an equivalent
minimal FSM, where states are different if and only if they have different behaviour.
The first to develop an algorithm for minimisation was Moore (1956). His algorithm
has a time complexity of 𝒪(mn), where m is the number of transitions, and n is
the number of states of the FSM. Later, Hopcroft (1971) improved this bound to
𝒪(m log n).
Minimisation algorithms can be used as a framework for deriving a set of separating sequences that show why states are inequivalent. The separating sequences in
Moore’s framework are of minimal length (Gill, 1962). Obtaining minimal separating
sequences in Hopcroft’s framework, however, is a non-trivial task. In this chapter, we
present an algorithm for finding such minimal separating sequences for all pairs of
inequivalent states of a FSM in 𝒪(m log n) time.
Coincidentally, Bonchi and Pous (2013) recently introduced a new algorithm for
the equally fundamental problem of proving equivalence of states in non-deterministic
automata. As both their and our work demonstrate, even classical problems in automata theory can still offer surprising research opportunities. Moreover, new ideas
for well-studied problems may lead to algorithmic improvements that are of practical
importance in a variety of applications.
One such application for our work is in conformance testing. Here, the goal is
to test if a black box implementation of a system is functioning as described by a
given FSM. It consists of applying sequences of inputs to the implementation, and
comparing the output of the system to the output prescribed by the FSM. Minimal
separating sequences are used in many test generation methods (Dorofeeva, et al.,
2010). Therefore, our algorithm can be used to improve these methods.

1 Preliminaries
We define a FSM as a Mealy machine M = (I, O, S, δ, λ), where I, O and S are finite
sets of inputs, outputs and states respectively, δ : S × I → S is a transition function and
λ : S × I → O is an output function. The functions δ and λ are naturally extended to
δ : S × I∗ → S and λ : S × I∗ → O∗ . Moreover, given a set of states S′ ⊆ S and a sequence
x ∈ I∗ , we define δ(S′ , x) = {δ(s, x) | s ∈ S′ } and λ(S′ , x) = {λ(s, x) | s ∈ S′ }. The inverse
transition function δ−1 : S × I → 𝒫(S) is defined as δ−1 (s, a) = {t ∈ S | δ(t, a) = s}.
Observe that Mealy machines are deterministic and input-enabled (i.e., complete)
by definition. The initial state is not specified because it is of no importance in what
follows. For the remainder of this chapter we fix a machine M = (I, O, S, δ, λ). We
use n to denote its number of states, that is n = |S|, and m to denote its number of
transitions, that is m = |S| × |I|.
Definition 1.

States s and t are equivalent if λ(s, x) = λ(t, x) for all x in I∗ .

Minimal Separating Sequences for All Pairs of States

61

We are interested in the case where s and t are not equivalent, i.e., inequivalent. If
all pairs of distinct states of a machine M are inequivalent, then M is minimal. An
example of a minimal FSM is given in Figure 4.1.
Definition 2. A separating sequence for states s and t in s is a sequence x ∈ I∗ such
that λ(s, x) ≠ λ(t, x). We say x is minimal if |y| ≥ |x| for all separating sequences y for s
and t.
A separating sequence always exists if two states are inequivalent, and there might
be multiple minimal separating sequences. Our goal is to obtain minimal separating
sequences for all pairs of inequivalent states of M.

1.1 Partition Refinement
In this section we will discuss the basics of minimisation. Both Moore’s algorithm
and Hopcroft’s algorithm work by means of partition refinement. A similar treatment
(for DFAs) is given by Gries (1973).
A partition P of S is a set of pairwise disjoint non-empty subsets of S whose union
is exactly S. Elements in P are called blocks. If P and P′ are partitions of S, then P′ is a
refinement of P if every block of P′ is contained in a block of P. A partition refinement
algorithm constructs the finest partition under some constraint. In our context the
constraint is that equivalent states belong to the same block.
Definition 3.

A partition is valid if equivalent states are in the same block.

Partition refinement algorithms for FSMs start with the trivial partition P = {S}, and
iteratively refine P until it is the finest valid partition (where all states in a block are
equivalent). The blocks of such a complete partition form the states of the minimised
FSM, whose transition and output functions are well-defined because states in the
same block are equivalent.
Let B be a block and a be an input. There are two possible reasons to split B (and
hence refine the partition). First, we can split B with respect to output after a if the set
λ(B, a) contains more than one output. Second, we can split B with respect to the state
after a if there is no single block B′ containing the set δ(B, a). In both cases it is obvious
what the new blocks are: in the first case each output in λ(B, a) defines a new block,
in the second case each block containing a state in δ(B, a) defines a new block. Both
types of refinement preserve validity.
Partition refinement algorithms for FSMs first perform splits w.r.t. output, until
there are no such splits to be performed. This is precisely the case when the partition
is acceptable.
Definition 4. A partition is acceptable if for all pairs s, t of states contained in the
same block and for all inputs a in I, λ(s, a) = λ(t, a).

62

Chapter 4

Any refinement of an acceptable partition is again acceptable. The algorithm continues
performing splits w.r.t. state, until no such splits can be performed. This is exactly the
case when the partition is stable.
Definition 5. A partition is stable if it is acceptable and for any input a in I and states
s and t that are in the same block, states δ(s, a) and δ(t, a) are also in the same block.
Since an FSM has only finitely many states, partition refinement will terminate. The
output is the finest valid partition which is acceptable and stable. For a more formal
treatment on partition refinement we refer to Gries (1973).

1.2 Splitting Trees and Refinable Partitions
Both types of splits described above can be used to construct a separating sequence
for the states that are split. In a split w.r.t. the output after a, this sequence is simply
a. In a split w.r.t. the state after a, the sequence starts with an a and continues with
the separating sequence for states in δ(B, a). In order to systematically keep track of
this information, we maintain a splitting tree. The splitting tree was introduced by Lee
and Yannakakis (1994) as a data structure for maintaining the operational history of
a partition refinement algorithm.
Definition 6. A splitting tree for M is a rooted tree T with a finite set of nodes with
the following properties:
– Each node u in T is labelled by a subset of S, denoted l(u).
– The root is labelled by S.
– For each inner node u, l(u) is partitioned by the labels of its children.
– Each inner node u is associated with a sequence σ(u) that separates states contained
in different children of u.
We use C(u) to denote the set of children of a node u. The lowest common ancestor (lca)
for a set S′ ⊆ S is the node u such that S′ ⊆ l(u) and S′ ̸⊆ l(v) for all v ∈ C(u) and
is denoted by lca(S′ ). For a pair of states s and t we use the shorthand lca(s, t) for
lca({s, t}).
The labels l(u) can be stored as a refinable partition data structure (Valmari &
Lehtinen, 2008). This is an array containing a permutation of the states, ordered
so that states in the same block are adjacent. The label l(u) of a node then can be
indicated by a slice of this array. If node u is split, some states in the slice l(u) may be
moved to create the labels of its children, but this will not change the set l(u).
A splitting tree T can be used to record the history of a partition refinement algorithm because at any time the leaves of T define a partition on S, denoted P(T ). We say
a splitting tree T is valid (resp. acceptable, stable, complete) if P(T ) is as such. A leaf
can be expanded in one of two ways, corresponding to the two ways a block can be
split. Given a leaf u and its block B = l(u) we define the following two splits:

Minimal Separating Sequences for All Pairs of States

63

(split-output) Suppose there is an input a such that B can be split w.r.t output after
a. Then we set σ(u) = a, and we create a node for each subset of B that produces the
same output x on a. These nodes are set to be children of u.
(split-state) Suppose there is an input a such that B can be split w.r.t. the state after a.
Then instead of splitting B as described before, we proceed as follows. First, we locate
the node v = lca(δ(B, a)). Since v cannot be a leaf, it has at least two children whose
labels contain elements of δ(B, a). We can use this information to expand the tree as
follows. For each node w in C(v) we create a child of u labelled {s ∈ B | δ(s, a) ∈ l(w)}
if the label contains at least one state. Finally, we set σ(u) = aσ(v).
A straight-forward adaptation of partition refinement for constructing a stable
splitting tree for M is shown in Algorithm 4.1. The termination and the correctness of
the algorithm outlined in Section 1.1 are preserved. It follows directly that states are
equivalent if and only if they are in the same label of a leaf node.

1
2
3
4
5
6
7
8
9

Require: An FSM M
Ensure: A valid and stable splitting tree T
initialise T to be a tree with a single node labelled S
repeat
find a ∈ I, B ∈ P(T ) such that we can split B w.r.t. output λ(⋅, a)
expand the u ∈ T with l(u) = B as described in (split-output)
until P(T ) is acceptable
repeat
find a ∈ I, B ∈ P(T ) such that we can split B w.r.t. state δ(⋅, a)
expand the u ∈ T with l(u) = B as described in (split-state)
until P(T ) is stable
Algorithm 4.1

Constructing a stable splitting tree.

Example 7. Figure 4.1 shows an FSM and a complete splitting tree for it. This tree is
constructed by Algorithm 4.1 as follows. First, the root node is labelled by {s0 , …, s5 }.
The even and uneven states produce different outputs after a, hence the root node
is split. Then we note that s4 produces a different output after b than s0 and s2 , so
{s0 , s2 , s4 } is split as well. At this point T is acceptable: no more leaves can be split
w.r.t. output. Now, the states δ({s1 , s3 , s5 }, a) are contained in different leaves of T.
Therefore, {s1 , s3 , s5 } is split into {s1 , s5 } and {s3 } and associated with sequence ab. At
this point, δ({s0 , s2 }, a) contains states that are in both children of {s1 , s3 , s5 }, so {s0 , s2 }
is split and the associated sequence is aab. We continue until T is complete.

64

Chapter 4

s1

a/1

s2

b/0

a/0
b/0

a/0
b/0

s3

s0
a/1
b/0
s5

a/0
b/1

a/1
b/0
s4

(a)
Figure 4.1

(b)
An FSM (a) and a complete splitting tree for it (b).

2 Minimal Separating Sequences
In Section 1.2 we have described an algorithm for constructing a complete splitting
tree. This algorithm is non-deterministic, as there is no prescribed order on the splits.
In this section we order them to obtain minimal separating sequences.
Let u be a non-root inner node in a splitting tree, then the sequence σ(u) can also be
used to split the parent of u. This allows us to construct splitting trees where children
will never have shorter sequences than their parents, as we can always split with those
sequences first. Trees obtained in this way are guaranteed to be layered, which means
that for all nodes u and all u′ ∈ C(u), |σ(u)| ≤ |σ(u′ )|. Each layer consists of nodes for
which the associated separating sequences have the same length.
Our approach for constructing minimal sequences is to ensure that each layer is
as large as possible before continuing to the next one. This idea is expressed formally
by the following definitions.
Definition 8. A splitting tree T is k-stable if for all states s and t in the same leaf we
have λ(s, x) = λ(t, x) for all x ∈ I≤k .
Definition 9. A splitting tree T is minimal if for all states s and t in different leaves
λ(s, x) ≠ λ(t, x) implies |x| ≥ |σ(lca(s, t))| for all x ∈ I∗ .
Minimality of a splitting tree can be used to obtain minimal separating sequences for
pairs of states. If the tree is in addition stable, we obtain minimal separating sequences
for all inequivalent pairs of states. Note that if a minimal splitting tree is (n − 1)-stable
(n is the number of states of M), then it is stable (Definition 5). This follows from the
well-known fact that n − 1 is an upper bound for the length of a minimal separating
sequence (Moore, 1956).
Algorithm 4.2 ensures a stable and minimal splitting tree. The first repeat-loop
is the same as before (in Algorithm 4.1). Clearly, we obtain a 1-stable and minimal
splitting tree here. It remains to show that we can extend this to a stable and minimal

Minimal Separating Sequences for All Pairs of States

65

splitting tree. Algorithm 4.3 will perform precisely one such step towards stability,
while maintaining minimality. Termination follows from the same reason as for Algorithm 4.1. Correctness for this algorithm is shown by the following key lemma.
We will denote the input tree by T and the tree after performing Algorithm 4.3 by T ′ .
Observe that T is an initial segment of T ′ .
Lemma 10.

Algorithm 4.3 ensures a (k + 1)-stable minimal splitting tree.

Proof. Let us proof stability. Let s and t be in the same leaf of T ′ and let x ∈ I∗ be such
that λ(s, x) ≠ λ(t, x). We show that |x| > k + 1.
Suppose for the sake of contradiction that |x| ≤ k + 1. Let u be the leaf containing
s and t and write x = ax′ . We see that δ(s, a) and δ(t, a) are separated by k-stability of
T. So the node v = lca(δ(l(u), a)) has children and an associated sequence σ(v). There
are two cases:
– |σ(v)| < k, then aσ(v) separates s and t and is of length ≤ k. This case contradicts
the k-stability of T.
– |σ(v)| = k, then the loop in Algorithm 4.3 will consider this case and split. Note
that this may not split s and t (it may occur that aσ(v) splits different elements in
l(u)). We can repeat the above argument inductively for the newly created leaf
containing s and t. By finiteness of l(u), the induction will stop and, in the end, s
and t are split.
Both cases end in contradiction, so we conclude that |x| > k + 1.
Let us now prove minimality. It suffices to consider only newly split states in T ′ .
Let s and t be two states with |σ(lca(s, t))| = k + 1. Let x ∈ I∗ be a sequence such that
λ(s, x) ≠ λ(t, x). We need to show that |x| ≥ k + 1. Since x ≠ ϵ we can write x = ax′ and
consider the states s′ = δ(s, a) and t′ = δ(t, a) which are separated by x′ . Two things
can happen:
– The states s′ and t′ are in the same leaf in T. Then by k-stability of T we get
λ(s′ , y) = λ(t′ , y) for all y ∈ I≤k . So |x′ | > k.
– The states s′ and t′ are in different leaves in T and let u = lca(s′ , t′ ). Then aσ(u)
separates s and t. Since s and t are in the same leaf in T we get |aσ(u)| ≥ k + 1 by
k-stability. This means that |σ(u)| ≥ k and by minimality of T we get |x′ | ≥ k.
In both cases we have shown that |x| ≥ k + 1 as required.
□
Example 11. Figure 4.2a shows a stable and minimal splitting tree T for the machine
in Figure 4.1. This tree is constructed by Algorithm 4.2 as follows. It executes the same
as Algorithm 4.1 until we consider the node labelled {s0 , s2 }. At this point k = 1. We
observe that the sequence of lca(δ({s0 , s2 }, a)) has length 2, which is too long, so we
continue with the next input. We find that we can indeed split w.r.t. the state after b,
so the associated sequence is ba. Continuing, we obtain the same partition as before,
but with smaller witnesses.
The internal data structure (a refinable partition) is shown in Figure 4.2(b): the
array with the permutation of the states is at the bottom, and every block includes

66

1
2
3
4
5
6
7
8

Chapter 4

Require: An FSM M with n states
Ensure: A stable, minimal splitting tree T
initialise T to be a tree with a single node labelled S
repeat
find a ∈ I, B ∈ P(T ) such that we can split B w.r.t. output λ(⋅, a)
expand the u ∈ T with l(u) = B as described in (split-output)
until P(T ) is acceptable
for k = 1 to n − 1 do
invoke Algorithm 4.3 or Algorithm 4.4 on T for k
end for
Algorithm 4.2

1
2
3
4
5
6

Constructing a stable and minimal splitting tree.

Require: A k-stable and minimal splitting tree T
Ensure: T is a (k + 1)-stable, minimal splitting tree
for all leaves u ∈ T and all inputs a ∈ I do
v ← lca(δ(l(u), a))
if v is an inner node and |σ(v)| = k then
expand u as described in (split-state) (this generates new leaves)

end if
end for
Algorithm 4.3

A step towards the stability of a splitting tree.

an indication of the slice containing its label and a pointer to its parent (as our final
algorithm needs to find the parent block, but never the child blocks).
B2
B4
B6

(a)

σ=a

B8

σ=b

σ=bσ(B2 )

B0

B5

s2

s0

B3

B10

s4

σ=aσ(B4 )

σ=aσ(B6 )

B1

B9

s5

s1

B7

s3

(b)

Figure 4.2 (a) A complete and minimal splitting tree for the FSM in Figure 4.1 and
(b) its internal refinable partition data structure.

Minimal Separating Sequences for All Pairs of States

67

3 Optimising the Algorithm
In this section, we present an improvement on Algorithm 4.3 that uses two ideas
described by Hopcroft (1971) in his seminal paper on minimising finite automata:
using the inverse transition set, and processing the smaller half. The algorithm that we
present is a drop-in replacement, so that Algorithm 4.2 stays the same except for
some bookkeeping. This way, we can establish correctness of the new algorithms
more easily. The variant presented in this section reduces the amount of redundant
computations that were made in Algorithm 4.3.
Using Hopcroft’s first idea, we turn our algorithm upside down: instead of searching for the lca for each leaf, we search for the leaves u for which l(u) ⊆ δ−1 (l(v), a),
for each potential lca v and input a. To keep the order of splits as before, we define
k-candidates.
Definition 12.

A k-candidate is a node v with |σ(v)| = k.

A k-candidate v and an input a can be used to split a leaf u if v = lca(δ(l(u), a)),
because in this case there are at least two states s, t in l(u) such that δ(s, a) and δ(t, a)
are in labels of different nodes in C(v). Refining u this way is called splitting u with
respect to (v, a). The set C(u) is constructed according to (split-state), where each child
w ∈ C(v) defines a child uw of u with states
l(uw ) = {s ∈ l(u) | δ(s, a) ∈ l(w)}
= l(u) ∩ δ−1 (l(w), a).

(4.1)

In order to perform the same splits in each layer as before, we maintain a list Lk of
k-candidates. We keep the list in order of the construction of nodes, because when
we split w.r.t. a child of a node u before we split w.r.t. u, the result is not well-defined.
Indeed, the order on Lk is the same as the order used by Algorithm 4.2. So far, the
improved algorithm still would have time complexity 𝒪(mn).
To reduce the complexity we have to use Hopcroft’s second idea of processing the
smaller half. The key idea is that, when we fix a k-candidate v, all leaves are split
with respect to (v, a) simultaneously. Instead of iterating over of all leaves to refine
them, we iterate over s ∈ δ−1 (l(w), a) for all w in C(v) and look up in which leaf it is
contained to move s out of it. From Lemma 8 by Knuutila (2001) it follows that we can
skip one of the children of v. This lowers the time complexity to 𝒪(m log n). In order
to move s out of its leaf, each leaf u is associated with a set of temporary children
C′ (u) that is initially empty, and will be finalised after iterating over all s and w.
In Algorithm 4.4 we use the ideas described above. For each k-candidate v and
input a, we consider all children w of v, except for the largest one (in case of multiple
largest children, we skip one of these arbitrarily). For each state s ∈ δ−1 (l(w), a) we
consider the leaf u containing it. If this leaf does not have an associated temporary

68

Chapter 4

Require: A k-stable and minimal splitting tree T, and a list Lk
Ensure: T is a (k + 1)-stable and minimal splitting tree, and a list Lk+1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29

Lk+1 ← ∅
for all k-candidates v in Lk in order do
let w′ be a node in C(v) with |l(w′ )| ≥ |l(w)| for all nodes w ∈ C(v)
for all inputs a in I do
for all nodes w in C(v) ∖ {w′ } do
for all states s in δ−1 (l(w), a) do
locate leaf u such that s ∈ l(u)
if C′ (u) does not contain node uw then
add a new node uw to C′ (u)

end if
move s from l(u) to l(uw )
end for
end for
for all leaves u with C′ (u) ≠ ∅do
if |l(u)| = 0 then
if |C′ (u)| = 1 then
recover u by moving its elements back and clear C′ (u)
continue with the next leaf
end if
set p = u and C(u) = C′ (u)
else
construct a new node p and set C(p) = C′ (u) ∪ {u}
insert p in the tree in the place where u was
end if
set σ(p) = aσ(v)
append p to Lk+1 and clear C′ (u)
end for
end for
end for
Algorithm 4.4

A better step towards the stability of a splitting tree.

child for w we create such a child (line 9), if this child exists we move s into that child
(line 11).
Once we have done the simultaneous splitting for the candidate v and input a,
we finalise the temporary children. This is done at lines 14–26. If there is only one
temporary child with all the states, no split has been made and we recover this node
(line 17). In the other case we make the temporary children permanent.

Minimal Separating Sequences for All Pairs of States

69

The states remaining in u are those for which δ(s, a) is in the child of v that we
have skipped; therefore we will call it the implicit child. We should not touch these
states to keep the theoretical time bound. Therefore, we construct a new parent node
p that will “adopt” the children in C′ (u) together with u (line 20).
We will now explain why considering all but the largest children of a node lowers
the algorithm’s time complexity. Let T be a splitting tree in which we colour all children
of each node blue, except for the largest one. Then:
Lemma 13.

A state s is in at most (log2 n) − 1 labels of blue nodes.

Proof. Observe that every blue node u has a sibling u′ such that |l(u′ )| ≥ |l(u)|. So the
parent p(u) has at least 2|l(u)| states in its label, and the largest blue node has at most
n/2 states.
Suppose a state s is contained in m blue nodes. When we walk up the tree starting
at the leaf containing s, we will visit these m blue nodes. With each visit we can double
the lower bound of the number of states. Hence n/2 ≥ 2m and m ≤ (log2 n) − 1. □
Corollary 14. A state s is in at most log2 n sets δ−1 (l(u), a), where u is a blue node
and a is an input in I.
If we now quantify over all transitions, we immediately get the following result. We
note that the number of blue nodes is at most n − 1, but since this fact is not used, we
leave this to the reader.
Corollary 15.

Let ℬ denote the set of blue nodes and define
𝒳 = {(b, a, s) | b ∈ ℬ, a ∈ I, s ∈ δ−1 (l(b), a)}.

Then 𝒳 has at most m log2 n elements.
The important observation is that when using Algorithm 4.4 we iterate in total over
every element in 𝒳 at most once.
Theorem 16.

Algorithm 4.2 using Algorithm 4.4 runs in 𝒪(m log n) time.

Proof. We prove that bookkeeping does not increase time complexity by discussing
the implementation.
Inverse transition.

δ−1 can be constructed as a preprocessing step in 𝒪(m).

State sorting. As described in Section 1.2, we maintain a refinable partition data
structure. Each time new pair of a k-candidate v and input a is considered, leaves are
split by performing a bucket sort.
First, buckets are created for each node in w ∈ C(v) ∖ w′ and each leaf u that
contains one or more elements from δ−1 (l(w), a), where w′ is a largest child of v. The
buckets are filled by iterating over the states in δ−1 (l(w), a) for all w. Then, a pivot is

70

Chapter 4

set for each leaf u such that exactly the states that have been placed in a bucket can
be moved right of the pivot (and untouched states in δ−1 (l(w′ ), a) end up left of the
pivot). For each leaf u, we iterate over the states in its buckets and the corresponding
indices right of its pivot, and we swap the current state with the one that is at the
current index. For each bucket a new leaf node is created. The refinable partition is
updated such that the current state points to the most recently created leaf.
This way, we assure constant time lookup of the leaf for a state, and we can update
the array in constant time when we move elements out of a leaf.
Largest child. For finding the largest child, we maintain counts for the temporary
children and a current biggest one. On finalising the temporary children we store (a
reference to) the biggest child in the node, so that we can skip this node later in the
algorithm.
Storing sequences. The operation on line 25 is done in constant time by using a
linked list.
□

4 Application in Conformance Testing
A splitting tree can be used to extract relevant information for two classical test generation methods: a characterisation set for the W-method and a separating family for
the HSI-method. For an introduction and comparison of FSM-based test generation
methods we refer to Dorofeeva, et al. (2010) or Chapter 2.
Definition 17. A set W ⊂ I∗ is called a characterisation set if for every pair of inequivalent states s, t there is a sequence w ∈ W such that λ(s, w) ≠ λ(t, w).
Lemma 18. Let T be a complete splitting tree, then the set {σ(u) | u ∈ T } is a characterisation set.
Proof. Let W = {σ(u) | u ∈ T }. Let s, t ∈ S be inequivalent states, then by completeness
s and t are contained in different leaves of T. Hence u = lca(s, t) exists and σ(u) ∈ W
separates s and t. This shows that W is a characterisation set.
□
Lemma 19. A characterisation set with minimal length sequences can be constructed
in time 𝒪(m log n).
Proof. By Lemma 18 the sequences associated with the inner nodes of a splitting tree
form a characterisation set. By Theorem 16, such a tree can be constructed in time
𝒪(m log n). Traversing the tree to obtain the characterisation set is linear in the number
of nodes (and hence linear in the number of states).
□

Minimal Separating Sequences for All Pairs of States

71

Definition 20. A collection of sets {Hs }s∈S is called a separating family if for every
pair of inequivalent states s, t there is a sequence ℎ such that λ(s, ℎ) ≠ λ(t, ℎ) and ℎ is
a prefix of some ℎs ∈ Hs and some ℎt ∈ Ht .
Lemma 21. Let T be a complete splitting tree, the sets {σ(u) | s ∈ l(u), u ∈ T }s∈S form
a separating family.
Proof. Let Hs = {σ(u) | s ∈ l(u)}. Let s, t ∈ S be inequivalent states, then by completeness s and t are contained in different leaves of T. Hence u = lca(s, t) exists. Since
both s and t are contained in l(u), the separating sequence σ(u) is contained in both
sets Hs and Ht . Therefore, it is a (trivial) prefix of some word ℎs ∈ Hs and some
ℎt ∈ Ht . Hence {Hs }s∈S is a separating family.
□
Lemma 22. A separating family with minimal length sequences can be constructed
in time 𝒪(m log n + n2 ).
Proof. The separating family can be constructed from the splitting tree by collecting all
sequences of all parents of a state (by Lemma 21). Since we have to do this for every
state, this takes 𝒪(n2 ) time.
□
For test generation one also needs a transition cover. This can be constructed in linear
time with a breadth first search. We conclude that we can construct all necessary
information for the W-method in time 𝒪(m log n) as opposed the the 𝒪(mn) algorithm
used by Dorofeeva, et al. (2010). Furthermore, we conclude that we can construct
all the necessary information for the HSI-method in time 𝒪(m log n + n2 ), improving
on the the reported bound 𝒪(mn3 ) by Hierons and Türker (2015). The original HSImethod was formulated differently and might generate smaller sets. We conjecture
that our separating family has the same size if we furthermore remove redundant
prefixes. This can be done in 𝒪(n2 ) time using a trie data structure.

5 Experimental Results
We have implemented Algorithms 4.3 in Go, and we have compared their running
time on two sets of FSMs.16 The first set is from Smeenk, et al. (2015a), where FSMs
for embedded control software were automatically constructed. These FSMs are of
increasing size, varying from 546 to 3 410 states, with 78 inputs and up to 151 outputs.
The second set is inferred from Hopcroft (1971), where two classes of finite automata,
A and B, are described that serve as a worst case for Algorithms 4.3 respectively. The
FSMs that we have constructed for these automata have 1 input, 2 outputs, and 22 – 215
states. The running times in seconds on an Intel Core i5-2500 are plotted in Figure 4.3.
We note that different slopes imply different complexity classes, since both axes have
a logarithmic scale.
16

Available at https://github.com/Jaxan/partition.

72

Chapter 4
102

102

101

100
10−2

100

10−4

10−1
500

1000

2000 3000

(a) Embedded control software.
Figure 4.3
black.

10−6

22

26

211

215

(b) Class A (dashed)
and class B (solid).

Running time in seconds of Algorithm 4.3 in grey and Algorithm 4.4 in

6 Conclusion
In this chapter we have described an efficient algorithm for constructing a set of
minimal-length sequences that pairwise distinguish all states of a finite state machine.
By extending Hopcroft’s minimisation algorithm, we are able to construct such sequences in 𝒪(m log n) for a machine with m transitions and n states. This improves
on the traditional 𝒪(mn) method that is based on the classic algorithm by Moore. As
an upshot, the sequences obtained form a characterisation set and a separating family,
which play a crucial in conformance testing.
Two key observations were required for a correct adaptation of Hopcroft’s algorithm. First, it is required to perform splits in order of the length of their associated
sequences. This guarantees minimality of the obtained separating sequences. Second,
it is required to consider nodes as a candidate before any one of its children are considered as a candidate. This order follows naturally from the construction of a splitting
tree.
Experimental results show that our algorithm outperforms the classic approach
for both worst-case finite state machines and models of embedded control software.
Applications of minimal separating sequences such as the ones described by Dorofeeva, et al. (2010) and Smeenk, et al. (2015a) therefore show that our algorithm is
useful in practice.

Part 2:
Nominal Techniques

74

Chapter

Chapter 5
Learning Nominal Automata
Joshua Moerman
Radboud University

Matteo Sammartino
University
College London

Bartek Klin
University of Warsaw

Alexandra Silva
University
College London

Michał Szynwelski
University of Warsaw

Abstract
We present an Angluin-style algorithm to learn nominal automata, which
are acceptors of languages over infinite (structured) alphabets. The abstract approach we take allows us to seamlessly extend known variations
of the algorithm to this new setting. In particular, we can learn a subclass
of nominal non-deterministic automata. An implementation using a recently developed Haskell library for nominal computation is provided
for preliminary experiments.

This chapter is based on the following publication:
Moerman, J., Sammartino, M., Silva, A., Klin, B., & Szynwelski, M. (2017). Learning
nominal automata. In Proceedings of the 44th ACM SIGPLAN Symposium on Principles
of Programming Languages, POPL. ACM. doi:10.1145/3009837.3009879

76

Chapter 5

Automata are a well established computational abstraction with a wide range of
applications, including modelling and verification of (security) protocols, hardware,
and software systems. In an ideal world, a model would be available before a system
or protocol is deployed in order to provide ample opportunity for checking important
properties that must hold and only then the actual system would be synthesised from
the verified model. Unfortunately, this is not at all the reality: Systems and protocols
are developed and coded in short spans of time and if mistakes occur they are most
likely found after deployment. In this context, it has become popular to infer or learn
a model from a given system just by observing its behaviour or response to certain
queries. The learned model can then be used to ensure the system is complying to
desired properties or to detect bugs and design possible fixes.
Automata learning, or regular inference, is a widely used technique for creating
an automaton model from observations. The original algorithm by Angluin (1987)
works for deterministic finite automata, but since then has been extended to other
types of automata, including Mealy machines and I/O automata (see Niese, 2003, §8.5,
and Aarts & Vaandrager, 2010), and even a special class of context-free grammars
(see Isberner, 2015, §6) Angluin’s algorithm is sometimes referred to as active learning,
because it is based on direct interaction of the learner with an oracle (“the Teacher”)
that can answer different types of queries. This is in contrast with passive learning,
where a fixed set of positive and negative examples is given and no interaction with
the system is possible.
In this chapter, staying in the realm of active learning, we will extend Angluin’s
algorithm to a richer class of automata. We are motivated by situations in which a
program model, besides control flow, needs to represent basic data flow, where data
items are compared for equality (or for other theories such as total ordering). In these
situations, values for individual symbols are typically drawn from an infinite domain
and automata over infinite alphabets become natural models, as witnessed by a recent
trend (Aarts, et al., 2015; Bojańczyk, et al., 2014; Bollig, et al., 2013; Cassel, et al., 2016;
D’Antoni & Veanes, 2014).
One of the foundational approaches to formal language theory for infinite alphabets uses the notion of nominal sets (Bojańczyk, et al., 2014). The theory of nominal
sets originates from the work of Fraenkel in 1922, and they were originally used to
prove the independence of the axiom of choice and other axioms. They have been
rediscovered in Computer Science by Gabbay and Pitts (see Pitts, 2013 for historical
notes), as an elegant formalism for modelling name binding, and since then they form
the basis of many research projects in the semantics and concurrency community. In
a nutshell, nominal sets are infinite sets equipped with symmetries which make them
finitely representable and tractable for algorithms. We make crucial use of this feature
in the development of a learning algorithm.
Our main contributions are the following.
– A generalisation of Angluin’s original algorithm to nominal automata. The generalisation follows a generic pattern for transporting computation models from

Learning Nominal Automata

77

finite sets to nominal sets, which leads to simple correctness proofs and opens the
door to further generalisations. The use of nominal sets with different symmetries also creates potential for generalisation, e.g., to languages with time features
(Bojańczyk & Lasota, 2012) or data dependencies represented as graphs (Montanari & Sammartino, 2014).
– An extension of the algorithm to nominal non-deterministic automata (nominal
NFAs). To the best of our knowledge, this is the first learning algorithm for nondeterministic automata over infinite alphabets. It is important to note that, in the
nominal setting, NFAs are strictly more expressive than DFAs. We learn a subclass
of the languages accepted by nominal NFAs, which includes all the languages
accepted by nominal DFAs. The main advantage of learning NFAs directly is
that they can provide exponentially smaller automata when compared to their
deterministic counterpart. This can be seen both as a generalisation and as an
optimisation of the algorithm.
– An implementation using a recently developed Haskell library tailored to nominal
computation – NLambda, or Nλ, by Klin and Szynwelski (2016). Our implementation is the first non-trivial application of a novel programming paradigm of
functional programming over infinite structures, which allows the programmer to
rely on convenient intuitions of searching through infinite sets in finite time.
This chapter is organised as follows. In Section 1, we present an overview of our
contributions (and the original algorithm) highlighting the challenges we faced in
the various steps. In Section 2, we revise some basic concepts of nominal sets and
automata. Section 3 contains the core technical contributions: The new algorithm
and proof of correctness. In Section 4, we describe an algorithm to learn nominal
non-deterministic automata. Section 5 contains a description of NLambda, details
of the implementation, and results of preliminary experiments. Section 6 contains a
discussion of related work. We conclude this chapter with a discussion section where
also future directions are presented.

1 Overview of the Approach
In this section, we give an overview through examples. We will start by explaining
the original algorithm for regular languages over finite alphabets, and then explain
the challenges in extending it to nominal languages.
Angluin’s algorithm L∗ provides a procedure to learn the minimal DFA accepting
a certain (unknown) language ℒ. The algorithm has access to a teacher which answers
two types of queries:
– membership queries, consisting of a single word w ∈ A∗ , to which the teacher will
reply whether w ∈ ℒ or not;

78

Chapter 5

– equivalence queries, consisting of a hypothesis DFA H, to which the teacher replies
yes if ℒ(H) = ℒ, and no otherwise, providing a counterexample w ∈ ℒ(H)△ℒ
(where △ denotes the symmetric difference of two languages).
The learning algorithm works by incrementally building an observation table, which at
each stage contains partial information about the language ℒ. The algorithm is able to
fill the table with membership queries. As an example, and to set notation, consider
the following table (over the alphabet A = {a, b}).
ϵ
ϵ 0
S ∪ S⋅A a 0
b 0

E
a
0
1
0

aa
1
0
0

row : S ∪ S⋅A → 2E
row(u)(v) = 1 ⟺ uv ∈ ℒ

This table indicates that ℒ contains at least aa and definitely does not contain the
words ϵ, a, b, ba, baa, aaa. Since row is fully determined by the language ℒ, we will
from now on refer to an observation table as a pair (S, E), leaving the language ℒ
implicit.
Given an observation table (S, E) one can construct a deterministic automaton
M(S, E) = (Q, q0 , δ, F) where
– Q = {row(s) | s ∈ S} is a finite set of states;
– F = {row(s) | s ∈ S, row(s)(ϵ) = 1} ⊆ Q is the set of final states;
– q0 = row(ϵ) is the initial state;
– δ : Q × A → Q is the transition function given by δ(row(s), a) = row(sa).
For this to be well-defined, we need to have ϵ ∈ S (for the initial state) and ϵ ∈ E (for
final states), and for the transition function there are two crucial properties of the table
that need to hold: Closedness and consistency. An observation table (S, E) is closed if
for all t ∈ S⋅A there exists an s ∈ S such that row(t) = row(s). An observation table
(S, E) is consistent if, whenever s1 and s2 are elements of S such that row(s1 ) = row(s2 ),
for all a ∈ A, row(s1 a) = row(s2 a). Each time the algorithm constructs an automaton,
it poses an equivalence query to the teacher. It terminates when the answer is yes,
otherwise it extends the table with the counterexample provided.

1.1 Simple Example of Execution
Angluin’s algorithm is displayed in Algorithm 5.1. Throughout this section, we will
consider the language(s)
ℒn = {ww | w ∈ A∗ , |w| = n} .

If the alphabet A is finite then ℒn is regular for any n ∈ ℕ, and there is a finite DFA
accepting it.

Learning Nominal Automata

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

79

S, E ← {ϵ}

repeat
while (S, E) is not closed or not consistent do
if (S, E) is not closed then
find s1 ∈ S, a ∈ A such that row(s1 a) ≠ row(s) for all s ∈ S
S ← S ∪ {s1 a}

end if
if (S, E) is not consistent then
find s1 , s2 ∈ S, a ∈ A, and e ∈ E such that
row(s1 ) = row(s2 ) and ℒ(s1 ae) ≠ ℒ(s2 ae)
E ← E ∪ {ae}

end if
end while
Make the conjecture M(S, E)
if the Teacher replies no, with a counter-example t then
S ← S ∪ pref(t)

end if
until the Teacher replies yes to the conjecture M(S, E)
return M(S, E)
Algorithm 5.1

The L∗ learning algorithm from Angluin (1987).

The language ℒ1 = {aa, bb} looks trivial, but the minimal DFA recognising it has as
many as 5 states. Angluin’s algorithm will terminate in (at most) 5 steps. We illustrate
some relevant ones.
Step 1 We start from S, E = {ϵ}, and we fill the entries of the table below by asking
membership queries for ϵ, a and b. The table is closed and consistent, so we construct
the hypothesis 𝒜1 , where q0 = row(ϵ) = {ϵ ↦ 0}:
ϵ
ϵ 0
a 0
b 0

𝒜1 =

q0

a, b

The Teacher replies no and gives the counterexample aa, which is in ℒ1 but it is not
accepted by 𝒜1 . Therefore, line 16 of the algorithm is triggered and we set S = {ϵ, a, aa}.
Step 2 The table becomes the one on the left below. It is closed, but not consistent:
Rows ϵ and a are identical, but appending a leads to different rows, as depicted.

80

Chapter 5

Therefore, line 10 is triggered and an extra column a, highlighted in red, is added.
The new table is closed and consistent and a new hypothesis 𝒜2 is constructed.

ϵ
a
a
aa
b
ab
aaa
aab

ϵ
0
0
a
1
0
0
0
0

ϵ
a
aa
b
ab
aaa
aab

ϵ
0
0
1
0
0
0
0

a
0
1
0
0
0
0
0

b
q0

𝒜2 =

q1

a

a

a, b

b

q2

The Teacher again replies no and gives the counterexample bb, which should be
accepted by 𝒜2 but it is not. Therefore we put S ← S ∪ {b, bb}.
Step 3 The new table is the one on the left. It is closed, but ϵ and b violate consistency,
when b is appended. Therefore we add the column b and we get the table on the right,
which is closed and consistent. The new hypothesis is 𝒜3 .

b

ϵ
a
aa
b
bb
ab
aaa
aab
ba
bba
bbb

ϵ
0
0
1
0
1
0
0
0
0
0
0

a
0
1
0
0
b
0
0
0
0
0
0
0

ϵ
a
aa
b
bb
ab
aaa
aab
ba
bba
bbb

ϵ
0
0
1
0
1
0
0
0
0
0
0

a
0
1
0
0
0
0
0
0
0
0
0

b
0
0
0
1
0
0
0
0
0
0
0

a
q0

𝒜3 =
b

a
q2

b
a, b
b

q1
a
q3

The Teacher replies no and provides the counterexample babb, so S ← S ∪ {ba, bab}.
Step 4

One more step brings us to the correct hypothesis 𝒜4 (details are omitted).

a
𝒜4 =

q1

q0

b
a
q3

b

q2

a, b

b
a

q4

a, b

Learning Nominal Automata

81

1.2 Learning Nominal Languages
Consider now an infinite alphabet A = {a, b, c, d, …}. The language ℒ1 becomes
{aa, bb, cc, dd, …}. Classical theory of finite automata does not apply to this kind
of languages, but one may draw an infinite deterministic automaton that recognises
ℒ1 in the standard sense:
≠a

qa
a

a
𝒜5 =

q0

b

qb

b

q3

A

q4

A

≠b

⋮

A
≠a
where ⟶
and ⟶
stand for the infinitely-many transitions labelled by elements of A
and A ∖ {a}, respectively. This automaton is infinite, but it can be finitely presented in
a variety of ways, for example:

∀x ∈ A
𝒜6 =

q0

x

qx

x

q3

A

q4

A

≠x

One can formalise the quantifier notation above (or indeed the “dots” notation above
that) in several ways. A popular solution is to consider finite register automata (Demri
& Lazic, 2009 and Kaminski & Francez, 1994), i.e., finite automata equipped with a
finite number of registers where alphabet letters can be stored and later compared
for equality. Our language ℒ1 is recognised by a simple automaton with four states
and one register. The problem of learning registered automata has been successfully
attacked before by, for instance, Howar, et al. (2012).
In this chapter, however, we consider nominal automata by Bojańczyk, et al. (2014)
instead. These automata ostensibly have infinitely many states, but the set of states
can be finitely presented in a way open to effective manipulation. More specifically,
in a nominal automaton the set of states is subject to an action of permutations of a
set 𝔸 of atoms, and it is finite up to that action. For example, the set of states of 𝒜5 is:
{q0 , q3 , q4 } ∪ {qa | a ∈ A}

and it is equipped with a canonical action of permutations π : 𝔸 → 𝔸 that maps every
qa to qπ(a) and leaves q0 , q3 and q4 fixed. Technically speaking, the set of states
has four orbits (one infinite orbit and three fixed points) of the action of the group of
permutations of 𝔸. Moreover, it is required that in a nominal automaton the transition

82

Chapter 5

relation is equivariant, i.e., closed under the action of permutations. The automaton 𝒜5
a
has this property: For example, it has a transition qa ⟶
q3 , and for any π : 𝔸 → 𝔸
π(a)
there is also a transition π(qa ) = qπ(a) ⟶ q3 = π(q3 ).
Nominal automata with finitely many orbits of states are equi-expressive with finite
register automata (Bojańczyk, et al., 2014), but they have an important theoretical
advantage: They are a direct reformulation of the classical notion of finite automaton,
where one replaces finite sets with orbit-finite sets and functions (or relations) with
equivariant ones. A research programme advocated by Bojańczyk, et al. is to transport
various computation models, algorithms and theorems along this correspondence.
This can often be done with remarkable accuracy, and our results are a witness to this.
Indeed, as we shall see, nominal automata can be learned with an algorithm that is
almost a verbatim copy of the classical Angluin’s one.
Indeed, consider applying Angluin’s algorithm to our new language ℒ1 . The key
idea is to change the basic data structure: Our observation table (S, E) will be such
that S and E are equivariant subsets of A∗ , i.e., they are closed under the canonical action
of atom permutations. In general, such a table has infinitely many rows and columns, so
the following aspects of Algorithm 5.1 seem problematic:
– line 4 and line 8: finding witnesses for closedness or consistency violations potentially require checking all infinitely many rows;
– line 16: every counterexample t has infinitely many prefixes, so it is not clear how
one constructs an infinite set S in finite time. However, an infinite S is necessary
for the algorithm to ever succeed, because no finite automaton recognises ℒ1 .
At this stage, we need to observe that due to equivariance of S, E and ℒ1 , the following
crucial properties hold:
(P1) the sets S, S⋅A and E admit a finite representation up to permutations;
(P2) the function row is such that row(π(s))(π(e)) = row(s)(e), for all s ∈ S and
e ∈ E, so the observation table admits a finite symbolic representation.
Intuitively, checking closedness and consistency, and finding a witness for their
violations, can be done effectively on the representations up to permutations (P1).
This is sound, as row is invariant w.r.t. permutations (P2).
We now illustrate these points through a few steps of the algorithm for ℒ1 .
Step 1’ We start from S, E = {ϵ}. We have S⋅A = A, which is infinite but admits a
finite representation. In fact, for any a ∈ A, we have A = {π(a) | π is a permutation}.
Then, by (P2), row(π(a))(ϵ) = row(a)(ϵ) = 0, for all π, so the first table can be written
as:
ϵ
ϵ 0
a 0

𝒜′1 =

q0

A

Learning Nominal Automata

83

It is closed and consistent. Our hypothesis is 𝒜′1 , where δ𝒜′1 (row(ϵ), x) = row(x) = q0 ,
for all x ∈ A. As in Step 1, the Teacher replies with the counterexample aa.
Step 2’ By equivariance of ℒ1 , the counterexample tells us that all words
of length 2 with two repeated letters are accepted. Therefore we extend S
ϵ
with the (infinite!) set of such words. The new symbolic table is depcited
ϵ
0
on the right.
a
0
The lower part stands for elements of S⋅A. For instance, ab stands for
aa 1
words obtained by appending a fresh letter to words of length 1 (row a).
ab 0
It can be easily verified that all cases are covered. Notice that the table is
aaa 0
different from that of Step 2: A single b is not in the lower part, because
aab 0
it can be obtained from a via a permutation. The table is closed.
Now, for consistency we need to check row(ϵx) = row(ax), for all a, x ∈ A. Again,
by (P2), it is enough to consider rows of the table above. Consistency is violated,
because row(a) ≠ row(aa). We found a “symbolic” witness a for such violation. In
order to fix consistency, while keeping E equivariant, we need to add columns for all
π(a). The resulting table is

ϵ
a
aa
ab
aaa
aab

ϵ
0
0
1
0
0
0

a
0
1
0
0
0
0

b
0
0
0
0
0
0

c
0
0
0
0
0
0

…
…
…
…
…
…
…

where non-specified entries are 0. Only finitely many entries of the table are relevant:
row(s) is fully determined by its values on letters in s and on just one letter not in s.
For instance, we have row(a)(a) = 1 and row(a)(a′ ) = 0, for all a′ ∈ A ∖ {a}. The table
is trivially consistent.
Notice that this step encompasses both Step 2 and 3, because the rows b and bb
added by Step 2 are already represented by a and aa. The hypothesis automaton is

x
𝒜′2 =

q0

≠x

A
qx

x

q2

∀x ∈ A

This is again incorrect, but one additional step will give the correct hypothesis automaton 𝒜6 .

84

Chapter 5

1.3 Generalisation to Non-Deterministic Automata
Since our extension of Angluin’s L∗ algorithm stays close to her original development,
exploring extensions of other variations of L∗ to the nominal setting can be done in
a systematic way. We will show how to extend the algorithm NL∗ for learning NFAs
by Bollig, et al. (2009). This has practical implications: It is well-known that NFAs
are exponentially more succinct than DFAs. This is true also in the nominal setting.
However, there are challenges in the extension that require particular care.
– Nominal NFAs are strictly more expressive than nominal DFAs. We will show
that the nominal version of NL∗ terminates for all nominal NFAs that have a corresponding nominal DFA and, more surprisingly, that it is capable of learning some
languages that are not accepted by nominal DFAs.
– Language equivalence of nominal NFAs is undecidable. This does not affect the
correctness proof, as it assumes a teacher which is able to answer equivalence
queries accurately. For our implementation, we will describe heuristics that produce correct results in many cases.
For the learning algorithm the power of non-determinism means that we can make
some shortcuts during learning: If we want to make the table closed, we were previously required to find an equivalent row in the upper part; now we may find a sum
of rows which, together, are equivalent to an existing row. This means that in some
cases fewer rows will be added for closedness.

2 Preliminaries
We recall the notions of nominal sets, nominal automata and nominal regular languages. We refer to Bojańczyk, et al. (2014) for a detailed account.
Let 𝔸 be a countable set and let Perm(𝔸) be the set of permutations on 𝔸, i.e.,
the bijective functions π : 𝔸 → 𝔸. Permutations form a group where the identity
permutation id is the unit element, inverse is functional inverse and multiplication is
function composition.
A nominal set (Pitts, 2013) is a set X together with a function ⋅ : Perm(𝔸) × X → X,
interpreting permutations over X. Such function must be a group action of Perm(𝔸),
i.e., it must satisfy id ⋅ x = x and π ⋅ (π′ ⋅ x) = (π ∘ π′ ) ⋅ x. We say that a finite A ⊂ 𝔸
supports x ∈ X whenever, for all π acting as the identity on A, we have π ⋅ x = x. In
other words, permutations that only move elements outside A do not affect x. The
support of x ∈ X, denoted supp(x), is the smallest finite set supporting x. We require
nominal sets to have finite support, meaning that supp(x) exists for all x ∈ X.
The orbit of x, denoted orb(x), is the set of elements in X reachable from x via
permutations, explicitly
orb(x) = {π ⋅ x | π ∈ Perm(𝔸)}.

We say that X is orbit-finite whenever it is a union of finitely many orbits.

Learning Nominal Automata

85

Given a nominal set X, a subset Y ⊆ X is equivariant if it is preserved by permutations, i.e., π ⋅ y ∈ Y, for all y ∈ Y. In other words, Y is a union of some orbits of X.
This definition extends to the notion of an equivariant relation R ⊆ X × Y, by setting
π⋅(x, y) = (π⋅x, π⋅y), for (x, y) ∈ R; similarly for relations of greater arity. The dimension
of nominal set X is the maximal size of supp(x), for any x ∈ X. Every orbit-finite set
has finite dimension.
We define 𝔸(k) = {(a1 , …, ak ) | ai ≠ aj for i ≠ j}. For every single-orbit nominal
set X with dimension k, there is a surjective equivariant map
fX : 𝔸(k) → X.

This map can be used to get an upper bound for the number of orbits of X1 × … × Xn ,
for Xi a nominal set with li orbits and dimension ki . Suppose Oi is an orbit of Xi .
Then we have a surjection
fO1 × ⋯ × fOn : 𝔸(ki ) × ⋯ × 𝔸(kn ) → O1 × ⋯ × On

stipulating that the codomain cannot have more orbits than the domain. Let f𝔸 ({ki })
denote the number of orbits of 𝔸(k1 ) × ⋯ × 𝔸(kn ) , for any finite sequence of natural
numbers {ki }. We can form at most l = l1 l2 …ln tuples of the form O1 × ⋯ × On , so
X1 × ⋯ × Xn has at most lf𝔸 (k1 , …, kn ) orbits.
For X single-orbit, the local symmetries are defined by the group
{g ∈ Sk | fX (x1 , …, xk ) = fX (xg(1) , …, xg(k) ) for all xi ∈ X},

where k is the dimension of X and Sk is the symmetric group of permutations over k
distinct elements.
NFAs on sets have a finite state space. We can define nominal NFAs, with the
requirement that the state space is orbit-finite and the transition relation is equivariant.
A nominal NFA is a tuple (Q, A, Q0 , F, δ), where:
– Q is an orbit-finite nominal set of states;
– A is an orbit-finite nominal alphabet;
– Q0 , F ⊆ Q are equivariant subsets of initial and final states;
– δ ⊆ Q × A × Q is an equivariant transition relation.
A nominal DFA is a special case of nominal NFA where Q0 = {q0 } and the transition
relation is an equivariant function δ : Q × A → Q. Equivariance here can be rephrased
as requiring δ(π ⋅ q, π ⋅ a) = π ⋅ δ(q, a). In most examples we take the alphabet to be
A = 𝔸, but it can be any orbit-finite nominal set. For instance, A = Act × 𝔸, where Act
is a finite set of actions, represents actions act(x) with one parameter x ∈ 𝔸 (actions
with arity n can be represented via n-fold products of 𝔸).
A language ℒ is nominal regular if it is recognised by a nominal DFA. The theory
of nominal regular languages recasts the classical one using nominal concepts. A
nominal Myhill-Nerode-style syntactic congruence is defined: w, w′ ∈ A∗ are equivalent
w.r.t. ℒ, written w ≡ℒ w′ , whenever

86

Chapter 5
wv ∈ ℒ ⟺ w′ v ∈ ℒ

for all v ∈ A∗ . This relation is equivariant and the set of equivalence classes [w]ℒ is a
nominal set.
Theorem 1. (Myhill-Nerode theorem for nominal sets by Bojańczyk, et al., 2014)
Let ℒ be a nominal language. The following conditions are equivalent:
1. the set of equivalence classes of ≡ℒ is orbit-finite;
2. ℒ is recognised by a nominal DFA.
Unlike what happens for ordinary regular languages, nominal NFAs and nominal
DFAs are not equi-expressive. Here is an example of a language accepted by a nominal
NFA, but not by a nominal DFA:
ℒeq = {a1 …an | ai = aj , for some i < j ∈ {1, …, n}} .

In the theory of nominal regular languages, several problems are decidable: Language
inclusion and minimality test for nominal DFAs. Moreover, orbit-finite nominal sets
can be finitely-represented, and so can be manipulated by algorithms. This is the key
idea underpinning our implementation.

2.1 Different Atom Symmetries
An important advantage of nominal set theory as considered by Bojańczyk, et al.
(2014) is that it retains most of its properties when the structure of atoms 𝔸 is replaced with an arbitrary infinite relational structure subject to a few model-theoretic
assumptions. An example alternative structure of atoms is the total order of rational
numbers (ℚ, <), with the group of monotone bijections of ℚ taking the role of the
group of all permutations. The theory of nominal automata remains similar, and an
example nominal language over the atoms (ℚ, <) is:
{a1 …an | ai ≤ aj , for some i < j ∈ {1, …, n}}

which is recognised by a nominal DFA over those atoms.
To simplify the presentation, in this chapter we concentrate on the “equality atoms”
only. However, both the theory and the implementation can be generalised to other
atom structures, with the “ordered atoms” (ℚ, <) as the simplest other example. We
investigate the total order symmetry (ℚ, <) in Chapter 6.

3 Angluin’s Algorithm for Nominal DFAs
In our algorithm, we will assume a teacher as described at the start of Section 1. In
particular, the teacher is able to answer membership queries and equivalence queries,

Learning Nominal Automata

87

now in the setting of nominal languages. We fix a target language ℒ, which is assumed
to be a nominal regular language.
The learning algorithm for nominal automata, νL∗ , will be very similar to L∗ in
Algorithm 5.1. In fact, we only change the following lines:
6’
11’
16’

S ← S ∪ orb(sa)
E ← E ∪ orb(ae)
S ← S ∪ pref(orb(t))

(5.1)

The basic data structure is an observation table (S, E, T ) where S and E are orbit-finite
subsets of A∗ and T : S∪S⋅A×E → 2 is an equivariant function defined by T (s, e) = ℒ(se)
for each s ∈ S ∪ S⋅A and e ∈ E. Since T is determined by ℒ we omit it from the notation.
Let row : S∪S⋅A → 2E denote the curried counterpart of T. Let u ∼ v denote the relation
row(u) = row(v).
Definition 2. The table is called closed if for each t ∈ S⋅A there is a s ∈ S with t ∼ s.
The table is called consistent if for each pair s1 , s2 ∈ S with s1 ∼ s2 we have s1 a ∼ s2 a
for all a ∈ A.
The above definitions agree with the abstract definitions given by Jacobs and Silva
(2014) and we may use some of their results implicitly. The intuition behind the
definitions is as follows. Closedness assures us that for each state we have a successor
state for each input. Consistency assures us that each state has at most one successor
for each input. Together it allows us to construct a well-defined minimal automaton
from the observations in the table.
The algorithm starts with a trivial observation table and tries to make it closed and
consistent by adding orbits of rows and columns, filling the table via membership
queries. When the table is closed and consistent it constructs a hypothesis automaton
and poses an equivalence query.
The pseudocode for the nominal version is the same as listed in Algorithm 5.1,
modulo the changes displayed in (5.1). However, we have to take care to ensure that
all manipulations and tests on the (possibly) infinite sets S, E and A terminate in finite
time. We refer to Bojańczyk, et al. (2014) and Pitts (2013) for the full details on how
to represent these structures and provide a brief sketch here. The sets S, E, A and S⋅A
can be represented by choosing a representative for each orbit. The function T in turn
can be represented by cells Ti,j : orb(si ) × orb(ej ) → 2 for each representative si and
ej . Note, however, that the product of two orbits may consist of several orbits, so that
Ti,j is not a single boolean value. Each cell is still orbit-finite and can be filled with
only finitely many membership queries. Similarly the curried function row can be
represented by a finite structure.
To check whether the table is closed, we observe that if we have a corresponding
row s ∈ S for some t ∈ S⋅A, this holds for any permutation of t. Hence it is enough
to check the following: For all representatives t ∈ S⋅A there is a representative s ∈ S

88

Chapter 5

with row(t) = π ⋅ row(s) for some permutation π. Note that we only have to consider
finitely many permutations, since the support is finite and so we can decide this
property. Furthermore, if the property does not hold, we immediately find a witness
represented by t.
Consistency is a bit more complicated, but it is enough to consider the set of
inconsistencies, {(s1 , s2 , a, e) | row(s1 ) = row(s2 ) ∧ row(s1 a)(e) ≠ row(s2 a)(e)}. It is
an equivariant subset of S × S × A × E and so it is orbit-finite. Hence we can decide
emptiness and obtain representatives if it is non-empty.
Constructing the hypothesis happens in the same way as before (Section 1), where
we note the state space is orbit-finite since it is a quotient of S. Moreover, the function
row is equivariant, so all structure (Q0 , F and δ) is equivariant as well.
The representation given above is not the only way to represent nominal sets. For
example, first-order definable sets can be used as well (Klin & Szynwelski, 2016). From
now on we assume to have set theoretic primitives so that each line in Algorithm 5.1
is well defined.

3.1 Correctness
To prove correctness we only have to prove that the algorithm terminates, that is,
only finitely many hypotheses will be produced. Correctness follows trivially from
termination since the last step of the algorithm is an equivalence query to the teacher
inquiring whether an hypothesis automaton accepts the target language. We start out
by listing some facts about observation tables.
Lemma 3. The relation ∼ is an equivariant equivalence relation. Furthermore, for all
u, v ∈ S we have that u ≡ℒ v implies u ∼ v.
This lemma implies that at any stage of the algorithm the number of orbits of S/∼ does
not exceed the number of orbits of the minimal acceptor with state space A∗ /≡ℒ (recall
that ≡ℒ is the nominal Myhill-Nerode equivalence relation). Moreover, the following
lemma shows that the dimension of the state space never exceeds the dimension of
the minimal acceptor. Recall that the dimension is the maximal size of the support of
any state, which is different than the number of orbits.
Lemma 4.

We have supp([u]∼ ) ⊆ supp([u]≡ℒ ) ⊆ supp(u) for all u ∈ S.

Lemma 5. The automaton constructed from a closed and consistent table is minimal.
Proof. Follows from the categorical perspective by Jacobs and Silva (2014).

□

We note that the constructed automaton is consistent with the table (we use that the
set S is prefix-closed and E is suffix-closed (Angluin, 1987)). The following lemma
shows that there are no strictly “smaller” automata consistent with the table. So the
automaton is not just minimal, it is minimal w.r.t. the table.

Learning Nominal Automata

89

Lemma 6. Let H be the automaton associated with a closed and consistent table (S, E).
If M′ is an automaton consistent with (S, E) (meaning that se ∈ ℒ(M′ ) ⟺ se ∈ ℒ(H)
for all s ∈ S ∪ S⋅A and e ∈ E) and M′ has at most as many orbits as H, then there is a
surjective map f : QM′ → QH . If moreover
– M′ s dimension is bounded by the dimension of H, i.e., supp(m) ⊆ supp(f(m)) for
all m ∈ Q′M , and
– M′ has no fewer local symmetries than H, i.e., π ⋅ f(m) = f(m) implies π ⋅ m = m
for all m ∈ Q′M ,
then f defines an isomorphism M′ ≅ H of nominal DFAs.
Proof. (All maps in this proof are equivariant.) Define a map row′ : Q′M → 2E by
∗
restricting the language map Q′M → 2A to E. First, observe that row′ (δ′ (q′0 , s)) =
row(s) for all s ∈ S ∪ S⋅A, since ϵ ∈ E and M′ is consistent with the table. Second, we
have {row′ (δ′ (q′0 , s)) | s ∈ S} ⊆ {row′ (q) | q ∈ M′ }.
Let n be the number of orbits of H. The former set has n orbits by the first observation, the latter set has at most n orbits by assumption. We conclude that the two sets
(both being equivariant) must be equal. That means that for each q ∈ M′ there is a
s ∈ S such that row′ (q) = row(s). We see that row′ : M′ → {row′ (δ′ (q′0 , s)) | s ∈ S} = H
is a surjective map. Since a surjective map cannot increase the dimensions of orbits
and the dimensions of M′ are bounded, we note that the dimensions of the orbits in H
and M′ have to agree. Similarly, surjective maps preserve local symmetries. This map
must hence be an isomorphism of nominal sets. Note that row′ (q) = row′ (δ′ (q′0 , s))
implies q = δ′ (q′0 , s).
It remains to prove that it respects the automaton structures. It preserve the initial
state: row′ (q′0 ) = row(δ′ (q′0 , ϵ)) = row(ϵ). Now let q ∈ M′ be a state and s ∈ S such
that row′ (q) = row(s). It preserves final states: q ∈ F′ ⟺ row′ (q)(ϵ) = 1 ⟺
row(s)(ϵ) = 1. Finally, it preserves the transition structure:
row′ (δ′ (q, a)) = row′ (δ′ (δ′ (q′0 , s), a)) = row′ (δ′ (q′0 , sa)) = row(sa) = δ(row(s), a)
□

The above proof is an adaptation of Angluin’s proof for automata over sets. We will
now prove termination of the algorithm by proving that all steps are productive.
Theorem 7.

The algorithm terminates and is hence correct.

Proof. Provided that the if-statements and set operations terminate, we are left proving
that the algorithm adds (orbits of) rows and columns only finitely often. We start by
proving that a table can be made closed and consistent in finite time.
If the table is not closed, we find a row s1 ∈ S⋅A such that row(s1 ) ≠ row(s) for all
s ∈ S. The algorithm then adds the orbit containing s1 to S. Since s1 was nonequivalent
to all rows, we find that S ∪ orb(t)/∼ has strictly more orbits than S/∼. Since orbits of
S/∼ cannot be more than those of A∗ /≡ℒ , this happens finitely often.

90

Chapter 5

Columns are added in case of an inconsistency. Here the algorithm finds two
elements s1 , s2 ∈ S with row(s1 ) = row(s2 ) but row(s1 ae) ≠ row(s2 ae) for some a ∈ A
and e ∈ E. Adding ae to E will ensure that row′ (s1 ) ≠ row′ (s2 ) (row′ is the function
belonging to the updated observation table). If the two elements row′ (s1 ), row′ (s2 )
are in different orbits, the number of orbits is increased. If they are in the same orbit,
we have row′ (s2 ) = π ⋅ row′ (s1 ) for some permutation π. Using row(s1 ) = row(s2 )
and row′ (s1 ) ≠ row′ (s2 ) we have:
row(s1 ) = π ⋅ row(s1 )

row′ (s1 ) ≠ π ⋅ row′ (s1 )

Consider all such π and suppose there is a π and x ∈ supp(row(s1 )) such that π ⋅ x ∉
supp(row(s1 )). Then we find that π ⋅ x ∈ supp(row′ (s1 )), and so the support of the
row has grown. By Lemma 4 this happens finitely often. Suppose such π and x do not
exist, then we consider the finite group R = {ρ|supp([s1 ]∼ ) | row(s1 ) = ρ ⋅ row(s1 )}. We
see that {ρ|supp([s1 ]∼ ) | row′ (s1 ) = ρ ⋅ row′ (s1 )} is a proper subgroup of R. So, adding
a column in this case decreases the size of the group R, which can happen only finitely
often. In this case a local symmetry is removed.
In short, the algorithm will succeed in producing a hypothesis in each round. It
remains to prove that it needs only finitely many equivalence queries.
Let (S, E) be the closed and consistent table and H its corresponding hypothesis.
If it is incorrect, then a second hypothesis H′ will be constructed which is consistent
with the old table (S, E). The two hypotheses are nonequivalent, as H′ will handle
the counterexample correctly and H does not. Therefore, H′ will have at least one
orbit more, one local symmetry less, or one orbit will have strictly bigger dimension
(Lemma 6), all of which can only happen finitely often.
□
We remark that all the lemmas and proofs as above are close to the original ones of
Angluin. However, two things are crucially different. First, adding a column does
not always increase the number of (orbits of) states. It can happen that by adding
a column a bigger support is found or that a local symmetry is broken. Second, the
new hypothesis does not necessarily have more states, again it might have bigger
dimensions or less local symmetries.
From the proof Theorem 7 we observe moreover that the way we handle counterexamples is not crucial. Any other method which ensures a nonequivalent hypothesis
will work. In particular our algorithm is easily adapted to include optimisations
such as the ones by Maler and Pnueli (1995) and Rivest and Schapire (1993), where
counterexamples are added as columns.17

17

The additional optimisation of omitting the consistency check (Rivest & Schapire, 1993) cannot be done:
we always add a whole orbit to S (to keep the set equivariant) and inconsistencies can arise within an orbit.

Learning Nominal Automata

x
q0

x
x
z

q1,x
y

y

q2,x,y

T1

ϵ

ϵ
a
ab
aa
aba
abb
abc

0
0
1
0
0
0
1

T2
ϵ

ϵ a′
0 0

a

0

1
{0

if a′ ≠ a
else

ab

1

1
{0

if a′ ≠ a,b
else

91

aa

0 0
aba 0 0
abb 0

1
{0

if a′ ≠ a
else

abc 1

1
{0

if a′ ≠ a,b
else

T3
ϵ

ϵ a′
0 0

a

0

1
{0

if a′ ≠ a
else

ab

1

1
{0

if a′ ≠ a,b
1 if (b′ ≠ a,b ∧ a′ ≠ a,b) ∨ (b′ = b ∧ a′ ≠ a)
else
{ 0 else

b′ a′
1

aa

0 0
aba 0 0

1
{0

if a ≠ a′ ,b′
else

1
1

abb 0

1
{0

if a′ ≠ a
else

1
{0

if a ≠ a′ ,b′
else

abc 1

1
{0

if a′ ≠ a,b
1 if (b′ ≠ a,b ∧ a′ ≠ a,b) ∨ (b′ = b ∧ a′ ≠ a)
else
{ 0 else

Figure 5.1 Example automaton to be learnt and three subsequent tables computed by νL∗ . In the automaton, x, y, z denote distinct atoms.

92

Chapter 5

3.2 Example
Consider the target automaton in Figure 5.1 and an observation table T1 at some stage
during the algorithm. We remind the reader that the table is represented in a symbolic
way: The sequences in the rows and columns stand for whole orbits and the cells
denote functions from the product of the orbits to 2. Since the cells can consist of
multiple orbits, where each orbit is allowed to have a different value, we use a formula
to specify which orbits have a 1.
The table T1 has to be checked for closedness and consistency. We note that it is
definitely closed. For consistency we check the rows row(ϵ) and row(a) which are
equal. Observe, however, that row(ϵb)(ϵ) = 0 and row(ab)(ϵ) = 1, so we have an
inconsistency. The algorithm adds the orbit orb(b) as column and extends the table,
obtaining T2 . We note that, in this process, the number of orbits did grow, as the two
rows are split. Furthermore, we see that both row(a) and row(ab) have empty support
in T1 , but not in T2 , because row(a)(a′ ) depends on a′ being equal or different from a,
similarly for row(ab)(a′ ).
The table T2 is still not consistent as we see that row(ab) = row(ba) but row(abb)(c) =
1 and row(bab)(c) = 0. Hence the algorithm adds the columns orb(bc), obtaining
table T3 . We note that in this case, no new orbits are obtained and no support has
grown. In fact, the only change here is that the local symmetry between row(ab) and
row(ba) is removed. This last table, T3 , is closed and consistent and will produce the
correct hypothesis.

3.3 Query Complexity
In this section, we will analyse the number of queries made by the algorithm in the
worst case. Let M be the minimal target automaton with n orbits and of dimension k.
We will use log in base two.
Lemma 8.

The number of equivalence queries En,k is 𝒪(nk log k).

Proof. By Lemma 6 each hypothesis will be either 1) bigger in the number of orbits,
which is bounded by n, or 2) bigger in the dimension of an orbit, which is bounded
by k or 3) smaller in local symmetries of an orbit. For the last part we want to know
how long a subgroup series of the permutation group Sk can be. This is bounded by
the number of divisors of k!, as each subgroup divides the order of the group. We
can easily bound the number of divisors of any m by log m and so one can at take a
subgroup at most k log k times when starting with Sk .18
18

After publication we found a better bound by Cameron, et al. (1989): the length of the longest chain of
subgroups of Sk is ⌈ 32 k⌉ − b(k) − 1, where b(k) is the number of ones in the binary representation of k. This
gives a linear bound in k, instead of the ‘linearithmic’ bound.

Learning Nominal Automata

93

Since the hypothesis will grow monotonically in the number of orbits and for each
orbit will grow monotonically w.r.t. the remaining two dimensions, the number of
equivalence queries is bound by n + n(k + k log k).
□
Next we will give a bound for the size of the table.
Lemma 9. The table has at most n + mEn,k orbits in S with sequences of at most
length n+m, where m is the length of the longest counter example given by the teacher.
The table has at most n(k + k log k + 1) orbits in E of at most length n(k + k log k + 1)
Proof. In the termination proof we noted that rows are added at most n times. In
addition (all prefixes of) counter examples are added as rows which add another
mEn,k rows. Obviously counter examples are of length at most m and are extended
at most n times, making the length at most m + n in the worst case.
For columns we note that one of three dimensions approaches a bound similarly
to the proof of Lemma 8. So at most n(k + k log k + 1) columns are added. Since they
are suffix closed, the length is at most n(k + k log k + 1).
□
Let p and l denote respectively the dimension and the number of orbits of A.
Lemma 10. The number of orbits in the lower part of the table, S⋅A, is bounded by
(n + mEn,k )lf𝔸 (p(n + m), p).
Proof. Any sequence in S is of length at most n + m, so it contains at most p(n + m)
distinct atoms. When we consider S⋅A, the extension can either reuse atoms from
those p(n + m), or none at all. Since the extra letter has at most p distinct atoms, the set
𝔸(p(n+m)) × 𝔸(p) gives a bound f𝔸 (p(n + m), p) for the number of orbits of OS × OA ,
with OX an orbit of X. Multiplying by the number of such ordered pairs, namely
(n + mEn,k )l, gives a bound for S⋅A.
□
Let Cn,k,m = (n+mEn,k )(lf𝔸 (p(n+m), p)+1)n(k+k log k+1) be the maximal number
of cells in the table. We note that this number is polynomial in k, l, m and n but it is
not polynomial in p.
Corollary 11.

The number of membership queries is bounded by
Cn,k,m f𝔸 (p(n + m), pn(k + k log k + 1)).

4 Learning Non-Deterministic Nominal Automata
In this section, we introduce a variant of νL∗ , which we call νNL∗ , where the learnt
automaton is non-deterministic. It will be based on the NL∗ algorithm by Bollig, et
al. (2009), an Angluin-style algorithm for learning NFAs. The algorithm is shown

94

Chapter 5

in Algorithm 5.2. We first illustrate NL∗ , then we discuss its extension to nominal
automata.
NL∗ crucially relies on the use of residual finite-state automata (RFSA) (Denis, et
al., 2002), which are NFAs admitting unique minimal canonical representatives. The
states of this automaton correspond to Myhill-Nerode right-congruence classes, but
can be exponentially smaller than the corresponding minimal DFA: Composed states,
language-equivalent to sets of other states, can be dropped.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

S, E ← {ϵ}

repeat
while (S, E) is not RFSA-closed or not RFSA-consistent do
if (S, E) is not RFSA-closed then
find s ∈ S, a ∈ A such that row(sa) ∈ PR(S, E) ∖ PR⊤ (S, E)
S ← S ∪ {sa}

end if
if (S, E) is not RFSA-consistent then
find s1 , s2 ∈ S, a ∈ A, and e ∈ E such that
row(s1 ) ⊑ row(s2 ) and ℒ(s1 ae) = 1, ℒ(s2 ae) = 0
E ← E ∪ {ae}

end if
end while
Make the conjecture N(S, E)
if the Teacher replies no, with a counter-example t then
E ← E ∪ suff(t)

end if
until the Teacher replies yes to the conjecture N(S, E)
return N(S, E)
Algorithm 5.2

Algorithm for learning NFAs by Bollig, et al. (2009).

The algorithm NL∗ equips the observation table (S, E) with a union operation, allowing
for the detection of composed and prime rows.
Definition 12. Let (row(s1 ) ⊔ row(s2 ))(e) = row(s1 )(e) ∨ row(s2 )(e) (regarding cells
as booleans). This operation induces an ordering between rows: row(s1 ) ⊑ row(s2 )
whenever row(s1 )(e) = 1 implies row(s2 )(e) = 1, for all e ∈ E.
A row row(s) is composed if row(s) = row(s1 ) ⊔ ⋯ ⊔ row(sn ), for row(si ) ≠ row(s).
Otherwise it is prime. We denote by PR⊤ (S, E) the rows in the top part of the table
(ranging over S) which are prime w.r.t. the whole table (not only w.r.t. the top part).
We write PR(S, E) for all the prime rows of (S, E).

Learning Nominal Automata

95

As in L∗ , states of hypothesis automata will be rows of (S, E) but, as the aim is to
construct a minimal RFSA, only prime rows are picked. New notions of closedness
and consistency are introduced, to reflect features of RFSAs.
Definition 13. A table (S, E) is:
– RFSA-closed if, for all t ∈ S⋅A, row(t) = ⨆{row(s) ∈ PR⊤ (S, E) | row(s) ⊑ row(t)};
– RFSA-consistent if, for all s1 , s2 ∈ S and a ∈ A, row(s1 )⊑row(s2 ) implies row(s1 a)⊑
row(s2 a).
If (S, E) is not RFSA-closed, then there is a row in the bottom part of the table which is
prime, but not contained in the top part. This row is then added to S (line 5). If (S, E) is
not RFSA-consistent, then there is a suffix which does not preserve the containment of
two existing rows, so those rows are actually incomparable. A new column is added to
distinguish those rows (line 10). Notice that counterexamples supplied by the teacher
are added to columns (line 16). Indeed, it is shown by Bollig, et al. (2009) that treating
the counterexamples as in the original L∗ , namely adding them to rows, does not lead
to a terminating algorithm.
Definition 14. Given a RFSA-closed and RFSA-consistent table (S, E), the conjecture
automaton is N(S, E) = (Q, Q0 , F, δ), where:
– Q = PR⊤ (S, E);
– Q0 = {r ∈ Q | r ⊑ row(ϵ)};
– F = {r ∈ Q | r(ϵ) = 1};
– the transition relation is given by δ(row(s), a) = {r ∈ Q | r ⊑ row(sa)}.
As observed by Bollig, et al. (2009), N(S, E) is not necessarily a RFSA, but it is a
canonical RFSA if it is consistent with (S, E). If the algorithm terminates, then N(S, E)
must be consistent with (S, E), which ensures correctness. The termination argument
is more involved than that of L∗ , but still it relies on the minimal DFA.
Developing an algorithm to learn nominal NFAs is not an obvious extension of
NL∗ : Non-deterministic nominal languages strictly contain nominal regular languages,
so it is not clear what the developed algorithm should be able to learn. To deal with
this, we introduce a nominal notion of RFSAs. They are a proper subclass of nominal
NFAs, because they recognise nominal regular languages. Nonetheless, they are more
succinct than nominal DFAs.

4.1 Nominal Residual Finite-State Automata
Let ℒ be a nominal language and u be a finite string. The derivative of ℒ w.r.t. u is
u−1 ℒ = {v ∈ A∗ | uv ∈ ℒ}.

A language ℒ′ ⊆ 𝔸∗ is a residual of ℒ if there is u with ℒ′ = u−1 ℒ. Note that a residual
might not be equivariant, but it does have a finite support. We write R(ℒ) for the set of

96

Chapter 5

residuals of ℒ. Residuals form an orbit-finite nominal set: They are in bijection with
the state-space of the minimal nominal DFA for ℒ.
A nominal residual finite-state automaton for ℒ is a nominal NFA whose states are
subsets of such minimal automaton. Given a state q of an automaton, we write ℒ(q)
for the set of words leading from q to a set of states containing a final one.
Definition 15. A nominal residual finite-state automaton (nominal RFSA) is a nominal
NFA 𝒜 such that ℒ(q) ∈ R(ℒ(𝒜)), for all q ∈ Q𝒜 .
Intuitively, all states of a nominal RSFA recognise residuals, but not all residuals are
recognised by a single state: There may be a residual ℒ′ and a set of states Q′ such
that ℒ′ = ⋃q∈Q′ ℒ(q), but no state q′ is such that ℒ(q′ ) = ℒ′ . A residual ℒ′ is called
composed if it is equal to the union of the components it strictly contains, explicitly
ℒ′ = ∪{ℒ″ ∈ R(ℒ) | ℒ″ ⊊ ℒ′ };

otherwise it is called prime. In an ordinary RSFA, composed residuals have finitelymany components. This is not the case in a nominal RFSA. However, the set of components of ℒ′ always has a finite support, namely supp(ℒ′ ).
The set of prime residuals PR(ℒ) is an orbit-finite nominal set, and can be used
to define a canonical nominal RFSA for ℒ, which has the minimal number of states
and the maximal number of transitions. This can be regarded as obtained from the
minimal nominal DFA, by removing composed states and adding all initial states and
transitions that do not change the recognised language. This automaton is necessarily
unique.
Lemma 16. Let the canonical nominal RSFA of ℒ be (Q, Q0 , F, δ) such that:
– Q = PR(ℒ);
– Q0 = {ℒ′ ∈ Q | ℒ′ ⊆ ℒ};
– F = {ℒ′ ∈ Q | ϵ ∈ ℒ′ };
– δ(ℒ1 , a) = {ℒ2 ∈ Q | ℒ2 ⊆ a−1 ℒ1 }.
It is a well-defined nominal NFA accepting ℒ.

4.2 νNL∗
Our nominal version of NL∗ again makes use of an observation table (S, E) where S
and E are equivariant subsets of A∗ and row is an equivariant function. As in the basic
algorithm, we equip (S, E) with a union operation ⊔ and row containment relation ⊑,
defined as in Definition 12. It is immediate to verify that ⊔ and ⊑ are equivariant.
Our algorithm is a simple modification of the algorithm in Algorithm 5.2, where a
few lines are replaced:

Learning Nominal Automata
6’
11’
16’

97

S ← S ∪ orb(sa)
E ← E ∪ orb(ae)
E ← E ∪ suff(orb(t))

Switching to nominal sets, several decidability issues arise. The most critical one
is that rows may be the union of infinitely many component rows, as happens for
residuals of nominal languages, so finding all such components can be challenging.
We adapt the notion of composed to rows: row(t) is composed whenever
row(t) = ⨆{row(s) | row(s) ⊏ row(t)}.

where ⊏ is strict row inclusion; otherwise row(t) is prime.
We now check that three relevant parts of our algorithm terminate.
1. Row containment check. The basic containment check row(s) ⊑ row(t) is decidable, as row(s) and row(t) are supported by the finite supports of s and t respectively.
2. RFSA-Closedness and RFSA-Consistency Checks. (Line 3)
We first show that prime rows form orbit-finite nominal sets.
Lemma 17. PR(S, E), PR⊤ (S, E) and PR(S, E) ∖ PR⊤ (S, E) are orbit-finite nominal sets.
Consider now RFSA-closedness. It requires computing the set C(row(t)) of components of row(t) contained in PR⊤ (S, E) (possibly including row(t)). This may not be
equivariant under permutations Perm(𝔸), but it is if we pick a subgroup.
Lemma 18. The set C(row(t)) has the following properties:
– supp(C(row(t))) ⊆ supp(row(t)).
– it is equivariant and orbit-finite under the action of the group
Gt = {π ∈ Perm(𝔸) | π|supp(row(t)) = id}

of permutations fixing supp(row(t)).
We established that C(row(t)) can be effectively computed, and the same holds for
⨆ C(row(t)). In fact, ⨆ is equivariant w.r.t. the whole Perm(𝔸) and then, in particular,
w.r.t. Gt , so it preserves orbit-finiteness. Now, to check row(t) = ⨆ C(row(t)), we can
just pick one representative of every orbit of S⋅A, because we have C(π ⋅ row(t)) =
π ⋅ C(row(t)) and permutations distribute over ⊔, so permuting both sides of the
equation gives again a valid equation.
For RFSA-consistency, consider the two sets
N = {(s1 , s2 ) ∈ S × S | row(s1 ) ⊑ row(s2 )}, and
M = {(s1 , s2 ) ∈ S × S | ∀a ∈ A : row(s1 a) ⊑ row(s2 a)}.

98

Chapter 5

They are both orbit-finite nominal sets, by equivariance of row, ⊑ and A. We can check
RFSA-consistency in finite time by picking orbit representatives from N and M. For
each representative n ∈ N, we look for a representative m ∈ M and a permutation π
such that n = π ⋅ m. If no such m and π exist, then n does not belong to any orbit of
M, so it violates RFSA-consistency.
3. Finding Witnesses for Violations. (Lines 5 and 10) We can find witnesses by
comparing orbit representatives of orbit-finite sets, as we did with RFSA-consistency.
Specifically, we can pick representatives in S × A and S × S × A × E and check them
against the following orbit-finite nominal sets:
– {(s, a) ∈ S × A | row(sa) ∈ PR(S, E) ∖ PR⊤ (S, E)};
– {(s1 , s2 , a, e) ∈ S × S × A × E | row(s1 a)(e) = 1, row(s2 a)(e) = 0, row(s1 ) ⊑ row(s2 )};

4.3 Correctness
Now we prove correctness and termination of the algorithm. First, we prove that
hypothesis automata are nominal NFAs.
Lemma 19. The hypothesis automaton N(S, E) (see Definition 14) is a nominal NFA.
N(S, E), as in ordinary NL∗ , is not always a nominal RFSA. However, we have the

following.
Theorem 20. If the table (S, E) is RFSA-closed, RFSA-consistent and N(S, E) is consistent with (S, E), then N(S, E) is a canonical nominal RFSA.
This is proved by Bollig, et al. (2009) for ordinary RFSAs, using the standard theory
of regular languages. The nominal proof is exactly the same, using derivatives of
nominal regular languages and nominal RFSAs as defined in Section 4.1.
Lemma 21. The table (S, E) cannot have more than n orbits of distinct rows, where
n is the number of orbits of the minimal nominal DFA for the target language.
Proof. Rows are residuals of ℒ, which are states of the minimal nominal DFA for ℒ, so
orbits cannot be more than n.
□
Theorem 22. The algorithm νNL∗ terminates and returns the canonical nominal
RFSA for ℒ.
Proof. If the algorithm terminates, then it must return the canonical nominal RFSA
for ℒ by Theorem 20. We prove that a table can be made RFSA-closed and RFSAconsistent in finite time. This is similar to the proof of Theorem 7 and is inspired by
the proof of Theorem 2 of Bollig, et al. (2009).

Learning Nominal Automata

99

If the table is not RFSA-closed, we find a row s ∈ S⋅A such that row(s) ∈ PR(S, E) ∖
PR⊤ (S, E). The algorithm then adds orb(s) to S. Since s was nonequivalent to all upper
prime rows, and thus from all the rows indexed by S, we find that S ∪ orb(t)/∼ has
strictly more orbits than S/∼ (recall that s ∼ t ⟺ row(s) = row(t)). This addition
can only be done finitely many times, because the number of orbits of S/∼ is bounded,
by Lemma 21.
Now, the case of RFSA-consistency needs some additional notions. Let R be the
(orbit-finite) nominal set of all rows, and let I = {(r, r′ ) ∈ R × R | r ⊏ r′ } be the set of all
inclusion relations among rows. The set I is orbit-finite. In fact, consider
J = {(s, t) ∈ (S ∪ S⋅A) × (S ∪ S⋅A) | row(s) ⊏ row(t)}.

This set is an equivariant, thus orbit-finite, subset of (S ∪ S⋅A) × (S ∪ S⋅A). The set I is
the image of J via row × row, which is equivariant, so it preserves orbit-finiteness.
Now, suppose the algorithm finds two elements s1 , s2 ∈ S with row(s1 ) ⊑ row(s2 )
but row(s1 a)(e) = 1 and row(s2 a)(e) = 0 for some a ∈ A and e ∈ E. Adding a column
to fix RFSA-consistency may: C1) increase orbits of (S ∪ S⋅A)/∼, or; C2) decrease orbits
of I, or; C3) decrease local symmetries/increase dimension of one orbit of rows. In
fact, if no new rows are added (C1), we have two cases.
– If row(s1 ) ⊏ row(s2 ), i.e., (row(s1 ), row(s2 )) ∈ I, then row′ (s1 ) ̸⊏ row′ (s2 ), where
row′ is the new table. Therefore the orbit of (row′ (s1 ), row′ (s2 )) is not in I. Moreover, row′ (s) ⊏ row′ (t) implies row(s) ⊏ row(t) (as no new rows are added), so
no new pairs are added to I. Overall, I has less orbits (C2).
– If row(s1 ) = row(s2 ), then we must have row(s1 ) = π ⋅ row(s1 ), for some π, because
lines 4–7 forbids equal rows in different orbits. In this case row′ (s1 ) ≠ π ⋅ row′ (s1 )
and we can use part of the proof of Theorem 7 to see that the orbit of row′ (s1 ) has
bigger dimension or less local symmetries than that of row(s1 ) (C3).
Orbits of (S∪S⋅A)/∼ and of I are finitely-many, by Lemma 21 and what we proved above.
Moreover, local symmetries can decrease finitely many times, and the dimension of
each orbit of rows is bounded by the dimension of the minimal DFA state-space.
Therefore all the above changes can happen finitely many times.
We have proved that the table eventually becomes RFSA-closed and RFSA-consistent.
Now we prove that a finite number of equivalence queries is needed to reach the final
hypothesis automaton. To do this, we cannot use a suitable version of Lemma 6, because this relies on N(S, E) being consistent with (S, E), which in general is not true
(see (Bollig, et al., 2008) for an example of this). We can, however, use an argument
similar to that for RFSA-consistency, because the algorithm adds columns in response
to counterexamples. Let w the counterexample provided by the teacher. When 16′
is executed, the table must change. In fact, by Lemma 2 of Bollig, et al. (2009), if it
does not, then w is already correctly classified by N(S, E), which is absurd. We have
the following cases. E1) orbits of (S ∪ S⋅A)/∼ increase (C1). Or, E2) either: Orbits in
PR(S, E) increase, or any of the following happens: Orbits in I decrease (C2), local
symmetries/dimension of an orbit of rows change (C3). In fact, if E1 does not happen

100

Chapter 5

and PR(S, E), I and local symmetries/dimension of orbits of rows do not change, the
automaton 𝒜 for the new table coincides with N(S, E). But N(S, E) = 𝒜 is a contradiction, because 𝒜 correctly classifies w (by Lemma 2 of Bollig, et al. (2009), as w
now belongs to columns), whereas N(S, E) does not. Both E1 and E2 can only happen
finitely many times.
□

4.4 Query Complexity
We now give bounds for the number of equivalence and membership queries needed
by νNL∗ . Let n be the number of orbits of the minimal DFA M for the target language
and let k be the dimension (i.e., the size of the maximum support) of its nominal set
of states.
Lemma 23.

The number of equivalence queries E′n,k is O(n2 f𝔸 (k, k) + nk log k).

Proof. In the proof of Theorem 22, we saw that equivalence queries lead to more orbits
in (S ∪ S⋅A)/∼, in PR(S, E), less orbits in I or less local symmetries/bigger dimension
for an orbit. Clearly the first two ones can happen at most n times. We now estimate
how many times I can decrease. Suppose (S ∪ S⋅A)/∼ has d orbits and ℎ orbits are
added to it. Recall that, given an orbit O of rows of dimension at most m, f𝔸 (m, m)
is an upper bound for the number of orbits in the product O × O. Since the support
of rows is bounded by k, we can give a bound for the number of orbits added to I:
dℎf𝔸 (k, k), for new pairs r ⊏ r′ with r in a new orbit of rows and r′ in an old one
(or vice versa); plus (ℎ(ℎ − 1)/2)f𝔸 (k, k), for r and r′ both in (distinct) new orbits;
plus ℎf𝔸 (k, k), for r and r′ in the same new orbit. Notice that, if PR(S, E) grows but
(S ∪ S⋅A)/∼ does not, I does not increase. By Lemma 21, ℎ, d ≤ n, so I cannot decrease
more than (n2 + n(n − 1)/2 + n)f𝔸 (k, k) times.
Local symmetries of an orbit of rows can decrease at most k log k times (see proof
of Lemma 8), and its dimension can increase at most k times. Therefore n(k + log k)
is a bound for all the orbits of rows, which are at most n, by Lemma 21. Summing up,
we get the main result.
□
Lemma 24. Let m be the length of the longest counterexample given by the teacher.
Then the table has:
– at most n orbits in S, with words of length at most n;
– at most mE′n,k orbits in E, with words of length at most mE′n,k .
Proof. By Lemma 21, the number of orbits of rows indexed by S is at most n. Now,
notice that line 5 does not add orb(sa) to S if sa ∈ S, and lines 16 and 11 cannot identify
rows, so S has at most n orbits. The length of the longest word in S must be at most
n, as S = {ϵ} when the algorithm starts, and line 6’ adds words with one additional
symbol than those in S.

Learning Nominal Automata

101

For columns, we note that both fixing RFSA-consistency and adding counterexamples increase the number of columns, but this can happen at most E′n,k times (see
proof of Lemma 23). Each time at most m suffixes are added to E.
□
We compute the maximum number of cells as in Section 3.3.
Lemma 25. The number of orbits in the lower part of the table, S⋅A, is bounded by
nlf𝔸 (pn, p).
Then C′n,k,m = n(lf𝔸 (pn, p) + 1)mE′n,k is the maximal number of cells in the table.
This bound is polynomial in n, m and l, but not in k and p.
Corollary 26.

The number of membership queries is bounded by C′n,k,m f𝔸 (pn, pmE′n,k ).

5 Implementation and Preliminary Experiments
Our algorithms for learning nominal automata operate on infinite sets of rows and
columns, and hence it is not immediately clear how to actually implement them on
a computer. We have used NLambda, a recently developed Haskell library by Klin
and Szynwelski (2016) designed to allow direct manipulation of infinite (but orbitfinite) nominal sets, within the functional programming paradigm. The semantics
of NLambda is based by Bojańczyk, et al. (2012), and the library itself is inspired
by Fresh O’Caml by Shinwell (2006), a language for functional programming over
nominal data structures with binding.

5.1 NLambda
NLambda extends Haskell with a new type Atoms. Values of this type are atomic
values that can be compared for equality and have no other discernible structure. They
correspond to the elements of the infinite alphabet 𝔸 described in Section 2.
Furthermore, NLambda provides a unary type constructor Set. This appears
similar to the the Data.Set type constructor from the standard Haskell library, but its
semantics is markedly different: Whereas the latter is used to construct finite sets, the
former has orbit-finite sets as values. The new constructor Set can be applied to a range
of equality types that include Atoms, but also the tuple type (Atoms, Atoms), the list
type [Atoms], the set type Set Atoms, and other types that provide basic infrastructure
necessary to speak of supports and orbits. All these are instances of a type class
NominalType specified in NLambda for this purpose.
NLambda, in addition to all the standard machinery of Haskell, offers primitives
to manipulate values of any nominal types τ, σ:
– empty : Set τ, returns the empty set of any type;
– atoms : Set Atoms, returns the (infinite but single-orbit) set of all atoms;

102
–
–
–
–

Chapter 5

insert : τ → Set τ → Set τ, adds an element to a set;
map : (τ → σ) → (Set τ → Set σ), applies a function to every element of a set;
sum : Set (Set τ) → Set τ, computes the union of a family of sets;
isEmpty : Set τ → Formula, checks whether a set is empty.

The type Formula has the role of a Boolean type. For technical reasons, it is distinct
from the standard Haskell type Bool, but it provides standard logical operations, e.g.,

not : Formula → Formula,

or : Formula → Formula → Formula,

as well as a conditional operator ite : Formula → τ → τ → τ that mimics the
standard if-then-else construction. It is also the result type of a built-in equality
test on atoms:

eq : Atoms → Atoms → Formula.
Using these primitives, one builds more functions to operate on orbit-finite sets, such
as a function to build singleton sets:

singleton : τ → Set τ
singleton x = insert x empty
or a filtering function to select elements that satisfy a given predicate:

filter : (τ → Formula) → Set τ → Set τ
filter p s = sum (map (λx. ite (p x) (singleton x) empty) s)
or functions to quantify a predicate over a set:

exists, forall : (τ → Formula) → Set τ → Formula
exists p s = not (isEmpty (filter p s))
forall p s = isEmpty (filter (λx. not (p x)) s)
and so on. Note that these functions are written in exactly the same way as they would
be for finite sets and the standard Data.Set type. This is not an accident, and indeed
the programmer can use the convenient set-theoretic intuition of NLambda primitives.
For example, one could conveniently construct various orbit-finite sets such as the set
of all pairs of atoms:

atomPairs = sum (map (λx. map (λy. (x, y)) atoms) atoms),
the set of all pairs of distinct atoms:

distPairs = filter (λ(x, y). not (eq x y)) atomPairs
and so on.
It should be stressed that all these constructions terminate in finite time, even
though they formally involve infinite sets. To achieve this, values of orbit-finite set

Learning Nominal Automata

103

types Set τ are internally not represented as lists or trees of elements of type τ. Instead, they are stored and manipulated symbolically, using first-order formulas over
variables that range over atom values. For example, the value of distPairs above is
stored as the formal expression:
{(a, b) | a, b ∈ 𝔸, a ≠ b}

or, more specifically, as a triple:
– a pair (a, b) of “atom variables”,
– a list [a, b] of those atom variables that are bound in the expression (in this case,
the expression contains no free variables),
– a formula a ≠ b over atom variables.
All the primitives listed above, such as isEmpty, map and sum, are implemented on
this internal representation. In some cases, this involves checking the satisfiability of
certain formulas over atoms. In the current implementation of NLambda, an external
SMT solver Z3 (de Moura & Bjørner, 2008) is used for that purpose. For example, to
evaluate the expression isEmpty distPairs, NLambda makes a system call to the
SMT solver to check whether the formula a ≠ b is satisfiable in the first-order theory
of equality and, after receiving the affirmative answer, returns the value False.
For more details about the semantics and implementation of NLambda, see Klin
and Szynwelski (2016). The library itself can be downloaded from https://www
.mimuw.edu.pl/~szynwelski/nlambda/.

5.2 Implementation of νL∗ and νNL∗
Using NLambda we implemented the algorithms from Sections 3 and 4. We note that
the internal representation is slightly different than the one discussed in Section 3.
Instead of representing the table (S, E) with actual representatives of orbits, the sets
are represented logically as described above. Furthermore, the control flow of the
algorithm is adapted to fit in the functional programming paradigm. In particular,
recursion is used instead of a while loop. In addition to the nominal adaptation of Angluin’s algorithm νL∗ , we implemented a variant, νL∗col which adds counterexamples
to the columns instead of rows.
Target automata are defined using NLambda as well, using the automaton data
type provided by the library. Membership queries are already implemented by the
library. Equivalence queries are implemented by constructing a bisimulation (recall
that bisimulation implies language equivalence), where a counterexample is obtained
when two DFAs are not bisimilar. For nominal NFAs, however, we cannot implement
a complete equivalence query as their language equivalence is undecidable. We
approximated the equivalence by bounding the depth of the bisimulation for nominal
NFAs. As an optimisation, we use bisimulation up to congruence as described by
Bonchi and Pous (2015). Having an approximate teacher is a minor issue since in

104

Chapter 5

many applications no complete teacher can be implemented and one relies on testing
(Aarts, et al., 2015 and Bollig, et al., 2013). For the experiments listed here the bound
was chosen large enough for the learner to terminate with the correct automaton.
The code can be found at https://github.com/Jaxan/nominal-lstar.

5.3 Test Cases
To provide a benchmark for future improvements, we tested our algorithms on simple
automata described below. We report results in Table 5.1. The experiments were
performed on a machine with an Intel Core i5 (Skylake, 2.4 GHz) and 8 GB RAM.
Model
FIFO0
FIFO1
FIFO2
FIFO3
FIFO4
FIFO5
ℒ0
ℒ1
ℒ2
ℒ′0
ℒ′1
ℒ′2
ℒ′3
ℒeq

2
0
3
1
5
2
10
3
25
4
77
5
2
0
4
1
7
2
3
1
5
1
9
1
17
1
n/a n/a

νL∗ (s)

νL∗ col (s)

1.9
12.9
45.6
189
370
1337
1.3
29.6
229
4.4
15.4
46.3
89.0
n/a

1.9
7.4
22.6
107
267
697
1.4
4.7
23.1
4.9
15.4
40.5
66.8
n/a

νNL∗ (s)

2
3
5
10
25

0
1
2
3
4

2.4
17.3
70.3
476
1230

∞

∞

∞

2
4
7
3
4
5
6
3

0
1
2
1
1
1
1
1

1.4
8.9
84.7
11.3
66.4
210
566
16.3

Table 5.1 Results of experiments. The column DFA (resp.
RFSA) shows the number of orbits (left sub-column) and dimension (right sub-column) of the learnt minimal DFA (resp.
canonical RFSA). We use ∞ when the running time is too high.
Queue Data Structure. A queue is a data structure to store elements which can later
be retrieved in a first-in, first-out order. It has two operations: push and pop. We define
the alphabet ΣFIFO = {push(a), pop(a) | a ∈ 𝔸}. The language FIFOn contains all valid
traces of push and pop using a bounded queue of size n. The minimal nominal DFA
for FIFO2 is given in Figure 5.2.
The state reached from q1,x via push(x) is omitted: Its outgoing transitions are
those of q2,x,y , where y is replaced by x. Similar benchmarks appear in (Aarts, et al.,
2015 and Isberner, et al., 2014).
Double Word. ℒn = {ww | w ∈ 𝔸n } from Section 1.

Learning Nominal Automata

push(y)

push(x)

pop(𝔸)
∗

q2,x,y

q1,x

q0

105

pop(x)

pop(x) to q1,y

pop(≠ x)

⊥

pop(≠ x)
push(𝔸)
Figure 5.2

A nominal automaton accepting FIFO2 .

NFA. Consider the language ℒeq = ⋃a∈𝔸 𝔸∗ a𝔸∗ a𝔸∗ of words where some letter
appears twice. This is accepted by an NFA which guesses the position of the first
occurrence of a repeated letter a and then waits for the second a to appear. The
language is not accepted by a DFA (Bojańczyk, et al., 2014). Despite this νNL∗ is able
to learn the automaton shown in Figure 5.3.
𝔸

𝔸

𝔸
x

x
q′1,x

q′0

q′2
𝔸 to any q′2,x

𝔸
y to q′2,y
𝔸

Figure 5.3 A nominal NFA accepting ℒeq . Here, the transition
from q′2 to q′1,x is defined as δ(q′2 , a) = {q′1,b | b ∈ 𝔸}.
𝐧-last Position.

A prototypical example of regular languages which are accepted
by very small NFAs is the set of words where a distinguished symbol a appears
on the n-last position (Bollig, et al., 2009). We define a similar nominal language
ℒ′n = ⋃a∈𝔸 a𝔸∗ a𝔸n . To accept such words non-deterministically, one simply guesses
the n-last position. This language is also accepted by a much larger deterministic
automaton.

6 Related Work
This section compares νL∗ with other algorithms from the literature. We stress that no
comparison is possible for νNL∗ , as it is the first learning algorithm for non-deterministic
automata over infinite alphabets.

106

Chapter 5

The first one to consider learning automata over infinite alphabets was Sakamoto
(1997). In his work the problem is reduced to L∗ with some finite sub-alphabet. The
sub-alphabet grows in stages and L∗ is rerun at every stage, until the alphabet is
big enough to capture the whole language. In Sakamoto’s approach, any learning
algorithm can be used as a back-end. This, however, comes at a cost: It has to be rerun
at every stage, and each symbol is treated in isolation, which might require more
queries. Our algorithm νL∗ , instead, works with the whole alphabet from the very
start, and it exploits its symmetry. An example is in Sections 1.1 and 1.2: The ordinary
learner uses four equivalence queries, whereas the nominal one, using the symmetry,
only needs three. Moreover, our algorithm is easier to generalise to other alphabets
and computational models, such as non-determinism.
More recently papers appeared on learning register automata by Cassel, et al.
(2016) and Howar, et al. (2012). Their register automata are as expressive as our
deterministic nominal automata. The state space is similar to our orbit-wise representation: It is formed by finitely many locations with registers. Transitions are defined
symbolically using propositional logic. We remark that the most recent paper by
Cassel, et al. (2016) generalises the algorithm to alphabets with different structures
(which correspond to different atom symmetries in our work), but at the cost of changing Angluin’s framework. Instead of membership queries the algorithm requires more
sophisticated tree queries. In our approach, using a different symmetry does not affect
neither the algorithm nor its correctness proof. Tree queries can be reduced to membership queries by enumerating all n-types for some n (n-types in logic correspond
to orbits in the set of n-tuples). Keeping that in mind, their complexity results are
roughly the same as ours, although this is hard to verify, as they do not give bounds
on the length of individual tree queries. Finally, our approach lends itself better to be
extended to other variations on L∗ (of which many exist), as it is closer to Angluin’s
original work.
Another class of learning algorithms for systems with large alphabets is based
on abstraction and refinement, which is orthogonal to the approach in this thesis
but connections and possible transference of techniques are worth exploring in the
future. Aarts, et al. (2015) reduce the alphabet to a finite alphabet of abstractions, and
L∗ for ordinary DFAs over such finite alphabet is used. Abstractions are refined by
counterexamples. Other similar approaches are by Howar, et al. (2011) and Isberner,
et al. (2013), where global and local per-state abstractions of the alphabet are used,
and by Mens (2017), where the alphabet can also have additional structure (e.g., an
ordering relation). We also mention that Botincan and Babic (2013) give a framework
for learning symbolic models of software behaviour.
Berg, et al. (2006 and 2008) cope with an infinite alphabet by running L∗ (adapted
to Mealy machines) using a finite approximation of the alphabet, which may be
augmented when equivalence queries are answered. A smaller symbolic model is
derived subsequently. Their approach, unlike ours, does not exploit the symmetry

Learning Nominal Automata

107

over the full alphabet. The symmetry allows our algorithm to reduce queries and to
produce the smallest possible automaton at every step.
Finally we compare with results on session automata (Bollig, et al., 2013). Session
automata are defined over finite alphabets just like the work by Sakamoto. However,
session automata are more restrictive than deterministic nominal automata. For example, the model cannot capture an acceptor for the language of words where consecutive
data values are distinct. This language can be accepted by a three orbit nominal DFA,
which can be learned by our algorithm.
We implemented our algorithms in the nominal library NLambda as sketched
before. Other implementation options include Fresh OCaml (Shinwell, 2006), a functional programming language designed for programming over nominal data structures with binding, and Lois by Kopczyński and Toruńczyk (2016 and 2017), a C++
library for imperative nominal programming. We chose NLambda for its convenient
set-theoretic primitives, but the other options remain to be explored, in particular the
low-level Lois could be expected to provide more efficient implementations.

7 Discussion and Future Work
In this chapter we defined and implemented extensions of several versions of L∗ and
of NL∗ for nominal automata.
We highlight two features of our approach:
– It has strong theoretical foundations: The theory of nominal languages, covering different alphabets and symmetries (see Section 2.1); category theory, where nominal
automata have been characterised as coalgebras (Ciancia & Montanari, 2010 and
Kozen, et al., 2015) and many properties and algorithms (e.g., minimisation) have
been studied at this abstract level.
– It follows a generic pattern for transporting computation models and algorithms
from finite sets to nominal sets, which leads to simple correctness proofs.
These features pave the way to several extensions and improvements.
Future work includes a general version of νNL∗ , parametric in the notion of sideeffect (an example is non-determinism). Different notions will yield models with
different degree of succinctness w.r.t. deterministic automata. The key observation
here is that many forms of non-determinism and other side effects can be captured via
the categorical notion of monad, i.e., an algebraic structure, on the state-space. Monads
allow generalising the notion of composed and prime state: A state is composed
whenever it is obtained from other states via an algebraic operation. Our algorithm
νNL∗ is based on the powerset monad, representing classical non-determinism. We
are currently investigating a substitution monad, where the operation is “applying a
(possibly non-injective) substitution of atoms in the support”. A minimal automaton
over this monad, akin to a RFSA, will have states that can generate all the states of the
associated minimal DFA via a substitution, but cannot be generated by other states

108

Chapter 5

(they are prime). For instance, we can give an automaton over the substitution monad
that recognises ℒ2 from Section 1:
≠x
≠y

x, [y ↦ x]
q0

x

qx

y

qxy

x

qy

y

q1

A

q2

A

Here [y ↦ x] means that, if that transition is taken, qxy (hence its language) is subject
to y ↦ x. In general, the size of the minimal DFA for ℒn grows more than exponentially
with n, but an automaton with substitutions on transitions, like the one above, only
needs 𝒪(n) states. This direction is investigated in Chapter 7.
In principle, thanks to the generic approach we have taken, all our algorithms
should work for various kinds of atoms with more structure than just equality, as
advocated by Bojańczyk, et al. (2014). Details, such as precise assumptions on the
underlying structure of atoms necessary for proofs to go through, remain to be checked.
In the next chapter (Chapter 6), we investigate learning with the total order symmetry.
We implement this in NLambda, as well as a new tool for computing with nominal
sets over the total order symmetry.
The efficiency of our current implementation, as measured in Section 5.3, leaves
much to be desired. There is plenty of potential for running time optimisation, ranging
from improvements in the learning algorithms itself, to optimisations in the NLambda
library (such as replacing the external and general-purpose SMT solver with a purposebuilt, internal one, or a tighter integration of nominal mechanisms with the underlying
Haskell language as it was done by Shinwell, 2006), to giving up the functional
programming paradigm for an imperative language such as LOIS (Kopczyński &
Toruńczyk, 2016 and 2017).

Acknowledgements
We thank Frits Vaandrager and Gerco van Heerdt for useful comments and discussions.
We also thank the anonymous reviewers.

Chapter 6
Fast Computations on Ordered Nominal Sets
David Venhoek
Radboud University

Joshua Moerman
Radboud University
Jurriaan Rot
Radboud University

Abstract
We show how to compute efficiently with nominal sets over the total order symmetry by developing a direct representation of such nominal sets
and basic constructions thereon. In contrast to previous approaches, we
work directly at the level of orbits, which allows for an accurate complexity analysis. The approach is implemented as the library Ons (Ordered
Nominal Sets).
Our main motivation is nominal automata, which are models for
recognising languages over infinite alphabets. We evaluate Ons in two
applications: minimisation of automata and active automata learning.
In both cases, Ons is competitive compared to existing implementations
and outperforms them for certain classes of inputs.

This chapter is based on the following publication:
Venhoek, D., Moerman, J., & Rot, J. (2018). Fast Computations on Ordered Nominal Sets. In Theoretical Aspects of Computing - ICTAC - 15th International Colloquium,
Proceedings. Springer. doi:10.1007/978-3-030-02508-3_26

110

Chapter 6

Automata over infinite alphabets are natural models for programs with unbounded
data domains. Such automata, often formalised as register automata, are applied in
modelling and analysis of communication protocols, hardware, and software systems
(see Bojańczyk, et al., 2014; D’Antoni & Veanes, 2017; Grigore & Tzevelekos, 2016;
Kaminski & Francez, 1994; Montanari & Pistore, 1997; Segoufin, 2006 and references
therein). Typical infinite alphabets include sequence numbers, timestamps, and identifiers. This means one can model data flow in such automata beside the basic control
flow provided by ordinary automata. Recently, it has been shown in a series of papers
that such models are amenable to learning (Aarts, et al., 2015; Bollig, et al., 2013;
Cassel, et al., 2016; Drews & D’Antoni, 2017; Moerman, et al., 2017; Vaandrager, 2017)
with the verification of (closed source) TCP implementations by Fiterău-Broștean, et
al. (2016) as a prominent example.
A foundational approach to infinite alphabets is provided by the notion of nominal
set, originally introduced in computer science as an elegant formalism for name binding (Gabbay & Pitts, 2002 and Pitts, 2016). Nominal sets have been used in a variety
of applications in semantics, computation, and concurrency theory (see Pitts, 2013 for
an overview). Bojańczyk, et al. (2014) introduce nominal automata, which allow one to
model languages over infinite alphabets with different symmetries. Their results are
parametric in the structure of the data values. Important examples of data domains
are ordered data values (e.g., timestamps) and data values that can only be compared
for equality (e.g., identifiers). In both data domains, nominal automata and register
automata are equally expressive.
Important for applications of nominal sets and automata are implementations. A
couple of tools exist to compute with nominal sets. Notably, Nλ (Klin & Szynwelski,
2016) and Lois (Kopczyński & Toruńczyk, 2016 and 2017) provide a general purpose
programming language to manipulate infinite sets.19 Both tools are based on SMT
solvers and use logical formulas to represent the infinite sets. These implementations
are very flexible, and the SMT solver does most of the heavy lifting, which makes the
implementations themselves relatively straightforward. Unfortunately, this comes at a
cost as SMT solving is in general Pspace-hard. Since the formulas used to describe sets
tend to grow as more calculations are done, running times can become unpredictable.
In this chapter, we use a direct representation based on symmetries and orbits, to
represent nominal sets. We focus on the total order symmetry, where data values are
rational numbers and can be compared for their order. Nominal automata over the
total order symmetry are more expressive than automata over the equality symmetry
(i.e., traditional register automata of Kaminski & Francez, 1994). A key insight is
that the representation of nominal sets from Bojańczyk, et al. (2014) becomes rather
simple in the total order symmetry; each orbit is presented solely by a natural number,
intuitively representing the number of variables or registers.

19

Other implementations of nominal techniques that are less directly related to our setting (Mihda, Fresh
OCaml, and Nominal Isabelle) are discussed in Section 5.

Fast Computations on Ordered Nominal Sets

111

Our main contributions include the following.
– We develop the representation theory of nominal sets over the total order symmetry.
We give concrete representations of nominal sets, their products, and equivariant
maps.
– We provide time complexity bounds for operations on nominal sets such as intersections and membership. Using those results we give the time complexity of Moore’s
minimisation algorithm (generalised to nominal automata) and prove that it is
polynomial in the number of orbits.
– Using the representation theory, we are able to implement nominal sets in a C++
library Ons. The library includes all the results from the representation theory
(sets, products, and maps).
– We evaluate the performance of Ons and compare it to Nλ and Lois, using two
algorithms on nominal automata: minimisation (Bojańczyk & Lasota, 2012) and
automata learning (Moerman, et al., 2017). We use randomly generated automata
as well as concrete, logically structured models such as FIFO queues. For random
automata, our methods are drastically faster than the other tools. On the other
hand, Lois and Nλ are faster in minimising the structured automata as they exploit
their logical structure. In automata learning, the logical structure is not available
a-priori, and Ons is faster in most cases.
The structure of this chapter is as follows. Section 1 contains background on nominal sets and their representation. Section 2 describes the concrete representation of
nominal sets, equivariant maps and products in the total order symmetry. Section 3
describes the implementation Ons with complexity results, and Section 4 the evaluation of Ons on algorithms for nominal automata. Related work is discussed in
Section 5, and future work in Section 6.

1 Nominal sets
Nominal sets are infinite sets that carry certain symmetries, allowing a finite representation in many interesting cases. We recall their formalisation in terms of group
actions, following Bojańczyk, et al. (2014) and Pitts (2013), to which we refer for an
extensive introduction.

1.1 Group actions
Let G be a group and X be a set. A (left) G-action is a function ⋅ : G × X → X satisfying
1 ⋅ x = x and (ℎg) ⋅ x = ℎ ⋅ (g ⋅ x) for all x ∈ X and g, ℎ ∈ G. A set X with a G-action is
called a G-set and we often write gx instead of g ⋅ x. The orbit of an element x ∈ X is
the set {gx | g ∈ G}. A G-set is always a disjoint union of its orbits (in other words, the
orbits partition the set). We say that X is orbit-finite if it has finitely many orbits, and
we denote the number of orbits by N(X).

112

Chapter 6

A map f : X → Y between G-sets is called equivariant if it preserves the group action,
i.e., for all x ∈ X and g ∈ G we have g ⋅ f(x) = f(g ⋅ x). If an equivariant map f is
bijective, then f is an isomorphism and we write X ≅ Y. A subset Y ⊆ X is equivariant
if the corresponding inclusion map is equivariant. The product of two G-sets X and
Y is given by the Cartesian product X × Y with the point-wise group action on it, i.e.,
g(x, y) = (gx, gy). Union and intersection of X and Y are well-defined if the two actions
agree on their common elements.

1.2 Nominal sets
A data symmetry is a pair (𝒟, G) where 𝒟 is a set and G is a subgroup of Sym(𝒟),
the group of bijections on 𝒟. Note that the group G naturally acts on 𝒟 by defining
gx = g(x). In the most studied instance, called the equality symmetry, 𝒟 is a countably
infinite set and G = Sym(𝒟). In this chapter, we focus on the total order symmetry given
by 𝒟 = ℚ and G = {π | π ∈ Sym(ℚ), π is monotone}.
Let (𝒟, G) be a data symmetry and X be a G-set. A set of data values S ⊆ 𝒟 is
called a support of an element x ∈ X if for all g ∈ G with ∀s ∈ S : gs = s we have gx = x.
A G-set X is called nominal if every element x ∈ X has a finite support.
Example 1. We list several examples for the total order symmetry. The set ℚ2 is
nominal as each element (q1 , q2 ) ∈ ℚ2 has the finite set {q1 , q2 } as its support. The
set has the following three orbits:
{(q1 , q2 ) | q1 < q2 }

{(q1 , q2 ) | q1 = q2 }

{(q1 , q2 ) | q1 > q2 }.

For a set X, the set of all subsets of size n ∈ ℕ is denoted by 𝒫n (X) = {Y ⊆ X | #Y = n}.
The set 𝒫n (ℚ) is a single-orbit nominal set for each n, with the action defined by direct
image: gY = {gy | y ∈ Y}. The group of monotone bijections also acts by direct image
on the full power set 𝒫(ℚ), but this is not a nominal set. For instance, the set ℤ ∈ 𝒫(ℚ)
of integers has no finite support.
If S ⊆ 𝒟 is a support of an element x ∈ X, then any set S′ ⊆ 𝒟 such that S ⊆ S′ is also a
support of x. A set S ⊂ 𝒟 is a least finite support of x ∈ X if it is a finite support of x and
S ⊆ S′ for any finite support S′ of x. The existence of least finite supports is crucial
for representing orbits. Unfortunately, even when elements have a finite support, in
general they do not always have a least finite support. A data symmetry (𝒟, G) is said
to admit least supports if every element of every nominal set has a least finite support.
Both the equality and the total order symmetry admit least supports. (Bojańczyk, et al.,
2014 give additional (counter)examples of data symmetries admitting least supports.)
Having least finite supports is useful for a finite representation. Henceforth, we will
write least support to mean least finite support.
Given a nominal set X, the size of the least support of an element x ∈ X is denoted
by dim(x), the dimension of x. We note that all elements in the orbit of x have the same

Fast Computations on Ordered Nominal Sets

113

dimension. For an orbit-finite nominal set X, we define dim(X) = max{dim(x) | x ∈ X}.
For a single-orbit set O, observe that dim(O) = dim(x) where x is any element x ∈ O.

1.3 Representing nominal orbits
We represent nominal sets as collections of single orbits. The finite representation
of single orbits is based on the theory of Bojańczyk, et al. (2014), which uses the
technical notions of restriction and extension. We only briefly report their definitions
here. However, the reader can safely move to the concrete representation theory in
Section 2 with only a superficial understanding of Theorem 2 below.
The restriction of an element π ∈ G to a subset C ⊆ 𝒟, written as π|C , is the
restriction of the function π : 𝒟 → 𝒟 to the domain C. The restriction of a group G to
a subset C ⊆ 𝒟 is defined as G|C = {π|C | π ∈ G, πC = C}. The extension of a subgroup
S ≤ G|C is defined as extG (S) = {π ∈ G | π|C ∈ S}. For C ⊆ 𝒟 and S ≤ G|C , define
[C, S]ec = {{gs | s ∈ extG (S)} | g ∈ G}, i.e., the set of right cosets of extG (S) in G. Then
[C, S]ec is a single-orbit nominal set.
Using the above, we can formulate the representation theory from Bojańczyk, et
al. (2014). This gives a finite description for all single-orbit nominal sets X, namely a
finite set C together with some of its symmetries.
Theorem 2. Let X be a single-orbit nominal set for a data symmetry (𝒟, G) that
admits least supports and let C ⊆ 𝒟 be the least support of some element x ∈ X. Then
there exists a subgroup S ≤ G|C such that X ≅ [C, S]ec .
The proof by Bojańczyk, et al. (2014) uses a bit of category theory: it establishes an
equivalence of categories between single-orbit sets and the pairs (C, S). We will not
use the language of category theory much in order to keep the chapter self-contained.

2 Representation in the total order symmetry
This section develops a concrete representation of nominal sets over the total order
symmetry, as well as their equivariant maps and products. It is based on the abstract
representation theory from Section 1.3. From now on, by nominal set we always refer
to a nominal set over the total order symmetry. Hence, our data domain is ℚ and we
take G to be the group of monotone bijections.

2.1 Orbits and nominal sets
From the representation in Section 1.3, we find that any single-orbit set X can be
represented as a tuple (C, S). Our first observation is that the finite group S of ‘local

114

Chapter 6

symmetries’ in this representation is always trivial, i.e., S = I, where I = {1} is the
trivial group. This follows from the following lemma and S ≤ G|C .
Lemma 3.

For every finite subset C ⊂ ℚ, we have G|C = I.

Immediately, we see that (C, S) = (C, I), and hence that the orbit is fully represented
by the set C. A further consequence of Lemma 3 is that each element of an orbit can be
uniquely identified by its least support. This leads us to the following characterisation
of [C, I]ec .
Lemma 4.

Given a finite subset C ⊂ ℚ, we have [C, I]ec ≅ 𝒫#C (ℚ).

By Theorem 2 and the above lemmas, we can represent an orbit by a single integer
n, the size of the least support of its elements. This naturally extends to (orbit-finite)
nominal sets with multiple orbits by using a multiset of natural numbers, representing
the size of the least support of each of the orbits. These multisets are formalised here
as functions f : ℕ → ℕ.
Definition 5.

Given a function f : ℕ → ℕ, we define a nominal set [f]o by
[f]o =

∪

{i} × 𝒫n (ℚ).

n∈ℕ
1≤i≤f(n)

Proposition 6. For every orbit-finite nominal set X, there is a function f : ℕ → ℕ such
that X ≅ [f]o and the set {n | f(n) ≠ 0} is finite. Furthermore, the mapping between X
and f is one-to-one (up to isomorphism of nominal sets) when restricting to f : ℕ → ℕ
for which the set {n | f(n) ≠ 0} is finite.
The presentation in terms of a function f : ℕ → ℕ enforces that there are only finitely
many orbits of any given dimension. The first part of the above proposition generalises
to arbitrary nominal sets by replacing the codomain of f by the class of all sets and
adapting Definition 5 accordingly. However, the resulting correspondence will no
longer be one-to-one.
As a brief example, let us consider the set ℚ × ℚ. The elements (a, b) split in three
orbits, one for a < b, one for a = b and one for a > b. These have dimension 2, 1 and
2 respectively, so the set ℚ × ℚ is represented by the multiset {1, 2, 2}.

2.2 Equivariant maps
We show how to represent equivariant maps, using two basic properties. Let f : X → Y
be an equivariant map. The first property is that the direct image of an orbit (in X)
is again an orbit (in Y), that is to say, f is defined ‘orbit-wise’. Second, equivariant
maps cannot introduce new elements in the support (but they can drop them). More
precisely:

Fast Computations on Ordered Nominal Sets

115

Lemma 7. Let f : X → Y be an equivariant map, and O ⊆ X a single orbit. The direct
image f(O) = {f(x) | x ∈ O} is a single-orbit nominal set.
Lemma 8. Let f : X → Y be an equivariant map between two nominal sets X and Y.
Let x ∈ X and let C be a support of x. Then C supports f(x).
Hence, equivariant maps are fully determined by associating two pieces of information
for each orbit in the domain: the orbit on which it is mapped and a string denoting
which elements of the least support of the input are preserved. These ingredients are
formalised in the first part of the following definition. The second part describes how
these ingredients define an equivariant function. Proposition 10 then states that every
equivariant function can be described in this way.
Definition 9. Let H = {(I1 , F1 , O1 ), …, (In , Fn , On )} be a finite set of tuples where the
Ii ’s are disjoint single-orbit nominal sets, the Oi ’s are single-orbit nominal sets with
dim(Oi ) ≤ dim(Ii ), and the Fi ’s are bit strings of length dim(Ii ) with exactly dim(Oi )
ones.
Given a set H as above, we define fH : ⋃ Ii → ⋃ Oi as the unique equivariant
function such that, given x ∈ Ii with least support C, fH (x) is the unique element of
Oi with support {C(j) | Fi (j) = 1}, where Fi (j) is the j-th bit of Fi and C(j) is the j-th
smallest element of C.
Proposition 10. For every equivariant map f : X → Y between orbit-finite nominal
sets X and Y there is a set H as in Definition 9 such that f = fH .
Consider the example function min : 𝒫3 (ℚ) → ℚ which returns the smallest element
of a 3-element set. Note that both 𝒫3 (ℚ) and ℚ are single orbits. Since for the orbit
𝒫3 (ℚ) we only keep the smallest element of the support, we can thus represent the
function min with H = {(𝒫3 (ℚ), 100, ℚ)}.

2.3 Products
The product X × Y of two nominal sets is again a nominal set and hence, it can be
represented itself in terms of the dimension of each of its orbits as shown in Section 2.1.
However, this approach has some disadvantages.
Example 11. We start by showing that the orbit structure of products can be nontrivial. Consider the product of X = ℚ and the set Y = {(a, b) ∈ ℚ2 | a < b}. This
product consists of five orbits, more than one might naively expect from the fact that
both sets are single-orbit:
{(a, (b, c)) | a, b, c ∈ ℚ, a < b < c},

{(a, (a, b)) | a, b ∈ ℚ, a < b},

{(b, (a, c)) | a, b, c ∈ ℚ, a < b < c},

{(b, (a, b)) | a, b ∈ ℚ, a < b},

{(c, (a, b)) | a, b, c ∈ ℚ, a < b < c}.

116

Chapter 6

We find that this product is represented by the multiset {2, 2, 3, 3, 3}. Unfortunately,
this is not sufficient to accurately describe the product as it abstracts away from the
relation between its elements with those in X and Y. In particular, it is not possible to
reconstruct the projection maps from such a representation.
The essence of our representation of products is that each orbit O in the product
X × Y is described entirely by the dimension of O together with the two (equivariant)
projections π1 : O → X and π2 : O → Y. This combination of the orbit and the two
projection maps can already be represented using Propositions 6 and 10. However, as
we will see, a combined representation for this has several advantages. For discussing
such a representation, let us first introduce what it means for tuples of a set and two
functions to be isomorphic:
Definition 12. Given nominal sets X, Y, Z1 and Z2 , and equivariant functions l1 : Z1 →
X, r1 : Z1 → Y, l2 : Z2 → X and r2 : Z2 → Y, we define (Z1 , l1 , r1 ) ≅ (Z2 , l2 , r2 ) if there
exists an isomorphism ℎ : Z1 → Z2 such that l1 = l2 ∘ ℎ and r1 = r2 ∘ ℎ.
Our goal is to have a representation that, for each orbit O, produces a tuple (A, f1 , f2 )
isomorphic to the tuple (O, π1 , π2 ). The next lemma gives a characterisation that can
be used to simplify such a representation.
Lemma 13. Let X and Y be nominal sets and (x, y) ∈ X × Y. If C, Cx , and Cy are the
least supports of (x, y), x, and y respectively, then C = Cx ∪ Cy .
With Proposition 10 we represent the maps π1 and π2 by tuples (O, F1 , O1 ) and
(O, F2 , O2 ) respectively. Using Lemma 13 and the definitions of F1 and F2 , we see
that at least one of F1 (i) and F2 (i) equals 1 for each i.
We can thus combine the strings F1 and F2 into a single string P ∈ {L, R, B}∗ as
follows. We set P(i) = L when only F1 (i) is 1, P(i) = R when only F2 (i) is 1, and
P(i) = B when both are 1. The string P fully describes the strings F1 and F2 . This
process for constructing the string P gives it two useful properties. The number of Ls
and Bs in the string gives the size dimension of O1 . Similarly, the number of Rs and Bs
in the string gives the dimension of O2 . We will call strings with that property valid.
In conclusion, to describe a single orbit of the product X × Y, a valid string P together
with the images of π1 and π2 is sufficient.
Definition 14. Let P ∈ {L, R, B}∗ , and O1 ⊆ X, O2 ⊆ Y be single-orbit sets. Given a
tuple (P, O1 , O2 ), where the string P is valid, define
[(P, O1 , O2 )]t = (𝒫|P| (ℚ), fH1 , fH2 ),

where Hi = {(𝒫|P| (ℚ), Fi , Oi )} and the string F1 is defined as the string P with Ls and
Bs replaced by 1s and Rs by 0s. The string F2 is similarly defined with the roles of L
and R swapped.

Fast Computations on Ordered Nominal Sets

117

Proposition 15. There exists a one-to-one correspondence between the orbits O ⊆
X × Y, and tuples (P, O1 , O2 ) satisfying O1 ⊆ X, O2 ⊆ Y, and where P is a valid string,
such that [(P, O1 , O2 )]t ≅ (O, π1 |O , π2 |O ).
From the above proposition it follows that we can generate the product X × Y simply
by enumerating all valid strings P for all pairs of orbits (O1 , O2 ) of X and Y. Given
this, we can calculate the multiset representation of a product from the multiset
representations of both factors.
Theorem 16.

For X ≅ [f]o and Y ≅ [g]o we have X × Y ≅ [ℎ]o , where
ℎ(n) =

n
j
f(i)g(j)
.
( j )(n − i)
0≤i,j≤n
∑

i+j≥n

Example 17. To illustrate some aspects of the above representation, let us use it to
calculate the product of Example 11. First, we observe that both ℚ and S = {(a, b) ∈
ℚ2 | a < b} consist of a single orbit. Hence any orbit of the product corresponds to a
triple (P, ℚ, S), where the string P satisfies |P|L + |P|B = dim(ℚ) = 1 and |P|R + |P|B =
dim(S) = 2. We can now find the orbits of the product ℚ × S by enumerating all
strings satisfying these equations. This yields
– LRR, corresponding to the orbit {(a, (b, c)) | a, b, c ∈ ℚ, a < b < c},
– RLR, corresponding to the orbit {(b, (a, c)) | a, b, c ∈ ℚ, a < b < c},
– RRL, corresponding to the orbit {(c, (a, b)) | a, b, c ∈ ℚ, a < b < c},
– RB, corresponding to the orbit {(b, (a, b)) | a, b ∈ ℚ, a < b}, and
– BR, corresponding to the orbit {(a, (a, b)) | a, b ∈ ℚ, a < b}.
Each product string fully describes the corresponding orbit. To illustrate, consider the
string BR. The corresponding bit strings for the projection functions are F1 = 10 and
F2 = 11. From the lengths of the string we conclude that the dimension of the orbit is
2. The string F1 further tells us that the left element of the tuple consists only of the
smallest element of the support. The string F2 indicates that the right element of the
tuple is constructed from both elements of the support. Combining this, we find that
the orbit is {(a, (a, b)) | a, b ∈ ℚ, a < b}.

2.4 Summary
We summarise our concrete representation in the following table. Propositions 6, 10
and 15 correspond to the three rows in the table.
Notice that in the case of maps and products, the orbits are inductively represented
using the concrete representation. As a base case we can represent single orbits by
their dimension.

118

Chapter 6

Object

Representation

Single orbit O
Nominal set X = ⋃i Oi

Natural number n = dim(O)
Multiset of these numbers

Map from single orbit f : O → Y The orbit f(O) and a bit string F
Equivariant map f : X → Y
Set of tuples (O, F, f(O)), one for each orbit
Orbit in a product O ⊆ X × Y
Product X × Y

The corresponding orbits of X and Y, and a string P
relating their supports
Set of tuples (P, OX , OY ), one for each orbit

Table 6.1 Overview of representation.

3 Implementation and Complexity of ONS
The ideas outlined above have been implemented in a C++ library, Ons, and a Haskell
library, Ons-hs.20 We focus here on the C++ library only, as the Haskell one is very similar. The library can represent orbit-finite nominal sets and their products, (disjoint)
unions, and maps. A full description of the possibilities is given in the documentation
included with Ons.
As an example, the following program computes the product from Example 11.
Initially, the program creates the nominal set A, containing the entirety of ℚ. Then
it creates a nominal set B, such that it consists of the orbit containing the element
(1, 2) ∈ ℚ × ℚ. For this, the library determines to which orbit of the product ℚ × ℚ
the element (1, 2) belongs, and then stores a description of the orbit as described in
Section 2. Note that this means that it internally never needs to store the element
used to create the orbit. The function nomset_product then uses the enumeration of
product strings mentioned in Section 2.3 to calculate the product of A and B. Finally,
it prints a representative element for each of the orbits in the product. These elements
are constructed based on the description of the orbits stored, filled in to make their
support equal to sets of the form {1, 2, …, n}.

nomset<rational> A = nomset_rationals();
nomset<pair<rational, rational>> B({rational(1),rational(2)});
auto AtimesB = nomset_product(A, B);
// compute the product
for (auto orbit : AtimesB)
cout << orbit.getElement() << " ";
Running this gives the following output (where /1 signifies the denominator):

(1/1,(2/1,3/1)) (1/1,(1/1,2/1)) (2/1,(1/1,3/1))
(2/1,(1/1,2/1)) (3/1,(1/1,2/1))
20

Ons can be found at https://github.com/davidv1992/ONS and Ons-hs can be found at https://github.com
/Jaxan/ons-hs/.

Fast Computations on Ordered Nominal Sets

119

Internally, orbit is implemented following the theory presented in Section 2,
storing the dimension of the orbit it represents. It also contains sufficient information
to reconstruct elements given their least support, such as the product string for orbits
resulting from a product. The class nomset then uses a standard set data structure to
store the collection of orbits contained in the nominal set it represents.
In a similar way, eqimap stores equivariant maps by associating each orbit in the
domain with the image orbit and the string representing which of the least support to
keep. This is stored using a map data structure. For both nominal sets and equivariant
maps, the underlying data structure is currently implemented using trees.

3.1 Complexity of operations
Using the concrete representation of nominal sets, we can determine the complexity
of common operations. To simplify such an analysis, we will make the following
assumptions:
– The comparison of two orbits takes O(1).
– Constructing an orbit from an element takes O(1).
– Checking whether an element is in an orbit takes O(1).
These assumptions are justified as each of these operations takes time proportional
to the size of the representation of an individual orbit, which in practice is small
and approximately constant. For instance, the orbit 𝒫n (ℚ) is represented by just the
integer n and its type.
Theorem 18. If nominal sets are implemented with a tree-based set structure (as in
Ons), the complexity of the following set operations is as follows. Recall that N(X)
denotes the number of orbits of X. We use p and f to denote functions implemented
in whatever way the user wants, which we assume to take O(1) time. The software
assumes these are equivariant, but this is not verified.
Operation
Test x ∈ X
Test X ⊆ Y
Calculate X ∪ Y
Calculate X ∩ Y
Calculate {x ∈ X | p(x)}
Calculate {f(x) | x ∈ X}
Calculate X × Y

Complexity
O(log N(X))
O(min(N(X) + N(Y), N(X) log N(Y)))
O(N(X) + N(Y))
O(N(X) + N(Y))
O(N(X))
O(N(X) log N(X))
O(N(X × Y)) ⊆ O(3dim(X)+dim(Y) N(X)N(Y))

Table 6.2 Time complexity of operations on nominal sets.
Proof. Since most parts are proven similarly, we only include proofs for the first and
last item.

120

Chapter 6

Membership. To decide x ∈ X, we first construct the orbit containing x, which is
done in constant time. Then we use a logarithmic lookup to decide whether this orbit
is in our set data structure. Hence, membership checking is O(log(N(X))).
Products. Calculating the product of two nominal sets is the most complicated
construction. For each pair of orbits in the original sets X and Y, all product strings
need to be generated. Each product orbit itself is constructed in constant time. By
generating these orbits in-order, the resulting set takes O(N(X × Y)) time to construct.
We can also give an explicit upper bound for the number of orbits in terms of
the input. Recall that orbits in a product are represented by strings of length at
most dim(X) + dim(Y). (If the string is shorter, we pad it with one of the symbols.)
Since there are three symbols (L, R and B), the product of X and Y will have at most
3dim(X)+dim(Y) N(X)N(Y) orbits. It follows that taking products has time complexity
of O(3dim(X)+dim(Y) N(X)N(Y)).
□

4 Results and evaluation in automata theory
In this section we consider applications of nominal sets to automata theory. As mentioned in the introduction, nominal sets are used to formalise languages over infinite
alphabets. These languages naturally arise as the semantics of register automata. The
definition of register automata is not as simple as that of ordinary finite automata.
Consequently, transferring results from automata theory to this setting often requires
non-trivial proofs. Nominal automata, instead, are defined as ordinary automata by
replacing finite sets with orbit-finite nominal sets. The theory of nominal automata
is developed by Bojańczyk, et al. (2014) and it is shown that many algorithms, such
as minimisation (based on the Myhill-Nerode equivalence), from automata theory
transfer to nominal automata. Not all algorithms work: e.g., the subset construction
fails for nominal automata.
As an example we consider the following language on rational numbers:
ℒint = {a1 b1 ⋯an bn | ai , bi ∈ ℚ, ai < ai+1 < bi+1 < bi for all i}.

We call this language the interval language as a word w ∈ ℚ∗ is in the language when it
denotes a sequence of nested intervals. This language contains arbitrarily long words.
For this language it is crucial to work with an infinite alphabet as for each finite set
C ⊂ ℚ, the restriction ℒint ∩ C∗ is just a finite language. Note that the language is
equivariant: w ∈ ℒint ⟺ wg ∈ ℒint for any monotone bijection g, because nested
intervals are preserved by monotone maps.21 Indeed, ℒint is a nominal set, although it
is not orbit-finite.
Informally, the language ℒint can be accepted by the automaton depicted in Figure 6.1. Here we allow the automaton to store rational numbers and compare them to
21

The G-action on words is defined point-wise: g(w1 …wn ) = (gw1 )…(gwn ).

Fast Computations on Ordered Nominal Sets

121

new symbols. For example, the transition from q2 to q3 is taken if any value c between
a and b is read and then the currently stored value a is replaced by c. For any other
value read at state q2 the automaton transitions to the sink state q4 . Such a transition
structure is made precise by the notion of nominal automata.

q0

a

q1 (a)

b>a

a<c<b
a←c
q2 (a, b)

q3 (a, b)
a<c<b
b←c

c≤a

c≥b

c≤a

b≤a
q4

c≥b

a

Figure 6.1

Example automaton that accepts the language ℒint .

Definition 19. A nominal language is an equivariant subset L ⊆ A∗ where A is an
orbit-finite nominal set.
Definition 20. A nominal deterministic finite automaton is a tuple (S, A, F, δ), where S is
an orbit-finite nominal set of states, A is an orbit-finite nominal set of symbols, F ⊆ S
is an equivariant subset of final states, and δ : S × A → S is the equivariant transition
function.
Given a state s ∈ S, we define the usual acceptance condition: a word w ∈ A∗ is
accepted if w denotes a path from s to a final state.
The automaton in Figure 6.1 can be formalised as a nominal deterministic finite automaton as follows. Let S = {q0 , q4 } ∪ {q1 (a) | a ∈ ℚ} ∪ {q2 (a, b) | a < b ∈ ℚ} ∪ {q3 (a, b) | a <
b ∈ ℚ} be the set of states, where the group action is defined as one would expect. The
transition we described earlier can now be formally defined as δ(q2 (a, b), c) = q3 (c, b)
for all a < c < b ∈ ℚ. By defining δ on all states accordingly and defining the final
states as F = {q2 (a, b) | a < b ∈ ℚ}, we obtain a nominal deterministic automaton
(S, ℚ, F, δ). The state q0 accepts the language ℒint .
We implement two algorithms on nominal automata, minimisation and learning,
to benchmark Ons. The performance of Ons is compared to two existing libraries for
computing with nominal sets, Nλ and Lois. The following automata will be used.

Random automata
As a primary test suite, we generate random automata as follows. The input alphabet
is always ℚ and the number of orbits and dimension k of the state space S are fixed.

122

Chapter 6

For each orbit in the set of states, its dimension is chosen uniformly at random between
0 and k, inclusive. Each orbit has a probability 12 of consisting of accepting states.
To generate the transition function δ, we enumerate the orbits of S × ℚ and choose
a target state uniformly from the orbits S with small enough dimension. The bit string
indicating which part of the support is preserved is then sampled uniformly from all
valid strings. We will denote these automata as randN(S),k . The choices made here are
arbitrary and only provide basic automata. We note that the automata are generated
orbit-wise and this may favour our tool.

Structured automata
Besides random automata we wish to test the algorithms on more structured automata.
We define the following automata.
FIFO(𝐧) Automata accepting valid traces of a finite FIFO data structure of size n.
The alphabet is defined by two orbits: {Put(a) | a ∈ ℚ} and {Get(a) | a ∈ ℚ}.
𝐰𝐰(𝐧)

Automata accepting the language of words of the form ww, where w ∈ ℚn .

The language ℒmax = {wa ∈ ℚ∗ | a = max(w1 , …, wn )} where the last symbol
is the maximum of previous symbols.
𝓛max

𝓛int

The language accepting a series of nested intervals, as defined before.

In Table 6.3 we report the number of orbits for each automaton. The first two
classes of automata are described in Chapter 5. These two classes are also equivariant
w.r.t. the equality symmetry.
Extra structure allows the automata to be encoded more efficiently, as we do not
need to encode a transition for each orbit in S × A. Instead, a more symbolic encoding
is possible. Both Lois and Nλ allow to use this more symbolic representation. Our
tool, Ons, only works with nominal sets and the input data needs to be provided
orbit-wise. Where applicable, the automata listed above were generated using the
code from Moerman, et al. (2017), ported to the other libraries as needed.

4.1 Minimising nominal automata
For languages recognised by nominal DFAs, a Myhill-Nerode theorem holds which
relates states to right congruence classes. This guarantees the existence of unique
minimal automata. We say an automaton is minimal if its set of states has the least
number of orbits and each orbit has the smallest dimension possible.22 We generalise
22

Abstractly, an automaton is minimal if it has no proper quotients. Minimal deterministic automata are
unique up to isomorphism.

Fast Computations on Ordered Nominal Sets

123

Moore’s minimisation algorithm to nominal DFAs (Algorithm 6.1) and analyse its
time complexity using the bounds from Section 3.

Require: Nominal automaton M = (S, A, F, δ)
Ensure: Minimal nominal automaton equivalent to M
1
2
3
4
5
6
7
8
9
10
11

i←0
≡−1 ← S × S
≡0 ← F × F ∪ (S\F) × (S\F)
while ≡i ≠ ≡i−1 do
≡i+1 ← {(q1 , q2 ) | (q1 , q2 ) ∈ ≡i ∧ ∀a ∈ A, (δ(q1 , a), δ(q2 , a)) ∈ ≡i }
i←i+1

end while
E ← S/≡i
FE ← {e ∈ E | ∀s ∈ e, s ∈ F}
Let δE be the map such that, if s ∈ e and δ(s, a) ∈ e′ , then δE (e, a) = e′
return (E, A, FE , δE )

Algorithm 6.1

Moore’s minimisation algorithm for nominal DFAs.

Theorem 21. The runtime complexity of Moore’s algorithm on nominal deterministic
automata is O(35k kN(S)3 N(A)), where k = dim(S ∪ A).
Proof. This is shown by counting operations, using the complexity results of set
operations stated in Theorem 18. We first focus on the while loop on lines 4–7. The
runtime of an iteration of the loop is determined by line 5, as this is the most expensive
step. Since the dimensions of S and A are at most k, computing S × S × A takes
O(N(S)2 N(A)35k ). Filtering S × S using that then takes O(N(S)2 32k ). The time to
compute S × S × A dominates, hence each iteration of the loop takes O(N(S)2 N(A)35k ).
Next, we need to count the number of iterations of the loop. Each iteration of the
loop gives rise to a new partition, refining the previous partition. Furthermore, every
partition generated is equivariant. Note that this implies that each refinement of the
partition does at least one of two things: distinguish between two orbits of S previously
in the same element(s) of the partition, or distinguish between two members of the
same orbit previously in the same element of the partition. The former can happen
only N(S) − 1 times, as after that there are no more orbits lumped together. The latter
can only happen dim(S) times per orbit, because each such a distinction between
elements is based on splitting on the value of one of the elements of the support.
Hence, after dim(S) times on a single orbit, all elements of the support are used up.
Combining this, the longest chain of partitions of S has length at most O(kN(S)).
Since each partition generated in the loop is unique, the loop cannot run for more
iterations than the length of the longest chain of partitions on S. It follows that

124

Chapter 6

there are at most O(kN(S)) iterations of the loop, giving the loop a complexity of
O(kN(S)3 N(A)35k )

The remaining operations outside the loop have a lower complexity than that
of the loop, hence the complexity of Moore’s minimisation algorithm for a nominal
automaton is O(kN(S)3 N(A)35k ).
□
The above theorem shows in particular that minimisation of nominal automata is fixedparameter tractable (FPT) with the dimension as fixed parameter. The complexity of
Algorithm 6.1 for nominal automata is very similar to the O((#S)3 # A) bound given
by a naive implementation of Moore’s algorithm for ordinary DFAs. This suggest that
it is possible to further optimise an implementation with similar techniques used for
ordinary automata.

Implementations
We implemented the minimisation algorithm in Ons. For Nλ and Lois we used their
implementations of Moore’s minimisation algorithm (Klin & Szynwelski, 2016 and
Kopczyński & Toruńczyk, 2016 and 2017). For each of the libraries, we wrote routines
to read in an automaton from a file and, for the structured test cases, to generate the
requested automaton. For Ons, all automata were read from file. The output of these
programs was manually checked to see if the minimisation was performed correctly.

Results
The results (shown in Table 6.3) for random automata show a clear advantage for
Ons, which is capable of running all supplied testcases in less than one second. This
in contrast to both Lois and Nλ, which take more than 2 hours on the largest random
automata.
The results for structured automata show a clear effect of the extra structure. Both
Nλ and Lois remain capable of minimising the automata in reasonable amounts of
time for larger sizes. In contrast, Ons benefits little from the extra structure. Despite
this, it remains viable: even for the larger cases it falls behind significantly only for
the largest FIFO automaton and the two largest ww automata.

4.2 Learning nominal automata
Another application that we implemented in Ons is automata learning. The aim of
automata learning is to infer an unknown regular language ℒ. We use the framework
of active learning as set up by Angluin (1987) where a learning algorithm can query
an oracle to gather information about ℒ. Formally, the oracle can answer two types of
queries:

Fast Computations on Ordered Nominal Sets
Type
N(S) N(Smin )
rand5,1 (x10)
5
n/a
rand10,1 (x10)
10
n/a
rand10,2 (x10)
10
n/a
rand15,1 (x10)
15
n/a
rand15,2 (x10)
15
n/a
rand15,3 (x10)
15
n/a
FIFO(2)
13
6
FIFO(3)
65
19
FIFO(4)
440
94
FIFO(5)
3686
635
ww(2)
8
8
ww(3)
24
24
ww(4)
112
112
ww(5)
728
728
ℒmax
5
3
ℒint
5
5

Ons (s) Gen (s)
0.02
n/a
0.03
n/a
0.09
n/a
0.04
n/a
0.11
n/a
0.46
n/a
0.01
0.01
0.38
0.09
39.11
1.60
∞
39.78
0.00
0.00
0.19
0.02
26.44
0.25
∞
6.37
0.00
0.00
0.00
0.00

Nλ (s)
0.82
17.03
2114
87
3346

125

Lois (s)
3.14
92
∞

620

∞

∞
∞

1.37
11.59
76
402
0.14
0.88
3.41
10.54
2.06
1.55

0.24
2.4
14.95
71
0.03
0.16
0.61
1.80
0.03
0.03

Table 6.3 Running times for Algorithm 6.1 implemented in the three libraries. N(S) is the size of the input and N(Smin ) the size of the minimal
automaton. For Ons, the time used to generate the automaton is reported
separately (in grey). Timeouts are indicated by ∞.
– membership queries, where a query consists of a word w ∈ A∗ and the oracle replies
whether w ∈ ℒ, and
– equivalence queries, where a query consists of an automaton ℋ and the oracle replies
positively if ℒ(ℋ) = ℒ or provides a counterexample if ℒ(ℋ) ≠ ℒ.
With these queries, the L∗ algorithm can learn regular languages efficiently (Angluin,
1987). In particular, it learns the unique minimal automaton for ℒ using only finitely
many queries. The L∗ algorithm has been generalised to νL∗ in order to learn nominal
regular languages. In particular, it learns a nominal DFA (over an infinite alphabet)
using only finitely many queries. We implement νL∗ in the presented library and compare it to its previous implementation in Nλ. The algorithm is not polynomial, unlike
the minimisation algorithm described above. However, the authors conjecture that
there is a polynomial algorithm.23 For the correctness, termination, and comparison
with other learning algorithms see Chapter 5.

23

See https://joshuamoerman.nl/papers/2017/17popl-learning-nominal-automata.html for a sketch of the
polynomial algorithm.

126

Chapter 6

Implementations
Both implementations in Nλ and Ons are direct implementations of the pseudocode
for νL∗ with no further optimisations. The authors of Lois implemented νL∗ in their
library as well.24 They reported similar performance as the implementation in Nλ
(private communication). Hence we focus our comparison on Nλ and Ons. We use
the variant of νL∗ where counterexamples are added as columns instead of prefixes.
The implementation in Nλ has the benefit that it can work with different symmetries. Indeed, the structured examples, FIFO and ww, are equivariant w.r.t. the
equality symmetry as well as the total order symmetry. For that reason, we run the
Nλ implementation using both the equality symmetry and the total order symmetry
on those languages. For the languages ℒmax , ℒint and the random automata, we can
only use the total order symmetry.
To run the νL∗ algorithm, we implement an external oracle for the membership
queries. This is akin to the application of learning black box systems (see Vaandrager,
2017). For equivalence queries, we constructed counterexamples by hand. All implementations receive the same counterexamples. We measure CPU time instead of real
time, so that we do not account for the external oracle.

Results
The results (Table 6.4) for random automata show an advantage for Ons. Additionally,
we report the number of membership queries, which can vary for each implementation
as some steps in the algorithm depend on the internal ordering of set data structures.
In contrast to the case of minimisation, the results suggest that Nλ cannot exploit
the logical structure of FIFO(n), ℒmax and ℒint as it is not provided a priori. For ww(2)
we inspected the output on Nλ and saw that it learned some logical structure (e.g.,
it outputs {(a, b) | a ≠ b} as a single object instead of two orbits {(a, b) | a < b} and
{(a, b) | b < a}). This may explain why Nλ is still competitive. For languages which
are equivariant for the equality symmetry, the Nλ implementation using the equality
symmetry can learn with much fewer queries. This is expected as the automata
themselves have fewer orbits. It is interesting to see that these languages can be
learned more efficiently by choosing the right symmetry.

5 Related work
As stated in the introduction, Nλ by Klin and Szynwelski (2016) and Lois by Kopczyński
and Toruńczyk (2016) use first-order formulas to represent nominal sets and use SMT
solvers to manipulate them. This makes both libraries very flexible and they indeed
24

Can be found on https://github.com/eryxcc/lois/blob/master/tests/learning.cpp.

Fast Computations on Ordered Nominal Sets

Model
N(S) dim(S)
rand5,1
4
1
rand5,1
5
1
rand5,1
3
0
rand5,1
5
1
rand5,1
4
1
FIFO(1)
3
1
FIFO(2)
6
2
FIFO(3)
19
3
ww(1)
4
1
ww(2)
8
2
ww(3)
24
3
ℒmax
3
1
ℒint
5
2

Ons
time (s)
MQs
127
2321
0.12
404
0.86
499
∞
n/a
0.08
387
0.04
119
1.73
2655
2794
298400
0.42
134
266
3671
∞
n/a
0.01
54
0.59
478

Nλord
time (s)
MQs
2391
1243
2434
435
1819
422
∞
n/a
2097
387
3.17
119
392
3818
∞
n/a
2.49
77
228
2140
∞
n/a
3.58
54
83
478

127

Nλeq
time (s) MQs

1.76
40.00
2047
1.47
30.58
∞

51
434
8151
30
237
n/a

Table 6.4 Running times and number of membership queries for the νL∗ algorithm.
For Nλ we used two version: Nλord uses the total order symmetry Nλeq uses the
equality symmetry. Timeouts are indicated by ∞.
implement the equality symmetry as well as the total order symmetry. As their representation is not unique, the efficiency depends on how the logical formulas are
constructed. As such, they do not provide complexity results. In contrast, our direct representation allows for complexity results (Section 3) and leads to different
performance characteristics (Section 4).
A second big difference is that both Nλ and Lois implement a “programming paradigm” instead of just a library. This means that they overload natural programming
constructs in their host languages (Haskell and C++ respectively). For programmers
this means they can think of infinite sets without having to know about nominal sets.
It is worth mentioning that an older (unreleased) version of Nλ implemented
nominal sets with orbits instead of SMT solvers (Bojańczyk, et al., 2012). However,
instead of characterising orbits (e.g., by its dimension), they represent orbits by a
representative element. Klin and Szynwelski (2016) reported that the current version
is faster.
The theoretical foundation of our work is the main representation theorem by
Bojańczyk, et al. (2014). We improve on that by instantiating it to the total order
symmetry and distil a concrete representation of nominal sets. As far as we know, we
provide the first implementation of their representation theory.
Another tool using nominal sets is Mihda by Ferrari, et al. (2005). Here, only the
equality symmetry is implemented. This tool implements a translation from π-calculus
to history-dependent automata (HD-automata) with the aim of minimisation and
checking bisimilarity. The implementation in OCaml is based on named sets, which
are finite representations for nominal sets. The theory of named sets is well-studied

128

Chapter 6

and has been used to model various behavioural models with local names. For those
results, the categorical equivalences between named sets, nominal sets and a certain
(pre)sheaf category have been exploited (Ciancia, et al., 2010 and Ciancia & Montanari,
2010). The total order symmetry is not mentioned in their work. We do, however,
believe that similar equivalences between categories can be stated. Interestingly, the
product of named sets is similar to our representation of products of nominal sets:
pairs of elements together with data which denotes the relation between data values.
Fresh OCaml by Shinwell and Pitts (2005) and Nominal Isabelle by Urban and
Tasson (2005) are both specialised in name-binding and α-conversion used in proof
systems. They only use the equality symmetry and do not provide a library for
manipulating nominal sets. Hence they are not suited for our applications.
On the theoretical side, there are many complexity results for register automata
(Grigore & Tzevelekos, 2016 and Murawski, et al., 2015). In particular, we note that
problems such as emptiness and equivalence are NP-hard depending on the type of
register automaton. Recently, Murawski, et al. (2018) showed that equivalence of
unique-valued deterministic register automata can be decided in polynomial time.
These results do not easily compare to our complexity results for minimisation. One
difference is that we use the total order symmetry, where the local symmetries are
always trivial (Lemma 3). As a consequence, all the complexity required to deal with
groups vanishes. Rather, the complexity is transferred to the input of our algorithms,
because automata over the equality symmetry require more orbits when expressed
over the total order symmetry. Another difference is that register automata allow for
duplicate values in the registers. In nominal automata, such configurations will be
encoded in different orbits.
Orthogonal to nominal automata, there is the notion of symbolic automata (D’Antoni & Veanes, 2017 and Maler & Mens, 2017). These automata are also defined
over infinite alphabets but they use predicates on transitions, instead of relying on
symmetries. Symbolic automata are finite state (as opposed to infinite state nominal
automata) and do not allow for storing values. However, they do allow for general
predicates over an infinite alphabet, including comparison to constants.

6 Conclusion and Future Work
We presented a concrete finite representation for nominal sets over the total order
symmetry. This allowed us to implement a library, Ons, and provide complexity
bounds for common operations. The experimental comparison of Ons against existing
solutions for automata minimisation and learning show that our implementation is
much faster in many instances. As such, we believe Ons is a promising implementation
of nominal techniques.
A natural direction for future work is to consider other symmetries, such as the
equality symmetry. Here, we may take inspiration from existing tools such as Mihda

Fast Computations on Ordered Nominal Sets

129

(see Section 5). Another interesting question is whether it is possible to translate a
nominal automaton over the total order symmetry which accepts an equality language
to an automaton over the equality symmetry. This would allow one to efficiently move
between symmetries. Finally, our techniques can potentially be applied to timed
automata by exploiting the intriguing connection between the nominal automata that
we consider and timed automata (Bojańczyk & Lasota, 2012).

Acknowledgement
We would like to thank Szymon Toruńczyk and Eryk Kopczyński for their prompt
help when using the Lois library. For general comments and suggestions we would
like to thank Ugo Montanari and Niels van der Weide. At last, we want to thank the
anonymous reviewers for their comments.

130

Chapter 6

Chapter 7
Separation and Renaming in Nominal Sets
Joshua Moerman
Radboud University

Jurriaan Rot
Radboud University

Abstract
Nominal sets provide a foundation for reasoning about names. They
are used primarily in syntax with binders, but also, e.g., to model automata over infinite alphabets. In this chapter, nominal sets are related
to nominal renaming sets, which involve arbitrary substitutions rather
than permutations, through a categorical adjunction. In particular, the
separated product of nominal sets is related to the Cartesian product
of nominal renaming sets. Based on these results, we define the new
notion of separated nominal automata. These efficiently recognise nominal
languages, provided these languages are renaming sets. In such cases,
moving from the existing notion of nominal automata to separated automata can lead to an exponential reduction of the state space.

This chapter is based on the following submission:
Moerman, J. & Rot, J. (2019). Separation and Renaming in Nominal Sets. (Under submission)

132

Chapter 7

Nominal sets are abstract sets which allow one to reason over sets with names, in terms
of permutations and symmetries. Since their introduction in computer science by
Gabbay and Pitts (1999), they have been widely used for implementing and reasoning
over syntax with binders (see the book of Pitts, 2013). Further, nominal techniques
have been related to computability theory (Bojańczyk, et al., 2013) and automata
theory (Bojańczyk, et al., 2014), where they provide an elegant means of studying
languages over infinite alphabets. This embeds nominal techniques in a broader
setting of symmetry aware computation (Pitts, 2016).
Gabbay, one of the pioneers of nominal techniques described a variation on the
theme: nominal renaming sets (Gabbay, 2007 and Gabbay & Hofmann, 2008). Nominal
renaming sets are equipped with a monoid action of arbitrary (possibly non-injective)
substitution of names, in contrast to nominal sets, which only involve a group action
of permutations.
In this paper, the motivation for using nominal renaming sets comes from automata
theory over infinite alphabets. Certain languages form nominal renaming sets, which
means that they are closed under all possible substitutions on atoms. In order to obtain
efficient automata-theoretic representations of such languages, we systematically
relate nominal renaming sets to nominal sets.
We start by establishing a categorical adjunction in Section 2:
F

Pm-Nom

⊥

Sb-Nom

U

where Pm-Nom is the usual category of nominal sets and Sb-Nom the category of
nominal renaming sets. The right adjoint U simply forgets the action of non-injective
substitutions. The left adjoint F freely extends a nominal set with elements representing the application of such substitutions. For instance, F maps the nominal set 𝔸(∗) of
all words consisting of distinct atoms to the nominal renaming set 𝔸∗ consisting of all
words over the atoms.
In fact, the latter follows from one of the main results of this paper: F maps the
separated product X ∗ Y of nominal sets to the Cartesian product of nominal renaming
sets. Additionally, under certain conditions, U maps the exponent to the magic wand
X −−∗ Y, which is the right adjoint of the separated product. The separated product
consists of those pairs whose elements have disjoint supports. This is relevant for
name abstraction (Pitts, 2013), and has also been studied in the setting of presheaf
categories, aimed towards separation logic (O’Hearn, 2003).
We apply these connections between nominal sets and renaming sets in the context
of automata theory. Nominal automata are an elegant model for recognising languages
over infinite alphabets. They are expressively equivalent to the more classical register
automata (Bojańczyk, 2018, Theorem 6.5), and have appealing properties that register
automata lack, such as unique minimal automata. However, moving from register

Separation and Renaming in Nominal Sets

133

automata to nominal automata can lead to an exponential blow-up in the number of
states.25
As a motivating example, we consider a language modelling an n-bounded FIFO
queue. The input alphabet is given by Σ = {Put(a) | a ∈ 𝔸} ∪ {Pop}, and the output
alphabet by O = 𝔸 ∪ {⊥} (here ⊥ is a null value). The language Ln : Σ∗ → O maps
a sequence of queue operations to the resulting top element when starting from the
empty queue, or to ⊥ if this is undefined. The language Ln can be recognised by a
nominal automaton, but this requires an exponential number of states in n, as the
automaton distinguishes internally between all possible equalities among elements in
the queue.
Based on the observation that Ln is a nominal renaming set, we can come up with
a linear automata-theoretic representation. To this end, we define the new notion of
separated nominal automaton, where the transition function is only defined for pairs
of states and letters with a disjoint support (Section 3). Using the aforementioned
categorical framework, we find that such separated automata recognise languages
which are nominal renaming sets. Although separated nominal automata are not as
expressive as classical nominal automata, they can be much smaller. In particular, in
the FIFO example, the reachable part of the separated automaton obtained from the
original nominal automaton has n + 1 states, thus dramatically reducing the number
of states. We expect that such a reduction is useful in many applications, such as
automata learning (Chapter 5).
In summary, the main contributions of this paper are the adjunction between
nominal sets and nominal renaming sets, the relation between separated product and
the Cartesian product of renaming sets, and the application to automata theory. We
conclude with a coalgebraic account of separated automata in Section 3.1. In particular,
we justify the semantics of separated automata by showing how it arises through a
final coalgebra, obtained by lifting the adjunction to categories of coalgebras. The last
section is orthogonal to the other results, and background knowledge of coalgebra is
needed only there.

1 Monoid actions and nominal sets
In order to capture both the standard notion of nominal sets by Pitts (2013) and sets
with more general renaming actions by Gabbay and Hofmann (2008), we start by
defining monoid actions.
Definition 1. Let (M, ⋅, 1) be a monoid. An M-set is a set X together with a function
⋅ : M × X → X such that 1 ⋅ x = x and m ⋅ (n ⋅ x) = (m ⋅ n) ⋅ x for all m, n ∈ M and
x ∈ X. The function ⋅ is called an M-action and m ⋅ x is often written by juxtaposition
mx. A function f : X → Y between two M-sets is M-equivariant if m ⋅ f(x) = f(m ⋅ x) for
25

Here ‘number of states’ refers to the number of orbits in the state space.

134

Chapter 7

all m ∈ M and x ∈ X. The class of M-sets together with equivariant maps forms a
category M-Set.
Let 𝔸 = {a, b, c, …} be a countable infinite set of atoms. The two main instances of M
considered in this paper are the monoid
Sb = {m : 𝔸 → 𝔸 | m(a) ≠ a for finitely many a}
of all (finite) substitutions (with composition as multiplication), and the monoid
Pm = {g ∈ Sb | g is a bijection}
of all (finite) permutations. Since Pm is a submonoid of Sb, any Sb-set is also a Pm-set;
and any Sb-equivariant map is also Pm-equivariant. This gives rise to a forgetful
functor
U : Sb-Set → Pm-Set.

The set 𝔸 is an Sb-set by defining m ⋅ a = m(a). Given an M-set X, the set 𝒫(X) of
subsets of X is an M-set, with the action defined by direct image.
For a Pm-set X, the orbit of an element x is the set orb(x) = {g ⋅ x | g ∈ Pm}. We say
X is orbit-finite if the set {orb(s) | x ∈ X} is finite.
For any monoid M, the category M-Set is symmetric monoidal closed. The product
of two M-sets is given by the Cartesian product, with the action defined pointwise:
m ⋅ (x, y) = (m ⋅ x, m⋅y). In M-Set, the exponent X →M Y is given by the set {f : M×X →
Y | f is equivariant}.26 The action on such an f : M × X → Y is defined by (m ⋅ f)(n, x) =
f(mn, x). A good introduction to the construction of the exponent is given by Simmons
(n.d.). If M is a group, a simpler description of the exponent may be given, carried by
the set of all functions f : X → Y, with the action given by (g ⋅ f)(x) = g ⋅ f(g−1 ⋅ x).

1.1 Nominal M-sets
The notion of nominal set is usually defined w.r.t. a Pm-action. Here, we use the
generalisation to Sb-actions from Gabbay and Hofmann (2008). Throughout this
section, let M denote a submonoid of Sb.
Definition 2. Let X be an M-set, and x ∈ X an element. A set C ⊂ 𝔸 is an (M-)support
of x if for all m1 , m2 ∈ M s.t. m1 |C = m2 |C we have m1 x = m2 x. An M-set X is called
nominal if every element x has a finite M-support.
Nominal M-sets and equivariant maps form a full subcategory of M-Set, denoted by
M-Nom. The M-set 𝔸 of atoms is nominal. The powerset 𝒫(X) of a nominal set is not
nominal in general; the restriction to finitely supported elements is.
26

If we write a regular arrow →, then we mean a map in the category. Exponent objects will always be
denoted by annotated arrows.

Separation and Renaming in Nominal Sets

135

If M is a group, then the notion of support can be simplified by using inverses.
To see this, first note that, given elements g1 , g2 ∈ M, g1 |C = g2 |C can equivalently
be written as g1 g−1
2 |C = id|C . Second, the statement xg1 = xg2 can be expressed as
−1
xg1 g2 = x. Hence, C is a support iff g|C = idC implies gx = x for all g, which is the
standard definition for nominal sets over a group (Pitts, 2013). Surprisingly, Gabbay
and Hofmann (2008) show a similar characterisation also holds for Sb-sets. Moreover,
recall that every Sb-set is also a Pm-set; the associated notions of support coincide on
nominal Sb-sets, as shown by the following result. In particular, this means that the
forgetful functor restricts to U : Sb-Nom → Pm-Nom.
Lemma 3. (Theorem 4.8 from Gabbay, 2007) Let X be a nominal Sb-set, x ∈ X, and
C ⊂ 𝔸. Then C is an Sb-support of x iff it is a Pm-support of x.
Remark 4. It is not true that any Pm-support is an Sb-support. The condition that
X is nominal, in the above lemma, is crucial. Let X = 𝔸 + 1 and define the following
Sb-action: m ⋅ a = m(a) if m is injective, m ⋅ a = ∗ if m is non-injective, and m ⋅ ∗ = ∗.
This is a well-defined Sb-set, but is not nominal. Now consider U(X), this is the Pm-set
𝔸 + 1 with the natural action, which is a nominal Pm-set! In particular, as a Pm-set
each element has a finite support, but as a Sb-set the supports are infinite.
This counterexample is similar to the “exploding nominal sets” of Gabbay (2007),
but even worse behaved. We like to call them nuclear sets, since an element will collapse
when hit by a non-injective map, no matter how far away the non-injectivity occurs.
For M ∈ {Sb, Pm}, any element x ∈ X of a nominal M-set X has a least finite support
(w.r.t. set inclusion). We denote the least finite support of an element x ∈ X by
supp(x). Note that by Lemma 3, the set supp(x) is independent of whether a nominal
Sb-set X is viewed as an Sb-set or a Pm-set. The dimension of X is given by dim(X) =
max{|supp(x)| | x ∈ X}, where |supp(x)| is the cardinality of supp(x).
We list some basic properties of nominal M-sets, which have known counterparts
for the case that M is a group (Bojańczyk, et al., 2014), and when M = Sb (Gabbay &
Hofmann, 2008).
Lemma 5. Let X be an M-nominal set. If C supports an element x ∈ X, then m ⋅ C
supports m ⋅ x for all m ∈ M. Moreover, any g ∈ Pm preserves least supports: g ⋅
supp(x) = supp(gx).
The latter equality does not hold in general for a monoid M. For instance, the exploding
nominal renaming sets by Gabbay and Hofmann (2008) give counterexamples for
M = Sb.
Lemma 6. Given M-nominal sets X, Y and a map f : X → Y, if f is M-equivariant and
C supports an element x ∈ X, then C supports f(x).

136

Chapter 7

The category M-Nom is symmetric monoidal closed, with the product inherited from
M-Set, thus simply given by Cartesian product. The exponent is given by the restriction of the exponent X →M Y in M-Set to the set of finitely supported functions, denoted by X →M
fs Y. This is similar to the exponents of nominal sets with 01-substitutions
from Pitts (2014).
Remark 7. Gabbay and Hofmann (2008) give a different presentation of the exponent in M-Nom, based on a certain extension of partial functions. We prefer the
previous characterisation, as it is derived in a straightforward way from the exponent
in M-Set.

1.2 Separated product
Definition 8. Two elements x, y ∈ X of a Pm-nominal set are called separated, denoted
by x # y, if there are disjoint sets C1 , C2 ⊂ 𝔸 such that C1 supports x and C2 supports
y. The separated product of Pm-nominal sets X and Y is defined as
X ∗ Y = {(x, y) | x # y}.

We extend the separated product to the separated power, defined by X(0) = 1 and
X(n+1) = X(n) ∗ X, and the set of separated words X(∗) = ⋃i X(i) . The separated product
is an equivariant subset X ∗ Y ⊆ X × Y. Consequently, we have equivariant projection
maps X ∗ Y → X and X ∗ Y → Y.
Example 9. Two finite sets C, D ⊂ 𝔸 are separated precisely when they are disjoint.
An important example is the set 𝔸(∗) of separated words over the atoms: it consists of
those words where all letters are distinct.
The separated product gives rise to another symmetric closed monoidal structure
on Pm-Nom, with 1 as unit, and the exponential object given by magic wand X −−∗ Y.
An explicit characterisation of X −−∗ Y is not needed in the remainder of this chapter,
but for a complete presentation we briefly recall the description from Schöpp (2006)
(see also the book of Pitts, 2013 and the paper of Clouston, 2013). First, define a
Pm-action on the set of partial functions f : X ⇀ Y by (g ⋅ f)(x) = g ⋅ f(g−1 ⋅ x) if f(g−1 ⋅ x)
is defined. Now, such a partial function f : X ⇀ Y is called separating if f is finitely
supported, f(x) is defined iff f # x, and supp(f) = ⋃x∈dom(f) supp(f(x)) ∖ supp(x).
Finally, X −−∗ Y = {f : X ⇀ Y | f is separating}. We refer to the thesis of Schöpp (2006)
(Section 3.3.1) for a proof and explanation.
Remark 10. The special case 𝔸 −−∗ Y coincides with [𝔸]Y, the set of name abstractions
(Pitts, 2013). The latter is generalised to [X]Y by Clouston (2013), but it is shown there
that the coincidence [X]Y ≅ (X −−∗ Y) only holds under strong assumptions (including
that X is single-orbit).

Separation and Renaming in Nominal Sets

137

Remark 11. An analogue of the separated product does not seem to exist for nominal
Sb-sets. For instance, consider the set 𝔸 × 𝔸. As a Pm-set, it has four equivariant
subsets: ∅, Δ(𝔸) = {(a, a) | a ∈ 𝔸}, 𝔸 ∗ 𝔸, and 𝔸 × 𝔸. However, the set 𝔸 ∗ 𝔸 is not
an equivariant subset when considering 𝔸 × 𝔸 as an Sb-set.

2 A monoidal construction from Pm-sets to Sb-sets
In this section, we provide a free construction, extending nominal Pm-sets to nominal
Sb-sets. We use this as a basis to relate the separated product and exponent (in
Pm-Nom) to the product and exponent in Sb-Nom. The main results are:
– the forgetful functor U : Sb-Nom → Pm-Nom has a left adjoint F (Theorem 16);
– this F is monoidal: it maps separated products to products (Theorem 17);
– U maps the exponent object in Sb-Nom to the right adjoint −−∗ of the separated
product, if the domain has dimension ≤ 1 (Theorem 24, Corollary 25).
Together, these results form the categorical infrastructure to relate nominal languages
to separated languages and automata in Section 3.
Definition 12. Given a Pm-nominal set X, we define a nominal Sb-set F(X) as follows.
Define the set
F(X) = {(m, x) | m ∈ Sb, x ∈ X}/∼ ,

where ∼ is the least equivalence relation containing:
(m, gx) ∼ (mg, x),
(m, x) ∼ (m′ , x)

if m|C = m′ |C for a Pm-support C of x,

for all x ∈ X, m, m′ ∈ Sb and g ∈ Pm. The equivalence class of a pair (m, x) is denoted
by [m, x]. We define an Sb-action on F(X) as n ⋅ [m, x] = [nm, x].
Well-definedness is proved as part of Proposition 15 below. Informally, an equivalence
class [m, x] ∈ F(X) behaves “as if m acted on x”. The first equation of ∼ ensures
compatibility with the Pm-action on x, and the second equation ensures that [m, x]
only depends the relevant part of m. The following characterisation of ∼ is useful in
proofs.
Lemma 13. We have (m1 , x1 ) ∼ (m2 , x2 ) iff there is a permutation g ∈ Pm such that
gx1 = x2 and m1 |C = m2 g|C , for C some Pm-support of x1 .
Remark 14. The first relation of ∼ in Definition 12 comes from the construction of
“extension of scalars” in commutative algebra (see Atiyah & MacDonald, 1969). In
that context, one has a ring homomorphism f : A → B and an A-module M and wishes

138

Chapter 7

to obtain a B-module. This is constructed by the tensor product B ⊗A M and it is here
that the relation (b, am) ∼ (ba, m) is used (B is a right A-module via f).
Proposition 15.

The construction F in Definition 12 extends to a functor
F : Pm-Nom → Sb-Nom,

defined on an equivariant map f : X → Y by F(f)([m, x]) = [m, f(x)] ∈ F(Y).
Proof. We first prove well-definedness and then the functoriality.
𝐅(𝐗) is an Sb-set. To this end we check that the Sb-action is well-defined. Let
[m1 , x1 ] = [m2 , x2 ] ∈ F(X) and let m ∈ Sb. By Lemma 13, there is some permutation g such that gx1 = x2 and m1 |C = m2 g|C for some support C of x1 . By postcomposition with m we get mm1 |C = mm2 g|C , which means (again by the lemma)
that [mm1 , x1 ] = [mm2 , x2 ]. Thus m[m1 , x1 ] = m[m2 , x2 ], which concludes well-

definedness.
For associativity and unitality of the Sb-action, we simply note that it is directly
defined by left multiplication of Sb which is associative and unital. This concludes
that F(X) is an Sb-set.
𝐅(𝐗) is a nominal Sb set. Given an element [m, x] ∈ F(X) and a Pm-support C of x,
we will prove that m ⋅ C is an Sb-support for [m, x]. Suppose that we have m1 , m2 ∈ Sb
such that m1 |m⋅C = m2 |m⋅C . By pre-composition with m we get m1 m|C = m2 m|C and
this leads us to conclude [m1 m, x] = [m2 m, x]. So m1 [m, x] = m2 [m, x] as required.

Functoriality. Let f : X → Y be a Pm-equivariant map. To see that F(f) is welldefined consider [m1 , x1 ] = [m2 , x2 ]. By Lemma 13, there is a permutation g such
that gx1 = x2 and m1 |C = m2 g|C for some support C of x1 . Applying F(f) gives on
one hand [m1 , f(x1 )] and on the other hand [m2 , f(x2 )] = [m2 , f(gx1 )] = [m2 , gf(x1 )] =
[m2 g, f(x1 )] (we used equivariance in the second step). Since m1 |C = m2 g|C and f
preserves supports we have [m2 g, f(x1 )] = [m1 , f(x1 )].
For Sb-equivariance we consider both n ⋅ F(f)([m, x]) = n[m, f(x)] = [nm, f(x)] and
F(f)(n ⋅ [m, x]) = F(f)([nm, x]) = [nm, f(x)]. This shows that nF(f)([m, x]) = F(f)(n[m, x])
and concludes that we have a map F(f) : F(X) → F(Y).
The fact that F preserves the identity function and composition follows from the
definition directly.
□
Theorem 16.

The functor F is left adjoint to U:
F

Pm-Nom

⊥
U

Sb-Nom

Separation and Renaming in Nominal Sets

139

Proof. We show that, for every nominal set X, there is a map ηX : X → UF(X) with
the necessary universal property: for every Pm-equivariant f : X → U(Y) there is
a unique Sb-equivariant map f♯ : FX → Y such that U(f♯ ) ∘ ηX = f. Define ηX by
ηX (x) = [id, x]. This is equivariant: g ⋅ ηX (x) = g[id, x] = [g, x] = [id, gx] = ηX (gx).
Now, for f : X → U(Y), define f♯ ([m, x]) = m ⋅ f(x) for x ∈ X and m ∈ Sb. Then
U(f♯ ) ∘ ηX (x) = f♯ ([id, x]) = id ⋅ f(x) = f(x).
To show that f♯ is well-defined, consider [m1 , x1 ] = [m2 , x2 ] (we have to prove
that m1 ⋅ f(x1 ) = m2 ⋅ f(x2 )). By Lemma 13, there is a g ∈ Pm such that gx1 = x2
and m2 g|C = m1 |C for a Pm-support C of x1 . Now C is also a Pm-support for f(x)
and hence it is an Sb-support of f(x) (Lemma 3). We conclude that m2 ⋅ f(x2 ) =
m2 ⋅ f(gx1 ) = m2 g ⋅ f(x1 ) = m1 ⋅ f(x1 ) (we use Pm-equivariance in the one but last step
and Sb-support in the last step). Finally, Sb-equivariance of f♯ and uniqueness are
straightforward calculations.
□
The counit ϵ : FU(Y) → Y is defined by ϵ([m, x]) = m ⋅ x. For the inverse of −♯ , let
g : F(X) → Y be an Sb-equivariant map; then g♭ : X → U(Y) is given by g♭ (x) = g([id, x]).
Note that the unit η is a Pm-equivariant map, hence it preserves supports (i.e., any
support of x also supports [id, x]). This also means that if C is a support of x, then
m ⋅ C is a support of [m, x] (by Lemma 5).

2.1 On (separated) products
The functor F not only preserves coproducts, being a left adjoint, but it also maps the
separated product to products:
Theorem 17. The functor F is strong monoidal, from the monoidal category
(Pm-Set, ∗ , 1) to (Sb-Set, ×, 1). In particular, the map p given by
p = ⟨F(π1 ), F(π2 )⟩ : F(X ∗ Y) → F(X) × F(Y)

is an isomorphism, natural in X and Y.
Proof. We prove that p is an isomorphism. It suffices to show that p is injective and
surjective. Note that p([m, (x, y)]) = ([m, x], [m, y]).
Surjectivity. Let ([m1 , x], [m2 , y]) be an element of F(X) × F(Y). We take an element
y′ ∈ Y such that y′ # supp(x) and y′ = gy for some g ∈ Pm. Now we have an element
(x, y′ ) ∈ X ∗ Y. By Lemma 5, we have supp(y′ ) = supp(y). Define the map
m(x) =

⎧
⎪ m1 (x)
m2 (g−1 (x))
⎨
⎪
⎩x

if x ∈ supp(x)
if x ∈ supp(y′ )
otherwise.

(Observe that supp(x) # supp(y′ ), so the cases are not overlapping.) The map m is an
element of Sb. Now consider the element z = [m, (x, y′ )] ∈ F(X ∗ Y). Applying p to z

140

Chapter 7

gives the element ([m, x], [m, y′ ]). First, we note that [m, x] = [m1 , x] by the definition
of m. Second, we show that [m, y′ ] = [m2 , y]. Observe that mg|supp(y) = m2 |supp(y)
by definition of m. Since supp(y) is a support of y, we have [mg, y] = [m2 , y], and
since [mg, y] = [m, gy] = [m, y′ ] we are done. Hence p([m, (x, y′ )]) = ([m, x], [m, y′ ]) =
([m1 , x], [m2 , y]), so p is surjective.
Injectivity. Let [m1 , (x1 , y1 )] and [m2 , (x2 , y2 )] be two elements. Suppose that they
are mapped to the same element, i.e., [m1 , x1 ] = [m2 , x2 ] and [m1 , y1 ] = [m2 , y2 ]. Then
there are permutations gx , gy such that x2 = gx x1 and y2 = gy y1 . Moreover, let
C = supp(x1 ) and D = supp(y1 ); then we have m1 |C = m2 gx |C and m1 |D = m2 gy |D .
In order to show the two original elements are equal, we have to provide a single
permutation g. Define for, z ∈ C ∪ D,
g0 (z) =

gx (z)
{ gy (z)

if z ∈ C
if z ∈ D.

(Again, C and D are disjoint.) The function g0 is injective since the least supports of x2
and y2 are disjoint. Hence g0 defines a local isomorphism from C ∪ D to g0 (C ∪ D). By
homogeneity (Pitts, 2013), the map g0 extends to a permutation g ∈ Pm with g(z) =
gx (z) for z ∈ C and g(z) = gy (z) for z ∈ D. In particular we get (x2 , y2 ) = g(x1 , y1 ). We
also obtain m1 |C∪D = m2 g|C∪D . This proves that [m1 , (x1 , y1 )] = [m2 , (x2 , y2 )], and so
the map p is injective.
Unit and coherence. To show that F preserves the unit, we note that [m, 1] = [m′ , 1]
for every m, m′ ∈ Sb, as the empty set supports 1 and so m|∅ = m′ |∅ vacuously holds.
We conclude F(1) is a singleton. By the definition p([m, (x, y)]) = ([m, x], [m, y]), one
can check the coherence axioms elementary.
□
Since F also preserves coproducts (being a left adjoint), we obtain that F maps the set
of separated words to the set of all words.
Corollary 18.

For any Pm-nominal set X, we have F(X(∗) ) ≅ (FX)∗ .

As we will show below, the functor F preserves the set 𝔸 of atoms. This is an instance
of a more general result about preservation of one-dimensional objects.
Lemma 19. The functors F and U are equivalences on ≤ 1-dimensional objects. Concretely, for X ∈ Pm-Nom and Y ∈ Sb-Nom:
– If dim(X) ≤ 1, then the unit η : X → UF(X) is an isomorphism.
– If dim(Y) ≤ 1, then the co-unit ϵ : FU(X) → X is an isomorphism.
Before we prove this lemma, we need the following technical property of ≤ 1-dimensional
Sb-sets.

Separation and Renaming in Nominal Sets

141

Lemma 20. Let Y be a nominal Sb-set. If an element y ∈ Y is supported by a singleton
set (or even the empty set), then
{my | m ∈ Sb} = {gy | g ∈ Pm}.

Proof. Let y ∈ Y be supported by {a} and let m ∈ Sb. Now consider b = m(a) and the
bijection g = (a b). Now m|{a} = g|{a} , meaning that my = gy. So the set {my | m ∈ Sb}
is contained in {gy | g ∈ Pm}. The inclusion the other way is trivial, which means
{my | m ∈ Sb} = {gy | g ∈ Pm}.
□
Proof of Lemma 19. It is easy to see that η : x ↦ [id, x] is injective. Now to see that
η is surjective, let [m, x] ∈ UF(X) and consider a support {a} of x (this is a singleton
or empty since dim(X) ≤ 1). Let b = m(a) and consider the swap g = (a b). Now
[m, x] = [mg−1 , gx] and note that {b} supports gx and mg−1 |{b} = id|{b} . We continue
with [mg−1 , gx] = [id, gx], which concludes that gx is the preimage of [m, x]. Hence η
is an isomorphism.
To see that ϵ : [m, y] ↦ my is surjective, just consider m = id. To see that ϵ is
injective, let [m, y], [m′ , y′ ] ∈ FU(Y) be two elements such that my = m′ y′ . Then
by using Lemma 20 we find g, g′ ∈ Pm such that gy = my = m′ y′ = g′ y′ . This
means that y and y′ are in the same orbit (of U(Y)) and have the same dimension.
Case 1: supp(y) = supp(y′ ) = ∅, then [m, y] = [id, y] = [id, y′ ] = [m′ , y′ ]. Case 2:
supp(y) = {a} and supp(y′ ) = {b}, then supp(gy) = {g(a)} (Lemma 5). In particular
we now now that m and g map a to c = g(a), likewise m′ and g′ map b to c. Now
[m, y] = [m, g−1 g′ y′ ] = [mg−1 g′ , y′ ] = [m′ , y′ ], where we used mg−1 g(b) = c = m′ (b)
in the last step. This means that ϵ is injective and hence an isomorphism.
□
By Lemma 19, we may consider the set 𝔸 as both Sb-set and Pm-set (abusing notation).
And we get an isomorphism F(𝔸) ≅ 𝔸 of nominal Sb-sets. To appreciate the above
results, we give a concrete characterisation of one-dimensional nominal sets:
Lemma 21. Let X be a nominal M-set, for M ∈ {Sb, Pm}. Then dim(X) ≤ 1 iff there
exist (discrete) sets Y and I such that X ≅ Y + ∐I 𝔸.
In particular, the one-dimensional objects include the alphabets used for data words,
consisting of a product S × 𝔸 of a discrete set S of action labels and the set of atoms.
These alphabets are very common in the study of register automata (see, e.g., Isberner,
et al., 2014).
By the above and Theorem 17, F maps separated powers of 𝔸 to powers, and the
set of separated words over 𝔸 to the Sb-set of words over 𝔸.
Corollary 22.

We have F(𝔸(n) ) ≅ 𝔸n and F(𝔸(∗) ) ≅ 𝔸∗ .

142

Chapter 7

2.2 On exponents
We have described how F and U interact with (separated) products. In this section, we
establish a relationship between the magic wand (−−∗) and the exponent of nominal
Sb-sets (→Sb
fs ).
Definition 23. Let X ∈ Pm-Nom and Y ∈ Sb-Nom. We define a Pm-equivariant map
ϕ as follows:
ϕ : (X −−∗ U(Y)) → U(F(X) →Sb
fs Y)

is defined by using the composition
−1

p
F(ev)
ϵ
F(X −−∗ U(Y)) × F(X) ⟶
F((X −−∗ U(Y)) ∗ X) ⟶
FU(Y) ⟶
Y,

where p−1 is from Theorem 17 and ev is the evaluation map of the exponent −−∗. By
Currying and the adjunction we arrive at ϕ:
F(X −−∗ U(Y)) × F(X) → Y
F(X −−∗ U(Y)) → (F(X) →Sb
fs Y)

ϕ : (X −−∗ U(Y)) → U(F(X) →Sb
fs Y)

by Currying
by Theorem 16

With this map we can prove a generalisation of Theorem 16. In particular, the following
theorem generalises the one-to-one correspondence between maps X → U(Y) and
maps F(X) → Y. First, it shows that this correspondence is Pm-equivariant. Second, it
extends the correspondence to all finitely supported maps and not just the equivariant
ones.
Theorem 24. The sets X −−∗ U(Y) and U(F(X) →Sb
fs Y) are naturally isomorphic via ϕ
as nominal Pm-sets.
Proof. We define some additional maps in order to construct the inverse of ϕ. First,
from Theorem 16 we get the following isomorphism:
=
q : U(X × Y) ⟶
U(X) × U(Y)

Second, with this map and Currying, we obtain the following two natural maps:
q
U(ev)
Sb
U(F(X) →Sb
fs Y) × UF(X) ⟶ U((F(X) →fs Y) × F(X)) ⟶ U(Y)
−1

Pm
α : U(F(X) →Sb
fs Y) → (UF(X) →fs U(Y))

id×η
ev
Pm
(UF(X) →Pm
fs U(Y)) × X ⟶ (UF(X) →fs U(Y)) × UF(X) ⟶ U(Y)

Pm
β : (UF(X) →Pm
fs U(Y)) → (X →fs U(Y))

by Currying

by Currying

Separation and Renaming in Nominal Sets

143

Last, we note that the inclusion A ∗ B ⊆ A × B induces a restriction map r : (B →Pm
fs
C) → (B −−∗ C) (again by Currying). A calculation shows that r ∘ β ∘ α is the inverse
of ϕ.
□
Note that this theorem gives an alternative characterisation of the magic wand in terms
of the exponent in Sb-Nom, if the codomain is U(Y). Moreover, for a 1-dimensional
object X in Sb-Nom, we obtain the following special case of the theorem (using the
co-unit isomorphism from Lemma 19):
Let X, Y be nominal Sb-sets. For 1-dimensional X, the nominal Pm-set
U(X) −−∗ U(Y) is naturally isomorphic to U(X →Sb
fs Y).
Corollary 25.

Remark 26. The set 𝔸 −−∗ U(X) coincides with the atom abstraction [𝔸]UX (Remark 10). Hence, as a special case of Corollary 25, we recover Theorem 34 of Gabbay
and Hofmann (2008), which states a bijective correspondence between [𝔸]UX and
U(𝔸 →Sb
fs X).

3 Nominal and separated automata
In this section, we study nominal automata, which recognise languages over infinite
alphabets. After recalling the basic definitions, we introduce a new variant of automata
based on the separating product, which we call separated nominal automata. These
automata represent nominal languages which are Sb-equivariant, essentially meaning
they are closed under substitution. Our main result is that, if a ‘classical’ nominal
automaton (over Pm) represents a language L which is Sb-equivariant, then L can also
be represented by a separated nominal automaton. The latter can be exponentially
smaller (in number of orbits) than the original automaton, as we show in a concrete
example.
Remark 27. We will work with a general output set O instead of just acceptance.
The reason for this is that Sb-equivariant functions L : 𝔸 → 2 are not very interesting:
they are defined purely by the length of the input. By using more general output O,
we may still capture interesting behaviour, e.g., the automaton in Example 29.
Definition 28. Let Σ, O be Pm-sets, called input/output alphabet respectively.
– A (Pm)-nominal language is an equivariant map of the form L : Σ∗ → O.
– A nominal (Moore) automaton 𝒜 = (Q, δ, o, q0 ) consists of a nominal set of states Q,
an equivariant transition function δ : Q × Σ → Q, an equivariant output function
o : Q → O, and an initial state q0 ∈ Q with an empty support.
– The language semantics is the map l : Q × Σ∗ → O, defined inductively by
l(x, ε) = o(x) ,

l(x, aw) = l(δ(x, a), w)

144

Chapter 7

for all x ∈ Q, a ∈ Σ and w ∈ Σ∗ .
♭
∗
– For l♭ : Q → (Σ∗ →Pm
fs O) the transpose of l, we have that l (q0 ) : Σ → O is equivariant; this is called the language accepted by 𝒜.
Note that the language accepted by an automaton can equivalently be characterised
by considering paths through the automaton from the initial state.
If the state space Q and the alphabets Σ, O are orbit finite, this allows us to run
algorithms (reachability, minimization, etc.) on such automata, but there is no need to
assume this for now. For an automaton 𝒜 = (Q, δ, o, q0 ), we define the set of reachable
states as the least set R(𝒜) ⊆ Q such that q0 ∈ R(𝒜) and for all x ∈ R(𝒜) and a ∈ Σ,
δ(x, a) ∈ R(𝒜).
Example 29. We model a bounded FIFO queue of size n as a nominal Moore automaton, explicitly handling the data in the automaton structure.27 The input alphabet Σ
and output alphabet O are as follows:
Σ = {Put(a) | a ∈ 𝔸} ∪ {Pop},

O = 𝔸 ∪ {⊥}.

The input alphabet encodes two actions: putting a new value on the queue and
popping a value. The output is either a value (the front of the queue) or ⊥ if the
queue is empty. A queue of size n is modelled by the automaton (Q, δ, o, q0 ) defined
as follows.
Q = 𝔸≤n ∪ {⊥},

δ(a1 …ak , Put(b)) =

q0 = ϵ,

a1 …ak b
{⊥

o(a1 …ak ) =

if k < n
otherwise

a1
{⊥

if k ≥ 1
otherwise

δ(a1 …ak , Pop) =

a2 …ak
{⊥

if k > 0
otherwise

δ(⊥, x) = ⊥

The automaton is depicted in Figure 7.1 for the case n = 3. The language accepted
by this automaton assigns to a word w the first element of the queue after executing
the instructions in w from left to right, and ⊥ if the input is ill-behaved, i.e., Pop is
applied to an empty queue or Put(a) to a full queue.
Definition 30. Let Σ, O be Pm-sets. A separated language is an equivariant map of the
form Σ(∗) → O. A separated automaton 𝒜 = (Q, δ, o, q0 ) consists of Q, o and q0 defined
as in a nominal automaton, and an equivariant transition function δ : Q ∗ Σ → Q.
The separated language semantics of such an automaton is given by the map s : Q ∗
(∗)
Σ → O, defined by
s(x, ϵ) = o(x),
27

s(x, aw) = s(δ(x, a), w)

We use a reactive version of the queue data structure which is slightly different from the versions of Isberner, et al. (2014) and Moerman, et al. (2017).

Separation and Renaming in Nominal Sets

ϵ
o=⊥

Pop
⊥
o=⊥

Pop
Put(a)

a
o=a

Pop
goes to b
Put(b)

ab
o=a

Pop
goes to bc
Put(c)

145

abc
o=a

Put(d)

Σ

Figure 7.1 The FIFO automaton from Example 29 with n = 3. The right-most state
consists of five orbits as we can take a, b, c distinct, all the same, or two of them equal
in three different ways. Consequently, the complete state space has ten orbits. The
output of each state is denoted in the lower part.
for all x ∈ Q, a ∈ Σ and w ∈ Σ(∗) such that x # aw and a # w.
Let s♭ : Q → (Σ(∗) −−∗ O) be the transpose of s. Then s♭ (q0 ) : Σ(∗) → O corresponds
to a separated language, this is called the separated language accepted by 𝒜.
By definition of the separated product, the transition function is only defined on a
state x and letter a ∈ Σ if x # a. In Example 36 below, we describe the bounded FIFO
as a separated automaton, and describe its accepted language.
First, we show how the language semantics of separated nominal automata extends
to a language over all words, provided that both the input alphabet Σ and the output
alphabet O are Sb-sets.
Definition 31. Let Σ and O be nominal Sb-sets. An Sb-equivariant function L : Σ∗ →
O is called an Sb-language.
Notice the difference between an Sb-language L : Σ∗ → O and a Pm-language L′ : (UΣ)∗ →
U(O). They are both functions from Σ∗ to O, but the latter is only Pm-equivariant,
while the former satisfies the stronger property of Sb-equivariance. Languages over
separated words, and Sb-languages, are connected as follows.
Theorem 32. Suppose Σ, O are both nominal Sb-sets, and suppose dim(Σ) ≤ 1. There
is a one-to-one correspondence
S : (UΣ)(∗) → UO Pm-equivariant
S : Σ∗ → O Sb-equivariant

between separated languages and Sb-nominal languages. From S to S, this is given by
application of the forgetful functor and restricting to the subset of separated words.

146

Chapter 7

For the converse direction, given w = a1 …an ∈ Σ∗ , let b1 , …, bn ∈ Σ such that
w # bi for all i, and bi # bj for all i, j with i ≠ j. Define m ∈ Sb by
m(a) =

ai
{a

if a = bi for some i
otherwise

Then S(a1 a2 a3 ⋯an ) = m ⋅ S(b1 b2 b3 ⋯bn ).
Proof. There is the following chain of one-to-one correspondences, from the results of
the previous section:
(UΣ)(∗) → UO
F(UΣ)(∗) → O
(FUΣ)∗ → O
Σ∗ → O

by Theorem 16
by Corollary 18
by Lemma 19

□

Thus, every separated automaton over U(Σ), U(O) gives rise to an Sb-language S,
corresponding to the language S accepted by the automaton.
Any nominal automaton 𝒜 restricts to a separated automaton, formally described
in Definition 33. It turns out that if the (Pm)-language accepted by 𝒜 is actually an
Sb-language, then the restricted automaton already represents this language, as the
extension S of the associated separated language S (Theorem 34). Hence, in such a
case, the restricted separated automaton suffices to describe the language of 𝒜.
Definition 33. Let i : Q ∗ U(Σ) → Q × U(Σ) be the natural inclusion map. A nominal
automaton 𝒜 = (Q, δ, o, q0 ) induces a separated automaton 𝒜∗ , by setting
𝒜∗ = (Q, δ ∘ i, o, q0 ).

Theorem 34. Suppose Σ, O are both Sb-sets, and suppose dim(Σ) ≤ 1. Let L : (UΣ)∗ →
UO be the Pm-nominal language accepted by a nominal automaton 𝒜, and suppose L
is Sb-equivariant. Let S be the separated language accepted by 𝒜∗ . Then L = U(S).
Proof. If follows from the one-to-one correspondence in Theorem 32: on the bottom
there are two languages (L and U(S)), while there is only the restriction of L on the
top. We conclude that L = U(S).
□
As we will see in Example 36, separated automata allow us to represent Sb-languages
in a much smaller way than nominal automata. Given a nominal automaton 𝒜, a
smaller separated automaton can be obtained by computing the reachable part of the
restriction 𝒜∗ . The reachable part is defined similarly (but only where δ is defined)
and denoted by R(𝒜∗ ) as well.

Separation and Renaming in Nominal Sets
Proposition 35.

147

For any nominal automaton 𝒜, we have R(𝒜∗ ) ⊆ R(𝒜).

The converse inclusion of the above proposition does certainly not hold, as shown by
the following example.
Example 36. Let 𝒜 be the automaton modelling a bounded FIFO queue (for some
n), from Example 29. The Pm-nominal language L accepted by 𝒜 is Sb-equivariant: it
is closed under application of arbitrary substitutions.
The separated automaton 𝒜∗ is given simply by restricting the transition function
to Q ∗ Σ, i.e., a Put(a)-transition from a state w ∈ Q exists only if a does not occur
in w. The separated language S accepted by this new automaton is the restriction of
the nominal language of 𝒜 to separated words. By Theorem 34, we have L = U(S).
Hence, the separated automaton 𝒜∗ represents L, essentially by closing the associated
separated language S under all substitutions.
The reachable part of 𝒜∗ is given by
R𝒜∗ = 𝔸(≤n) ∪ {⊥}.

Clearly, restricting 𝒜∗ to the reachable part does not affect the accepted language.
However, while the orginal state space Q has exponentially many orbits in n, R𝒜∗ has
only n + 1 orbits! Thus, taking the reachable part of R𝒜∗ yields a separated automaton which represents the FIFO language L in a much smaller way than the original
automaton.

3.1 Separated automata: coalgebraic perspective
Nominal automata and separated automata can be presented as coalgebras on the
category of Pm-nominal sets. In this section we revisit the above results from this perspective, and generalise from (equivariant) languages to finitely supported languages.
In particular, we retrieve the extension from separated languages to Sb-languages, by
establishing Sb-languages as a final separated automaton. The latter result follows by
instantiating a well-known technique for lifting adjunctions to categories of coalgebras,
using the results of Section 2. In the remainder of this section we assume familiarity
with the theory of coalgebras, see, e.g., Jacobs (2016) and Rutten (2000).
Definition 37. Let M be a submonoid of Sb, and let Σ, O be nominal M-sets, referred
to as the input and output alphabet respectively. We define the functor BM : M-Nom →
M-Nom by BM (X) = O×(Σ →M
fs X). An (M)-nominal (Moore) automaton is a BM -coalgebra.
A BM -coalgebra can be presented as a nominal set Q together with the pairing
⟨o, δ♭ ⟩ : Q → O × (Σ →M
fs Q)

of an equivariant output function o : Q → O, and (the transpose of) an equivariant
transition function δ : Q × Σ → Q. In case M = Pm, this coincides with the automata

148

Chapter 7

of Definition 28, omitting initial states. The language semantics is generalised accordingly, as follows. Given such a BM -coalgebra (Q, ⟨o, δ♭ ⟩), the language semantics
l : Q × Σ∗ → O is given by
l(x, ε) = o(x) ,

l(x, aw) = l(δ(x, a), w)

for all x ∈ S, a ∈ Σ and w ∈ Σ∗ .
Theorem 38. Let M be a submonoid of Sb, let Σ, O be nominal M-sets. The nom∗
M
inal M-set Σ∗ →M
fs O extends to a final BM -coalgebra (Σ →fs O, ζ), such that the
unique homomorphism from a given BM -coalgebra is the transpose l♭ of the language
semantics.
A separated automaton (Definition 30, without initial states) corresponds to a coalgebra
for the functor B∗ : Pm-Nom → Pm-Nom given by B∗ (X) = O×(Σ −−∗ X). The separated
language semantics arises by finality.
Theorem 39. The set Σ(∗) −−∗ O is the carrier of a final B∗ -coalgebra, such that the
unique coalgebra homomorphism from a given B∗ -coalgebra (Q, ⟨o, δ⟩) is the transpose
s♭ of the separated language semantics s : Q ∗ Σ(∗) → O (Definition 30).
Next, we provide an alternative final B∗ -coalgebra which assigns Sb-nominal languages to states of separated nominal automata. The essence is to obtain a final
B∗ -coalgebra from the final BSb -coalgebra. In order to prove this, we use a technique
to lift adjunctions to categories of coalgebras. This technique occurs regularly in the
coalgebraic study of automata (Jacobs, et al., 2015; Kerstan, et al., 2014; Klin & Rot,
2016).
Theorem 40. Let Σ be a Pm-set, and O an Sb-set. Define B∗ and BSb accordingly, as
B∗ (X) = UO × (Σ −−∗ X) and BSb (X) = O × (FΣ →Sb
fs X).
There is an adjunction F ⊣ U in:
F

CoAlg(B∗ )

⊥

CoAlg(BSb )

U

where F and U coincide with F and U respectively on carriers.
Proof. There is a natural isomorphism λ : B∗ U → UBSb given by
id × ϕ

≅

λ : UO × (Σ −−∗ UX) −−→ UO × U(FΣ →Sb
→ U(O × (FΣ →Sb
fs X) −
fs X)),

where ϕ is the isomorphism from Theorem 24 and the isomorphism on the right
comes from U being a right adjoint. The result now follows from Theorem 2.14 of
Hermida and Jacobs (1998). In particular, U(X, γ) = (UX, λ−1 ∘ U(γ)).
□

Separation and Renaming in Nominal Sets

149

Since right adjoints preserve limits, and final objects in particular, we obtain the
following. This gives an Sb-semantics of separated automata through finality.
Corollary 41. Let ((FΣ)∗ →Sb
fs O, ζ) be the final BSb -coalgebra (Theorem 38). The
∗
Sb
B∗ -coalgebra U(Σ →fs O, ζ) is final and carried by the set (FΣ)∗ →Sb
fs O of Sb-nominal
languages.

4 Related and future work
Fiore and Turi (2001) described a similar adjunction between certain presheaf categories. However, Staton (2007) describes in his thesis that the usage of presheaves
allows for many degenerate models and one should look at sheaves instead. The
category of sheaves is equivalent to the category of nominal sets. Staton transfers the
adjunction of Fiore and Turi to the sheaf categories. We conjecture that the adjunction
presented in this paper is equivalent, but defined in more elementary means. The
monoidal property of F, which is crucial for our application in automata, has not been
discussed before.
An interesting line of research is the generalisation to other symmetries by Bojańczyk,
et al. (2014). In particular, the total order symmetry is relevant, since it allows one to
compare elements on their order, as often used in data words. In this case the symmetries are given by the group of all monotone bijections. Many results of nominal
sets generalise to this symmetry. For monotone substitutions, however, the situation
seems more subtle. For example, we note that a substitution which maps two values
to the same value actually maps all the values in between to that value. Whether the
adjunction from Theorem 16 generalises to other symmetries is left as future work.
This research was motivated by learning nominal automata. If we know a nominal
automaton recognises an Sb-language, then we are better off learning a separated
automaton directly. From the Sb-semantics of separated automata, it follows that we
have a Myhill-Nerode theorem, which means that learning is feasible. We expect that
this can be useful, since we can achieve an exponential reduction this way.
Bojańczyk, et al. (2014) prove that nominal automata are equivalent to register
automata in terms of expressiveness. However, when translating from register automata with n states to nominal automata, we may get exponentially many orbits.
This happens for instance in the FIFO automaton (Example 29). We have shown that
the exponential blow-up is avoidable by using separated automata, for this example
and in general for Sb-equivariant languages. An open problem is whether the latter
requirement can be relaxed, by adding separated transitions only locally in a nominal
automaton.
A possible step in this direction is to consider the monad T = UF on Pm-Nom
and incorporate it in the automaton model. We believe that this is the hypothesised
“substitution monad” from Chapter 5. The monad is monoidal (sending separated
products to Cartesian products) and if X is an orbit-finite nominal set, then so is T (X).

150

Chapter 7

This means that we can consider nominal T-automata and we can perhaps determinise
them using coalgebraic methods (Silva, et al., 2013).

Acknowledgements
We would like to thank Gerco van Heerdt for his useful comments.


Curriculum Vitae
Joshua Moerman was born in 1991 in Utrecht, the Netherlands. After graduating gymnasium at the Christiaan Huygens College in Eindhoven, 2009, he followed a double
bachelor programme in mathematics and computer science at the Radboud University in
Nijmegen. In 2013, he obtained both bachelors summa cum laude and continued with a
master in mathematics. He obtained the degree of Master of Science in Mathematics
summa cum laude in 2015, with a specialisation in algebra and topology.
In February 2015, he started his Ph.D. research under supervision of Frits Vaandrager, Sebastiaan Terwijn, and Alexandra Silva. This was a joint project between the
computer science institute (iCIS) and the mathematics departement (part of IMAPP)
of the Radboud University. During the four years of his Ph.D. research, he spent a
total of six months at the University College London, UK.
As of April 2019, Joshua works as a postdoctoral researcher in the group of JoostPieter Katoen at the RWTH Aachen, Germany.

170