15177 lines
580 KiB
Text
15177 lines
580 KiB
Text
A (co)algebraic theory of succinct automata✩
|
||
Gerco van Heerdta , Joshua Moermana,b , Matteo Sammartinoa , Alexandra
|
||
Silvaa
|
||
a University
|
||
|
||
College London
|
||
University
|
||
|
||
arXiv:1905.05519v1 [cs.FL] 14 May 2019
|
||
|
||
b Radboud
|
||
|
||
Abstract
|
||
The classical subset construction for non-deterministic automata can be generalized to other side-effects captured by a monad. The key insight is that both the
|
||
state space of the determinized automaton and its semantics—languages over
|
||
an alphabet—have a common algebraic structure: they are Eilenberg-Moore algebras for the powerset monad. In this paper we study the reverse question to
|
||
determinization. We will present a construction to associate succinct automata
|
||
to languages based on different algebraic structures. For instance, for classical
|
||
regular languages the construction will transform a deterministic automaton
|
||
into a non-deterministic one, where the states represent the join-irreducibles of
|
||
the language accepted by a (potentially) larger deterministic automaton. Other
|
||
examples will yield alternating automata, automata with symmetries, CABAstructured automata, and weighted automata.
|
||
|
||
1. Introduction
|
||
Non-deterministic automata are often used to provide compact representations of regular languages. Take, for instance, the language
|
||
L = {w ∈ {a, b}∗ | |w| > 2 and the 3rd symbol from the right is an a}.
|
||
There is a simple non-deterministic automaton accepting it (below, top automaton) and it is not very difficult to see that the smallest deterministic automaton
|
||
(below, bottom automaton) will have 8 states.
|
||
a, b
|
||
a, b
|
||
a
|
||
s1
|
||
s2
|
||
s3
|
||
s4
|
||
a, b
|
||
|
||
✩ This work was partially supported by ERC starting grant ProFoundNet (679127) and a
|
||
Leverhulme Prize (PLP-2016-129).
|
||
|
||
Preprint submitted to Elsevier
|
||
|
||
May 15, 2019
|
||
|
||
a
|
||
a
|
||
|
||
1
|
||
|
||
b b
|
||
|
||
a
|
||
|
||
b
|
||
b
|
||
|
||
14
|
||
|
||
a
|
||
|
||
12
|
||
|
||
b
|
||
a
|
||
|
||
13
|
||
|
||
123
|
||
|
||
a
|
||
|
||
1234
|
||
|
||
b
|
||
|
||
a
|
||
124
|
||
|
||
b
|
||
|
||
a
|
||
|
||
134
|
||
|
||
b
|
||
The labels we chose for the states of the deterministic automaton are not
|
||
coincidental—they represent the subsets of states of the non-deterministic automaton that would be obtained when constructing a deterministic one using
|
||
the classical subset construction.
|
||
The question we want to study in this paper has as starting point precisely
|
||
the observation that non-deterministic automata provide compact representations of languages and hence are more amenable to be used in algorithms and
|
||
promote scalability. In fact, the origin of our study goes back to our own work
|
||
on automata learning [15], where we encountered large nominal automata that,
|
||
in order for the algorithm to work for more realistic examples, had to be represented non-deterministically. In other recent work [7, 3], different forms of nondeterminism are used to learn compact representations of regular languages.
|
||
This left us wondering whether other side-effects could be used to overcome
|
||
scalability issues.
|
||
Moggi [16] introduced the idea that monads could be used a general abstraction for side-effects. A monad is a triple (T, η, µ) in which T is an endofunctor
|
||
over a category whose objects can be thought of as capturing pure computations.
|
||
The monad is equipped with a unit η : X → T X, a natural transformation that
|
||
enables embedding any pure computation into an effectful one, and a multiplication µ : T T X → T X that allows flattening nested effectful computations.
|
||
Examples of monads capturing side-effects include powerset (non-determinism)
|
||
and distributions (randomness).
|
||
Monads have been used extensively in programming language semantics (see
|
||
e.g. [22] and references therein). More recently, they were used in categorical
|
||
studies of automata theory [6]. One example of a construction in which they
|
||
play a key role is a generalization of the classical subset construction to a class
|
||
of automata [21, 20], which we will describe next.
|
||
The classical subset construction, connecting non-deterministic and deterministic automata, can be described concisely by the following diagram.
|
||
X
|
||
|
||
{−}
|
||
|
||
δ
|
||
|
||
2 × P(X)A
|
||
|
||
P(X)
|
||
|
||
l
|
||
|
||
2A
|
||
|
||
∗
|
||
|
||
<ǫ?,∂>
|
||
δ♯
|
||
id×l
|
||
|
||
A
|
||
|
||
∗
|
||
|
||
2 × (2A )A
|
||
|
||
We omit initial states and represent a non-deterministic automaton as a pair
|
||
(X, δ) where X is the state space and δ : X → 2 × P(X) is the transition
|
||
function which has in the first component the (non-)final state classifier. The
|
||
language semantics of a non-deterministic automaton (X, δ) is obtained by first
|
||
constructing a deterministic automaton (P(X), δ ♯ ) which has a larger state space
|
||
2
|
||
|
||
consisting of subsets of the original state space and then computing the accepted
|
||
language of the determinized automaton. The language map l associating the accepted language to a state is a universal map: for every deterministic automaton
|
||
(Q, Q → 2 × QA ) the map l is the unique map into the automaton of languages
|
||
∗
|
||
|
||
∗
|
||
|
||
<ǫ?,∂>
|
||
|
||
∗
|
||
|
||
(2A , 2A −−−−−→ 2 × (2A )A ).
|
||
The universal property of the automaton of languages inspired the development of a categorical generalization of automata theory, including of the subset
|
||
construction which we detail below. In particular, we can consider general aut
|
||
tomata as pairs (X, X −
|
||
→ F X) where the transition dynamics t is parametric on
|
||
a functor F . Such pairs are usually called coalgebras for the functor F [18]. For
|
||
a wide class of functors F , the category of coalgebras has a final object (Ω, ω),
|
||
the so-called final coalgebra, which plays the analogue role to languages.
|
||
The classical subset construction was generalized in previous work [21] by
|
||
replacing deterministic automata with coalgebras for a functor F and the powerset monad with a suitable monad T . As above, it can be summarized in a
|
||
diagram:
|
||
X
|
||
δ
|
||
|
||
FTX
|
||
|
||
η
|
||
|
||
TX
|
||
|
||
l
|
||
|
||
Ω
|
||
ω
|
||
|
||
δ♯
|
||
Fl
|
||
|
||
FΩ
|
||
|
||
The monad T will be the structure we will explore to enable succinct representations. The crucial ingredient in generalizing the subset construction was the
|
||
observation that the target of the transition dynamics—2 × P(−)A —and the
|
||
∗
|
||
set of languages—2A —both have a complete join-semilattice structure. This
|
||
enables one to define the determinized automaton as a unique lattice extension
|
||
of the non-deterministic one, and, moreover, the language map l preserves the
|
||
semantics: l({s1 , s2 }) = l({s1 }) ∪ l({s2 }).
|
||
This latter somewhat trivial observation was also exploited in the work of
|
||
Bonchi and Pous [8] in defining an efficient algorithm for language equivalence of
|
||
NFAs by using coinduction-up-to. Join-semilattices are precisely the EilenbergMoore algebras of the powerset monad, and one can show that if a functor has
|
||
a final coalgebra in Set, this can be lifted to the category of Eilenberg-Moore
|
||
algebras of a monad T (T -algebras). This makes it possible to construct the
|
||
more general diagram above, where the coalgebra structure is generalized using
|
||
a functor F and a monad T . The only assumptions for the existence of T -algebra
|
||
maps δ ♯ and l are the existence of a final coalgebra for F in Set and that F T X
|
||
can be given a T -algebra structure.
|
||
In this paper we ask the reverse question—given a deterministic automaton,
|
||
if we assume the state space has a join-semilattice structure, can we build a corresponding succinct non-deterministic one? More generally, given an F -coalgebra
|
||
in the category of T -algebras, can we build a succinct F T -coalgebra in the base
|
||
category that represents the same behavior?
|
||
We will provide an abstract framework to understand this construction,
|
||
based on previous work by Arbib and Manes [4]. Our abstract framework relies on alternative, more modern, presentation of some of their results. Due to
|
||
our focus on set-based structures, we will conduct our investigation within the
|
||
category Set, which enables us to provide effective procedures. This does mean
|
||
that not all of the results due to Arbib and Manes will be given in their original
|
||
3
|
||
|
||
generality. We present a comprehensive set of examples that will illustrate the
|
||
versatility of the framework. We also discuss more algorithmic aspects that are
|
||
essential if the present framework is to be used as an optimization, for instance
|
||
as part of a learning algorithm.
|
||
After recalling basic facts about monads and structured automata in Section 2, the rest of this paper is organized as follows:
|
||
• In Section 3 we introduce a general notion of generators for a T -algebra,
|
||
and we show that automata whose state space form a T -algebra—which we
|
||
call T -automata—admit an equivalent T -succinct automaton, defined over
|
||
generators. We also characterize minimal generators and give a condition
|
||
under which they are globally minimal in size.
|
||
• In Section 4 we give an effective procedure to find a minimal set of generators for a T -algebra, and we present an algorithm that uses that procedure
|
||
to compute the T -succinct version of a given T -automaton. The algorithm
|
||
works by first minimising the T -automaton: the explicit algebraic structure
|
||
allows states that correspond to algebraic combinations of other states to
|
||
be detected, and then discarded when generators are computed.
|
||
• In Section 5 we show how the algorithm of Section 4 can be applied to
|
||
“plain” finite automata—without any algebraic structure—in order to derive an equivalent T -succinct automaton. We conclude with a result about
|
||
the compression power of our construction: it produces an automaton that
|
||
is at least as small as the minimal version of the original automaton.
|
||
• Finally, in Section 6 we give several examples, and in Section 7 we discuss
|
||
related and future work.
|
||
2. Preliminaries
|
||
Side-effects and different notions of non-determinism can be conveniently
|
||
captured as a monad T on a category C. A monad T = (T, µ, η) is a triple
|
||
consisting of an endofunctor T on C and two natural transformations: a unit
|
||
η : Id ⇒ T and a multiplication µ : T 2 ⇒ T . They satisfy the following laws:
|
||
µ ◦ ηT = id = µ ◦ T η
|
||
µ ◦ µT = µ ◦ T µ.
|
||
S
|
||
An example is the triple (P, {−}, ) where P denotes the powerset functor in
|
||
Set that assigns to each setSthe set of all its subsets, {−} is the function that
|
||
returns a singleton set, and is just union of sets.
|
||
Given a monad T , the category of CT of Eilenberg-Moore algebras over T , or
|
||
simply T -algebras, has as objects pairs (X, h) consisting of an object X, called
|
||
carrier, and a morphism h : T X → X such that h◦µX = h◦T h and h◦ηX = idX .
|
||
A T -homomorphism between two T -algebras (X, h) and (Y, k) is a morphism
|
||
f : X → Y such that f ◦ h = k ◦ T f .
|
||
We will often refer to a T -algebra (X, h) as X if h is understood or if its
|
||
specific definition is irrelevant. Given an object X, (T X, µX ) is a T -algebra
|
||
called the free T -algebra on X. Given an object U and a T -algebra (V, v), there
|
||
is a bijective correspondence between T -algebra homomorphisms T U → V and
|
||
morphisms U → V : for a T -algebra homomorphism f : T U → V , define f † =
|
||
4
|
||
|
||
f ◦ η : U → V ; for a morphism g : U → V , define g ♯ = v ◦ T g : T U → V . Then
|
||
g ♯ is a T -algebra homomorphism called the free T -extension of g, and we have
|
||
f †♯ = f
|
||
|
||
g ♯† = g.
|
||
|
||
(1)
|
||
|
||
Furthermore, for all objects S and morphisms h : S → U ,
|
||
g ♯ ◦ T h = (g ◦ h)♯ .
|
||
|
||
(2)
|
||
|
||
Example 2.1. For the monad P the associated Eilenberg-Moore category is
|
||
the category of (complete) join-semilattices.
|
||
Given a set X, the free P-algebra
|
||
S
|
||
on X is the join-semilattice (PX, ) of subsets of X with the union operation
|
||
as join.
|
||
Although some results are completely abstract, the central definition of minimal generators in Section 3 is specific to monads T on the category Set. Therefore we restrict ourselves to this setting. More precisely, we consider automata
|
||
over a finite alphabet A with outputs in a set O. In order to define automata
|
||
in SetT as (pointed) coalgebras for the functor O × (−)A , we need to lift this
|
||
functor from Set to SetT . Such a lifting corresponds to a distributive law of T
|
||
over O × (−)A [see e.g., 13]. A distributive law of the monad T over a functor
|
||
F : Set → Set is a natural transformation ρ : T F ⇒ F T satisfying ρ ◦ ηF = F η
|
||
and F µ ◦ ρT ◦ T ρ = ρ ◦ µF . In most examples we will define a T -algebra
|
||
structure β : T O → O on O, which is well known to induce a distributive law
|
||
ρ : T (O × (−)A ) ⇒ O × T (−)A given by
|
||
β×ρ′
|
||
|
||
hT π1 ,T π2 i
|
||
|
||
ρX = T (O × X A ) −−−−−−−→ T O × T (X A ) −−−−X
|
||
→ O × T (X)A
|
||
|
||
(3)
|
||
|
||
for any set X, where ρ′ (U )(a) = T (λf : A → X.f (a)). In general, we assume
|
||
an arbitrary distributive law ρ : T (O × (−)A ) ⇒ O × T (−)A , which gives us the
|
||
following notion of automaton.
|
||
Definition 2.2 (T -automaton). A T -automaton is a triple (X, i : 1 → X, δ : X →
|
||
O × X A ), where X is an object of SetT denoting the state space of the automaton, i is a function designating the initial state, and δ is a T -algebra map
|
||
assigning an output and transitions to each state.
|
||
Notice that the initial state map i : 1 → X in the above definition is not
|
||
required to be a T -algebra map. However, it corresponds to the T -algebra map
|
||
i♯ : T 1 → X. Thus, a T -automaton is an automaton in SetT .
|
||
The functor F (X) = O × X A has a final coalgebra in SetT [12] that can be
|
||
used to define the language accepted by a T -automaton.
|
||
Definition 2.3 (Language accepted). Given a T -automaton (X, i : 1 → X, δ : X →
|
||
∗
|
||
O × X A ), the language accepted by X is l ◦ i : 1 → OA , where l is the final
|
||
coalgebra map. In the diagram below, ω is the final coalgebra.
|
||
1
|
||
|
||
i
|
||
|
||
X
|
||
|
||
l
|
||
|
||
∗
|
||
|
||
ω(ϕ) = (ϕ(ε), λa w.ϕ(aw))
|
||
l(x)(ε) = π1 (δ(x))
|
||
|
||
ω
|
||
|
||
δ
|
||
|
||
O×X
|
||
|
||
OA
|
||
|
||
A
|
||
A id×l
|
||
|
||
∗
|
||
|
||
O × (OA )A
|
||
|
||
We use ε to denote the empty word.
|
||
5
|
||
|
||
l(x)(aw) = l(π2 (δ(x))(a))(w)
|
||
|
||
If the monad T is finitary, then the category SetT is locally finitely presentable, and hence it admits (strong epi, mono)-factorizations [2]. As in [4], we
|
||
use these factorizations to quotient the state-space of an automaton under language equivalence. The transition structure, γ, is obtained by diagonalization
|
||
via the factorization system. Diagramatically:
|
||
1
|
||
j
|
||
|
||
i
|
||
|
||
e
|
||
|
||
X
|
||
|
||
OA
|
||
|
||
γ
|
||
|
||
δ
|
||
|
||
O × XA
|
||
|
||
m
|
||
|
||
M
|
||
A
|
||
|
||
id×e
|
||
|
||
O × MA
|
||
|
||
∗
|
||
|
||
(4)
|
||
|
||
ω
|
||
id×m
|
||
|
||
A
|
||
|
||
∗
|
||
|
||
O × (OA )A
|
||
|
||
Here the epi e and mono m are obtained by factorizing the final coalgebra map
|
||
∗
|
||
l : X → OA . We call the quotient automaton (M, j, γ) the observable quotient
|
||
of (X, i, δ).
|
||
3. T -succinct automata
|
||
Given a T -automaton X = (X, i, δ), our aim is to obtain an equivalent
|
||
automaton in Set with transition function Y → O × T (Y )A , where Y is smaller
|
||
than X.1 The key idea is to find generators for X. Our definition of generators
|
||
is equivalent to the definition of a scoop due to Arbib and Manes [4, Section 7,
|
||
Definition 8].
|
||
Definition 3.1 (Generators for an algebra). We say that a set G is a set of
|
||
generators for a T -algebra X whenever there exists a function g : G → X such
|
||
that g ♯ : T G → X is a split epi in Set.
|
||
The intuition of requiring a split epi is that every element of X can now be
|
||
decomposed into a “combination” (defined by T ) of elements of G. We show two
|
||
simple results on generators, which will allow us to find initial sets of generators
|
||
for a given T -algebra.
|
||
Lemma 3.2. The carrier of any T -algebra X is a set of generators for it.
|
||
χ
|
||
|
||
Proof. Let T X −
|
||
→ X be the T -algebra structure on X. Then idX satisfies id♯X =
|
||
χ, and χ is a split epi because it is required to satisfy χ ◦ ηX = idX .
|
||
Lemma 3.3. Any set X is a set of generators for the free T -algebra T X.
|
||
♯
|
||
Proof. Follows directly from the fact that ηX : X → T X satisfies ηX
|
||
= idT X .
|
||
|
||
Once we have a set of generators G for X, we can define an equivalent free
|
||
representation of X , that is, an automaton whose state space is freely generated
|
||
from G.
|
||
1 Here, we are abusing notation and using O and A for both the objects in SetT and in the
|
||
base category Set. In particular, we use T C to also denote the free T -algebra over C.
|
||
|
||
6
|
||
|
||
Proposition 3.4 (Free representation of an automaton [4, Section 7, Proposition 9]). The free algebra T G forms the state space of an automaton equivalent
|
||
to X .
|
||
Proof. Let g : G → X witness G being a set of generators for X and let s : X →
|
||
T G be a right inverse of g ♯ . Recall that X = (X, i, δ) and define
|
||
i
|
||
|
||
s
|
||
|
||
j=1−
|
||
→X −
|
||
→ TG
|
||
g
|
||
|
||
id×sA
|
||
|
||
δ
|
||
|
||
γ=G−
|
||
→X−
|
||
→ O × X A −−−−→ O × (T G)A
|
||
Then (T G, j, γ ♯ ) is an automaton. We will show that g ♯ : T G → X is an automaton homomorphism. We have g ♯ ◦ j = g ♯ ◦ s ◦ i = i, and, writing F for the
|
||
functor O × (−)A and χ for the T -algebra structure on X,
|
||
TG
|
||
|
||
Tg
|
||
|
||
Tδ
|
||
|
||
TX
|
||
|
||
TFX
|
||
|
||
TFs
|
||
|
||
2
|
||
|
||
ρ
|
||
g♯
|
||
|
||
ρ
|
||
|
||
TFTG
|
||
|
||
FTs
|
||
id
|
||
|
||
FTX
|
||
|
||
χ
|
||
|
||
F T 2G
|
||
|
||
Fµ
|
||
|
||
FTG
|
||
|
||
F T g♯
|
||
|
||
FTX
|
||
|
||
1
|
||
|
||
3
|
||
|
||
F g♯
|
||
|
||
Fχ
|
||
δ
|
||
|
||
X
|
||
|
||
FX
|
||
|
||
commutes. Here 1 commutes because δ is a T -algebra homomorphism, 2 commutes by naturality of the distributive law ρ, and 3 commutes because g ♯ is a
|
||
T -algebra homomorphism. The triangle on the left unfolds the definition of g ♯ ,
|
||
and the remaining triangle commutes by s being right inverse to g ♯ . Note that
|
||
the composition in the top row of the diagram is γ ♯ . We conclude that g ♯ is
|
||
an automaton homomorphism, which using the finality in Definition 2.3 implies
|
||
that (T G, j, γ ♯ ) accepts the same language as X .
|
||
The state space T G of this free representation can be extremely large. Fortunately, the fact that T G is a free algebra allows for a much more succinct
|
||
version of this automaton.
|
||
Definition 3.5 (T -succinct automaton). Given an automaton of the form
|
||
(T X, i, δ), where T X is the free T -algebra on X, the corresponding T -succinct
|
||
automaton is the triple (X, i, δ ◦ η). The language accepted by the T -succinct
|
||
automaton is the language l ◦ i accepted by (T X, i, δ):
|
||
1
|
||
i
|
||
|
||
X
|
||
δ◦η
|
||
|
||
O × (T X)A
|
||
|
||
η
|
||
|
||
TX
|
||
|
||
l
|
||
|
||
OA
|
||
|
||
∗
|
||
|
||
ω
|
||
|
||
δ
|
||
id×lA
|
||
|
||
∗
|
||
|
||
O × (OA )A
|
||
|
||
The goal of our construction is to build a T -succinct automaton from a set
|
||
of generators that is minimal in a way that we will define now. In what follows
|
||
below we use the following piece of notation: if U and V are sets such that
|
||
U ⊆ V , then we write ιU
|
||
V for the inclusion map U → V .
|
||
7
|
||
|
||
Definition 3.6 (Minimal generators). Given a T -algebra X and a set of generators G for X witnessed by g : G → X, we say that r ∈ G is redundant if there
|
||
G\{r} ♯
|
||
exists a U ∈ T (G \ {r}) satisfying (g ◦ ιG
|
||
) (U ) = g(r); all other elements
|
||
are said to be isolated [4]2 . We call G a minimal set of generators for X if G
|
||
contains no redundant elements.
|
||
A minimal set of generators is not necessarily minimal in size. However,
|
||
under certain conditions this is the case. The following result was mentioned
|
||
but not proved by Arbib and Manes [4], who showed that its conditions are
|
||
satisfied for any finitely generated P-algebra. We note that these conditions do
|
||
not apply (in general) to any of the further examples in Section 6.
|
||
Proposition 3.7. If a T -algebra X is generated by the isolated elements I of
|
||
the set of generators X (Lemma 3.2) with their inclusion map ιIX and I is finite,
|
||
then there is no set of generators for X smaller than I, and every minimal set
|
||
of generators for X has the same size as I.
|
||
g
|
||
|
||
Proof. Let G −
|
||
→ X be a set of generators for X, and assume towards a contradiction that G is smaller than I. Then there must be an i ∈ I such that there
|
||
is no v ∈ G satisfying g(v) = i. Let g ′ : G → X \ {i} be pointwise equal to
|
||
g. Because g ♯ is a split epi and thus surjective, there is a U ∈ T G such that
|
||
g ♯ (U ) = i. Note that by (2),
|
||
X\{i}
|
||
|
||
g ♯ = (ιX
|
||
|
||
T (g′ )
|
||
|
||
(ι
|
||
|
||
X\{i} ♯
|
||
|
||
)
|
||
|
||
◦ g ′ )♯ = T G −−−→ T (X \ {i}) −−X
|
||
−−−−→ X.
|
||
|
||
X\{i}
|
||
|
||
Then (id ◦ ιX
|
||
)♯ (T (g ′ )(U )) = i, contradicting the fact that i is isolated in
|
||
the full set of generators X. Thus, G cannot be smaller than I. In fact, we see
|
||
that for every i ∈ I there is a v ∈ G satisfying g(v) = i. This yields a function
|
||
h : I → G such that g ◦ h = ιIX .
|
||
Suppose G is a minimal set of generators, and take any v ∈ G not in the
|
||
image of h. We will show that v is redundant in G. Since I constitutes a set of
|
||
generators for X, there exists a U ∈ T I such that (ιIX )♯ (U ) = g(v). Then
|
||
g ♯ (T (h)(U )) = (g ◦ h)♯ (U ) = (ιIX )♯ (U ) = g(v).
|
||
It follows that v is redundant in G, which contradicts G being minimal. Therefore, h is surjective and G has the same size as I.
|
||
4. T -minimization
|
||
In this section we describe a construction to compute a “minimal” succinct
|
||
T -automaton equivalent to a given T -automaton. This crucially relies on a procedure that finds a minimal set of generators by removing redundant elements
|
||
one by one. All that needs to be done for specific monads is determining whether
|
||
an element is redundant.
|
||
2 Arbib and Manes [4] define isolated elements only for the full set X rather than relative
|
||
to a set of generators for X. Our refinement plays an important role in finding a minimal set
|
||
of generators.
|
||
|
||
8
|
||
|
||
Proposition 4.1 (Generator reduction). Given a T -algebra X and a set of
|
||
generators G for X, if r ∈ G is redundant, then G \ {r} is a set of generators
|
||
for X.
|
||
Proof. Let G′ = G \ {r} and let g ′ : G′ → X be the restriction of g : G → X to
|
||
G′ . Since r is redundant, there is a U ∈ T (G′ ) such that g ′♯ (U ) = g(r). Define
|
||
e : G → T (G′ ) by
|
||
(
|
||
U
|
||
if x = r
|
||
e(x) =
|
||
η(x) if x 6= r.
|
||
We will show that g ′♯ ◦ e = g. Consider any x ∈ G. If x = r, then
|
||
g ′♯ (e(x)) = g ′♯ (e(r)) = g ′♯ (U ) = g(r) = g(x).
|
||
If x 6= r, then, using (1),
|
||
g ′♯ (e(x)) = g ′♯ (η(x)) = g ′♯† = g ′ (x) = g(x).
|
||
Let χ : T X → X be the algebra structure on X and take any right inverse
|
||
s : X → T G of g ♯ . Then
|
||
g ′♯ ◦ e♯ ◦ s = g ′♯ ◦ µ ◦ T e ◦ s
|
||
|
||
(definition of e♯ )
|
||
|
||
= χ ◦ T (g ′♯ ) ◦ T e ◦ s
|
||
|
||
(g ′♯ is a T -algebra homomorphism)
|
||
|
||
= χ ◦ T (g ′♯ ◦ e) ◦ s
|
||
|
||
(functoriality of T )
|
||
|
||
= χ ◦ Tg ◦ s
|
||
|
||
(g ′♯ ◦ e = g as shown above)
|
||
|
||
= g♯ ◦ s
|
||
|
||
(definition of g ♯ )
|
||
|
||
= idX
|
||
|
||
(s is right inverse to g ♯ ).
|
||
|
||
We thus see that e♯ ◦ s is right inverse to g ′♯ , which means that G′ is a set of
|
||
generators for X.
|
||
If we determine that an element is isolated, there is no need to check this
|
||
again later when the set of generators has been reduced. This is thanks to the
|
||
following result.
|
||
g′
|
||
|
||
g
|
||
|
||
Proposition 4.2. If G −
|
||
→ X and G′ −→ X are sets of generators for a T ′
|
||
algebra X such that G ⊆ G and g ′ is the restriction of g to the domain G′ , then
|
||
whenever an element r ∈ G′ is isolated in G, it is also isolated in G′ .
|
||
Proof. We will show that redundant elements in G′ are also redundant in G.
|
||
If r ∈ G′ is isolated in G′ , then there exists U ∈ T (G′ \ {r}) such that (g ′ ◦
|
||
′
|
||
G′ \{r} ♯
|
||
) (U ) = g ′ (r). Note that g ′ = g ◦ ιG
|
||
ιG′
|
||
G . We have
|
||
G\{r} ♯
|
||
|
||
(g ◦ ιG
|
||
|
||
G′ \{r}
|
||
|
||
G\{r}
|
||
|
||
) (T (ιG\{r} )(U )) = (g ◦ ιG
|
||
|
||
′
|
||
|
||
G′ \{r}
|
||
|
||
◦ ιG\{r} )♯ (U )
|
||
′
|
||
|
||
G \{r}
|
||
|
||
♯
|
||
= (g ◦ ιG
|
||
G ◦ ιG\{r} ) (U )
|
||
G′ \{r}
|
||
|
||
= (g ′ ◦ ιG\{r} )♯ (U )
|
||
= g ′ (r)
|
||
= g(r),
|
||
so r is redundant in G.
|
||
9
|
||
|
||
(2)
|
||
|
||
Finally, taking the observable quotient M of a T -automaton Q preserves
|
||
generators, considering that the T -automaton homomorphism m : Q → M is a
|
||
split epi in Set under the axiom of choice.
|
||
Proposition 4.3. If Q and M are T -algebras, m : Q → M is a T -algebra
|
||
g
|
||
homomorphism that is a split epi in Set, and G −
|
||
→ Q is a set of generators for
|
||
g
|
||
m
|
||
Q, then G −
|
||
→ Q −→ M is a set of generators for M .
|
||
Proof. Let a : T Q → Q be the T -algebra structure on Q and b : T M → M the
|
||
one on M . We have
|
||
(m ◦ g)♯ = b ◦ T (m ◦ g) = b ◦ T (m) ◦ T (g) = m ◦ a ◦ T g = m ◦ g ♯
|
||
using that m is a T -algebra homomorphism. It is well known that compositions
|
||
of split epis are split epis themselves, so G is a set of generators for M .
|
||
Now we are ready to define the construction that builds a T -succinct automaton accepting the same language as a T -automaton.
|
||
Construction 4.4 (T -minimization). Starting from a T -automaton (X, i, δ),
|
||
where X has a finite set of generators, we execute the following steps.
|
||
1. Take the observable quotient (M, i0 , δ0 ) of (X, i, δ).
|
||
2. Compute a minimal set of generators G of M by starting from the full set
|
||
M and applying Proposition 4.1.
|
||
3. Compute and return the corresponding T -succinct automaton as defined
|
||
in Definition 3.5 via Proposition 3.4.
|
||
Generic minimization algorithms have been proposed in the literature. For
|
||
example, Adámek et al. give a general procedure to compute the observable
|
||
quotient [1], and König and Küpper provide a generic partition refinement algorithm for coalgebras, with a focus on instantiations to weighted automata [14].
|
||
None of these works provide any complexity analysis. Recently, Dorsch et al. [11]
|
||
have presented a coalgebraic Paige–Tarjan algorithm and provided a complexity analysis for a class of functors in categories with image-factorization. These
|
||
restrictions match well the ones we make, and therefore their algorithm could
|
||
be applied in our first step. Given a finite set of generators G, the loop in the
|
||
second step involves considering each element of G and checking whether it is
|
||
redundant. If so, we will remove the element from G and continue the loop.
|
||
The redundancy check is the only part for which computability needs to be
|
||
determined in each specific setting.
|
||
Example 4.5 (Join-semilattices). We give an example of the construction in
|
||
the category JSL of complete join-semilattices. We start from a minimal Pautomaton (in JSL) that has 4 states and is depicted below on the left. The
|
||
dashed blue lines indicate the JSL structure.
|
||
a, b
|
||
b
|
||
|
||
z
|
||
|
||
b
|
||
|
||
a
|
||
a, b
|
||
|
||
y
|
||
|
||
x
|
||
|
||
⊥
|
||
|
||
y
|
||
|
||
x
|
||
|
||
a
|
||
a, b
|
||
|
||
b
|
||
10
|
||
|
||
a, b
|
||
b
|
||
|
||
Since the automaton is minimal, it is isomorphic to its observable quotient.
|
||
We start from the full set of generators {⊥, x, y, z}. Note that z is the union
|
||
of x and y, so we can eliminate it. Additionally, ⊥ is the empty union and can
|
||
be removed as well. Both x and y are isolated elements and form the unique
|
||
minimal set of generators G = {x, y} (see the remark above Proposition 3.7).
|
||
These are exactly the join-irreducibles of M . They induce by Proposition 3.4
|
||
an automaton (T G, j, γ), where γ is the same transition structure as the above
|
||
automaton, but with {x, y} substituted for z; the initial state is the singleton set
|
||
{x}. The P-succinct automaton corresponding to this minimal set of generators
|
||
(Definition 3.5) is the non-deterministic automaton shown on the right.
|
||
Note that the definition of the automaton defined in Proposition 3.4 depends
|
||
on the right inverse chosen for the extension of the generator map. When the
|
||
original JSL automaton is reachable (every state is reached by some set of
|
||
words, where a set of words reaches the join of the states reached by the words it
|
||
contains), this right inverse may be chosen in such a way to recover the canonical
|
||
residual finite state automaton (RFSA), as well as the simplified canonical RFSA,
|
||
both due to Denis et al. [10]. Details are given in [23]. See [17] for conditions
|
||
under which the canonical RFSA, referred to as the jiromaton, is a state-minimal
|
||
NFA.
|
||
5. Main construction
|
||
In this section we present the main construction of the paper. Given a finite
|
||
automaton (X, i, δ) in Set, i.e., an automaton where X is finite, this construction
|
||
builds an equivalent T -succinct automaton.
|
||
The first step is taking the reachable part R of X and converting this automaton into a T -automaton recognising the same language.
|
||
Proposition 5.1. Let (T R, î, δ̂) be the T -automaton defined as follows:
|
||
1
|
||
i
|
||
|
||
R
|
||
|
||
î=ηR ◦i
|
||
ηR
|
||
|
||
TR
|
||
A
|
||
δ̂=((id×ηR
|
||
)◦δ)♯
|
||
|
||
δ
|
||
|
||
O × RA
|
||
|
||
A
|
||
id×ηR
|
||
|
||
O × T (R)A
|
||
|
||
Then (R, i, δ) and (T R, î, δ̂) accept the same language.
|
||
Proof. The diagram above means that ηR is a coalgebra homomorphism, and
|
||
as such it preserves language. Explicitly: x ∈ R accepts the same language as
|
||
ηR (x), which in particular holds for i(⋆) and î(⋆).
|
||
Now we can T -minimize (T R, î, δ̂) (Construction 4.4), which yields an equivalent T -automaton. Notice that, R being finite, any quotient of T R has a finite
|
||
set of generators. This is a consequence of R being a set of generators for T R
|
||
(Lemma 3.3) and of generators being preserved by quotients (Proposition 4.3).
|
||
It follows that every step of the T -minimization construction terminates.
|
||
Proposition 5.2. The T -succinct automaton defined above is at least as small
|
||
as the minimal deterministic automaton equivalent to X.
|
||
11
|
||
|
||
Proof. The situation is summed up in the following commutative diagram:
|
||
G
|
||
|
||
R
|
||
|
||
g
|
||
|
||
η
|
||
|
||
TR
|
||
|
||
e
|
||
|
||
M
|
||
|
||
m
|
||
|
||
OA
|
||
|
||
∗
|
||
|
||
Here G is the final minimal set of generators for M resulting from the construction. Commutativity follows from G being a subset of the set of generators
|
||
R.
|
||
The minimal deterministic automaton equivalent to X is obtained from R
|
||
by merging language-equivalent states. Recalling (4) and the proof of Proposition 5.1, we see that e ◦ ηR is a coalgebra homomorphism. Together with
|
||
commutativity of the above diagram, this means that the language accepted by
|
||
r ∈ G (seen as a state of R) is given by (m ◦ g)(r). Since G is a subset of R,
|
||
to show that G is at least as small as the minimal deterministic automaton, we
|
||
only have to show that different states in G accept different languages. That is,
|
||
we will show that m ◦ g is injective. We know that m is injective by definition;
|
||
to see that g is injective, consider r1 , r2 ∈ G such that g(r1 ) = g(r2 ). Then
|
||
g(r1 ) = g(r2 ) = g ♯ (η(r2 )). Assuming r1 6= r2 leads to the contradiction that G
|
||
is not a minimal set of generators because in this case η(r2 ) ∈ T (G \ {r1 }).
|
||
Computing the determinization T R is an expensive operation that only terminates if T preserves finite sets. One could devise an optimized version of
|
||
Construction 4.4 in which the determinization is not computed completely in
|
||
order to minimize it. Instead, we could choose to work with data structures as
|
||
Böllig et al. [7] did for non-deterministic automata, and which we generalized
|
||
in recent work [23]. In these papers, partial representations of the determinized
|
||
automaton are used in an iterative process to compute the generators of the
|
||
state space of the minimal one.
|
||
6. Examples
|
||
6.1. Monads preserving finite sets
|
||
If T preserves finite sets, then there is a naive method to find a redundant
|
||
element: assuming a finite set of generators G for a T -algebra X, the set T (G \
|
||
{r}) is also finite for any r ∈ G. Thus, we can loop over all U ∈ T (G \ {r}) and
|
||
check if the generator map g : G → X satisfies g ♯ (U ) = g(r).
|
||
6.1.1. Alternating automata.
|
||
We now use our construction to get small alternating finite automata (AFAs)
|
||
over a finite alphabet A. AFAs generalize both non-deterministic and universal
|
||
automata, where the latter are the dual of non-deterministic automata: a word
|
||
is accepted when all paths reading it are accepting. In an AFA, reading a symbol
|
||
leads to a DNF formula (without negation) of next states.
|
||
We use the characterization of alternating automata due to Bertrand [5].
|
||
Given a partially ordered set (P, ≤), an upset is a subset U of P such that
|
||
whenever x ∈ U and x ≤ y, then y ∈ U . Given Q ⊆ P , we write ↑Q for the
|
||
upward closure of Q, that is the smallest upset of P containing Q. We consider
|
||
|
||
12
|
||
|
||
a
|
||
|
||
q0
|
||
b
|
||
|
||
b
|
||
q1
|
||
|
||
q2
|
||
a
|
||
|
||
a
|
||
|
||
a, b
|
||
b
|
||
|
||
q4
|
||
|
||
q0
|
||
|
||
a
|
||
|
||
q2
|
||
b
|
||
|
||
a, b
|
||
a
|
||
|
||
q3
|
||
|
||
(a) Deterministic automaton
|
||
|
||
a, b
|
||
q1
|
||
|
||
(b) Small corresponding AFA
|
||
|
||
Figure 1: Automata for the language {a, ba, bb, baa}
|
||
|
||
the monad TAlt that maps a set X to the set of all upsets of P(X). Its unit is
|
||
given by ηX (x) =↑{{x}} and its multiplication by
|
||
µX (U ) = {V ⊆ X | ∃W ∈U ∀Y ∈W ∃Z∈Y Z ⊆ V }.
|
||
The sets of sets in TAlt (X) can be seen as DNF formulae over elements of X:
|
||
the outer powerset is interpreted disjunctively and the inner one conjunctively.
|
||
Accordingly, we define an algebra structure β : TAlt (2) → 2 on the output set
|
||
2 by letting β(U ) = 1 if {1} ∈ U and β(U ) = 0 otherwise. Recall from (3) in
|
||
Section 2 that such an algebra structure induces a distributive law.
|
||
We now explicitly spell out the T -minization algorithm that turns a DFA
|
||
(X, i, δ) into a TAlt -succinct AF A.
|
||
1. Compute the reachable states R of (X, i, δ) via a standard visit of its
|
||
graph.
|
||
2. Compute the corresponding freely-generated TAlt -automaton (TAlt R, î, δ̂),
|
||
by generating all DNF formulae TAlt R on R.
|
||
3. Compute the observable quotient (M, i0 , δ0 ) of (TAlt R, î, δ̂) via a standard minimization algorithm, such as the coalgebraic Paige–Tarjan algorithm [11].
|
||
4. Compute a minimal set of generators for M as follows. Consider the generator map idM : M → M , for which we have that id♯ is the algebra map
|
||
of M . Pick r ∈ M , and iterate over all DNF formulae ϕ over M \ {r}; if
|
||
there is ϕ which is mapped to r by the algebra map of M (i.e., id♯ ), r is
|
||
redundant and can be removed from M . Repeat until no more elements
|
||
are removed from M , which yields a minimal set of generators G.
|
||
5. Return the TAlt -succinct automaton (G, i0 , i0 ◦ η).
|
||
Note that every step of this algorithm terminates, as X is finite and the size of
|
||
|R|
|
||
TAlt R is 22 .
|
||
Example 6.1. Consider the regular language over A = {a, b} given by the
|
||
finite set {a, ba, bb, baa}. The minimal DFA accepting this language is given in
|
||
Figure 1a.
|
||
According to our construction, we first construct a TAlt -automaton with
|
||
state space freely generated from this automaton (which is already reachable).
|
||
Then we TAlt -minimize it in order to obtain a small AFA. In this case, there is
|
||
a unique minimal subset of 3 generators: G = {q0 , q1 , q2 }. To see this, consider
|
||
|
||
13
|
||
|
||
the languages JqK accepted by states q of the deterministic automaton:
|
||
Jq0 K = {a, ba, bb, baa}
|
||
|
||
Jq2 K = {ε}
|
||
|
||
Jq1 K = {a, b, aa}
|
||
|
||
Jq3 K = {ε, a}.
|
||
|
||
Jq4 K = ∅
|
||
|
||
These languages generate the states of the minimal TAlt -automaton by interpreting joins as unions and meets as intersections. We note that Jq4 K is just an
|
||
empty join and Jq3 K = (Jq0 K ∩ Jq1 K) ∪ Jq2 K.3 These are the only redundant generators. Removing them leads to the AFA in Figure 1b. Here the black square
|
||
represents a conjunction of next states.
|
||
6.1.2. Complete Atomic Boolean Algebras
|
||
We now consider the monad C given by the double contravariant powerset
|
||
X
|
||
functor, namely CX = 22 . Here the outer powerset is treated disjunctively as
|
||
in the case of TAlt , and the sets provided by the inner powerset are interpreted
|
||
as valuations. Thus, elements of C(X) can be seen as full DNF formulae over
|
||
X: every conjunctive clause contains for each x ∈ X either x or the negation x
|
||
of x. The unit assigns to an element x the disjunction of all full conjunctions
|
||
containing x, and the multiplication turns formulae of formulae into full DNF
|
||
formulae in the usual way. Algebras for this monad are known as complete
|
||
atomic boolean algebras (CABAs).
|
||
Using the fact that 2 is a free CABA (2 ∼
|
||
= C(∅)), we obtain the following
|
||
semantics for C-succinct automata: a set of sets of states is accepting if and
|
||
only if it contains the exact set F of accepting states. This is different from
|
||
alternating automata, where a subset of F is sufficient. Reading a symbol in a
|
||
C-succinct automaton works as follows. Suppose we are in a set of sets of states
|
||
S ∈ C(Q), where we read a symbol a. The resulting set of sets contains U ⊆ Q
|
||
if and only if there is a set V ∈ S such that every state in V transitions into a
|
||
set of sets containing U , and every state not in V does not transition into any
|
||
set of sets containing U .
|
||
Note that every DNF formula can be converted to a full DNF formula. This
|
||
implies that C-succinct automata can always be as small as the smallest AFAs
|
||
for a given language. With the following example we show that they can actually
|
||
be strictly smaller. The T -minimization algorithm for AFA we have given in
|
||
the previous section applies to this setting as well (including negation in DNF
|
||
formulae).
|
||
Example 6.2. Consider the regular language of words over the singleton alphabet A = {a} whose length is non-zero and even. The minimal DFA accepting this
|
||
language is shown in Figure 2a. We start the algorithm with the C-automaton
|
||
with state space freely generated from this this DFA and merge the languageequivalent states. Initially, the set of generators is the set of states of the original
|
||
DFA. By noting that the language accepted by q2 is the negation of the one accepted by q1 , in full DNF form Jq2 K = (Jq0 K ∩ Jq1 K) ∪ (Jq0 K ∩ Jq1 K) (where for
|
||
any language U its complement is defined as U = A∗ \ U ), we see that q2 is
|
||
3 Strictly speaking, we should take the upwards-closure of this disjunction (adding any
|
||
possible set of elements to each conjunction as an additional clause). We choose to use the
|
||
equivalent succinct formula both here and in the subsequent AFA construction to aid readability.
|
||
|
||
14
|
||
|
||
q0
|
||
q0
|
||
a
|
||
a
|
||
|
||
a
|
||
q1
|
||
|
||
a
|
||
|
||
q2
|
||
q1
|
||
|
||
a
|
||
|
||
a
|
||
|
||
(b) C-succinct automaton
|
||
|
||
(a) Deterministic automaton
|
||
|
||
Figure 2: Automata for the language of non-zero even words over {a}
|
||
|
||
redundant. The set of generators {q0 , q1 } is minimal and corresponds to the Csuccinct automaton in Figure 2b. We depict C-succinct automata in the same
|
||
manner as AFAs, but note that their interpretation is different. Here the transition into the black square represents the transition into the conjunction of the
|
||
negations of q0 and q1 .
|
||
We now show that there is no AFA with two states accepting the same
|
||
language. Suppose such an AFA exists, and let the state space be X = {x0 , x1 }.
|
||
Since a and aaa are not in the language but aa is, one of these states must
|
||
be accepting and the other must be rejecting.4 Without loss of generality we
|
||
assume that x0 is rejecting and x1 is accepting. The empty word is not in the
|
||
language, so our initial configuration has to be ↑{{x0 }}. Since a is also not in
|
||
the language, x0 will have to transition to ↑{{x0 }} as well. However, this implies
|
||
that aa is not accepted by the AFA, which contradicts the assumption that it
|
||
accepts the right language.
|
||
Unfortunately, the fact that the transition behavior of a set of states depends
|
||
on states not in that set generally makes it difficult to work with C-succinct
|
||
automata by hand.
|
||
6.1.3. Symmetry
|
||
We now consider succinct automata that exploit symmetry present in their
|
||
accepted language. Given a finite group G, consider the monad G × (−), where
|
||
the unit pairs any element with the unit of G and the multiplication applies the
|
||
multiplication of G. The algebras for G × (−) are precisely left group actions.
|
||
We assume an action on the alphabet A; if no such action is relevant, one may
|
||
π2
|
||
A. We also assume an action on the
|
||
consider the trivial action G × A −→
|
||
output set O. Group actions will be denoted by a centered dot. We consider the
|
||
distributive law ρ : G × (O × (−)A ) ⇒ O × (G × (−))A given by
|
||
ρX (g, o, f ) = (g · o, λa.(g, f (g −1 · a))).
|
||
We explain the resulting semantics of (G × (−))-succinct automata in an example.
|
||
4 If there were no rejecting states, the only way to reject a word is by ending up in the
|
||
empty set of sets of states. However, this means that extensions of that word are rejected
|
||
as well. Similarly, if there are no accepting states one can only accept by ending up in ↑{∅},
|
||
which accepts everything.
|
||
|
||
15
|
||
|
||
a, b
|
||
|
||
a
|
||
q0 , ⊥
|
||
|
||
q1 , ⊥
|
||
a
|
||
|
||
a
|
||
|
||
q3 , a
|
||
|
||
b/(ab)
|
||
|
||
a, b
|
||
|
||
b
|
||
|
||
a/e
|
||
q0 , ⊥
|
||
|
||
b
|
||
|
||
q2 , ⊥
|
||
|
||
b
|
||
|
||
(a) Deterministic automaton
|
||
|
||
a/e, b/e
|
||
|
||
q4 , b
|
||
|
||
q1 , ⊥
|
||
|
||
a/e
|
||
|
||
q3 , a
|
||
|
||
b/(ab)
|
||
(b) Corresponding (G × (−))-succinct automaton
|
||
|
||
Figure 3: Automata outputting the first symbol to appear twice in a row
|
||
|
||
Example 6.3. Consider the group Perm({a, b}) = {e, (ab)} of permutations
|
||
over elements a and b. Here e is the identity and (ab) swaps a and b. We consider
|
||
the alphabet A = {a, b} with an action Perm(A) × A → A given by applying
|
||
the permutation to the element of A, and the output set O = A ∪ {⊥} with an
|
||
action given by
|
||
(ab) · a = b
|
||
|
||
(ab) · b = a
|
||
|
||
(ab) · ⊥ = ⊥.
|
||
|
||
Figure 3a shows a deterministic automaton over the alphabet A with outputs
|
||
in O. States are labeled by pairs (q, o), where q is a state label and o the output
|
||
of the state. The recognized language is the one assigning to a word over A the
|
||
first input symbol appearing twice in a row, or ⊥ if no such symbol exists. This
|
||
deterministic automaton is in fact the minimal (Perm(A)×(−))-automaton. The
|
||
action on its state space is defined by
|
||
(ab) · q0 = q0
|
||
|
||
(ab) · q1 = q2
|
||
|
||
(ab) · q2 = q1
|
||
|
||
(ab) · q3 = q4
|
||
|
||
(ab) · q4 = q3 .
|
||
|
||
We note that in the set of generators given by the full state space, q1 , q2 , q3 , and
|
||
q4 are redundant. After removing q2 , only q3 and q4 are redundant. Subsequently
|
||
removing q4 leaves no redundant elements.
|
||
The final (G × (−))-succinct automaton is shown in Figure 3b. Its actual
|
||
configurations are pairs of a group element and a state. Transition labels are of
|
||
the form x/g, where x ∈ A and g ∈ Perm(A). If we are in a configuration (g, q)
|
||
and state q has an associated output o ∈ O, the actual output is g ·o. On reading
|
||
a symbol x ∈ A, we find the outgoing transition of which the label starts with
|
||
the symbol g −1 ·x. Supposing this label contains a group element g ′ and leads to
|
||
a state q ′ , the resulting configuration is (gg ′ , q ′ ). For example, consider reading
|
||
the word bb. We start in the configuration (e, q0 ). Reading b here simply takes
|
||
the transition corresponding to b, which brings us to ((ab), q1 ). Now reading the
|
||
second b, we actually read (ab)−1 · b = (ab) · b = a. This brings us to ((ab), q3 ).
|
||
The output is then given by (ab) · a = b.
|
||
In general, sets of generators in this setting correspond to subsets in which
|
||
all orbits are represented. The orbits of a set X with a left group action are the
|
||
equivalence classes of the relation that identifies elements x, y ∈ X whenever
|
||
there exists g ∈ G such that g ·x = y. Minimal sets of generators contain a single
|
||
16
|
||
|
||
representative for each orbit. The algorithm given for AFAs in section 6.1.1 can
|
||
be applied to this setting as well: step 4 will remove elements until only orbit
|
||
representatives are left.
|
||
6.2. Vector Spaces
|
||
We now exploit vector space structures. Given a field F, consider the free
|
||
vector space monad V . It maps each set X to the set of functions X → F with
|
||
finite support (finitely many elements of X are mapped to a non-zero value). A
|
||
function f : X → Y is mapped to the function V (f ) : V (X) → V (Y ) given by
|
||
X
|
||
g(x).
|
||
V (f )(g)(y) =
|
||
x∈X,f (x)=y
|
||
|
||
The unit η : X → V (X) and multiplication µ : V V (X) → V (X) of the monad
|
||
are given by
|
||
(
|
||
X
|
||
1 if x = x′
|
||
′
|
||
f (g) · g(x) ∈ F.
|
||
η(x)(x ) =
|
||
µ(f )(x) =
|
||
′
|
||
0 if x 6= x
|
||
g∈V (X)
|
||
Here 0 and 1, as well as addition and multiplication, are those of the field F.
|
||
Elements of V (X) can alternatively be written as formal sums v1 x1 + · · · + vn xn
|
||
with vi ∈ F and xi ∈ X for all i. We will use this notation in the example below.
|
||
Algebras for the free vector space monad are precisely vector spaces. We use
|
||
the output set O = F, and the alphabet can be any finite set A. Instantiating
|
||
(3), this leads to a pointwise distributive law ρ : V (O × (−)A ) ⇒ O × V (−)A
|
||
given at a set X by
|
||
|
||
|
||
X
|
||
X
|
||
ρ(f ) =
|
||
f (o, g) · o, λa.λx.
|
||
f (o, g) .
|
||
(o,g)∈O×X A
|
||
|
||
(o,g)∈O×X A ,g(a)=x
|
||
|
||
With these definitions, the V -succinct automata are weighted automata. We
|
||
note that if F is infinite, any non-trivial V -automaton will also be infinite. However, we can still start from a given weighted automaton and apply a slight
|
||
modification of Construction 4.4: minimize from the succinct representation,
|
||
use the states of the succinct representation as initial set of generators, and
|
||
finally find a minimal set of generators. Moreover, we may add a reachability
|
||
analysis, which in this case cannot lead to a larger automaton. Thus, the resulting algorithm essentially comes down to the standard minimization algorithm
|
||
for weighted automata [19], where the process of removing redundant generators
|
||
is integrated into the minimization. If F is finite and we do want to start from
|
||
a deterministic automaton, we can consider this automaton as a weighted one
|
||
by assigning each transition a weight of 1.
|
||
|
||
Example 6.4. Consider for F = R the deterministic automaton in Figure 4a.
|
||
This is a minimal automaton in Set; the freely generated V -automaton is infinite, and so is its minimization. However, that minimization has the states
|
||
of the automaton in Figure 4a as a set of generators. To gain insight into this
|
||
minimization, we compute the languages accepted by those generators (apart
|
||
|
||
17
|
||
|
||
q1 , 1
|
||
a, b, c
|
||
|
||
a
|
||
b
|
||
|
||
q0 , 0
|
||
|
||
q1 , 1
|
||
|
||
a, b, c
|
||
|
||
q2 , 1
|
||
|
||
c
|
||
|
||
a, c
|
||
a, b, c
|
||
|
||
q4 , 0
|
||
|
||
q0 , 0
|
||
|
||
a, b, c
|
||
|
||
a, b, c
|
||
|
||
b, c/2
|
||
|
||
q3 , 3
|
||
|
||
q2 , 1
|
||
|
||
(a) Deterministic automaton
|
||
|
||
(b) Succinct weighted automaton
|
||
|
||
Figure 4: Succinctness via a weighted automaton
|
||
|
||
from q0 ):
|
||
q1 :
|
||
q2 :
|
||
|
||
ε 7→ 1
|
||
ε 7→ 1
|
||
|
||
a 7→ 1
|
||
a 7→ 0
|
||
|
||
b 7→ 1
|
||
b 7→ 0
|
||
|
||
c 7→ 1
|
||
c 7→ 0
|
||
|
||
q3 :
|
||
q4 :
|
||
|
||
ε 7→ 3
|
||
ε 7→ 0
|
||
|
||
a 7→ 1
|
||
a 7→ 0
|
||
|
||
b 7→ 1
|
||
b 7→ 0
|
||
|
||
c 7→ 1
|
||
c 7→ 0
|
||
|
||
Words not displayed are mapped to 0 by any state. The language of q0 is the
|
||
only one assigning non-zero values to certain words of length two, such as aa,
|
||
and therefore q0 cannot be a redundant generator. The other generators are
|
||
redundant: writing JqK for the language of a state q, Jq4 K is just a zero-ary sum,
|
||
and we have
|
||
Jq1 K = Jq3 K − 2Jq2 K
|
||
|
||
Jq2 K =
|
||
|
||
1
|
||
1
|
||
Jq3 K − Jq1 K
|
||
2
|
||
2
|
||
|
||
Jq3 K = Jq1 K + 2Jq2 K.
|
||
|
||
Once q4 is removed, all other generators are still redundant. Further removing
|
||
q3 makes q1 and q2 isolated. Therefore, V -minimization yields the weighted
|
||
automaton shown in Figure 4b. Here a transition on an input x ∈ A with
|
||
weight w ∈ F receives the label x/w, or just x if w = 1. Weights multiply along
|
||
a path, and different possible paths add up to assign a value to a word. Reading
|
||
c from q0 , for example, we move to q1 + 2q2 , which has an output of 1 + 2 ∗ 1 = 3.
|
||
In general, the (sub)sets of generators of a vector space are its subsets that
|
||
span the whole space, and such a set of generators is minimal precisely when
|
||
it forms a basis. The weighted automaton resulting from our algorithm is the
|
||
usual minimal weighted automaton for the language. Redundant elements can
|
||
be found using standard techniques such as Gaussian elimination.
|
||
7. Conclusions
|
||
We have presented a construction to obtain succinct representations of deterministic finite automata as automata with side-effects. This construction is
|
||
very general in that it is based on the abstract characterisation of side-effects
|
||
as monads. Nonetheless, it can be easily implemented. An essential part of our
|
||
construction is the computation of a minimal set of generators for an algebra.
|
||
We have provided an algorithm for this that works for any suitable Set monad.
|
||
18
|
||
|
||
We have applied the construction to several non trivial examples: alternating automata, automata with symmetries, CABA-structured automata, and weighted
|
||
automata.
|
||
Related work. This work revamps and extends results of Arbib and Manes [4],
|
||
as discussed throughout the paper. We note that most of their results are formulated in a more general category, whereas here we work specifically in Set.
|
||
The reason for this is that we focus on the procedure for finding minimal sets
|
||
of generators by removing redundant elements, which are defined using set subtraction (Definition 3.6). This limitation is already present in the work of Arbib and Manes, who spend little time on the subject and only study the nondeterministic case in detail. Our main contribution, the general procedure for
|
||
finding a minimal set of generators, is not present in their work. It generalizes
|
||
several techniques to obtain compact automaton representations of languages,
|
||
some of them presented in the context of learning algorithms [10, 7, 3]. Preliminary results on generalizing succinct automaton constructions within a learning
|
||
algorithm can be found in [23].
|
||
In [17], Myers et al. present a coalgebraic construction of canonical nondeterministic automata. Their specific examples are the átomaton [9], obtained
|
||
from the atoms of the boolean algebra generated by the residual languages (the
|
||
languages accepted by the states of the minimal DFA); the canonical RFSA;
|
||
the minimal xor automaton [24], actually a weighted automaton over field with
|
||
two elements rather than a non-deterministic one; and what they call the distromaton, obtained from the atoms of the distributive lattice generated by the
|
||
residual languages. They further provide specific algorithms for obtaining some
|
||
of their example succinct automata.
|
||
The underlying idea in the work of Myers et al. for finding succinct representations of algebras is similar to ours, and the deterministic structured automata
|
||
they start from are equivalent: in their paper the deterministic automata live
|
||
in a locally finite variety, which translates to the category of algebras for a
|
||
monad that preserves finite sets (such as those in Section 6.1). They also define the succinct automaton using a minimal set of generators for the algebra,
|
||
but instead of our algorithmic approach of getting to this set by removing redundant generators, they use a dual equivalence between finite algebras and a
|
||
suitable modification of the category of sets and relations between them. This
|
||
seems to restrict their work to non-deterministic automata, although there may
|
||
be an easy generalization: the equivalence would be with a modification of a
|
||
Kleisli category. A major difference with our work is that they have no general
|
||
algorithm to construct the succinct automata; as mentioned, specific ones are
|
||
provided for their examples. In fact, they provide no guidelines on how to find
|
||
a suitable equivalence for a given variety. On the other hand, their equivalences
|
||
guarantee uniqueness up to isomorphism of the succinct automata, which is a
|
||
desirable property for many applications.
|
||
The restriction in the work of Myers et al. to locally finite varieties means
|
||
that our example of weighted automata over an infinite field (Section 6.2) cannot
|
||
be captured in their work. Conversely, since both the átomaton and the distromaton are non-deterministic NFAs obtained from categories of algebras with
|
||
more structure than JSLs, these examples are not covered by our work. Their
|
||
other examples, however, the canonical RFSA and the minimal xor automaton,
|
||
are obtained using instances of our method as well. The fact that the problem
|
||
19
|
||
|
||
of finding in general a suitable equivalence is open means it is not trivial to
|
||
determine whether our approach can be seen as a special case of a generalized
|
||
version of theirs when we restrict to monads that preserve finite sets.
|
||
Future work. The main question that remains is under which conditions the
|
||
notion of a minimal set of generators actually describes a size-minimal set of
|
||
generators. Proposition 3.7 provides a partial answer to this question, but its
|
||
conditions fail to apply the majority of our examples, even though in some of
|
||
these cases minimal does mean size-minimal. A related question is whether we
|
||
can find heuristics to increase the state space of a T -automaton in such a way
|
||
that the number of generators decreases. The reason the canonical RFSAs of
|
||
Denis et al. [10] are not always state-minimal NFAs is because the states of
|
||
these NFAs, seen as singletons in the determinized automaton, in general are
|
||
not reachable. Hence, removing unreachable states from a T -automaton may
|
||
increase the size of minimal sets of generators, which is why Construction 4.4
|
||
does not include a reachability analysis. Although finding state-minimal NFAs
|
||
is PSPACE-complete, a moderate gain might still be possible.
|
||
|
||
|
||
arXiv:2007.06327v1 [cs.LO] 13 Jul 2020
|
||
|
||
Generating Functions for Probabilistic
|
||
Programs⋆
|
||
Lutz Klinkenberg1[0000−0002−3812−0572] , Kevin Batz1[0000−0001−8705−2564] ,
|
||
Benjamin Lucien Kaminski1,2[0000−0001−5185−2324] , Joost-Pieter
|
||
Katoen1[0000−0002−6143−1926] , Joshua Moerman1[0000−0001−9819−8374] , and
|
||
Tobias Winkler1[0000−0003−1084−6408]
|
||
1
|
||
|
||
RWTH Aachen University, 52062 Aachen, Germany
|
||
2
|
||
University College London, United Kingdom
|
||
{lutz.klinkenberg, kevin.batz, benjamin.kaminski, katoen, joshua,
|
||
tobias.winkler}@cs.rwth-aachen.de
|
||
|
||
Abstract. This paper investigates the usage of generating functions
|
||
(GFs) encoding measures over the program variables for reasoning about
|
||
discrete probabilistic programs. To that end, we define a denotational
|
||
GF-transformer semantics for probabilistic while-programs, and show
|
||
that it instantiates Kozen’s seminal distribution transformer semantics.
|
||
We then study the effective usage of GFs for program analysis. We show
|
||
that finitely expressible GFs enable checking super-invariants by means
|
||
of computer algebra tools, and that they can be used to determine termination probabilities. The paper concludes by characterizing a class of
|
||
— possibly infinite-state — programs whose semantics is a rational GF
|
||
encoding a discrete phase-type distribution.
|
||
Keywords: probabilistic programs · quantitative verification · semantics · formal power series.
|
||
|
||
1
|
||
|
||
Introduction
|
||
|
||
Probabilistic programs are sequential programs for which coin flipping is a firstclass citizen. They are used e.g. to represent randomized algorithms, probabilistic
|
||
graphical models such as Bayesian networks, cognitive models, or security protocols. Although probabilistic programs are typically rather small, their analysis
|
||
is intricate. For instance, approximating expected values of program variables at
|
||
program termination is as hard as the universal halting problem [15]. Determining higher moments such as variances is even harder. Deductive program verification techniques based on a quantitative version of weakest preconditions [18]
|
||
enable to reason about the outcomes of probabilistic programs, such as what is
|
||
the probability that a program variable equals a certain value. Dedicated analysis techniques have been developed to e.g., determine tail bounds [6], decide
|
||
almost-sure termination [19,8], or to compare programs [1].
|
||
⋆
|
||
|
||
This research was funded by the ERC AdG project FRAPPANT (787914) and the
|
||
DFG RTG 2236 UnRAVeL.
|
||
|
||
2
|
||
|
||
L. Klinkenberg et al.
|
||
|
||
This paper aims at exploiting the well-tried potential of probability generating
|
||
functions (PGFs [14]) for the analysis of probabilistic programs. In our setting,
|
||
PGFs are power series representations — generating functions — encoding discrete probability mass functions of joint distributions over program variables.
|
||
PGF representations — in particular if finite — enable a simple extraction of
|
||
important information from the encoded distributions such as expected values,
|
||
higher moments, termination probabilities or stochastic independence of program variables.
|
||
To enable the usage of PGFs for program analysis, we define a denotational semantics of a simple probabilistic while-language akin to probabilistic
|
||
GCL [18]. Our semantics is defined in a forward manner: given an input distribution over program variables as a PGF, it yields a PGF representing the resulting subdistribution. The “missing” probability mass represents the probability
|
||
of non-termination. More accurately, our denotational semantics transforms formal power series (FPS). Those form a richer class than PGFs, which allows for
|
||
overapproximations of probability distributions. While-loops are given semantics
|
||
as least fixed points of FPS transformers. It is shown that our semantics is in
|
||
fact an instantiation of Kozen’s seminal distribution-transformer semantics [16].
|
||
The semantics provides a sound basis for program analysis using PGFs. Using
|
||
Park’s Lemma, we obtain a simple technique to prove whether a given FPS overapproximates a program’s semantics i.e., whether an FPS is a so-called superinvariant. Such upper bounds can be quite useful: for almost-surely terminating
|
||
programs, such bounds can provide exact program semantics, whereas, if the
|
||
mass of an overapproximation is strictly less than one, the program is provably
|
||
non-almost-surely terminating. This result is illustrated on a non-trivial random
|
||
walk and on examples illustrating that checking whether an FPS is a superinvariant can be automated using computer algebra tools.
|
||
In addition, we characterize a class of — possibly infinite-state — programs
|
||
whose PGF semantics is a rational function. These homogeneous bounded programs (HB programs) are characterized by loops in which each unbounded variable has no effect on the loop guard and is in each loop iteration incremented by
|
||
a quantity independent of its own value. Operationally speaking, HB programs
|
||
can be considered as finite-state Markov chains with rewards in which rewards
|
||
can grow unboundedly large. It is shown that the rational PGF of any program that is equivalent to an almost-surely terminating HB program represents
|
||
a multi-variate discrete phase-type distribution [22]. We illustrate this result by
|
||
obtaining a closed-form characterization for the well-studied infinite-state dueling cowboys example [18].
|
||
Related work. Semantics of probabilistic programs is a well-studied topic. This
|
||
includes the seminal works by Kozen [16] and McIver and Morgan [18]. Other
|
||
related semantics of discrete probabilistic while-programs are e.g., given in several other articles like [18,24,10,23,4]. PGFs have recent scant attention in the
|
||
analysis of probabilistic programs. A notable exception is [5] in which generating
|
||
functions of finite Markov chains are obtained by Padé approximation. Computer
|
||
|
||
Generating Functions for Probabilistic Programs
|
||
|
||
3
|
||
|
||
algebra systems have been used to transform probabilistic programs [7], and more
|
||
recently in the automated generation of moment-based loop invariants [2].
|
||
Organization of this paper. After recapping FPSs and PGFs in Sections 2–3,
|
||
we define our FPS transformer semantics in Section 4, discuss some elementary properties and show it instantiates Kozen’s distribution transformer semantics [16]. Section 5 presents our approach for verifying upper bounds to
|
||
loop invariants and illustrates this by various non-trivial examples. In addition,
|
||
it characterizes programs that are representable as finite-state Markov chains
|
||
equipped with rewards and presents the relation to discrete phase-type distributions. Section 6 concludes the paper. All proofs can be found in the appendix.
|
||
|
||
2
|
||
|
||
Formal Power Series
|
||
|
||
Our goal is to make the potential of probability generating functions available
|
||
to the formal verification of probabilistic programs. The programs we consider
|
||
will, without loss of generality, operate on a fixed set of k program variables.
|
||
The valuations of those variables range over N. A program state σ is hence a
|
||
vector in Nk . We denote the state (0, . . . , 0) by 0.
|
||
A prerequisite for understanding probability generating functions are (multivariate) formal power series — a special way of representing a potentially infinite
|
||
k-dimensional array. For k=1, this amounts to representing a sequence.
|
||
Definition 1 (Formal Power Series). Let X = X1 , . . . , Xk be a fixed sequence of k distinct formal indeterminates. For a state σ = (σ1 , . . . , σk ) ∈ Nk ,
|
||
let Xσ abbreviate the formal multiplication X1σ1 · · · Xkσk . The latter object is
|
||
called a monomial and we denote the set of all monomials over X by Mon (X).
|
||
A (multivariate) formal power series (FPS) is a formal sum
|
||
X
|
||
[σ]F · Xσ ,
|
||
where
|
||
[ · ]F : N k → R ∞
|
||
F =
|
||
≥0 ,
|
||
σ∈Nk
|
||
|
||
R∞
|
||
≥0
|
||
|
||
where
|
||
denotes the extended positive real line. We denote the set of all FPSs
|
||
by FPS. Let F, G ∈ FPS. If [σ]F < ∞ for all σ ∈ Nk , we denote this fact
|
||
by F ≪ ∞. The addition F + G and scaling r · F by a scalar r ∈ R∞
|
||
≥0 is
|
||
defined coefficient-wise by
|
||
X
|
||
X
|
||
|
||
F +G =
|
||
[σ]F + [σ]G · Xσ
|
||
r · [σ]F · Xσ .
|
||
and
|
||
r·F =
|
||
σ∈Nk
|
||
|
||
σ∈Nk
|
||
|
||
For states σ = (σ1 , . . . , σk ) and τ = (τ1 , . . . , τk ), we define σ + τ = (σ1 +
|
||
τ1 , . . . , σk + τk ). The multiplication F · G is given as their Cauchy product (or
|
||
discrete convolution)
|
||
X
|
||
[σ]F · [τ ]G · Xσ+τ .
|
||
F ·G =
|
||
σ,τ ∈Nk
|
||
|
||
Drawing coefficients from the extended reals enables us to define a complete lattice on FPSs in Section 4. Our analyses in Section 5 will, however, only consider
|
||
FPSs with F ≪ ∞.
|
||
|
||
4
|
||
|
||
3
|
||
|
||
L. Klinkenberg et al.
|
||
|
||
Generating Functions
|
||
A generating function is a device somewhat similar to a bag. Instead of
|
||
carrying many little objects detachedly, which could be embarrassing, we
|
||
put them all in a bag, and then we have only one object to carry, the bag.
|
||
— George Pólya [25]
|
||
|
||
Formal power series pose merely a particular way of encoding an infinite kdimensional array as yet another infinitary object, but we still carry all objects
|
||
forming the array (the coefficients of the FPS) detachedly and there seems to
|
||
be no advantage in this particular encoding. It even seems more bulky. We will
|
||
now, however, see that this bulky encoding can be turned into a one-object bag
|
||
carrying all our objects: the generating function.
|
||
Definition 2 (Generating
|
||
Functions). The generating function of a formal
|
||
P
|
||
power series F = σ∈Nk [σ]F · Xσ ∈ FPS with F ≪ ∞ is defined as the partial
|
||
function
|
||
f:
|
||
|
||
[0, 1]k 99K R≥0 ,
|
||
|
||
(x1 , . . . , xk ) 7→
|
||
|
||
X
|
||
|
||
[σ]F · xσ1 1 · · · xσk k .
|
||
|
||
σ=(σ1 ,...,σk )∈Nk
|
||
|
||
In other words: in order to turn an FPS into its generating function, we merely
|
||
treat every formal indeterminate Xi as an “actual” indeterminate xi , and the
|
||
formal multiplications and the formal sum also as “actual” ones. The generating
|
||
function f of F is uniquely determined by F as we require all coefficients of
|
||
F to be non-negative, and so the ordering of the summands is irrelevant: For
|
||
a given point x ∈ [0, 1]k , the sum defining f (x) either converges absolutely to
|
||
some positive real or diverges absolutely to ∞. In the latter case, f is undefined
|
||
at x and hence f may indeed be partial.
|
||
Since generating functions stem from formal power series, they are infinitely
|
||
often differentiable at 0 = (0, . . . , 0). Because of that, we can recover F from f
|
||
as the (multivariate) Taylor expansion of f at 0.
|
||
Definition 3 (Multivariate Derivatives and Taylor Expansions). For
|
||
σ = (σ1 , . . . , σk ) ∈ Nk , we write f (σ) for the function f differentiated σ1 times
|
||
in x1 , σ2 times in x2 , and so on. If f is infinitely often differentiable at 0, then
|
||
the Taylor expansion of f at 0 is given by
|
||
X
|
||
|
||
σ∈Nk
|
||
|
||
f (σ) ( 0 )
|
||
· xσ1 · · · xσk k .
|
||
σ1 ! · · · σk ! 1
|
||
|
||
If we replace every indeterminate xi by the formal indeterminate Xi in the
|
||
Taylor expansion of generating function f of F , then we obtain the formal power
|
||
series F . It is in precisely that sense, that f generates F .
|
||
|
||
Generating Functions for Probabilistic Programs
|
||
|
||
5
|
||
|
||
Example 1 (Formal Power Series and Generating Functions). Consider the infinite (1-dimensional) sequence 1/2, 1/4, 1/8, 1/16, . . .. Its (univariate) FPS — the
|
||
entity carrying all coefficients detachedly — is given as
|
||
1
|
||
1
|
||
1
|
||
1
|
||
1
|
||
1
|
||
1 1
|
||
+ X + X2 + X3 + X4 + X5 +
|
||
X6 +
|
||
X7 + . . . .
|
||
2 4
|
||
8
|
||
16
|
||
32
|
||
64
|
||
128
|
||
256
|
||
|
||
(†)
|
||
|
||
On the other hand, its generating function — the bag — is given concisely by
|
||
1
|
||
.
|
||
2−x
|
||
|
||
(♭)
|
||
|
||
Figuratively speaking, (†) is itself the infinite sequence an := 21n , whereas (♭) is a
|
||
bag with the label “infinite sequence an := 21n ”. The fact that (†) generates (♭),
|
||
1
|
||
at 0 being 21 + 41 x + 18 x2 + . . ..
|
||
△
|
||
follows from the Taylor expansion of 2−x
|
||
The potential of generating functions is that manipulations to the functions —
|
||
i.e. to the concise representations — are in a one-to-one correspondence to the
|
||
associated manipulations to FPSs [9]. For instance, if f (x) is the generating
|
||
function of F encoding the sequence a1 , a2 , a3 , . . ., then the function f (x) · x is
|
||
the generating function of F · X which encodes the sequence 0, a1 , a2 , a3 , . . .
|
||
As another example for correspondence between operations on FPSs and
|
||
generating functions, if f (x) and g(x) are the generating functions of F and G,
|
||
respectively, then f (x) + g(x) is the generating function of F + G.
|
||
Example 2 (Manipulating to Generating Functions). Revisiting Example 1, if
|
||
1
|
||
we multiply 2−x
|
||
by x, we change the label on our bag from “infinite sequence
|
||
1
|
||
:= 21n ” and — just by
|
||
:=
|
||
an
|
||
2n ” to “a 0 followed by an infinite sequence an+1
|
||
changing the label — the bag will now contain what it says on its label. Indeed,
|
||
1 4
|
||
x
|
||
at 0 is 0 + 21 x + 41 x2 + 18 x3 + 16
|
||
x + . . . encoding
|
||
the Taylor expansion of 2−x
|
||
the sequence 0, 1/2, 1/4, 1/8, 1/16, . . .
|
||
△
|
||
Due to the close correspondence of FPSs and generating functions [9], we use
|
||
both concepts interchangeably, as is common in most mathematical literature.
|
||
We mostly use FPSs for definitions and semantics, and generating functions in
|
||
calculations and examples.
|
||
Probability Generating Functions. We now use formal power series to represent probability distributions.
|
||
Definition 4 (Probability Subdistribution). A probability subdistribution
|
||
(or simply subdistribution) over Nk is a function
|
||
X
|
||
µ : Nk → [0, 1],
|
||
such that
|
||
|µ| =
|
||
µ(σ) ≤ 1 .
|
||
σ∈Nk
|
||
|
||
We call |µ| the mass of µ. We say that µ is a (full) distribution if |µ| = 1,
|
||
and a proper subdistribution if |µ| < 1. The set of all subdistributions on Nk is
|
||
denoted by D≤ (Nk ) and the set of all full distributions by D(Nk ).
|
||
|
||
6
|
||
|
||
L. Klinkenberg et al.
|
||
|
||
We need subdistributions for capturing non-termination. The “missing” probability mass 1 − |µ| precisely models the probability of non-termination.
|
||
The generating function of a (sub-)distribution is called a probability generating function. Many properties of a distribution µ can be read off from its
|
||
generating function Gµ in a simple way. We demonstrate how to extract a few
|
||
common properties in the following example.
|
||
Example 3 (Geometric Distribution PGF). Recall Example 1. The presented formal power series encodes a geometric distribution µgeo with parameter 1/2 of a
|
||
single variable X. The fact that µgeo is a proper probability distribution, for
|
||
1
|
||
= 1. The expected
|
||
instance, can easily be verified computing Ggeo (1) = 2−1
|
||
1
|
||
′
|
||
△
|
||
value of X is given by Ggeo (1) = (2−1)2 = 1.
|
||
Extracting Common Properties. Important information about probability
|
||
distributions is, for instance, the first and higher moments. In general, the k th
|
||
factorial moment of variable Xi can be extracted from a PGF by computing
|
||
∂kG
|
||
(1, . . . , 1).3 This includes the mass |G| as the 0th moment. The marginal dis∂Xik
|
||
tribution of variable Xi can simply be extracted from G by G(1, . . . , Xi , . . . , 1).
|
||
We also note that PGFs can treat stochastic independence. For instance, for a
|
||
bivariate PGF H we can check for stochastic independence of the variables X
|
||
and Y by checking whether H(X, Y ) = H(X, 1) · H(1, Y ).
|
||
|
||
4
|
||
|
||
FPS Semantics for pGCL
|
||
|
||
In this section, we give denotational semantics to probabilistic programs in terms
|
||
of FPS transformers and establish some elementary properties useful for program
|
||
analysis. We begin by endowing FPSs and PGFs with an order structure:
|
||
Definition 5 (Order on FPS). For all F, G ∈ FPS, let
|
||
F G
|
||
|
||
iff
|
||
|
||
∀ σ ∈ Nk :
|
||
|
||
[σ]G ≤ [σ]F .
|
||
|
||
Lemma 1 (Completeness of on FPS). (FPS, ) is a complete latttice.
|
||
4.1
|
||
|
||
FPS Transformer Semantics
|
||
|
||
Recall that we assume programs to range over exactly k variables with valuations
|
||
in Nk . Our program syntax is similar to Kozen [16] and McIver & Morgan [18].
|
||
Definition 6 (Syntax of pGCL [16,18]). A program P in probabilistic Guarded
|
||
Command Language ( pGCL) adheres to the grammar
|
||
P ::= skip
|
||
|
||
xi := E
|
||
|
||
P;P
|
||
|
||
if(B) {P } else {P }
|
||
3
|
||
|
||
{P } [p] {P }
|
||
|
||
while (B) {P } ,
|
||
|
||
In general, one must take the limit Xi → 1 from below.
|
||
|
||
Generating Functions for Probabilistic Programs
|
||
|
||
7
|
||
|
||
where xi ∈ {x1 , . . . , xk } is a program variable, E is an arithmetic expression over
|
||
program variables, p ∈ [0, 1] is a probability, and B is a predicate (called guard)
|
||
over program variables.
|
||
The FPS semantics of pGCL will be defined in a forward denotational style,
|
||
where the program variables x1 , . . . , xk correspond to the formal indeterminates
|
||
X1 , . . . , Xk of FPSs.
|
||
For handling assignments, if-conditionals and while-loops, we need some
|
||
auxiliary functions on FPSs: For an arithmetic expression E over program variables, we denote by evalσ (E) the evaluation of E in program state σ. For a
|
||
predicate B ⊆ Nk and FPS F , we define the restriction of F to B by
|
||
X
|
||
hF iB :=
|
||
[σ]F · Xσ ,
|
||
σ∈B
|
||
|
||
i.e. hF iB is the FPS obtained from F by setting all coefficients [σ]F where σ 6∈ B
|
||
to 0. Using these prerequisites, our FPS transformer semantics is given as follows:
|
||
Definition 7 (FPS Semantics of pGCL). The semantics JP K : FPS → FPS of
|
||
a loop-free pGCL program P is given according to the upper part of Table 1.
|
||
The unfolding operator ΦB,P for the loop while (B) {P } is defined by
|
||
ΦB,P :
|
||
|
||
(FPS → FPS) → (FPS → FPS),
|
||
|
||
|
||
|
||
ψ 7→ λF . hF i¬B + ψ JP K hF iB .
|
||
|
||
|
||
The partial order (FPS, ) extends to a partial order FPS → FPS, ⊑ on FPS
|
||
transformers by a point-wise lifting of . The least element of this partial order is
|
||
the transformer 0 = λF . 0 mapping any FPS F to the zero series. The semantics
|
||
of while (B) {P } is then given by the least fixed point (with respect to ⊑) of its
|
||
unfolding operator, i.e.
|
||
Jwhile (B) {P }K = lfp ΦB,P .
|
||
|
||
Table 1. FPS transformer semantics of pGCL programs.
|
||
P
|
||
|
||
JP K (F )
|
||
|
||
skip
|
||
xi := E
|
||
|
||
F
|
||
P
|
||
|
||
{P1 } [p] {P2 }
|
||
if (B) {P1 } else {P2 }
|
||
P1 # P2
|
||
while(B){P }
|
||
|
||
evalσ (E)
|
||
|
||
σ∈Nk
|
||
|
||
µσ X1σ1 · · · Xi
|
||
|
||
· · · Xkσk
|
||
|
||
p · JP1 K (F ) + (1 − p) · JP2 K (F )
|
||
|
||
|
||
+ JP2 K hF i¬B
|
||
|
||
JP2 K JP1 K (F )
|
||
|
||
JP1 K hF iB
|
||
|
||
|
||
lfp ΦB,P (F ) ,
|
||
|
||
for
|
||
|
||
|
||
|
||
ΦB,P (ψ) = λF . hF i¬B + ψ JP K hF iB
|
||
|
||
8
|
||
|
||
(( G
|
||
|
||
L. Klinkenberg et al.
|
||
|
||
Example 4. Consider the program P = {x := 0} [1/2] {x := 1}# c := c + 1 and
|
||
the input PGF G = 1, which denotes a point mass on state σ = 0. Using the
|
||
annotation style shown in the left margin, denoting that JP ′ K (G) = G′ , we
|
||
calculate JP K (G) as follows:
|
||
|
||
P′
|
||
(( G′
|
||
|
||
(( 1
|
||
{x := 0} [1/2] {x := 1}#
|
||
((
|
||
|
||
1
|
||
2
|
||
|
||
+
|
||
|
||
X
|
||
2
|
||
|
||
c := c + 1
|
||
(( C2 + CX
|
||
2
|
||
As for the semantics of c := c + 1, see Table 2.
|
||
|
||
△
|
||
|
||
Before we study how our FPS transformers behave on PGFs in particular, we
|
||
now first argue that our FPS semantics is well-defined. While evident for loopfree programs, we appeal to the Kleene Fixed Point Theorem for loops [17],
|
||
which requires ω-continuous functions.
|
||
Theorem 1 (ω-continuity of pGCL Semantics). The semantic functional J · K
|
||
is ω-continuous, i.e. for all programs P ∈ pGCL and all increasing ω-chains
|
||
F1 F2 . . . in FPS,
|
||
|
||
|
||
= sup JP K (Fn ) .
|
||
JP K sup Fn
|
||
n∈N
|
||
|
||
n∈N
|
||
|
||
Theorem 2 (Well-definedness of FPS Semantics). The semantics functional J · K is well-defined, i.e. the semantics of any loop while (B) {P } exists
|
||
uniquely and can be written as
|
||
Jwhile (B) {P }K = lfp ΦB,P = sup ΦnB,P (0) .
|
||
n∈N
|
||
|
||
4.2
|
||
|
||
Healthiness Conditions of FPS Transformers
|
||
|
||
In this section we show basic, yet important, properties which follow from [16].
|
||
For instance, for any input FPS F , the semantics of a program cannot yield as
|
||
output an FPS with a mass larger than |F |, i.e. programs cannot create mass.
|
||
Table 2. Common assignments and their effects on the input PGF F (X, Y ).
|
||
P
|
||
|
||
JP K (F )
|
||
|
||
x := x + k
|
||
x := k · x
|
||
|
||
X k · F (X, Y )
|
||
|
||
x := x + y
|
||
|
||
F (X, XY )
|
||
|
||
F (X k , Y )
|
||
|
||
Generating Functions for Probabilistic Programs
|
||
|
||
9
|
||
|
||
Theorem 3 (Mass Conservation). For every P ∈ pGCL and F ∈ FPS, we
|
||
have JP K (F ) ≤ |F |.
|
||
|
||
A program P is called mass conserving if |JP K (F )| = |F | for all F ∈ FPS. Mass
|
||
conservation has important implications for FPS transformers acting on PGFs:
|
||
given as input a PGF, the semantics of a program yields a PGF.
|
||
|
||
Corollary 1 (PGF Transformers). For every P ∈ pGCL and G ∈ PGF, we
|
||
have JP K (G) ∈ PGF.
|
||
|
||
Restricted to PGF, our semantics hence acts as a subdistribution transformer.
|
||
Output masses may be smaller than input masses. The probability of nontermination of the programs is captured by the “missing” probability mass.
|
||
As observed in [16], semantics of probabilistic programs are fully defined by
|
||
their effects on point masses, thus rendering probabilistic program semantics
|
||
linear. In our setting, this generalizes to linearity of our FPS transformers.
|
||
Definition 8 (Linearity). Let F, G ∈ FPS and r ∈ R∞
|
||
≥0 be a scalar. The function ψ : FPS → FPS is called a linear transformer (or simply linear), if
|
||
ψ(r · F + G) = r · ψ(F ) + ψ(G) .
|
||
Theorem 4 (Linearity of pGCL Semantics). For every program P and
|
||
guard B, the functions h · iB and JP K are linear. Moreover, the unfolding operator ΦB,P maps linear transformers onto linear transformers.
|
||
As a final remark, we can unroll while loops:
|
||
Lemma 2 (Loop Unrolling). For any FPS F ,
|
||
Jwhile (B) {P }K (F ) = hF i¬B + Jwhile (B) {P }K JP K hF iB
|
||
4.3
|
||
|
||
Embedding into Kozen’s Semantics Framework
|
||
|
||
|
||
|
||
.
|
||
|
||
Kozen [16] defines a generic way of giving distribution transformer semantics
|
||
based on an abstract measurable space (X n , M (n) ). Our FPS semantics instantiates his generic semantics. The state space we consider is Nk , so that (Nk , P(Nk ))
|
||
is our measurable space.4 A measure on that space is a countably-additive function µ : P(Nk ) → [0, ∞] with µ(∅) = 0. We denote the set of all measures on our
|
||
space by M. Although, we represent measures by FPSs, the two notions are in
|
||
bijective correspondence τ : FPS → M, given by
|
||
X
|
||
τ (F ) = λS.
|
||
[σ]F .
|
||
σ∈S
|
||
|
||
This map preserves the linear structure and the order .
|
||
Kozen’s syntax [16] is slightly different from pGCL. We compensate for this by
|
||
a translation function T, which maps pGCL programs to Kozen’s. The following
|
||
theorem shows that our semantics agrees with Kozen’s semantics.5
|
||
4
|
||
|
||
5
|
||
|
||
We note that we want each point σ to be measurable, which enforces a discrete
|
||
measurable space.
|
||
Note that Kozen regards a program P itself as a function P : M → M.
|
||
|
||
10
|
||
|
||
L. Klinkenberg et al.
|
||
|
||
Theorem 5. The FPS semantics of pGCL is an instance of Kozen’s semantics,
|
||
i.e. for all pGCL programs P , we have
|
||
τ ◦ JP K = T(P ) ◦ τ .
|
||
Equivalently, the following diagram commutes:
|
||
FPS
|
||
|
||
τ
|
||
|
||
M
|
||
|
||
JP K
|
||
|
||
T(P )
|
||
|
||
FPS
|
||
|
||
M
|
||
|
||
τ
|
||
|
||
For more details about the connection between FPSs and measures, as well as
|
||
more information about the actual translation, see Appendix A.3.
|
||
|
||
5
|
||
|
||
Analysis of Probabilistic Programs
|
||
|
||
Our PGF semantics enables the representation of the effect of a pGCL program on
|
||
a given PGF. As a next step, we investigate to what extent a program analysis
|
||
can exploit such PGF representations. To that end, we consider the overapproximation with loop invariants (Section 5.1) and provide examples showing that
|
||
checking whether an FPS transformer overapproximates a loop can be checked
|
||
with computer algebra tools. In addition, we determine a subclass of pGCL programs whose effect on an arbitrary input state is ensured to be a rational PGF
|
||
encoding a phase-type distribution (Section 5.2).
|
||
5.1
|
||
|
||
Invariant-style Overapproximation of Loops
|
||
|
||
In this section, we seek to overapproximate loop semantics, i.e. for a given loop
|
||
W = while (B) {P }, we want to find a (preferably simple) FPS transformer ψ,
|
||
such that JW K ⊑ ψ, meaning that for any input G, we have JW K (G) ψ(G)
|
||
(cf. Definition 7). Notably, even if G is a PGF, we do not require ψ(G) to be
|
||
one. Instead, ψ(G) can have a mass larger than one. This is fine, because it still
|
||
overapproximates the actual semantics coefficient-wise. Such overapproximations
|
||
immediately carry over to reading off expected values (cf. Section 3), for instance
|
||
∂
|
||
∂X
|
||
|
||
JW K (G) (1)
|
||
|
||
≤
|
||
|
||
∂
|
||
∂X ψ(G)(1)
|
||
|
||
.
|
||
|
||
We use invariant-style reasoning for verifying that a given ψ overapproximates
|
||
the semantics of JW K. For that, we introduce the notion of a superinvariant and
|
||
employ Park’s Lemma, a well-known concept from fixed point theory, to obtain
|
||
a conceptually simple proof rule for verifying overapproximations of while loops.
|
||
Theorem 6 (Superinvariants and Loop Overapproximations). Let ΦB,P
|
||
be the unfolding operator of while(B){P } (cf. Def. 7) and ψ : FPS → FPS. Then
|
||
ΦB,P (ψ) ⊑ ψ
|
||
|
||
implies
|
||
|
||
Jwhile (B) {P }K ⊑ ψ .
|
||
|
||
Generating Functions for Probabilistic Programs
|
||
|
||
11
|
||
|
||
We call a ψ satisfying ΦB,P (ψ) ⊑ ψ a superinvariant. We are interested in linear
|
||
superinvariants, as our semantics is also linear (cf. Theorem 4). Furthermore,
|
||
linearity allows to define ψ solely in terms of its effect on monomials, which
|
||
makes reasoning considerably simpler:
|
||
Corollary 2. Given a function f : Mon (X) → FPS, let the linear extension fˆ
|
||
of f be defined by
|
||
X
|
||
[σ]F f (Xσ ) .
|
||
fˆ: FPS → FPS, F 7→
|
||
σ∈Nk
|
||
|
||
Let ΦB,P be the unfolding operator of while (B) {P }. Then
|
||
∀ σ ∈ Nk :
|
||
|
||
ΦB,P (fˆ)(Xσ ) ⊑ fˆ(Xσ )
|
||
|
||
implies
|
||
|
||
Jwhile (B) {P }K ⊑ fˆ .
|
||
|
||
We call an f satisfying the premise of the above corollary a superinvariantlet. Notice that superinvariantlets and their extensions agree on monomials, i.e.
|
||
f (Xσ ) = fˆ(Xσ ). Let us examine a few examples for superinvariantlet-reasoning.
|
||
Example 5 (Verifying Precise Semantics). In Program 1.1, in each iteration, a
|
||
fair coin flip determines the value of x. Subsequently, c is incremented by 1.
|
||
Consider the following superinvariantlet:
|
||
(
|
||
C
|
||
, if i = 1;
|
||
f (X i C j ) = C j · 2−C
|
||
X i,
|
||
if i 6= 1.
|
||
To verify that f is indeed a superinvariantlet, we have to show that
|
||
|
||
ΦB,P (fˆ)(X i C j ) = X i C j x6=1 + fˆ JP K X i C j x=1
|
||
!
|
||
|
||
⊑ fˆ X i C j
|
||
|
||
For i 6= 1, we get
|
||
ΦB,P (fˆ)(X i C j ) =
|
||
|
||
X iC j
|
||
|
||
|
||
|
||
.
|
||
|
||
x6=1
|
||
|
||
+ fˆ(JP K (0))
|
||
|
||
= X i C j = f (X i C j ) = fˆ(X i C j ) .
|
||
For i = 1, we get
|
||
ΦB,P (fˆ)(X 1 C j ) = fˆ
|
||
=
|
||
=
|
||
|
||
1 0 j+1
|
||
+ 12 X 1 C j+1
|
||
2X C
|
||
|
||
|
||
0 j+1
|
||
1
|
||
+ 21 f X 1 C j+1
|
||
2f X C
|
||
|
||
|
||
1 j
|
||
C j+1
|
||
= fˆ X 1 C j
|
||
2−C = f X C
|
||
|
||
|
||
|
||
(by linearity of fˆ)
|
||
. (by definition of f )
|
||
|
||
C
|
||
.
|
||
Hence, Corollary 2 yields JW K (X) ⊑ f (X) = 2−C
|
||
For this example, we can state even more. As the program is almost surely
|
||
terminating, and f (X i C j ) = 1 for all (i, j) ∈ N2 , we conclude that fˆ is exactly
|
||
the semantics of W , i.e. fˆ = JW K.
|
||
△
|
||
|
||
12
|
||
|
||
L. Klinkenberg et al.
|
||
|
||
while ( x = 1 ) {
|
||
{x := 0} [ 1/2 ] {x := 1}#
|
||
c := c + 1
|
||
}
|
||
Program 1.1. Geometric distribution generator.
|
||
|
||
while ( x > 0 ) {
|
||
{x := x + 1} [ 1/2 ] {x := x - 1}#
|
||
c := c + 1
|
||
}
|
||
Program 1.2. Left-bounded 1-dimensional random walk.
|
||
|
||
Example 6 (Verifying Proper Overapproximations). Program 1.2 models a one
|
||
dimensional, left-bounded random walk. Given an input (i, j) ∈ N2 , it is evident
|
||
that this program can only terminate in an even (if i is even) or odd (if i
|
||
is odd) number of steps. This information can be encoded into the following
|
||
superinvariantlet:
|
||
f (X 0 C j ) = C j
|
||
f (X i+1 C j ) = C j ·
|
||
|
||
and
|
||
(
|
||
|
||
C
|
||
1−C 2 ,
|
||
1
|
||
1−C 2 ,
|
||
|
||
if i is odd;
|
||
if i is even.
|
||
|
||
It is straightforward to verify that f is a proper superinvariantlet (proper because
|
||
C
|
||
3
|
||
5
|
||
1−C 2 = C + C + C + . . . is not a PGF) and hence f properly overapproximates
|
||
the loop semantics. Another superinvariantlet for Program 1.2 is given by
|
||
√
|
||
|
||
1− 1−C 2 i
|
||
, if i ≥ 1;
|
||
i j
|
||
j
|
||
C
|
||
h(X C ) = C ·
|
||
1,
|
||
if i = 0.
|
||
Given that the program terminates almost-surely [11] and that h is a superinvariantlet yielding only PGFs, it follows that the extension of h is exactly the
|
||
semantics of Program 1.2. An alternative derivation of this formula for the case
|
||
h(X) can be found, e.g., in [12].
|
||
For both f and h, we were able to prove that they are indeed superinvariantlets automatically, using the computer algebra library SymPy [20]. The code
|
||
is included in Appendix B (Program 1.5).
|
||
△
|
||
|
||
Example 7 (Proving Non-almost-sure Termination). In Program 1.3, the branching probability of the choice statement depends on the value of a program variable. This notation is just syntactic sugar, as this behavior can be mimicked by
|
||
loop constructs together with coin flips [3, pp. 115f].
|
||
|
||
Generating Functions for Probabilistic Programs
|
||
|
||
13
|
||
|
||
while ( x > 0 ) {
|
||
{x := x - 1} [ 1/x ] {x := x + 1}
|
||
}
|
||
Program 1.3. A non-almost-surely terminating loop.
|
||
|
||
while ( x < 1 and t < 2 ) {
|
||
if ( t = 0 ) {
|
||
{x := 1} [ a ] {t := 1}# c := c + 1
|
||
} else {
|
||
{x := 1} [ b ] {t := 0}# d := d + 1
|
||
}
|
||
}
|
||
Program 1.4. Dueling cowboys.
|
||
|
||
To prove that Program 1.3 does not terminate almost-surely, we consider the
|
||
following superinvariantlet:
|
||
f (X i ) = 1 −
|
||
|
||
i−2
|
||
1 X 1
|
||
·
|
||
,
|
||
e n=0 n!
|
||
|
||
where e = 2.71828 . . . is Euler’s number.
|
||
|
||
Again, the superinvariantlet property was verified automatically, here
|
||
using2Math1
|
||
1
|
||
ematica [13]. Now, consider for instance f (X 3 ) = 1 − 1e · 0!
|
||
+ 1!
|
||
= 1 − e < 1.
|
||
This proves, that the program terminates on X 3 with a probability strictly
|
||
smaller than 1, witnessing that the program is not almost surely terminating. △
|
||
5.2
|
||
|
||
Rational PGFs
|
||
|
||
In several of the examples from the previous sections, we considered PGFs which
|
||
were rational functions, that is, fractions of two polynomials. Since those are a
|
||
particularly simple class of PGFs, it is natural to ask which programs have
|
||
rational semantics. In this section, we present a semantic characterization of a
|
||
class of while-loops whose output distribution is a (multivariate) discrete phasetype dsitribution [21,22]. This implies that the resulting PGF of such programs
|
||
is an effectively computable rational function for any given input state. Let us
|
||
illustrate this by an example.
|
||
Example 8 (Dueling Cowboys). Program 1.4 models two dueling cowboys [18].
|
||
The hit chance of the first cowboy is a percent and the hit chance of the second
|
||
cowboy is b percent, where a, b ∈ [0, 1].6 The cowboys shoot at each other in
|
||
turns, as indicated by the variable t, until one of them gets hit (x is set to 1).
|
||
6
|
||
|
||
These are not program variables.
|
||
|
||
14
|
||
|
||
L. Klinkenberg et al.
|
||
|
||
The variable c counts the number of shots of the first cowboy and d those of the
|
||
second cowboy.
|
||
We observe that Program 1.4 is somewhat independent of the value of c, in
|
||
the sense that moving the statement c := c + 1 to either immediately before
|
||
or after the loop, yields an equivalent program. In our notation, this is expressed
|
||
as JW K (C · H) = C · JW K (H) for all PGFs H. By symmetry, the same applies
|
||
to variable d. Unfolding the loop once on input 1, yields
|
||
JW K (1) = (1 − a)C · JW K (T ) + aCX .
|
||
A similar equation for JW K (T ) involving JW K (1) on its right-hand side holds.
|
||
This way we obtain a system of two linear equations, although the program itself
|
||
is infinite-state. The linear equation system has a unique solution JW K (1) in the
|
||
field of rational functions over the variables C, D, T , and X which is the PGF
|
||
G :=
|
||
|
||
aCX + (1 − a)bCDT X
|
||
.
|
||
1 − (1 − b)(1 − a)CD
|
||
|
||
From G we can easily read off the following: The probability that the first cowboy
|
||
a
|
||
, and the expected total number of
|
||
wins (x = 1 and t = 0) equals 1−(1−a)(1−b)
|
||
1
|
||
∂
|
||
G(1) = a+b−ab
|
||
. Notice that this quantity equals
|
||
shots of the first cowboy is ∂C
|
||
∞ if a and b are both zero, i.e. if both cowboys have zero hit chance.
|
||
If we write GV for the PGF obtained by substituting all but the variables in
|
||
V with 1, then we moreover see that GC · GD 6= GC,D . This means that C and
|
||
D (as random variables) are stochastically dependent.
|
||
△
|
||
|
||
The distribution encoded in the PGF JW K (1) is a discrete phase-type distribution. Such distributions are defined as follows: A Markov reward chain is a
|
||
Markov chain where each state is augmented with a reward vector in Nk . By definition, a (discrete) distribution on Nk is of phase-type iff it is the distribution of
|
||
the total accumulated reward vector until absorption in a Markov reward chain
|
||
with a single absorbing state and a finite number of transient states. In fact,
|
||
Program 1.4 can be described as a Markov reward chain with two states (X 0 T 0
|
||
and X 0 T 1 ) and 2-dimensional reward vectors corresponding to the “counters”
|
||
(c, d): the reward in state X 0 T 0 is (1, 0) and (0, 1) in the other state.
|
||
Each pGCL program describes a Markov reward chain [10]. It is not clear which
|
||
(non-trivial) syntactical restrictions to impose to guarantee for such chains to be
|
||
finite. In the remainder of this section, we give a characterization of while-loops
|
||
that are equivalent to finite Markov reward chains. The idea of our criterion is
|
||
that each variable has to fall into one of the following two categories:
|
||
Definition 9 (Homogeneous and Bounded Variables). Let P ∈ pGCL be a
|
||
program, B be a guard and xi be a program variable. Then:
|
||
– xi is called homogeneous for P if JP K (Xi ·G) = Xi ·JP K (G) for all G ∈ PGF.
|
||
– xi is called bounded by B if the set {σi | σ ∈ B} is finite.
|
||
|
||
Generating Functions for Probabilistic Programs
|
||
|
||
15
|
||
|
||
Intuitively, homogeneity of xi means that it does not matter whether one increments the variable before or after the execution of P . Thus, a homogeneous
|
||
variable behaves like an increment-only counter even if this may not be explicit
|
||
in the syntax. In Example 8, the variables c and d in Program 1.4 are homogeneous (for both the loop-body and the loop itself). Moreover, x and t are clearly
|
||
bounded by the loop guard. We can now state our characterization.
|
||
Definition 10 (HB Loops). A loop while (B) {P } is called homogeneousbounded (HB) if for all program states σ ∈ B, the PGF JP K (Xσ ) is a polynomial
|
||
and for all program variables x it either holds that
|
||
– x is homogeneous for P and the guard B is independent of x, or that
|
||
– x is bounded by the guard B.
|
||
In an HB loop, all the possible valuations of the bounded variables satisfying B
|
||
span the finite transient state space of a Markov reward chain in which the
|
||
dimension of the reward vectors equals the number of homogeneous variables.
|
||
The additional condition that JP K (Xσ ) is a polynomial ensures that there is
|
||
only a finite amount of terminal (absorbing) states. Thus, we have the following:
|
||
Proposition 1. Let W be a while-loop. Then JW K (Xσ ) is the (rational) PGF
|
||
of a multivariate discrete phase-type distribution if and only if W is equivalent
|
||
to an HB loop that almost-surely terminates on input σ.
|
||
To conclude, we remark that there are various simple syntactic conditions for
|
||
HB loops: For example, if P is loop-free, then JP K (Xσ ) is always a polynomial.
|
||
Similarly, if x only appears in assignments of the form x := x + k, k ≥ 0, then
|
||
x is homogeneous. Such updates of variables are e.g. essential in constant probability programs [?]. The crucial point is that such conditions are only sufficient
|
||
but not necessary. Our semantic conditions thus capture the essence of phasetype distribution semantics more adequately while still being reasonably simple
|
||
(albeit — being non-trivial semantic properties — undecidable in general).
|
||
|
||
6
|
||
|
||
Conclusion
|
||
|
||
We have presented a denotational distribution transformer semantics for probabilistic while-programs where the denotations are generation functions (GFs).
|
||
Moreover, we have provided a simple invariant-style technique to prove that
|
||
a given GF overapproximates the program’s semantics and identified a class
|
||
of (possibly infinite-state) programs whose semantics is a rational GF ecoding a
|
||
phase-type distribution. Directions for future work include the (semi-)automated
|
||
synthesis of invariants and the development of notions on how precise overapproximations by invariants actually are.
|
||
|
||
|
||
|
||
A
|
||
|
||
Proofs of Section 4
|
||
|
||
A.1
|
||
|
||
Proofs of Section 4.1
|
||
|
||
Lemma 1 (Completeness of on FPS). (FPS, ) is a complete latttice.
|
||
Proof. We start by showing that (FPS, ) is a partial order. Let F, G, H ∈ FPS,
|
||
σ ∈ Nk . For reflexivity, consider the following:
|
||
G G
|
||
|
||
∀σ ∈ Nk : [σ]G ≤ [σ]G
|
||
|
||
iff
|
||
iff
|
||
|
||
true .
|
||
|
||
For antisymmetry, consider the following:
|
||
|
||
implies
|
||
implies
|
||
implies
|
||
|
||
G H and H G
|
||
|
||
∀σ ∈ Nk : [σ]G ≤ [σ]H
|
||
|
||
and
|
||
|
||
[σ]H ≤ [σ]G
|
||
|
||
and
|
||
|
||
[σ]H ≤ [σ]F
|
||
|
||
k
|
||
|
||
∀σ ∈ N : [σ]G = [σ]H
|
||
G = H .
|
||
|
||
For transitivity, consider the following:
|
||
|
||
implies
|
||
implies
|
||
implies
|
||
|
||
G H
|
||
k
|
||
|
||
and H F
|
||
|
||
∀σ ∈ N : [σ]G ≤ [σ]H
|
||
k
|
||
|
||
∀σ ∈ N : [σ]G ≤ [σ]F
|
||
G F .
|
||
|
||
Next, we show that every set S ⊆ FPS has a supremum
|
||
X
|
||
sup [σ]F Xσ
|
||
sup S =
|
||
σ∈Nk
|
||
|
||
F ∈S
|
||
|
||
P
|
||
in FPS. In particular, notice that sup ∅ = σ∈Nk 0 · Xσ . The fact that sup S ∈
|
||
k
|
||
FPS is trivial since supF ∈S [σ]F ∈ R∞
|
||
≥0 for every σ ∈ N . Furthermore, the fact
|
||
that sup S is an upper bound on S is immediate since is defined coefficientwise. Finally, sup S is also the least upper bound, since, by definition of , we
|
||
have [σ]sup S = supF ∈S [σ]F .
|
||
The following proofs rely on the Monotone Sequence Theorem (MST), which
|
||
we recall here: If (an )n∈N is a monotonically increasing sequence in R∞
|
||
≥0 , then
|
||
supn an = limn→∞ an . In particular, if (an )n∈N and (bn )n∈N are monotonically
|
||
increasing sequences in R∞
|
||
≥0 , then
|
||
sup an + sup bn =
|
||
n
|
||
|
||
n
|
||
|
||
lim an + lim bn =
|
||
|
||
n→∞
|
||
|
||
n→∞
|
||
|
||
lim an + bn = sup an + bn .
|
||
|
||
n→∞
|
||
|
||
n
|
||
|
||
Generating Functions for Probabilistic Programs
|
||
|
||
19
|
||
|
||
Theorem 1 (ω-continuity of pGCL Semantics). The semantic functional J · K
|
||
is ω-continuous, i.e. for all programs P ∈ pGCL and all increasing ω-chains
|
||
F1 F2 . . . in FPS,
|
||
JP K
|
||
|
||
|
||
|
||
sup Fn
|
||
n∈N
|
||
|
||
|
||
|
||
= sup JP K (Fn ) .
|
||
n∈N
|
||
|
||
Proof. By induction on the structure of P . Let S = {F1 , F2 , . . .} be an increasing
|
||
ω-chain in FPS. First, we consider the base cases.
|
||
The case P = skip. We have
|
||
JP K (sup S) = sup S = sup {F } = sup {JP K (F )} .
|
||
F ∈S
|
||
|
||
F ∈S
|
||
|
||
P
|
||
The case P = xi := E. Let sup S = Ĝ = σ∈Nk [σ]Ĝ · X σ , where for each σ ∈ Nk
|
||
we have [σ]Ĝ = supF ∈S [σ]F . We calculate
|
||
JP K (sup S)
|
||
|
||
= JP K Ĝ
|
||
|
||
|
||
X
|
||
= JP K
|
||
[σ]Ĝ · X σ
|
||
σ∈Nk
|
||
|
||
|
||
|
||
|
||
|
||
= JP K
|
||
|
||
X
|
||
|
||
[σ]Ĝ · X1σ1 · · · Xiσi · · · Xkσk
|
||
|
||
= JP K
|
||
|
||
X
|
||
|
||
[σ]Ĝ · X1σ1 · · · Xiσi · · · Xkσk
|
||
|
||
σ∈Nk
|
||
|
||
|
||
|
||
σ∈Nk
|
||
|
||
=
|
||
|
||
X
|
||
|
||
σ∈Nk
|
||
|
||
=
|
||
|
||
= sup
|
||
F ∈S
|
||
|
||
sup [σ]F
|
||
|
||
F ∈S
|
||
|
||
X
|
||
|
||
· · · Xkσk
|
||
|
||
evalσ (E)
|
||
|
||
· X1σ1 · · · Xi
|
||
|
||
evalσ (E)
|
||
|
||
σ∈Nk
|
||
|
||
|
||
|
||
X
|
||
|
||
σ∈Nk
|
||
|
||
= sup JP K (F )
|
||
F ∈S
|
||
|
||
|
||
|
||
[σ]F · X1σ1 · · · Xi
|
||
|
||
= sup JP K
|
||
F ∈S
|
||
|
||
evalσ (E)
|
||
|
||
[σ]Ĝ · X1σ1 · · · Xi
|
||
|
||
X
|
||
|
||
σ∈Nk
|
||
|
||
|
||
|
||
· · · Xkσk
|
||
|
||
· · · Xkσk
|
||
|
||
(sup on FPS is defined coefficient–wise)
|
||
|
||
|
||
[σ]F · X1σ1 · · · Xiσi · · · Xkσk
|
||
|
||
20
|
||
|
||
L. Klinkenberg et al.
|
||
|
||
As the induction hypothesis now assume that for some arbitrary, but fixed,
|
||
programs P1 , P2 and all increasing ω-chains S1 , S2 in FPS it holds that both
|
||
JP1 K (sup S1 ) =
|
||
|
||
sup JP1 K (F )
|
||
|
||
and
|
||
|
||
F ∈S1
|
||
|
||
JP2 K (sup S2 ) =
|
||
|
||
sup JP2 K (F ) .
|
||
|
||
F ∈S2
|
||
|
||
We continue with the induction step.
|
||
The case P = {P1 } [p] {P2 }. We have
|
||
JP K (sup S)
|
||
= p · JP1 K (sup S) + (1 − p) · JP2 K (sup S)
|
||
|
||
|
||
|
||
|
||
= p · sup JP1 K (F ) + (1 − p) · sup JP2 K (F )
|
||
F ∈S
|
||
F ∈S
|
||
|
||
|
||
|
||
=
|
||
sup p · JP1 K (F ) + sup (1 − p) · JP2 K (F )
|
||
F ∈S
|
||
|
||
(I.H. on P1 and P2 )
|
||
|
||
F ∈S
|
||
|
||
(scalar multiplication is defined point–wise)
|
||
|
||
= sup (p · JP1 K (F ) + (1 − p) · JP2 K (F ))
|
||
|
||
(apply MST coefficient–wise.)
|
||
|
||
F ∈S
|
||
|
||
= sup J{P1 } [p] {P2 }K (F )
|
||
F ∈S
|
||
|
||
= sup JP K (F ) .
|
||
F ∈S
|
||
|
||
The case P = if (B) {P1 } else {P2 }. We have
|
||
JP K (sup S)
|
||
= JP1 K (hsup SiB ) + JP2 K (hsup Si¬B )
|
||
|
||
|
||
|
||
|
||
|
||
|
||
= JP1 K sup hF iB
|
||
+ JP2 K sup hF i¬B
|
||
F ∈S
|
||
|
||
F ∈S
|
||
|
||
(restriction defined coefficient–wise)
|
||
|
||
= sup JP1 K (hF iB ) + sup JP2 K (hF i¬B )
|
||
F ∈S
|
||
|
||
F ∈S
|
||
|
||
= sup (JP1 K (hF iB ) + JP2 K (hF i¬B ))
|
||
|
||
(I.H. on P1 and P2 )
|
||
(apply MST coefficient–wise)
|
||
|
||
F ∈S
|
||
|
||
= sup Jif (B) {P1 } else {P2 }K (F )
|
||
F ∈S
|
||
|
||
= sup JP K (F ) .
|
||
F ∈S
|
||
|
||
The case P = while(B){P1 }. Recall that for every G ∈ FPS,
|
||
JP K (G) = (lfp ΦB,P1 ) (G)
|
||
|
||
= sup ΦnB,P1 (0) (G) .
|
||
n∈N
|
||
|
||
Hence, it suffices to show that
|
||
|
||
|
||
|
||
|
||
|
||
n
|
||
n
|
||
sup ΦB,P1 (0) (sup S) = sup
|
||
sup ΦB,P1 (0) (F ) .
|
||
n∈N
|
||
|
||
F ∈S
|
||
|
||
n∈N
|
||
|
||
Generating Functions for Probabilistic Programs
|
||
|
||
21
|
||
|
||
Assume for the moment that for every n ∈ N and all increasing ω-chains S in
|
||
FPS,
|
||
|
||
|
||
ΦnB,P1 (0) (sup S) = sup ΦnB,P1 (0) (F ) .
|
||
(1)
|
||
F ∈S
|
||
|
||
We then have
|
||
|
||
|
||
sup ΦnB,P1 (0) (sup S)
|
||
n∈N
|
||
|
||
= sup ΦnB,P1 (0) (sup S)
|
||
n∈N
|
||
|
||
= sup sup ΦnB,P1 (0) (F )
|
||
n∈N F ∈S
|
||
|
||
= sup sup ΦnB,P1 (0) (F )
|
||
F ∈S n∈N
|
||
|
||
|
||
|
||
|
||
= sup
|
||
,
|
||
sup ΦnB,P1 (0) (F )
|
||
F ∈S
|
||
|
||
(sup for ΦB,P1 is defined point–wise)
|
||
(Equation 1)
|
||
(swap suprema)
|
||
(sup for ΦB,P1 is defined point–wise)
|
||
|
||
n∈N
|
||
|
||
which is what we have to show. It remains to prove Equation 1 by induction on n.
|
||
Base case n = 0. We have
|
||
|
||
|
||
Φ0B,P1 (0) (sup S) = sup S = sup F = sup Φ0B,P1 (0) (F ) .
|
||
F ∈S
|
||
|
||
F ∈S
|
||
|
||
Induction step. We have
|
||
|
||
|
||
Φn+1
|
||
B,P1 (0) (sup S)
|
||
|
||
= ΦB,P1 ΦnB,P1 (0) (sup S)
|
||
|
||
= hsup Si¬B + ΦnB,P1 (0) (JP1 K (hsup SiB ))
|
||
|
||
|
||
= hsup Si¬B + ΦnB,P1 (0) sup JP1 K (hF iB )
|
||
F ∈S
|
||
|
||
= hsup Si¬B + sup ΦnB,P1 (0) (JP1 K (hF iB ))
|
||
F ∈S
|
||
|
||
= sup hF i¬B + ΦnB,P1 (0) (JP1 K (hF iB ))
|
||
F ∈S
|
||
|
||
|
||
= sup Φn+1
|
||
B,P1 (0) (F ) .
|
||
F ∈S
|
||
|
||
(Def. of ΦB,P1 )
|
||
(I.H. on P1 )
|
||
(I.H. on n)
|
||
(apply MST)
|
||
(Def. of ΦB,P1 )
|
||
|
||
This completes the proof.
|
||
Theorem 2 (Well-definedness of FPS Semantics). The semantics functional J · K is well-defined, i.e. the semantics of any loop while (B) {P } exists
|
||
uniquely and can be written as
|
||
Jwhile (B) {P }K = lfp ΦB,P = sup ΦnB,P (0) .
|
||
n∈N
|
||
|
||
22
|
||
|
||
L. Klinkenberg et al.
|
||
|
||
Proof. First, we show that the unfolding operator ΦB,P is ω-continuous. For
|
||
that, let f1 ⊑ f2 ⊑ . . . be an ω-chain in FPS → FPS. Then,
|
||
ΦB,P
|
||
|
||
|
||
|
||
sup{fn }
|
||
|
||
n∈N
|
||
|
||
|
||
|
||
|
||
|
||
= λG. hGi¬B + sup{fn } (JP K (hGiB ))
|
||
n∈N
|
||
|
||
= λG. hGi¬B + sup{fn (JP K (hGiB ))}
|
||
n∈N
|
||
|
||
(sup on FPS → FPS is defined point–wise)
|
||
|
||
= sup {λG. hGi¬B + fn (JP K (hGiB ))}
|
||
n∈N
|
||
|
||
(apply monotone sequence theorem coefficient-wise)
|
||
= sup {ΦB,P (fn )} .
|
||
|
||
(Def. of ΦB,P )
|
||
|
||
n∈N
|
||
|
||
Since ΦB,P is ω-continuous and (FPS → FPS, ⊑) forms a complete lattice (Lemma 1),
|
||
we get by the Kleene fixed point Theorem [17] that ΦB,P has a unique least fixed
|
||
point given by supn∈N ΦnB,P (0).
|
||
Theorem 3 (Mass Conservation). For every P ∈ pGCL and F ∈ FPS, we
|
||
have JP K (F ) ≤ |F |.
|
||
Proof. By induction on the structure of P . For the loop–free cases, this is
|
||
straightforward. For the case P = while(B){P1 }, we proceed as follows. For
|
||
every r ∈ R∞
|
||
≥0 , we define the set
|
||
FPSr = {F ∈ FPS | |F | ≤ r}
|
||
of all FPSs whose mass is at most r. First, we define the restricted unfolding
|
||
operator
|
||
ΦB,P1 ,r : (FPSr → FPSr ) → (FPSr → FPSr ),
|
||
|
||
ψ 7→ ΦB,P1 (ψ) .
|
||
|
||
Our induction hypothesis on P1 implies that ΦB,P1 ,r is well–defined.
|
||
It is now only left to show that (FPSr , ) is an ω-complete partial order,
|
||
because then ΦB,P1 ,r has a least fixed point in FPSr for every r ∈ R∞
|
||
≥0 . The
|
||
theorem then follows by letting r = |G|, because
|
||
(lfp ΦB,P1 ) (G) =
|
||
|
||
|
||
lfp ΦB,P1 ,|G| (G)
|
||
|
||
implies
|
||
|
||
|(lfp ΦB,P1 ) (G)| ≤ |G| .
|
||
|
||
(FPSr , ) is an ω-complete partial order. The fact that (FPSr , ) is a partial
|
||
order is immediate. It remains to show ω-completeness. For that, let f1 f2
|
||
. . . be an ω-chain in FPSr . We have to show that supn Fn ∈ FPSr , which is the
|
||
case if and only if
|
||
X
|
||
sup fn =
|
||
sup [σ]fn ≤ r .
|
||
n
|
||
|
||
σ∈Nk
|
||
|
||
n
|
||
|
||
Generating Functions for Probabilistic Programs
|
||
|
||
23
|
||
|
||
Now let g : N → Nk be some bijection from N to Nk . We have
|
||
X
|
||
|
||
=
|
||
|
||
sup [σ]fn
|
||
|
||
σ∈Nk
|
||
∞
|
||
X
|
||
|
||
n
|
||
|
||
(series converges absolutely)
|
||
|
||
sup [g(i)]fn
|
||
|
||
i=0
|
||
|
||
= sup
|
||
N
|
||
|
||
n
|
||
|
||
N
|
||
X
|
||
i=0
|
||
|
||
= sup sup
|
||
N
|
||
|
||
n
|
||
|
||
= sup sup
|
||
n
|
||
|
||
N
|
||
|
||
sup [g(i)]fn (rewrite infinite series as supremum of partial sums)
|
||
n
|
||
|
||
N
|
||
X
|
||
|
||
[g(i)]fn
|
||
|
||
(apply monotone sequence theorem)
|
||
|
||
[g(i)]fn
|
||
|
||
(swap suprema)
|
||
|
||
i=0
|
||
|
||
N
|
||
X
|
||
i=0
|
||
|
||
P
|
||
Now observe that supN N
|
||
i=0 [g(i)]fn = |fn |, which is a monotonically increasing
|
||
sequence in n. Moreover, since fn ∈ FPSr , this sequence is bounded from above
|
||
by r. Hence, the least upper bound supn |fn | of the sequence |fn | is no larger
|
||
than r, too. This completes the proof.
|
||
|
||
A.2
|
||
|
||
Proofs of Section 4.2
|
||
|
||
Lemma 3 (Representation of JwhileK). Let W = while (B) {P } be a pGCL
|
||
program. An alternative representation is:
|
||
|
||
JW K = λG.
|
||
|
||
∞
|
||
X
|
||
i=0
|
||
|
||
ϕi (G)
|
||
|
||
¬B
|
||
|
||
,
|
||
|
||
where ϕ(G) = JP K (hGiB ).
|
||
|
||
Proof. First we show by induction, that ΦnB,P (0)(G) =
|
||
Base case. We have
|
||
|
||
Φ0B,P (0)(G) = 0 =
|
||
|
||
−1
|
||
X
|
||
i=0
|
||
|
||
ϕi (G)
|
||
|
||
¬B
|
||
|
||
Pn−1
|
||
i=0
|
||
|
||
.
|
||
|
||
ϕi (G)
|
||
|
||
¬B
|
||
|
||
.
|
||
|
||
24
|
||
|
||
L. Klinkenberg et al.
|
||
|
||
Induction step. We have
|
||
|
||
n
|
||
Φn+1
|
||
B,P (0)(G) = ΦB,P ΦB,P (0)(G)
|
||
= hGi¬B + ΦnB,P (0)(JP K hGiB )
|
||
= hGi¬B + ΦnB,P (0)(ϕ(G))
|
||
n−1
|
||
X
|
||
|
||
= hGi¬B +
|
||
|
||
i=0
|
||
n
|
||
X
|
||
|
||
= hGi¬B +
|
||
=
|
||
|
||
n
|
||
X
|
||
|
||
ϕi (G)
|
||
|
||
JW K (G)
|
||
|
||
= sup ΦnB,P (0) (G)
|
||
n∈N
|
||
|
||
= sup ΦnB,P (0)(G)
|
||
n∈N
|
||
)
|
||
( n
|
||
X
|
||
i
|
||
= sup
|
||
ϕ (G) ¬B
|
||
n∈N
|
||
|
||
=
|
||
|
||
∞
|
||
X
|
||
i=0
|
||
|
||
ϕi
|
||
|
||
i=1
|
||
|
||
i=0
|
||
|
||
Overall, we thus get
|
||
|
||
ϕi+1
|
||
|
||
¬B
|
||
|
||
¬B
|
||
|
||
¬B
|
||
|
||
.
|
||
|
||
(sup on FPS → FPS is defined point–wise)
|
||
(see above)
|
||
|
||
i=0
|
||
|
||
ϕi (G)
|
||
|
||
¬B
|
||
|
||
Theorem 4 (Linearity of pGCL Semantics). For every program P and guard B,
|
||
the functions h · iB and JP K are linear. Moreover, the unfolding operator ΦB,P
|
||
maps linear transformers onto linear transformers.
|
||
Proof. Linearity of h·iB . We have
|
||
*
|
||
ha · G + F iB =
|
||
=
|
||
=
|
||
|
||
a·
|
||
|
||
*
|
||
|
||
σ∈Nk
|
||
|
||
X
|
||
|
||
X
|
||
|
||
σ∈B
|
||
|
||
=a·
|
||
|
||
µσ X σ +
|
||
|
||
(a · µσ + νσ ) X σ
|
||
|
||
(a · µσ + νσ )X
|
||
a · µσ X σ
|
||
|
||
X
|
||
|
||
X
|
||
|
||
µσ X
|
||
|
||
νσ X σ
|
||
|
||
σ∈Nk
|
||
|
||
σ∈Nk
|
||
|
||
X
|
||
|
||
σ∈B
|
||
|
||
=
|
||
|
||
X
|
||
|
||
σ
|
||
|
||
+
|
||
|
||
+
|
||
|
||
+
|
||
|
||
B
|
||
|
||
σ
|
||
|
||
X
|
||
|
||
νσ X σ
|
||
|
||
σ∈B
|
||
|
||
+
|
||
|
||
σ∈B
|
||
|
||
= a · hGiB + hF iB
|
||
|
||
X
|
||
|
||
σ∈B
|
||
|
||
νσ X σ
|
||
|
||
B
|
||
|
||
Generating Functions for Probabilistic Programs
|
||
|
||
25
|
||
|
||
Linearity of JP K. By induction on the structure of P . First, we consider the
|
||
base cases.
|
||
The case P = skip. We have
|
||
JskipK (r · F + G) = r · F + G = r · JskipK (F ) + JskipK (G)
|
||
The case P = xi := E
|
||
JXi := EK (r · F + G)
|
||
X
|
||
eval (E)
|
||
· · · Xkσk
|
||
[σ]r·F +G X1σ1 · · · Xi σ
|
||
=
|
||
σ∈Nk
|
||
|
||
=
|
||
|
||
X
|
||
|
||
σ∈Nk
|
||
|
||
= r·
|
||
|
||
= r·
|
||
|
||
evalσ (E)
|
||
|
||
· · · Xkσk
|
||
|
||
evalσ (E)
|
||
|
||
· · · Xkσk
|
||
|
||
|
||
|
||
eval (E)
|
||
· · · Xi σ
|
||
|
||
· · · Xkσk
|
||
|
||
|
||
|
||
(r · [σ]F + [σ]G ) · X1σ1 · · · Xi
|
||
|
||
X
|
||
|
||
[σ]F · X1σ1 · · · Xi
|
||
|
||
X
|
||
|
||
X1σ1
|
||
|
||
σ∈Nk
|
||
|
||
σ∈Nk
|
||
|
||
+
|
||
|
||
[σ]F ·
|
||
|
||
X
|
||
|
||
σ∈Nk
|
||
|
||
evalσ (E)
|
||
|
||
[σ]G · X1σ1 · · · Xi
|
||
|
||
= r · JP K (F ) + JP K (G) .
|
||
|
||
(+ and · defined coefficient–wise)
|
||
|
||
|
||
eval (E)
|
||
· · · Xkσk
|
||
+ [σ]G · X1σ1 · · · Xi σ
|
||
(+ and · defined coefficient–wise)
|
||
|
||
· · · Xkσk
|
||
|
||
|
||
|
||
(+ and · defined coefficient–wise)
|
||
|
||
Next, we consider the induction step.
|
||
The case P = P1 ; P2 . We have
|
||
JP1 ; P2 K (r · F + G)
|
||
= JP2 K (JP1 K (r · F + G))
|
||
|
||
= JP2 K (r · JP1 K (F ) + JP1 K (G))
|
||
= r · JP2 K (JP1 K (F )) + JP2 K (JP1 K (G)) .
|
||
|
||
(I.H. on P1 )
|
||
(I.H. on P2 )
|
||
|
||
The case P = if (B) {P1 } else {P2 }. We have
|
||
Jif (B) {P1 } else {P2 }K (r · F + G)
|
||
|
||
= hJP1 K (r · F + G)iB + hJP2 K (r · F + G)i¬B
|
||
= hr · JP1 K (F ) + JP1 K (G))iB + hr · JP2 K (F ) + JP2 K (G)i¬B
|
||
(I.H. on P1 and P2 )
|
||
|
||
= r · hJP1 K (F )iB + hJP2 K (F )i¬B + hJP1 K (G)iB + hJP2 K (G)i¬B
|
||
(linearity of h·iB and h·i¬B )
|
||
= r · Jif (B) {P1 } else {P2 }K (F ) + Jif (B) {P1 } else {P2 }K (G)
|
||
|
||
26
|
||
|
||
L. Klinkenberg et al.
|
||
|
||
The case P = {P1 } [p] {P2 }.
|
||
J{P1 } [p] {P2 }K (r · F + G)
|
||
|
||
= p · JP1 K (r · F + G) + (1 − p) · JP2 K (r · F + G)
|
||
|
||
= p · (r · JP1 K (F ) + JP1 K (G)) + (1 − p) · (r · JP2 K (F ) + JP2 K (G))
|
||
(I.H. on P1 and P2 )
|
||
= r · (p · JP1 K (F ) + (1 − p) · JP2 K (F )) + p · JP1 K (G) + (1 − p) · JP2 K (G)
|
||
(reorder terms)
|
||
|
||
= r · J{P1 } [p] {P2 }K (F ) + J{P1 } [p] {P2 }K
|
||
The case P = while (B) {P1 }.
|
||
Jwhile (B) {P1 }K (r · F + G)
|
||
|
||
= sup ΦnB,P1 (0) (r · F + G)
|
||
n∈N
|
||
|
||
= sup ΦnB,P1 (0)(r · F + G)
|
||
(sup on FPS → FPS defined point–wise)
|
||
n∈N
|
||
|
||
= sup r · ΦnB,P1 (0)(F ) + ΦnB,P1 (0)(G)
|
||
n∈N
|
||
|
||
(by straightforward induction on n using I.H. on P1 )
|
||
n
|
||
|
||
= r · sup ΦB,P1 (0)(F ) + sup ΦnB,P1 (0)(G)
|
||
n∈N
|
||
|
||
n∈N
|
||
|
||
(apply monotone sequence theorem coefficient–wise)
|
||
|
||
= r · Jwhile (B) {P1 }K (F ) + Jwhile (B) {P1 }K (G)
|
||
Linearity of ΦB,P (f ) for linear f .
|
||
*
|
||
|
||
+
|
||
X
|
||
X
|
||
ΦB,P (f )
|
||
µσ X σ =
|
||
µσ X σ
|
||
σ∈Nk
|
||
|
||
σ∈Nk
|
||
|
||
=
|
||
|
||
=
|
||
|
||
=
|
||
|
||
*
|
||
|
||
*
|
||
|
||
X
|
||
|
||
σ∈Nk
|
||
|
||
X
|
||
|
||
σ∈Nk
|
||
|
||
X
|
||
|
||
σ∈Nk
|
||
|
||
=
|
||
|
||
X
|
||
|
||
σ∈Nk
|
||
|
||
=
|
||
|
||
µσ X σ
|
||
|
||
X
|
||
|
||
σ∈Nk
|
||
|
||
µσ X σ
|
||
|
||
+
|
||
|
||
+
|
||
|
||
¬B
|
||
|
||
¬B
|
||
|
||
|
||
|
||
*
|
||
+
|
||
X
|
||
|
||
+ f JP K
|
||
µσ X σ
|
||
|
||
|
||
+f
|
||
+
|
||
|
||
¬B
|
||
|
||
σ∈Nk
|
||
|
||
X
|
||
|
||
σ∈Nk
|
||
|
||
X
|
||
|
||
σ∈Nk
|
||
|
||
B
|
||
|
||
|
||
|
||
µσ JP K (hX σ iB )
|
||
|
||
(1. & 2.)
|
||
|
||
µσ · f (JP K (hX σ iB ))
|
||
(f lin.)
|
||
|
||
µσ hX σ i¬B + µσ · f (JP K (hX σ iB ))
|
||
µσ · hX σ i¬B + f (JP K (hX σ iB ))
|
||
µσ · ΦB,P (f )(X σ )
|
||
|
||
|
||
|
||
Generating Functions for Probabilistic Programs
|
||
|
||
27
|
||
|
||
Lemma 2 (Loop Unrolling). For any FPS F ,
|
||
Jwhile (B) {P }K (F ) = hF i¬B + Jwhile (B) {P }K JP K hF iB
|
||
Proof. Let W, W ′ be as described in Lemma 2.
|
||
|
||
|
||
|
||
.
|
||
|
||
JW K (G) = (lfp ΦB,P ) (G)
|
||
= ΦB,P (lfp ΦB,P ) (G)
|
||
= hGi¬B + (lfp ΦB,P ) JP K hGiB
|
||
|
||
|
||
|
||
= Jif (B) {P ; W } else {skip}K (G)
|
||
= JW ′ K (G)
|
||
|
||
A.3
|
||
|
||
Proofs of Section 4.3
|
||
|
||
Lemma 4. The mapping τ is a bijection. The inverse τ −1 of τ is given by
|
||
X
|
||
µ ({σ}) · Xσ
|
||
τ −1 : M → FPS, µ 7→
|
||
σ∈Nk
|
||
|
||
Proof. We show this by showing τ −1 ◦ τ = id and τ ◦ τ −1 = id.
|
||
|
||
|
||
!
|
||
X
|
||
X
|
||
−1
|
||
−1
|
||
σ
|
||
|
||
λN .
|
||
ασ
|
||
=τ
|
||
τ ◦τ
|
||
ασ X
|
||
σ∈N
|
||
|
||
σ∈Nk
|
||
|
||
=
|
||
|
||
X X
|
||
|
||
σ∈Nk s∈{σ}
|
||
|
||
|
||
|
||
τ ◦ τ −1 (µ) = τ
|
||
|
||
X
|
||
|
||
σ∈Nk
|
||
|
||
ασ · Xσ =
|
||
|
||
X
|
||
|
||
ασ Xσ
|
||
|
||
σ∈Nk
|
||
|
||
|
||
|
||
µ({σ}) · Xσ = λN .
|
||
|
||
X
|
||
|
||
µ({σ}) = µ(N ) = µ
|
||
|
||
σ∈N
|
||
|
||
Lemma 5. The mappings τ and τ −1 are monotone linear maps.
|
||
Proof. First, we show that τ −1 is linear (and hence τ , due to bijectivity):
|
||
X
|
||
(µ + ν)({σ}) · Xσ
|
||
τ −1 (µ + ν) =
|
||
σ∈Nk
|
||
|
||
=
|
||
|
||
X
|
||
|
||
(µ({σ}) + ν({σ})) · Xσ
|
||
|
||
X
|
||
|
||
(µ({σ}) · Xσ + ν({σ}) · Xσ )
|
||
|
||
σ∈Nk
|
||
|
||
=
|
||
|
||
σ∈Nk
|
||
|
||
|
||
|
||
=
|
||
|
||
X
|
||
|
||
σ∈Nk
|
||
|
||
(as M forms a vector space with standard +)
|
||
|
||
|
||
|
||
|
||
|
||
µ({σ}) · Xσ +
|
||
|
||
X
|
||
|
||
σ∈Nk
|
||
|
||
|
||
|
||
ν({σ}) · Xσ = τ −1 (µ) + τ −1 (ν)
|
||
|
||
28
|
||
|
||
L. Klinkenberg et al.
|
||
|
||
Second, we show that τ is monotone:
|
||
Assume Gµ ⊑ Gµ′ .
|
||
|
||
|
||
|
||
τ (Gµ ) = τ
|
||
|
||
X
|
||
|
||
σ∈Nk
|
||
|
||
≤ λS.
|
||
|
||
X
|
||
|
||
µ({σ}) · X
|
||
µ′ ({σ})
|
||
|
||
|
||
|
||
σ
|
||
|
||
= λS.
|
||
|
||
X
|
||
|
||
µ({σ})
|
||
|
||
σ∈S
|
||
|
||
σ∈S
|
||
|
||
|
||
|
||
=τ
|
||
|
||
X
|
||
|
||
σ∈Nk
|
||
|
||
(as µ({σ}) ≤ µ′ ({σ}) per definition of ⊑)
|
||
|
||
µ′ ({σ}) · Xσ = τ (Gµ′ )
|
||
|
||
Third, we show that τ −1 is monotone:
|
||
Assume µ ⊑ µ′ .
|
||
|
||
τ −1 (µ) =
|
||
|
||
X
|
||
|
||
σ∈Nk
|
||
|
||
⊑
|
||
|
||
X
|
||
|
||
σ∈Nk
|
||
|
||
µ({σ}) · Xσ
|
||
µ′ ({σ}) · Xσ (as µ({σ}) ≤ µ′ ({σ}) per definition of ⊑)
|
||
|
||
= τ −1 (µ′ )
|
||
Lemma 6. Let f : (P, ≤) → (Q, ≤) be a monotone isomorphism for any partially ordered sets P and Q. Then,
|
||
f ∗ : Hom(P, P ) → Hom(Q, Q),
|
||
|
||
φ 7→ f ◦ ϕ ◦ f −1
|
||
|
||
is also a monotone isomorphism.
|
||
Proof. Let f be such a monotone isomorphism, and f ∗ the corresponding lifting.
|
||
First, we note that f ∗ is also bijective. Its inverse is given by (f ∗ )−1 = (f −1 )∗ .
|
||
Second, f ∗ is monotone, as shown in the following calculation.
|
||
f ≤ g =⇒ ∀x.
|
||
|
||
=⇒ ∀x.
|
||
=⇒ ∀x.
|
||
|
||
f (x) ≤ g(x)
|
||
|
||
|
||
|
||
τ ◦ f τ −1 ◦ τ (x) ≤ τ ◦ g τ −1 ◦ τ (x)
|
||
τ ∗ ◦ f (τ (x)) ≤ τ ∗ ◦ g (τ (x))
|
||
|
||
=⇒ ∀y. τ ∗ ◦ f (y) ≤ τ ∗ ◦ g(y)
|
||
=⇒ τ ∗ (f ) ≤ τ ∗ (g)
|
||
|
||
Lemma 7. Let P, Q be complete lattices, and τ a monotone isomorphism. Also
|
||
let lfp be the least fixed point operator. Then the following diagram commutes.
|
||
|
||
Generating Functions for Probabilistic Programs
|
||
|
||
Hom(P, P )
|
||
|
||
τ∗
|
||
|
||
lfp
|
||
|
||
29
|
||
|
||
Hom(Q, Q)
|
||
lfp
|
||
|
||
P
|
||
|
||
τ
|
||
|
||
Q
|
||
|
||
Proof. Let ϕ ∈ Hom(P, P ) be arbitrary.
|
||
|
||
lfp ϕ = inf p ϕ(p) = p
|
||
|
||
|
||
τ (lfp ϕ) = τ inf p ϕ(p) = p
|
||
|
||
= inf τ (p) ϕ(p) = p
|
||
|
||
= inf τ (p) ϕ(τ −1 ◦ τ (p)) = τ −1 ◦ τ (p)
|
||
|
||
= inf τ (p) τ ◦ ϕ(τ −1 ◦ τ (p)) = τ (p)
|
||
|
||
= inf q τ ◦ ϕ(τ −1 (q)) = q
|
||
|
||
= inf q τ ∗ (ϕ)(q) = q
|
||
= lfp τ ∗ (ϕ)
|
||
Definition 11. Let T be the program translation from pGCL to a modified Kozen
|
||
syntax, defined inductively:
|
||
T(skip) = skip
|
||
T(xi := E) = xi := fE (x1 , . . . , xk )
|
||
T({P } [p] {Q}) = {T(P )} [p] {T(Q)}
|
||
T(P # Q) = T(P ); T(Q)
|
||
T(if (B) {P } else {Q}) = if B then T(P ) else T(Q) fi
|
||
T(while (B) {P }) = while B do T(P ) od ,
|
||
where p is a probability, k = |Var(P )|, B is a Boolean expression and P, Q are
|
||
pGCL programs. The extended construct skip as well as {P } [p] {Q} is only syntactic sugar and can be simulated by the original Kozen semantics. The intended
|
||
semantics of these constructs are
|
||
and
|
||
|
||
[skip] = id
|
||
[{P } [p] {Q}] = p · T(P ) + (1 − p) · T(Q).
|
||
|
||
Lemma 8. For all guards B, the following identity holds: eB ◦ τ = τ ◦ h·iB .
|
||
P
|
||
Proof. For all Gµ = σ∈Nk µ ({σ}) · Xσ ∈ FPS:
|
||
eB ◦ τ (Gµ ) = eB (µ)
|
||
|
||
= λS. µ(S ∩ B)
|
||
|
||
|
||
X
|
||
X
|
||
τ ◦ hGµ iB = τ
|
||
µ ({σ}) · Xσ +
|
||
0 · Xσ
|
||
σ∈B
|
||
|
||
= λS. µ(S ∩ B)
|
||
|
||
σ6∈B
|
||
|
||
30
|
||
|
||
=⇒
|
||
|
||
L. Klinkenberg et al.
|
||
|
||
∀Gµ ∈ FPS. eB ◦ τ (Gµ ) = τ ◦ hGµ iB
|
||
|
||
Theorem 5. The FPS semantics of pGCL is an instance of Kozen’s semantics,
|
||
i.e. for all pGCL programs P , we have
|
||
τ ◦ JP K = T(P ) ◦ τ .
|
||
Proof. The proof is done via induction on the program structure. We omit the
|
||
loop-free cases, as they are straightforward.
|
||
By definition, T(while (B) {P }) = while B do P od. Hence, the corresponding Kozen semantics is equal to lfp TB,P , where
|
||
T : (M → M) → (M → M),
|
||
|
||
S 7→ eB̄ + (S ◦ P ◦ eB ) .
|
||
|
||
First, we show that τ −∗ ◦ TB,P ◦ τ ∗ = ΦB,P , where τ ∗ is the canonical lifting
|
||
of τ , i.e., τ ∗ (S) = τ ◦ S ◦ τ −1 for all S ∈ (FPS → FPS).
|
||
|
||
−∗
|
||
τ ◦ TB,P ◦ τ ∗ (S) = τ −∗ ◦ TB,P ◦ τ ◦ S ◦ τ −1
|
||
|
||
= τ −∗ eB̄ + τ ◦ S ◦ τ −1 ◦ P ◦ eB
|
||
= τ −1 ◦ eB̄ ◦ τ + τ −1 ◦ τ ◦ S ◦ τ −1 ◦ P ◦ eB ◦ τ
|
||
|
||
= τ −1 ◦ eB̄ ◦ τ + S ◦ τ −1 ◦ P ◦ eB ◦ τ
|
||
|
||
= τ −1 ◦ τ ◦ h·iB̄ + S ◦ τ −1 ◦ P ◦ τ ◦ h·iB
|
||
|
||
= h·iB̄ + S ◦ τ −1 ◦ τ ◦ JP K ◦ h·iB
|
||
(Using I.H. on P ◦ τ )
|
||
|
||
= h·iB̄ + S ◦ JP K ◦ h·iB
|
||
|
||
= ΦB,P (S)
|
||
|
||
Having this equality at hand, we can easily proof the correspondence of our
|
||
while semantics to the one defined by Kozen in the following manner:
|
||
τ ◦ Jwhile (B) {P }K = T(while (B) {P }) ◦ τ
|
||
|
||
⇔ τ ◦ lfp ΦB,P = lfp TB,P ◦ τ
|
||
|
||
⇔ lfp ΦB,P = τ −1 ◦ lfp TB,P ◦ τ
|
||
|
||
⇔ lfp ΦB,P = τ −∗ (lfp TB,P )
|
||
|
||
⇔ lfp ΦB,P = lfp τ −∗ ◦ TB,P ◦ τ
|
||
⇔ lfp ΦB,P = lfp ΦB,P
|
||
|
||
B
|
||
|
||
(Definition of τ ∗ )
|
||
|
||
∗
|
||
|
||
(cf. Lemma 7)
|
||
|
||
Proofs of Section 5
|
||
|
||
Theorem 6 (Superinvariants and Loop Overapproximations). Let ΦB,P
|
||
be the unfolding operator of while(B){P } (cf. Def. 7) and ψ : FPS → FPS. Then
|
||
ΦB,P (ψ) ⊑ ψ
|
||
|
||
implies
|
||
|
||
Jwhile (B) {P }K ⊑ ψ .
|
||
|
||
Generating Functions for Probabilistic Programs
|
||
|
||
31
|
||
|
||
Proof. Instance of Park’s Lemma [?].
|
||
|
||
Corollary 2. Given a function f : Mon (X) → FPS, let the linear extension fˆ
|
||
of f be defined by
|
||
fˆ:
|
||
|
||
FPS → FPS,
|
||
|
||
F 7→
|
||
|
||
X
|
||
|
||
[σ]F f (Xσ ) .
|
||
|
||
σ∈Nk
|
||
|
||
Let ΦB,P be the unfolding operator of while (B) {P }. Then
|
||
∀ σ ∈ Nk :
|
||
|
||
ΦB,P (fˆ)(Xσ ) ⊑ fˆ(Xσ )
|
||
|
||
implies
|
||
|
||
Jwhile (B) {P }K ⊑ fˆ .
|
||
|
||
Proof. Let G ∈ FPS be arbitrary.
|
||
ΦB,P (fˆ)(G) =
|
||
|
||
X
|
||
|
||
[σ]G ΦB,P (fˆ)(Xσ )
|
||
|
||
(By Theorem 4)
|
||
|
||
[σ]G f (Xσ )
|
||
|
||
(By assumption)
|
||
|
||
σ∈Nk
|
||
|
||
⊑
|
||
|
||
=⇒
|
||
|
||
X
|
||
|
||
σ∈Nk
|
||
|
||
= fˆ(G)
|
||
JW K ⊑ fˆ
|
||
|
||
(By Theorem 6)
|
||
|
||
Proof of Example 6.
|
||
|
||
|
||
|
||
ΦB,P fˆ X i C j =
|
||
X iC j
|
||
|
||
i=0
|
||
|
||
|
||
1
|
||
+ fˆ
|
||
X iC j
|
||
2
|
||
|
||
·
|
||
i>0
|
||
|
||
case i = 0 : ⇒ (C j + fˆ(0)) = C j = f (X 0 C j )
|
||
|
||
C ˆ i−1 j
|
||
f (X C ) + fˆ X i+1 C j
|
||
case i > 0 : ⇒
|
||
2 (
|
||
1
|
||
i even
|
||
2,
|
||
= C j · 1−C
|
||
= f (X i C j )
|
||
C
|
||
,
|
||
i odd
|
||
1−C 2
|
||
|
||
=⇒ ΦB,P fˆ (X i C j ) ⊑ f (X i C j ).
|
||
|
||
C
|
||
1
|
||
+
|
||
X iC j
|
||
X
|
||
2
|
||
|
||
· XC
|
||
i>0
|
||
|
||
|
||
|
||
32
|
||
|
||
L. Klinkenberg et al.
|
||
|
||
Thus fˆ is a superinvariant.
|
||
i
|
||
|
||
j
|
||
|
||
ΦB,P (ĥ)(X C ) =
|
||
|
||
i
|
||
|
||
X C
|
||
|
||
j
|
||
i=0
|
||
|
||
+ ĥ
|
||
|
||
|
||
|
||
1
|
||
X iC j
|
||
2
|
||
|
||
C
|
||
1
|
||
·
|
||
X iC j
|
||
+
|
||
i>0 X
|
||
2
|
||
|
||
i>0
|
||
|
||
· XC
|
||
|
||
case i = 0 : ⇒ (C j + ĥ(0)) = 1 = h(X 0 C j )
|
||
|
||
|
||
C
|
||
case i > 0 : ⇒
|
||
ĥ X i−1 C j + ĥ X i+1 C j
|
||
2
|
||
|
||
!i−1
|
||
!i+1
|
||
√
|
||
√
|
||
C j+1 1 − 1 − C 2
|
||
1 − 1 − C2
|
||
|
||
=
|
||
+
|
||
·
|
||
2
|
||
C
|
||
C
|
||
j
|
||
|
||
= C ·
|
||
|
||
1−
|
||
|
||
!i
|
||
√
|
||
1 − C2
|
||
= h(X i C j )
|
||
C
|
||
|
||
=⇒ ΦB,P (ĥ)(X i C j ) = h(X i C j )
|
||
Thus fˆ is a superinvariant.
|
||
Verification Python Script
|
||
from sympy import ∗
|
||
i nit p rint ing ()
|
||
x , c = sy m b o l s ( ’ x , c ’ )
|
||
i , j = sy m b o l s ( ’ i , j ’ , i n t e g e r=True )
|
||
#d e f i n e t h e h i g h e r o r d e r t r a n s f o r m e r
|
||
def Phi ( f ) :
|
||
return c /2 ∗ ( f . s u b s ( i , i −1) + f . s u b s ( i , i +1) )
|
||
def c o m p u t e d i f f e r e n c e ( f ) :
|
||
return ( Phi ( f ) − f ) . s i m p l i f y ( )
|
||
#d e f i n e P o l yno m ia l v e r i f y e r
|
||
def v e r i f y p o l y ( p o l y ) :
|
||
p r i n t ( ” Check c o e f f i c i e n t s f o r non−p o s i t i v i t y : ” )
|
||
for c o e f f in poly . c o e f f s ( ) :
|
||
i f coeff > 0:
|
||
return F a l s e
|
||
return True
|
||
# a c t u a l v e r i f i c a t i o n method
|
||
def v e r i f y ( f ) :
|
||
p r i n t ( ” Check i n v a r i a n t : ” )
|
||
pprint ( f )
|
||
result = compute difference( f )
|
||
if result . is zero :
|
||
print ( ” I n v a r i a n t i s a f i x p o i n t ! ” )
|
||
return True
|
||
else :
|
||
print ( ” I n v a r i a n t i s not a f i x p o i n t − check
|
||
try :
|
||
return v e r i f y p o l y ( Poly ( r e s u l t ) )
|
||
except P o l i f i c a t i o n F a i l e d :
|
||
p r i n t ( ” I n v a r i a n t i s n o t a Poly ! ” )
|
||
return F a l s e
|
||
except :
|
||
p r i n t ( ” Unexpected E r r o r ” )
|
||
raise
|
||
|
||
if
|
||
|
||
r e m a i n d e r i s Poly ” )
|
||
|
||
|
||
|
||
Generating Functions for Probabilistic Programs
|
||
|
||
33
|
||
|
||
# d e f i n e t h e l o o p i n v a r i a n t g u e s s ( i != 0) c a s e
|
||
f = c ∗∗ j ∗ ( ( c / (1− c ∗ ∗ 2 ) ) ∗ ( i % 2 ) + (1/(1 − c ∗ ∗ 2 ) ) ∗ ( ( i +1) % 2 ) )
|
||
# Second i n v a r i a n t :
|
||
h = c ∗∗ j ∗ ( ( 1 − s q r t ( 1 − c ∗ ∗ 2 ) ) / c ) ∗∗ i
|
||
print ( ” I n v a r i a n t v e r i f i e d ” i f
|
||
print ( ” I n v a r i a n t v e r i f i e d ” i f
|
||
|
||
v e r i f y ( f ) e l s e ”Unknown ” )
|
||
v e r i f y ( h ) e l s e ”Unknown ” )
|
||
|
||
Program 1.5. Python program checking the invariants
|
||
|
||
Proof of Example 7.
|
||
ΦB,P (fˆ)(X i ) =
|
||
|
||
Xi
|
||
|
||
i=0
|
||
|
||
+
|
||
|
||
1 ˆ
|
||
·f
|
||
i
|
||
|
||
|
||
|
||
Xi
|
||
|
||
·
|
||
i>0
|
||
|
||
case i = 0 : ⇒ 1 + ∞ · fˆ(0) + − ∞ · fˆ(0)
|
||
|
||
1
|
||
X
|
||
|
||
|
||
|
||
+
|
||
|
||
|
||
|
||
1
|
||
1−
|
||
· fˆ X i
|
||
i
|
||
|
||
|
||
= 1 + ∞ · 0 + − ∞ · 0 = 1 = f Xi
|
||
|
||
|
||
|
||
1 ˆ i−1
|
||
1
|
||
case i > 0 : ⇒ 0 +
|
||
· fˆ X i+1
|
||
+ 1−
|
||
·f X
|
||
i
|
||
i
|
||
!
|
||
!
|
||
|
||
|
||
i−3
|
||
i−1
|
||
1
|
||
1 X 1
|
||
1 X 1
|
||
1
|
||
+ 1−
|
||
· 1− ·
|
||
· 1− ·
|
||
=
|
||
i
|
||
e n=0 n!
|
||
i
|
||
e n=0 n!
|
||
!
|
||
!
|
||
i−3
|
||
i−1
|
||
1 X 1
|
||
1
|
||
1
|
||
1 X 1
|
||
+
|
||
1− ·
|
||
−
|
||
−
|
||
·
|
||
+
|
||
=
|
||
i
|
||
ei n=0 n!
|
||
e n=0 n!
|
||
i
|
||
!
|
||
|
||
|
||
i−1
|
||
1 X 1
|
||
1
|
||
1
|
||
1
|
||
=
|
||
1− ·
|
||
+
|
||
·
|
||
+
|
||
e n=0 n!
|
||
ei
|
||
(i − 2)! (i − 1)!
|
||
!
|
||
!
|
||
i−1
|
||
i−2
|
||
1
|
||
1 X 1
|
||
1 X 1
|
||
+
|
||
=
|
||
1− ·
|
||
=
|
||
1− ·
|
||
e n=0 n!
|
||
e(i − 1)!
|
||
e n=0 n!
|
||
|
||
= f Xi
|
||
=⇒ ΦB,P (fˆ)(X i ) = f (X i )
|
||
|
||
i>0
|
||
|
||
·X
|
||
|
||
|
||
|
||
i−1
|
||
1 X 1
|
||
·
|
||
ei n=0 n!
|
||
|
||
Mathematica input query:
|
||
Input:
|
||
Output:
|
||
|
||
1
|
||
·
|
||
k
|
||
0
|
||
|
||
k−3
|
||
1 X 1
|
||
1− ·
|
||
e n=0 n!
|
||
|
||
!
|
||
|
||
1
|
||
+ (1 − ) ·
|
||
k
|
||
|
||
k−1
|
||
1 X 1
|
||
1− ·
|
||
e n=0 n!
|
||
|
||
!
|
||
|
||
−
|
||
|
||
k−2
|
||
1 X 1
|
||
1− ·
|
||
e n=0 n!
|
||
|
||
!
|
||
|
||
!
|
||
|
||
|
||
n-Complete Test Suites for IOCO
|
||
Petra van den Bos(B) , Ramon Janssen, and Joshua Moerman
|
||
Institute for Computing and Information Sciences,
|
||
Radboud University, Nijmegen, The Netherlands
|
||
{petra,ramonjanssen,joshua.moerman}@cs.ru.nl
|
||
|
||
Abstract. An n-complete test suite for automata guarantees to detect
|
||
all faulty implementations with a bounded number of states. This
|
||
principle is well-known when testing FSMs for equivalence, but the problem becomes harder for ioco conformance on labeled transitions systems.
|
||
Existing methods restrict the structure of specifications and implementations. We eliminate those restrictions, using only the number of implementation states, and fairness in test execution. We provide a formalization,
|
||
a construction and a correctness proof for n-complete test suites for ioco.
|
||
|
||
1
|
||
|
||
Introduction
|
||
|
||
The holy grail of model-based testing is a complete test suite: a test suite that
|
||
can detect any possible faulty implementation. For black-box testing, this is
|
||
impossible: a tester can only make a finite number of observations, but for an
|
||
implementation of unknown size, it is unclear when to stop. Often, a so called
|
||
n-complete test suite is used to tackle this problem, meaning it is complete for
|
||
all implementations with at most n states.
|
||
For specifications modeled as finite state machines (FSMs) (also called Mealy
|
||
machines), this has already been investigated extensively. In this paper we
|
||
will explore how an n-complete test suite can be constructed for suspension
|
||
automata. We use the ioco relation [11] instead of equivalence of FSMs.
|
||
An n-complete test suite for FSM equivalence usually provides some way
|
||
to reach all states and transitions of the implementation. After reaching some
|
||
state, it is tested whether this is the correct state, by observing behavior which
|
||
is unique for that state, and hence distinguishing it from all other states.
|
||
Unlike FSM equivalence, ioco is not an equivalence relation, meaning that different implementations may conform to the same specification and, conversely,
|
||
an implementation may conform to different specifications. In this paper, we
|
||
focus on the problem of distinguishing states. For ioco, this cannot be done with
|
||
simple identification. If an implementation state conforms to multiple specifications states, those states are defined to be compatible. Incompatible states can
|
||
be handled in ways comparable to FSM-methods, but distinguishing compatible
|
||
states requires more effort.
|
||
P. van den Bos and R. Janssen—Supported by NWO project 13859 (SUMBAT).
|
||
c IFIP International Federation for Information Processing 2017
|
||
|
||
Published by Springer International Publishing AG 2017. All Rights Reserved
|
||
N. Yevtushenko et al. (Eds.): ICTSS 2017, LNCS 10533, pp. 91–107, 2017.
|
||
DOI: 10.1007/978-3-319-67549-7 6
|
||
|
||
92
|
||
|
||
P. van den Bos et al.
|
||
|
||
In this paper, we give a structured approach for distinguishing incompatible
|
||
states. We also propose a strategy to handle compatible states. Obviously, they
|
||
cannot be distinguished in the sense of incompatible states. We thus change the
|
||
aim of distinguishing: instead of forcing a non-conformance to either specification
|
||
state, we may also prove conformance to both. As our only tool in proving this
|
||
is by further testing, this is a recursive problem: during complete testing, we are
|
||
required to prove conformance to multiple states by testing. We thus introduce a
|
||
recursively defined test suite. We give examples where this still gives a finite test
|
||
suite, together with a completeness proof for this approach. To show an upper
|
||
bound for the required size of a test suite, we also show that an n-complete test
|
||
suite with finite size can always be constructed, albeit an inefficient one.
|
||
Related Work. Testing methods for Finite State Machines (FSMs) have been
|
||
analyzed thoroughly, and n-complete test suites are already known for quite
|
||
a while. A survey is given in [3]. Progress has been made on generalizing these
|
||
testing methods to nondeterministic FSMs, for example in [6,9]. FSM-based work
|
||
that more closely resembles ioco is reduction of non-deterministic FSMs [4].
|
||
Complete testing in ioco received less attention than in FSM theory on this
|
||
subject. The original test generation method [11] is an approach in which test
|
||
cases are generated randomly. The method is complete in the sense that any
|
||
fault can be found, but there is no upper bound to the required number and
|
||
length of test cases.
|
||
In [8], complete test suites are constructed for Mealy-IOTSes. Mealy-IOTSes
|
||
are a subclass of suspension automata, but are similar to Mealy machines as
|
||
(sequences of) outputs are coupled to inputs. This makes the transition from
|
||
FSM testing more straightforward.
|
||
The work most similar to ours [10] works on deterministic labeled transition systems, adding quiescence afterwards, as usual for ioco. Non-deterministic
|
||
models are thus not considered, and cannot be handled implicitly through determinization, as determinization can only be done after adding quiescence. Some
|
||
further restrictions are made on the specification domains. In particular, all
|
||
specification states should be reachable without depending on choices for output transitions of the implementation. Furthermore, all states should be mutually incompatible. In this sense, our test suite construction can be applied to
|
||
a broader set of systems, but will potentially be much less efficient. Thus, we
|
||
prioritize exploring the bounds of n-complete test suites for ioco, whereas [10]
|
||
aims at efficient test suites, by restricting the models which can be handled.
|
||
|
||
2
|
||
|
||
Preliminaries
|
||
|
||
The original ioco theory is defined for labeled transition systems, which may
|
||
contain internal transitions, be nondeterministic, and may have states without outputs [11]. To every state without outputs, a self-loop with quiescence is
|
||
added as an artificial output. The resulting labeled transition system is then
|
||
determinized to create a suspension automaton, which is equivalent to the initial
|
||
|
||
n-Complete Test Suites for IOCO
|
||
|
||
93
|
||
|
||
labeled transition system with respect to ioco [13]. In this paper, we will consider a slight generalization of suspension automata, such that our results hold
|
||
for ioco in general: quiescent transitions usually have some restrictions, but we
|
||
do not require them and we will treat quiescence as any other output. We will
|
||
define them in terms of general automata with inputs and outputs.
|
||
Definition 1. An I/O-automaton is a tuple (Q, LI , LO , T, q0 ) where
|
||
–
|
||
–
|
||
–
|
||
–
|
||
–
|
||
|
||
Q is a finite set of states
|
||
LI is a finite set of input labels
|
||
LO is a finite set of output labels
|
||
T : Q × (LI ∪ LO ) Q is the (partial) transition function
|
||
q0 ∈ Q is the initial state
|
||
|
||
We denote the domain of I/O-automata for LI and LO with A(LI , LO ).
|
||
For the remainder of this paper we fix LI and LO as disjoint sets of input and
|
||
output labels respectively, with L = LI ∪ LO , and omit them if clear from the
|
||
context. Furthermore, we use a, b as input symbols and x, y, z as output symbols.
|
||
Definition 2. Let S = (Q, LI , LO , T, q0 ) ∈ A, q ∈ Q, B ⊆ Q, μ ∈ L and
|
||
σ ∈ L∗ . Then we define:
|
||
|
||
|
||
∅
|
||
if T (q, μ) = ⊥
|
||
q after μ =
|
||
{T (q, μ)} otherwise
|
||
|
||
q after μ
|
||
B after μ =
|
||
q ∈B
|
||
|
||
q after = {q}
|
||
q after μσ = (q after μ) after σ
|
||
|
||
B after σ =
|
||
|
||
|
||
|
||
q after σ
|
||
|
||
q ∈B
|
||
|
||
out(B) = {x ∈ LO | B after x = ∅}
|
||
in(B) = {a ∈ LI | B after a = ∅}
|
||
init(B) = in(B) ∪ out(B)
|
||
Straces(B) = {σ ∈ L∗ | B after σ = ∅}
|
||
|
||
|
||
|
||
S is output-enabled if ∀p ∈ Q : out(p) = ∅ SA = {S ∈ A | S is output-enabled}
|
||
|
||
|
||
S is input-enabled if ∀p ∈ Q : in(p) = LI SAIE = {S ∈ SA | S is input-enabled}
|
||
|
||
We interchange singleton sets with its element, e.g. we write out(q) instead
|
||
of out({q}). Definitions on states will sometimes be used for automata as well,
|
||
acting on their initial states. Similarly, definitions on automata will be used for
|
||
states, acting on the automaton with that state as its initial state. For example,
|
||
for S = (Q, LI , LO , T, q0 ) ∈ A and q ∈ Q, we may write S after μ instead of q0
|
||
after μ, and we may write that q is input-enabled if S is input-enabled.
|
||
In this paper, specifications are suspension automata in SA, and implementations are input-enabled suspension automata in SAIE . The ioco relation formalizes when implementations conform to specifications. We give a definition
|
||
relating suspension automata, following [11], and the coinductive definition [7]
|
||
relating states. Both definitions have been proven to coincide.
|
||
Definition 3. Let S ∈ SA, and I ∈ SAIE . Then we say that I ioco S if ∀σ ∈
|
||
Straces(S) : out(I after σ) ⊆out(S after σ).
|
||
|
||
94
|
||
|
||
P. van den Bos et al.
|
||
|
||
Definition 4. Let S = (Qs , LI , LO , Ts , q0s ) ∈ SA, and I = (Qi , LI , LO , Ti , q0i ) ∈
|
||
SAIE . Then for qi ∈ Qi , qs ∈ Qs , we say that qi ioco qs if there exists a
|
||
coinductive ioco relation R ⊆ Qi × Qs such that (qi , qs ) ∈ R, and ∀(q, p) ∈ R:
|
||
– ∀a ∈ in(p) : (q after a, p after a) ∈ R
|
||
– ∀x ∈ out(q) : x ∈ out(p) ∧ (q after x, p after x) ∈ R
|
||
In order to define complete test suites, we require execution of tests to be
|
||
fair : if a trace σ is performed often enough, then every output x appearing
|
||
in the implementation after σ will eventually be observed. Furthermore, the
|
||
implementation may give an output after σ before the tester can supply an
|
||
input. We then assume that the tester will eventually succeed in performing
|
||
this input after σ. This fairness assumption is unavoidable for any notion of
|
||
completeness in testing suspension automata: a fault can never be detected if an
|
||
implementation always chooses paths that avoid this fault.
|
||
|
||
3
|
||
|
||
Distinguishing Experiments
|
||
|
||
An important part of n-complete test suites for FSM equivalence is the distinguishing sequence, used to identify an implementation state. As ioco is not
|
||
an equivalence relation, there does not have to be a one-to-one correspondence
|
||
between specification and implementation states.
|
||
3.1
|
||
|
||
Equivalence and Compatibility
|
||
|
||
We first describe equivalence and compatibility relations between states, in order
|
||
to define distinguishing experiments. We consider two specifications to be equivalent, denoted S1 ≈ S2 , if they have the same implementations conforming to
|
||
them. Then, for all implementations I, we have I ioco S1 iff I ioco S2 . For two
|
||
inequivalent specifications, there is thus an implementation which conforms to
|
||
one, but not the other.
|
||
Intuitively, equivalence relates states with the same traces. However, implicit
|
||
underspecification by absent inputs should be handled equivalently to explicit
|
||
underspecification with chaos. This is done by using chaotic completion [11].
|
||
This definition of equivalence is inspired by the relation wioco [12], which relates
|
||
specifications based on their sets of traces.
|
||
Definition 5. Let (Q, LI , LO , T, q0 ) ∈ SA. Define chaos, a specification to
|
||
which every implementation conforms, as X = ({χ}, LI , LO , {(χ, x, χ) | x ∈
|
||
L}, χ). Let QX = Q ∪ {χ}. The relation ≈ ⊆ QX × QX relates all equivalent
|
||
states. It is the largest relation for which it holds that q ≈ q if:
|
||
out(q) = out(q ) ∧ (∀μ ∈ init(q) ∩ init(q ) : q after μ ≈ q after μ)
|
||
∧ (∀a ∈ in(q)\in(q ) : q after a ≈ χ) ∧ (∀a ∈ in(q )\in(q) : q after a ≈ χ)
|
||
|
||
n-Complete Test Suites for IOCO
|
||
|
||
95
|
||
|
||
For two inequivalent specifications, there may still exist an implementation
|
||
that conforms to the two. In that case, we define the specifications to be compatible, following the terminology introduced in [9,10]. We introduce an explicit
|
||
relation for compatibility.
|
||
Definition 6. Let (Q, LI , LO , T, q0 ) ∈ SA. The relation ♦ ⊆ Q × Q relates all
|
||
compatible states. It is the largest relation for which it holds that q ♦ q if:
|
||
(∀a ∈ in(q) ∩ in(q ) : q after a ♦ q after a)
|
||
∧ (∃x ∈ out(q) ∩ out(q ) : q after x ♦ q after x)
|
||
Compatibility is symmetric and reflexive, but not transitive. Conversely, two
|
||
specifications are incompatible if there exists no implementation conforming to
|
||
both. When q1 ♦ q2 , we can indeed easily make an implementation which conforms to both q1 and q2 : the set of outputs of the implementation state can
|
||
simply be out(q1 )∩out(q2 ), which is non-empty by definition of ♦. Upon such
|
||
an output transition or any input transition, the two successor states are again
|
||
compatible, thus the implementation can keep picking transitions in this manner. For example, in Fig. 1, compatible states 2 and 3 of the specification are
|
||
both implemented by state 2 of the implementation.
|
||
|
||
a
|
||
|
||
3
|
||
|
||
a
|
||
x
|
||
|
||
4
|
||
|
||
y
|
||
x
|
||
y
|
||
|
||
x
|
||
|
||
1
|
||
|
||
2
|
||
|
||
x
|
||
6
|
||
|
||
y
|
||
|
||
a
|
||
|
||
5
|
||
|
||
z
|
||
|
||
(a) Specification S with 2 ♦ 3.
|
||
|
||
a
|
||
|
||
3
|
||
|
||
a
|
||
x
|
||
|
||
4
|
||
a
|
||
|
||
x
|
||
y
|
||
|
||
x
|
||
|
||
1
|
||
a x
|
||
6
|
||
z
|
||
|
||
y
|
||
|
||
2∧3
|
||
2
|
||
|
||
a
|
||
|
||
5
|
||
a
|
||
|
||
(b) An implementation of S.
|
||
|
||
a
|
||
4∧5
|
||
|
||
x
|
||
6
|
||
|
||
y
|
||
z
|
||
|
||
(c) The merge of specification states 2 and 3.
|
||
|
||
Fig. 1. A specification, an implementation, and a merge of two states.
|
||
|
||
Beneš et al. [1] describe the construction of merging specifications. For specification states qs and qs , their merge is denoted qs ∧ qs . For any implementation
|
||
state qi , it holds that qi ioco qs ∧ qi ioco qs ⇐⇒ qi ioco (qs ∧ qs ). Intuitively, a
|
||
merge of two states thus only allows behavior allowed by both states. Figure 1c
|
||
shows the merge of specification states 2 and 3. The merge of qs and qs can be
|
||
implemented if and only if qs ♦ qs : indeed, for incompatible states, the merge
|
||
has states without any output transitions, which is denoted invalid in [1].
|
||
3.2
|
||
|
||
Distinguishing Trees
|
||
|
||
When an implementation is in state qi , two incompatible specification states qs
|
||
and qs are distinguished by showing to which of the two qi conforms, assuming
|
||
that it conforms to one. Conversely, we can say that we have to show a nonconformance of qi to qs or qs . Generally, a set of states D is distinguished by
|
||
|
||
96
|
||
|
||
P. van den Bos et al.
|
||
|
||
showing non-conformance to all its states, possibly except one. As a base case,
|
||
if |D| ≤ 1, then D is already distinguished. We will construct a distinguishing
|
||
tree as an input-enabled automaton which distinguishes D after reaching pass.
|
||
Definition
|
||
7. Let μ be a symbol and D a set of states. Then injective(μ, D) if
|
||
|
||
μ ∈ {in(q) | q ∈ D} ∪ LO ∧ ∀q, q ∈ D : q = q ∧ μ ∈init(q)∩init(q ) =⇒ q
|
||
after μ = q after μ. This is extended to sets of symbols Σ as injective(Σ, D) if
|
||
∀μ ∈ Σ : injective(μ, D).
|
||
Definition 8. Let (Q, LI , LO , T, q0 ) ∈ SA(LI , LO ), and D ⊆ Q a set of mutually incompatible states. Then define DT (LI , LO , D) ⊆ A(LO , LI ) inductively
|
||
as the domain of input-enabled distinguishing trees for D, such that for every
|
||
Y ∈ DT (LI , LO , D) with initial state t0 :
|
||
– if |D| ≤ 1, then t0 is the verdict state pass, and
|
||
– if |D| > 1, then t0 has either
|
||
• a transition for a single input a ∈ LI to a Y ∈ DT (LI , LO , D after a)
|
||
such that injective(a, D), and transitions to a verdict state reset for all
|
||
x ∈ LO , or
|
||
• a transition for every output x ∈ LO to a Y ∈ DT (LI , LO , D after x)
|
||
such that injective(x, D).
|
||
Furthermore, pass or reset is always reached after a finite number of steps,
|
||
and these states are sink states, i.e. contain transitions only to itself.
|
||
A distinguishing tree can synchronize with an implementation to reach a
|
||
verdict state. As an implementation is output-enabled and the distinguishing
|
||
tree is input-enabled, this never blocks. If the tree performs an input, the implementation may provide an output first, resulting in reset: another attempt is
|
||
needed to perform the input. If no input is performed by the tree, it waits for
|
||
any output, after which it can continue. In this way, the tester is guaranteed to
|
||
steer the implementation to a pass, where the specification states disagree on
|
||
the allowed outputs: the implementation has to choose an output, thus has to
|
||
choose which specifications (not) to implement.
|
||
For a set D of mutually incompatible states, such a tree may not exist. For
|
||
example, consider states 1, 3 and 5 in Fig. 2. States 1 and 3 both lead to the
|
||
same state after a, and can therefore not be distinguished. Similarly, states 3
|
||
and 5 cannot be distinguished after b. Labels a and b are therefore not injective
|
||
according to Definition 7 and should not be used. This concept is similar in FSM
|
||
testing [5]. A distinguishing sequence always exists when |D| = 2. When |D| > 2,
|
||
we can thus use multiple experiments to separate all states pairwise.
|
||
Lemma 9. Let S ∈ SA. Let q and q be two states of S, such that q ♦ q . Then
|
||
there exists a distinguishing tree for q and q .
|
||
Proof. Since q ♦ q , we know that:
|
||
(∃a ∈ in(q) ∩ in(q ) : q after a ♦ q after a)
|
||
∨ (∀x ∈ out(q) ∩ out(q ) : q after x ♦ q after x)
|
||
|
||
n-Complete Test Suites for IOCO
|
||
|
||
97
|
||
|
||
So we have that some input or all outputs, enabled in both q and q , lead to
|
||
incompatible states, for which this holds again. Hence, we can construct a tree
|
||
with nodes that either have a child for an enabled input of both states, or
|
||
children for all outputs enabled in the states (children for not enabled outputs
|
||
are distinguishing trees for ∅), as in the second case of Definition 8. If this tree
|
||
would be infinite, then this tree would describe infinite sequences of labels. Since
|
||
S is finite, such a sequence would be a cycle in S. This would mean that q ♦ q ,
|
||
which is not the case. Hence we have that the tree is finite, as required by
|
||
Definition 8.
|
||
|
||
|
||
z
|
||
5
|
||
|
||
a
|
||
y
|
||
b
|
||
|
||
a
|
||
4
|
||
|
||
b
|
||
|
||
3
|
||
z
|
||
|
||
a
|
||
|
||
2
|
||
|
||
x
|
||
|
||
z
|
||
1
|
||
|
||
b
|
||
|
||
Fig. 2. No distinguishing tree exists for {1,3,5}.
|
||
|
||
3.3
|
||
|
||
Distinguishing Compatible States
|
||
|
||
Distinguishing techniques such as described in Sect. 3.2 rely on incompatibility
|
||
of two specifications, by steering the implementation to a point where the specifications disagree on the allowed outputs. This technique fails for compatible specifications, as an implementation state may conform to both specifications. Thus,
|
||
a tester then cannot steer the implementation to showing a non-conformance to
|
||
either.
|
||
We thus extend the aim of a distinguishing experiment: instead of showing a
|
||
non-conformance to any of two compatible states qs and qs , we may also prove
|
||
conformance to both. This can be achieved with an n-complete test suite for
|
||
qs ∧ qs ; this will be explained in Sect. 4.1. Note that even for an implementation
|
||
which does not conform to one of the specifications, n-complete testing is needed.
|
||
Such an implementation may be distinguished, but it is unknown how, due to
|
||
compatibility. See for example the specification and implementation of Fig. 1.
|
||
State 2 of the implementation can only be distinguished from state 3 by observing
|
||
ax, which is non-conforming behavior for state 2. Although y would also be nonconforming for state 2, this behavior is not observed.
|
||
In case that a non-conformance to the merged specification is found with an
|
||
n-complete test suite, then the outcome is similar to that of a distinguishing tree
|
||
for incompatible states: we have disproven conformance to one of the individual
|
||
specifications (or to both).
|
||
|
||
4
|
||
|
||
Test Suite Definition
|
||
|
||
The number n of an n-complete test suite T of a specification S tells how many
|
||
states an implementation I is allowed to have to give the guarantee that I ioco
|
||
|
||
98
|
||
|
||
P. van den Bos et al.
|
||
|
||
S after passing T (we will define passing a test suite later). To do this, we must
|
||
only count the states relevant for conformance.
|
||
Definition 10. Let S = (Qs , LI , LO , T, q0s ) ∈ SA, and I = (Qi , LI , LO , Ti , q0i ) ∈
|
||
SAIE . Then,
|
||
– A state qs ∈ Qs is reachable if ∃σ ∈ L∗ : S after σ = qs .
|
||
– A state qi ∈ Qi is specified if ∃σ ∈ Straces(S) : I after σ = qi . A transition
|
||
(qi , μ, qi ) ∈ Ti is specified if qi is specified, and if either μ ∈ LO , or μ ∈
|
||
LI ∧ ∃σ ∈ L∗ : I after σ = qi ∧ σμ ∈ Straces(S).
|
||
– We denote the number of reachable states of S with |S|, and the number of
|
||
specified, reachable states of I with |I|.
|
||
Definition 11. Let S ∈ SA be a specification. Then a test suite T for S is
|
||
n-complete if for each implementation I: I passes T =⇒ (I ioco S ∨ |I| > n).
|
||
In particular, |S|-complete means that if an implementation passes the test
|
||
suite, then the implementation is correct (w.r.t. ioco) or it has strictly more
|
||
states than the specification. Some authors use the convention that n denotes
|
||
the number of extra states (so the above would be called 0-completeness).
|
||
To define a full complete test suite, we first define sets of distinguishing
|
||
experiments.
|
||
Definition 12. Let (Q, LI , LO , T, q0 ) ∈ SA. For any state q ∈ Q, we choose a
|
||
set W (q) of distinguishing experiments, such that for all q ∈ Q with q = q :
|
||
– if q ♦ q , then W (q) contains a distinguishing tree for D ⊆ Q, s.t. q, q ∈ D.
|
||
– if q ♦ q , then W (q) contains a complete test suite for q ∧ q .
|
||
Moreover, we need sequences to access all specified, reachable implementation
|
||
states. After such sequences distinguishing experiments can be executed. We will
|
||
defer the explicit construction of the set of access sequences. For now we assume
|
||
some set P of access sequences to exist.
|
||
Definition 13. Let S ∈ SA and I ∈ SAIE . Let P be a set of access sequences
|
||
and let P + = {σ ∈ P ∪ P · L | S after σ = ∅}. Then the distinguishing test suite
|
||
is defined as T = {στ | σ ∈ P + , τ ∈ W (q0 after σ)}. An element t ∈ T is a test.
|
||
4.1
|
||
|
||
Distinguishing Experiments for Compatible States
|
||
|
||
The distinguishing test suite relies on executing distinguishing experiments. If
|
||
a specification contains compatible states, the test suite contains distinguishing
|
||
experiments which are themselves n-complete test suites. This is thus a recursive
|
||
construction: we need to show that such a test suite is finite. For particular
|
||
specifications, recursive repetition of the distinguishing test suite as described
|
||
above is already finite. For example, specification S in Fig. 1 contains compatible
|
||
states, but in the merge of every two compatible states, no further compatible
|
||
states remain. A test suite for S needs to distinguish states 2 and 3. For this
|
||
|
||
n-Complete Test Suites for IOCO
|
||
|
||
99
|
||
|
||
purpose, it uses an n-complete test suite for 2 ∧ 3, which contains no compatible
|
||
states, and thus terminates by only containing distinguishing trees.
|
||
However, the merge of two compatible states may in general again contain
|
||
compatible states. In these cases, recursive repetition of distinguishing test suites
|
||
may not terminate. An alternative unconditional n-complete test suite may be
|
||
constructed using state counting methods [4], as shown in the next section.
|
||
Although inefficient, it shows the possibility of unconditional termination. The
|
||
recursive strategy thus may serve as a starting point for other, efficient constructions for n-complete test suites.
|
||
Unconditional n-complete Test Suites. We introduce Lemma 16 to bound
|
||
test suite execution. We first define some auxiliary definitions.
|
||
Definition 14. Let S ∈ SA, σ ∈ L∗ , and x ∈ LO . Then σx is an iococounterexample if S after σ = ∅, x ∈out(S after σ).
|
||
Naturally, I ioco S if and only if Straces(I) contains no ioco-counterexample.
|
||
Definition 15. Let S = (Qs , LI , LO , Ts , qs0 ) ∈ SA and I ∈ SAIE . A trace σ ∈
|
||
Straces(S) is short if ∀qs ∈ Qs : |{ρ | ρ is a prefix of σ ∧ qs0 after ρ = qs }| ≤ |I|.
|
||
S, then Straces(I) contains
|
||
Lemma 16. Let S ∈ SA and I ∈ SAIE . If I ioco
|
||
|
||
a short ioco-counterexample.
|
||
S, then Straces(I) must contain an ioco-counterexample σ. If
|
||
Proof. If I ioco
|
||
|
||
σ is short, the proof is trivial, so assume it is not. Hence, there exists a state
|
||
qs , with at least |I| + 1 prefixes of σ leading to qs . At least two of those prefixes
|
||
ρ and ρ must lead to the same implementation state, i.e. it holds that qi0 after
|
||
ρ = qi0 after ρ and qs0 after ρ = qs0 after ρ . Assuming |ρ| < |ρ | without loss
|
||
of generality, we can thus create an ioco-counterexample σ shorter than σ by
|
||
|
||
replacing ρ by ρ. If σ is still not short, we can repeat this process until it is.
|
||
We can use Lemma 16 to bound exhaustive testing to obtain n-completeness.
|
||
When any specification state is visited |I| + 1 times with any trace, then any
|
||
extensions of this trace will not be short, and we do not need to test them.
|
||
Fairness allows us to test all short traces which are present in the implementation.
|
||
Corollary 17. Given a specification S the set of all traces of length at most
|
||
|S| ∗ n is an n-complete test suite.
|
||
Example 18. Figure 3 shows an example of a non-conforming implementation
|
||
with a counterexample yyxyyxyyxyyx, of maximal length 4 · 3 = 12.
|
||
4.2
|
||
|
||
Execution of Test Suites
|
||
|
||
A test στ is executed by following σ, and then executing the distinguishing
|
||
experiment τ . If the implementation chooses any output deviating from σ, then
|
||
the test gives a reset and should be reattempted. Finishing τ may take several
|
||
|
||
100
|
||
|
||
P. van den Bos et al.
|
||
x
|
||
1
|
||
y
|
||
4
|
||
|
||
y
|
||
y
|
||
x
|
||
|
||
2
|
||
y
|
||
|
||
x
|
||
3
|
||
|
||
(a) Specification S.
|
||
|
||
x
|
||
3
|
||
|
||
1
|
||
y
|
||
|
||
y
|
||
2
|
||
|
||
(b) Implementation I.
|
||
|
||
Fig. 3. A specification, and a non-conforming implementation.
|
||
|
||
executions: a distinguishing tree may give a reset, and an n-complete test suite
|
||
to distinguish compatible states may contain multiple tests. Therefore σ needs
|
||
to be run multiple times, in order to allow full execution of the distinguishing
|
||
experiment. By assuming fairness, every distinguishing experiment is guaranteed
|
||
to termininate, and thus also every test.
|
||
The verdict of a test suite T for specification S is concluded simply by checking for observed ioco-counterexamples to S during execution. When executing
|
||
a distinguishing experiment w as part of T, the verdict of w is ignored when
|
||
concluding a verdict for T: we only require w to be fully executed, i.e. be reattempted if it gives a reset, until it gives a pass or fail. For example, if σ leads
|
||
to specification state q, and q needs to be distinguished from compatible state
|
||
q , a test suite T for q ∧ q is needed to distinguished q and q . If T finds a
|
||
non-conformance to either q or q , it yields fail. Only in the former case, T will
|
||
also yield fail, and in the latter case, T will continue with other tests: q and
|
||
q have been successfully distinguished, but no non-conformance to q has been
|
||
found. If all tests have been executed in this manner, T will conclude pass.
|
||
4.3
|
||
|
||
Access Sequences
|
||
|
||
In FSM-based testing, the set P for reaching all implementation states is taken
|
||
care of rather efficiently. The set P is constructed by choosing a word σ for
|
||
each specification state, such that σ leads to that state (note the FSMs are fully
|
||
deterministic). By passing the tests P · W , where W is a set of distinguishing
|
||
experiment for every reached state, we know the implementation has at least
|
||
some number of states (by observing that many different behaviors). By passing
|
||
tests P ·L·W we also verify that every transition has the correct
|
||
destination state.
|
||
By extending these tests to P · L≤k+1 · W (where L≤k+1 = m∈{0,··· ,k+1} Lm ),
|
||
we can reach all implementation states if the implementation has at most k
|
||
more states than the specification. For suspension automata, however, things
|
||
are more difficult for two reasons: (1) A specification state may be reachable
|
||
only if an implementation chooses to implement a particular, optional output
|
||
transition (in which case this state is not certainly reachable [10]), and (2) if
|
||
the specification has compatible states, the implementation may implement two
|
||
specification states with a single implementation state.
|
||
Consider Fig. 4 for an example. An implementation can omit state 2 of the
|
||
specification, as shown in Fig. 4b. Now Fig. 4c shows a fault not found by a test
|
||
|
||
n-Complete Test Suites for IOCO
|
||
|
||
101
|
||
|
||
suite P · L≤1 · W : if we take y ∈ P , z ∈ L, and observe z ∈ W (3), we do not
|
||
reach the faulty y transition in the implementation. So by leaving out states, we
|
||
introduce an opportunity to make a fault without needing more states than the
|
||
specification. This means that we may need to increase the size of the test suite
|
||
in order to obtain the desired completeness. In this example, however, a test
|
||
suite P · L≤2 · W is enough, as the test suite will contain a test with yzz ∈ P · L2
|
||
after which the faulty output y ∈ W (3) will be observed.
|
||
|
||
x
|
||
y
|
||
|
||
2
|
||
|
||
1
|
||
|
||
3
|
||
|
||
(a) Specification S.
|
||
|
||
z
|
||
|
||
y
|
||
|
||
z
|
||
|
||
y
|
||
|
||
y
|
||
|
||
z
|
||
(b) Conforming implementation.
|
||
|
||
z
|
||
(c) Non-conforming
|
||
implementation.
|
||
|
||
Fig. 4. A specification with not certainly reachable states 2 and 3.
|
||
|
||
Clearly, we reach all states in a n-state implementation for any specification
|
||
S, by taking P to be all traces in Straces(S) of at most length n. This set P
|
||
can be constructed by simple enumeration. We then have that the traces in the
|
||
set P will reach all specified, reachable states in all implementations I such that
|
||
|I| ≤ n. In particular this will mean that P + reaches all specified transitions.
|
||
Although this generates exponentially many sequences, the length is substantially shorter than the sequences obtained by the unconditional n-complete test
|
||
suite. We conjecture that a much more efficient construction is possible with a
|
||
careful analysis of compatible states and the not certainly reachable states.
|
||
4.4
|
||
|
||
Completeness Proof for Distinguishing Test Suites
|
||
|
||
We let T be the distinguishing test suite as defined in Definition 13. As discussed
|
||
before, if q and q are compatible, the set W (q) can be defined using another
|
||
complete test suite. If the test suite is again a distinguishing test suite, completeness of it is an induction hypothesis. If, on the other hand, the unconditional
|
||
n-complete test suite is used, completeness is already guaranteed (Corollary 17).
|
||
Theorem 19. Let S = (Qs , LI , LO , Ts , q0s ) ∈ SA be a specification. Let T be a
|
||
distinguishing test suite for S. Then T is n-complete.
|
||
Proof. We will show that for any implementation of the correct size and which
|
||
passes the test suite we can build a coinductive ioco relation which contain the
|
||
initial states. As a basis for that relation we take the states which are reached by
|
||
the set P . This may not be an ioco relation, but by extending it (in two steps)
|
||
we obtain a full ioco relation. Extending the relation is an instance of a so-called
|
||
up-to technique, we will use terminology from [2].
|
||
|
||
102
|
||
|
||
P. van den Bos et al.
|
||
|
||
More precisely, Let I = (Qi , LI , LO , Ti , q0i ) ∈ SAIE be an implementation
|
||
with |I| ≤ n which passes T. By construction of P , all reachable specified implementation states are reached by P and so all specified transitions are reached
|
||
by P + .
|
||
The set P defines a subset of Qi × Qs , namely R = {(q0i after σ, q0s
|
||
after σ) | σ ∈ P }. We add relations for all equivalent states: R = {(i, s) |
|
||
(i, s ) ∈ R, s ∈ Qs , s ≈ s }. Furthermore, let J = {(i, s, s ) | i ∈ Qi , s, s ∈
|
||
Qs such that i ioco s ∧ i ioco
|
||
s } and Ri,s,s be the ioco relation for i ioco s ∧
|
||
|
||
|
||
i ioco s , now define R = R ∪ (i,s,s )∈J Ri,s,s . We want to show that R defines
|
||
a coinductive ioco relation. We do this by showing that R progresses to R.
|
||
Let (i, s) ∈ R. We assume that we have seen all of out(i) and that
|
||
out(i) ⊆ out(s) (this is taken care of by the test suite and the fairness assumption). Then, because we use P + , we also reach the transitions after i. We need
|
||
to show that the input and output successors are again related.
|
||
– Let a ∈ LI . Since I is input-enabled we have a transition for a with i after
|
||
a = i2 . Suppose there is a transition for a from s: s after a = s2 (if not, then
|
||
we’re done). We have to show that (i2 , s2 ) ∈ R.
|
||
– Let x ∈ LO . Suppose there is a transition for x: i after x = i2 Then (since
|
||
out(i)⊆out(s)) there is a transition for x from s: s after x = s2 . We have to
|
||
show that (i2 , s2 ) ∈ R.
|
||
In both cases we have a successor (i2 , s2 ) which we have to prove to be in R. Now
|
||
since P reaches all states of I, we know that (i2 , s2 ) ∈ R for some s2 . If s2 ≈ s2
|
||
then (i2 , s2 ) ∈ R ⊆ R holds trivially, so suppose that s2 ≈ s2 . Then there exists
|
||
a distinguishing experiment w ∈ W (s2 ) ∩ W (s2 ) which has been executed in i2 ,
|
||
namely in two tests: a test σw for some σ ∈ P + with S after σ = s2 , and a test
|
||
σ w for some σ ∈ P with S after σ . Then there are two cases:
|
||
– If s2 ♦ s2 then w is a distinguishing tree separating s2 and s2 . Then there is
|
||
a sequence ρ taken in w of the test σw, i.e. w after ρ reaches a pass state
|
||
of w, and similarly there is a sequence ρ that is taken in w of the test σ w.
|
||
By construction of distinguishing trees, ρ must be an ioco-counterexample for
|
||
either s2 or s2 , but because T passed this must be s2 . Similarly, ρ disproves
|
||
s2 . One implementation state can implement at most one of {ρ, ρ }. This
|
||
contradicts that the two tests passed, so this case cannot happen.
|
||
– If s2 ♦ s2 (but s2 ≈ s2 as assumed above), then w is a test suite itself for
|
||
s2 ∧ s2 . If w passed in both tests then i2 ioco s2 and i2 ioco s2 , and hence
|
||
(i2 , s2 ) ∈ Ri,s2 ,s2 ⊆ R. If w failed in one of the tests σw or σ w, then i2 does
|
||
not conform to both s2 and s2 , and hence w also fails in the other test. So
|
||
again, there is a counterexample ρ for s2 and ρ for s2 . One implementation
|
||
state can implement at most one of {ρ, ρ }. This contradicts that the two
|
||
tests passed, so this case cannot happen.
|
||
We have now seen that R progresses to R. It is clear that R progresses to R
|
||
too. Then, since each Ri,s,s is an ioco relation, they progress to Ri,s,s ⊆ R. And
|
||
so the union, R, progresses to R, meaning that R is a coinductive ioco relation.
|
||
|
||
|
||
Furthermore, we have (i0 , s0 ) ∈ R (because ∈ P ), concluding the proof.
|
||
|
||
n-Complete Test Suites for IOCO
|
||
|
||
103
|
||
|
||
We remark that if the specification does not contain any compatible states,
|
||
that the proof can be simplified a lot. In particular, we do not need n-complete
|
||
test suites for merges of states, and we can use the relation R instead of R.
|
||
|
||
5
|
||
|
||
Constructing Distinguishing Trees
|
||
|
||
Lee and Yannakakis proposed an algorithm for constructing adaptive distinguishing sequences for FSMs [5]. With a partition refinement algorithm, a splitting
|
||
tree is build, from which the actual distinguishing sequence is extracted.
|
||
A splitting tree is a tree of which each node is identified with a subset of the
|
||
states of the specification. The set of states of a child node is a (strict) subset of
|
||
the states of its parent node. In contrast to splitting trees for FSMs, siblings may
|
||
overlap: the tree does not describe a partition refinement. We define leaves(Y )
|
||
as the set of leaves of a tree Y . The algorithm will split the leaf nodes, i.e. assign
|
||
children to every leaf node. If all leaves are identified with a singleton set of
|
||
states, we can distinguish all states of the root node.
|
||
Additionally, every non-leaf node is associated with a set of labels from L. We
|
||
denote the labels of node D with labels(D). The distinguishing tree that is going
|
||
to be constructed from the splitting tree is built up from these labels. As argued
|
||
in Sect. 3.2, we require injective distinguishing trees, thus our splitting trees only
|
||
contain injective labels, i.e. injective(labels(D), D) for all non-leaf nodes D.
|
||
Below we list three conditions that describe when it is possible to split the
|
||
states of a leaf D, i.e. by taking some transition, we are able to distinguish some
|
||
states from the other states of D. We will see later how a split is done. If the
|
||
first condition is true, at least one state is immediately distinguished from all
|
||
other states. The other two conditions describe that a leaf D can be split if after
|
||
an input or all outputs some node D is reached that already is split, i.e. D is
|
||
a non-leaf node. Consequently, a split for condition 1 should be done whenever
|
||
possible, and otherwise a split for condition 2 or 3 can be done. Depending on
|
||
the implementation one is testing, one may prefer splitting with either condition
|
||
2 or 3, when both conditions are true.
|
||
We present each condition by first giving an intuitive description in words,
|
||
and then a more formal definition. With Π(A) we denote the set of all non-trivial
|
||
partitions of a set of states A.
|
||
Definition 20. A leaf D of tree Y can be split if one of the following conditions
|
||
hold:
|
||
1. All outputs are enabled in some but not in all states.
|
||
∀x ∈ out(D) : injective(x, D) ∧ ∃d ∈ D : d after x = ∅
|
||
2. Some states reach different leaves than other states for all outputs.
|
||
∀x ∈ out(D) : injective(x, D) ∧ ∃P ∈ Π(D), ∀d, d ∈ P :
|
||
(d = d =⇒ ∀l ∈ leaves(Y ) : l ∩ d after x = ∅ ∨ l ∩ d after x = ∅)
|
||
|
||
104
|
||
|
||
P. van den Bos et al.
|
||
|
||
3. Some states reach different leaves than other states for some input.
|
||
∃a ∈ in(D) : injective(a, D) ∧ ∃P ∈ Π(D), ∀d, d ∈ P :
|
||
(d = d =⇒ ∀l ∈ leaves(Y ) : l ∩ d after a = ∅ ∨ l ∩ d after a = ∅)
|
||
Algorithm 1 shows how to split a single leaf of the splitting tree (we chose
|
||
arbitrarily to give condition 2 a preference over condition 3). A splitting tree is
|
||
constructed in the following manner. Initially, a splitting tree is a leaf node of
|
||
the state set from the specification. Then, the full splitting tree is constructed by
|
||
splitting leaf nodes with Algorithm 1 until no further splits can be made. If all
|
||
leaves in the resulting splitting tree are singletons, the splitting tree is complete
|
||
and a distinguishing tree can be constructed (described in the next section).
|
||
Otherwise, no distinguishing tree exists. Note that the order of the splits is left
|
||
unspecified.
|
||
|
||
1
|
||
2
|
||
3
|
||
4
|
||
5
|
||
6
|
||
7
|
||
8
|
||
9
|
||
10
|
||
11
|
||
12
|
||
13
|
||
14
|
||
15
|
||
|
||
Input: A specification S = (Q, LI , LO , T, q0 ) ∈ SA
|
||
Input: The current (unfinished) splitting tree Y
|
||
Input: A leaf node D from Y
|
||
if Condition 1 holds for D then
|
||
P := {D after x | x ∈ out(D)};
|
||
labels(D) := out(D);
|
||
Add the partition blocks of P as children of D;
|
||
else if Condition 2 holds for D then
|
||
labels(D) := out(D);
|
||
foreach x ∈ out(D) do
|
||
P := the finest partition for Condition 2 with D and x;
|
||
Add the partition blocks of P as children of D;
|
||
end
|
||
else if Condition 3 holds for D with input a then
|
||
P := the finest partition for Condition 3 with D and a;
|
||
labels(D) := {a};
|
||
Add the partition blocks of P as children of D;
|
||
return Y ;
|
||
|
||
Algorithm 1. Algorithm for splitting a leaf node of a splitting tree.
|
||
|
||
Example 21. Let us apply Algorithm 1 on the suspension automaton in Fig. 5a.
|
||
Figure 5b shows the resulting splitting tree. We initialize the root node to
|
||
{1, 2, 3, 4, 5}. Condition 1 applies, since states 1 and 5 only have output y
|
||
enabled, while states 2, 3 and 4 only have outputs x and z enabled. Thus, we
|
||
add leaves {1, 5} and {2, 3, 4}.
|
||
We can split {1, 5} by taking an output transition for y according to condition
|
||
2, as 1 after y = 4 ∈ {2, 3, 4}, while 5 after y = 1 ∈ {1, 5}, i.e. 1 and 5 reach
|
||
different leaves. Condition 2 also applies for {2, 3, 4}. We have that {2, 3} after
|
||
x = {2, 4} ⊆ {2, 3, 4} while 4 after x = 5 ∈ {5}. Hence we obtain children {4}
|
||
|
||
n-Complete Test Suites for IOCO
|
||
|
||
105
|
||
|
||
{1,2,3,4,5}: x, y, z
|
||
y
|
||
|
||
5
|
||
|
||
1
|
||
|
||
a
|
||
|
||
2
|
||
|
||
a
|
||
|
||
y
|
||
|
||
x
|
||
|
||
z
|
||
|
||
4
|
||
|
||
z
|
||
z
|
||
|
||
x
|
||
|
||
x
|
||
|
||
{2,3,4}: x, z
|
||
|
||
{1,5}: y
|
||
a
|
||
{1}
|
||
|
||
3
|
||
|
||
(a) Example specification with
|
||
mutually incompatible states.
|
||
|
||
{5}
|
||
|
||
{4} {2,3}: a {2} {3,4}: a
|
||
{2} {3}
|
||
|
||
{3} {4}
|
||
|
||
(b) Splitting tree of Figure 5a.
|
||
|
||
Fig. 5. Specification and its splitting tree.
|
||
|
||
and {2, 3} for output x. For z we have that 2 after z = 1 ∈ {1} while {3, 4}
|
||
after z = {3, 4} ⊆ {2, 3, 4}, so we obtain children {2} and {3, 4} for z.
|
||
We can split {2,3} by taking input transition a according to condition 3,
|
||
since 2 after a = 4 and 3 after a = 2, and no leaf of the splitting tree contains
|
||
both state 2 and state 4. Note that we could also have split on output transitions
|
||
x and z. Node {3, 4} cannot be split for output transition z, since {3, 4} after
|
||
z = {3, 4} which is a leaf, and hence condition 2 does not hold. However node
|
||
{3, 4} can be split for input transition a, as 3 after a = 2 and 4 after a = 4.
|
||
Now all leaves are singleton, so we can distinguish all states with this tree.
|
||
A distinguishing tree Y ∈ DT (LI , LO , D) for D can be constructed from a
|
||
splitting tree with singleton leaf nodes. This follows the structure in Definition 8,
|
||
and we only need to choose whether to provide an input, or whether to observe
|
||
outputs. We look at the lowest node D in the split tree such that D ⊆ D .
|
||
|
||
x
|
||
|
||
{1,2,3,4,5}
|
||
y
|
||
|
||
{2,4,5}
|
||
x y z
|
||
|
||
z
|
||
{1,3,4}
|
||
x y
|
||
|
||
{1,4}
|
||
x y
|
||
z
|
||
|
||
{1} {1,3} {5} {4} {3} {2,5}
|
||
{4,5}
|
||
y
|
||
x
|
||
z
|
||
x y
|
||
z
|
||
x y
|
||
z
|
||
{5} {1} {3} {2} {4} {4}
|
||
|
||
z
|
||
|
||
{4}
|
||
|
||
{3,4}
|
||
a
|
||
|
||
x
|
||
|
||
y
|
||
|
||
z
|
||
|
||
{4} {1} {1} {2,4} reset reset reset
|
||
y
|
||
x
|
||
z
|
||
{4,5}
|
||
x y
|
||
z
|
||
{5}
|
||
|
||
{1} {3}
|
||
|
||
∅
|
||
|
||
{1,3}
|
||
x y
|
||
z
|
||
{2}
|
||
|
||
{4} {4}
|
||
|
||
Fig. 6. Distinguishing tree of Fig. 5a. The states are named by the sets of states which
|
||
they distinguish. Singleton and empty sets are the pass states. Self-loops in verdict
|
||
states have been omitted, for brevity.
|
||
|
||
106
|
||
|
||
P. van den Bos et al.
|
||
|
||
If labels(D ) has an input, then Y has a transition for this input, and a transition
|
||
to reset for all outputs. If labels(D ) contains outputs, then Y has a transition for
|
||
all outputs. In this manner, we recursively construct states of the distinguishing
|
||
tree until |D| ≤ 1, in which case we have reached a pass state. Figure 6 shows
|
||
the distinguishing tree obtained from the splitting tree in Fig. 5b.
|
||
|
||
6
|
||
|
||
Conclusions
|
||
|
||
We firmly embedded theory on n-complete test suites into ioco theory, without making any restricting assumptions. We have identified several problems
|
||
where classical FSM techniques fail for suspension automata, in particular for
|
||
compatible states. An extension of the concept of distinguishing states has been
|
||
introduced such that compatible states can be handled, by testing the merge
|
||
of such states. This requires that the merge itself does not contain compatible
|
||
states. Furthermore, upper bounds for several parts of a test suite have been
|
||
given, such as reaching all states in the implementation.
|
||
These upper bounds are exponential in the number of states, and may limit
|
||
practical applicability. Further investigation is needed to efficiently tackle these
|
||
parts of the test suite. Alternatively, looser notions for completeness may circumvent these problems. Furthermore, experiments are needed to compare our
|
||
testing method and random testing as in [11] quantitatively, in terms of efficiency
|
||
of computation and execution time, and the ability to find bugs, preferably on
|
||
a real world case study.
|
||
|
||
|
||
Residual Nominal Automata
|
||
Joshua Moerman
|
||
RTWH Aachen University, Germany
|
||
|
||
Matteo Sammartino
|
||
Royal Holloway University of London, UK
|
||
University College London, UK
|
||
|
||
Abstract
|
||
We are motivated by the following question: which nominal languages admit an active learning
|
||
algorithm? This question was left open in previous work, and is particularly challenging for languages
|
||
recognised by nondeterministic automata. To answer it, we develop the theory of residual nominal
|
||
automata, a subclass of nondeterministic nominal automata. We prove that this class has canonical
|
||
representatives, which can always be constructed via a finite number of observations. This property
|
||
enables active learning algorithms, and makes up for the fact that residuality – a semantic property
|
||
– is undecidable for nominal automata. Our construction for canonical residual automata is based on
|
||
a machine-independent characterisation of residual languages, for which we develop new results in
|
||
nominal lattice theory. Studying residuality in the context of nominal languages is a step towards a
|
||
better understanding of learnability of automata with some sort of nondeterminism.
|
||
2012 ACM Subject Classification Theory of computation → Automata over infinite objects; Theory
|
||
of computation → Automated reasoning
|
||
Keywords and phrases nominal automata, residual automata, derivative language, decidability,
|
||
closure, exact learning, lattice theory
|
||
Digital Object Identifier 10.4230/LIPIcs.CONCUR.2020.44
|
||
Related Version Full version at https://arxiv.org/abs/1910.11666.
|
||
Funding ERC AdG project 787914 FRAPPANT, EPSRC Standard Grant CLeVer (EP/S028641/1).
|
||
Acknowledgements We would like to thank Gerco van Heerdt for providing examples similar to
|
||
that of Lr in the context of probabilistic automata. We thank Borja Balle for references on residual
|
||
probabilistic languages, and Henning Urbat for discussions on nominal lattice theory. Lastly, we
|
||
thank the reviewers of a previous version of this paper for their interesting questions and suggestions.
|
||
|
||
1
|
||
|
||
Introduction
|
||
|
||
Formal languages over infinite alphabets have received considerable attention recently. They
|
||
include data languages for reasoning about XML databases [32], trace languages for analysis
|
||
of programs with resource allocation [18], and behaviour of programs with data flows [19].
|
||
Typically, these languages are accepted by register automata, first introduced in the seminal
|
||
paper [20]. Another appealing model is that of nominal automata [6]. While nominal automata
|
||
are as expressive as register automata, they enjoy convenient properties. For example, the
|
||
deterministic ones admit canonical minimal models, and the theory of formal languages and
|
||
many textbook algorithms generalise smoothly.
|
||
In this paper, we investigate the properties of so-called residual nominal automata. An
|
||
automaton accepting a language L is residual whenever the language of each state is a
|
||
derivative of L. In the context of regular languages over finite alphabets, residual finite state
|
||
automata (RFSAs) are a subclass of nondeterministic finite automata (NFAs) introduced by
|
||
Denis et al. [14] as a solution to the well-known problem of NFAs not having unique minimal
|
||
representatives. They show that every regular language L admits a unique canonical RFSA.
|
||
© Joshua Moerman and Matteo Sammartino;
|
||
licensed under Creative Commons License CC-BY
|
||
31st International Conference on Concurrency Theory (CONCUR 2020).
|
||
Editors: Igor Konnov and Laura Kovács; Article No. 44; pp. 44:1–44:21
|
||
Leibniz International Proceedings in Informatics
|
||
Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl Publishing, Germany
|
||
|
||
44:2
|
||
|
||
Residual Nominal Automata
|
||
|
||
Nondeterministic
|
||
Residual
|
||
|
||
Nondeterministic−
|
||
|
||
Residual−
|
||
Deterministic
|
||
Figure 1 Relationship between classes of nominal languages. Edges are strict inclusions. With ·−
|
||
we denote classes where automata are not allowed to guess values, i.e., to store symbols in registers
|
||
without explicitly reading them.
|
||
|
||
Residual automata play a key role in the context of exact learning 1 , in which one computes
|
||
an automaton representation of an unknown language via a finite number of observations.
|
||
The defining property of residual automata allows one to (eventually) observe the semantics
|
||
of each state independently. In the finite-alphabet setting, residuality underlies the seminal
|
||
algorithm L? for learning deterministic automata [1] (deterministic automata are always
|
||
residual), and enables efficient algorithms for learning nondeterministic [8] and alternating
|
||
automata [2, 3]. Residuality has also been studied for learning probabilistic automata [13].
|
||
Existence of canonical residual automata is crucial for the convergence of these algorithms.
|
||
Our investigation of residuality in the nominal setting is motivated by the following
|
||
question: which nominal languages admit an exact learning algorithm? In previous work [28],
|
||
we have shown that the L? algorithm generalises smoothly to nominal languages, meaning that
|
||
deterministic nominal automata can be learned. However, the general non-deterministic case
|
||
proved to be significantly more challenging. In fact, in stark contrast with the finite-alphabet
|
||
case, nondeterministic nominal automata are strictly more expressive than deterministic
|
||
ones, thus residual automata are not just succinct representations of deterministic languages.
|
||
As a consequence, our attempt to generalise the NL? algorithm for nondeterministic finite
|
||
automata to the nominal setting did not fully succeed: we could only prove that it works for
|
||
deterministic languages, leaving the nondeterministic case open. By investigating residual
|
||
languages, and how they relate to deterministic and nondeterministic ones, we are finally
|
||
able to settle this case.
|
||
In summary, our contributions are as follows:
|
||
Section 3: We refine nominal languages as depicted in Figure 1, by giving separating
|
||
languages for each class.
|
||
Section 4: We develop new results of nominal lattice theory, and we provide the main
|
||
characterisation theorem (Theorem 4.10), showing that the class of residual languages
|
||
allow for canonical automata which: a) are minimal in their respective class and unique
|
||
(up to isomorphism); b) can be constructed via a finite number of observations of the
|
||
language. Both properties are crucial for learning. We prove this important result by
|
||
a machine-independent characterisation of those classes of languages. We also give an
|
||
analogous result for non-guessing languages (Theorem 4.16).
|
||
Section 5: We study decidability and closure properties. Many decision problems, such as
|
||
equivalence and universality, are known to be undecidable for nondeterministic nominal
|
||
automata. For residual automata, we show that universality becomes decidable. However,
|
||
the problem of whether an automaton is residual is undecidable.
|
||
|
||
1
|
||
|
||
Exact learning is also known as query learning or active (automata) learning [1].
|
||
|
||
J. Moerman and M. Sammartino
|
||
|
||
44:3
|
||
|
||
Section 6: We settle important open questions about exact learning of nominal languages.
|
||
We show that residuality does not imply convergence of existing algorithms, and we give
|
||
a (modified) NL? -style algorithm that works precisely for residual languages.
|
||
This research mirrors that of residual probabilistic automata [13]. There, too, one has
|
||
distinct classes of which the deterministic and residual ones admit canonical automata
|
||
and have an algebraic characterisation. We believe that our results contribute to a better
|
||
understanding of learnability of automata with some sort of nondeterminism.
|
||
|
||
2
|
||
|
||
Preliminaries
|
||
|
||
We recall the notions of nominal sets [33] and nominal automata [6]. Let A be a countably
|
||
infinite set of atoms 2 and let Perm(A) be the set of permutations on A, i.e., the bijective
|
||
functions π : A → A. Permutations form a group where the unit is given by the identity
|
||
function, the inverse by functional inverse, and multiplication by function composition.
|
||
A nominal set is a set X equipped with a function · : Perm(A) × X → X, interpreting
|
||
permutations over X. This function must be a group action of Perm(A), i.e., it must satisfy
|
||
id ·x = x and π · (π 0 · x) = (π ◦ π 0 ) · x. We say that a set A ⊂ A supports x ∈ X whenever
|
||
π · x = x for all π fixing A, i.e., such that π|A = idA . We require for nominal sets that each
|
||
element x has a finite support. We denote by supp(x) the smallest finite set supporting x.
|
||
The orbit orb(x) of x ∈ X is the set of elements in X reachable from x via permutations:
|
||
orb(x) := {π · x | π ∈ Perm(A)}. X is orbit-finite whenever it is a finite union of orbits.
|
||
Orbit-finite sets are finitely-representable, hence algorithmically tractable [5].
|
||
Given a nominal set X, a subset Y ⊆ X is equivariant if it is preserved by permutations,
|
||
i.e., π · Y = Y , for all π ∈ Perm(A), where π acts element-wise. This definition extends
|
||
to relations and functions. For instance, a function f : X → Y between nominal sets is
|
||
equivariant whenever π · f (x) = f (π · x). Given a nominal set X, the nominal power set is
|
||
defined as Pfs (X) := {U ⊆ X | U is finitely supported}.
|
||
We recall the notion of nominal automaton from [6]. The theory of nominal automata seamlessly extends classical automata theory by having orbit-finite nominal sets and equivariant
|
||
functions in place of finite sets and functions.
|
||
I Definition 2.1. A (nondeterministic) nominal automaton A consists of: an orbit-finite
|
||
nominal set Σ, the alphabet; an orbit-finite nominal set of states Q; equivariant subsets
|
||
I, F ⊆ Q of initial and final states; and an equivariant subset δ ⊆ Q × Σ × Q of transitions.
|
||
The usual notions of acceptance and language apply. We denote the language of A
|
||
by L(A), and the language accepted by a state q ∈ Q by L(q). Note that the language
|
||
L(A) ∈ Pfs (Σ∗ ) is equivariant, and that L(q) ∈ Pfs (Σ∗ ) need not be equivariant, but it is
|
||
supported by supp(q).
|
||
We recall the notion of derivative language [14].3
|
||
I Definition 2.2. Given a language L and a word u ∈ Σ∗ , we define the derivative of L w.r.t.
|
||
u as u−1 L := {w | uw ∈ L} and the set of all derivatives as Der(L) := u−1 L | u ∈ Σ∗ .
|
||
These definitions seamlessly extend to the nominal setting. Note that w−1 L is finitely
|
||
supported whenever L is.
|
||
2
|
||
3
|
||
|
||
Sometimes these are called data values.
|
||
This is sometimes called a residual language or left quotient. We do not use the term residual language
|
||
here, because residual language will mean a language accepted by a residual automaton.
|
||
|
||
CONCUR 2020
|
||
|
||
44:4
|
||
|
||
Residual Nominal Automata
|
||
|
||
Of special interest are the deterministic, residual, and non-guessing nominal automata,
|
||
which we introduce next.
|
||
I Definition 2.3. A nominal automaton A is:
|
||
Deterministic if I = {q0 }, and for each q ∈ Q and a ∈ Σ there is a unique q 0 such that
|
||
(q, a, q 0 ) ∈ δ. In this case, the relation is in fact functional δ : Q × Σ → Q.
|
||
Residual if each state q ∈ Q accepts a derivative of L(A), formally: L(q) = w−1 L(A) for
|
||
some word w ∈ Σ∗ . The words w such that L(q) = w−1 L(A) are called characterising
|
||
words for the state q.
|
||
Non-guessing if supp(q0 ) = ∅, for each q0 ∈ I, and supp(q 0 ) ⊆ supp(q) ∪ supp(a), for each
|
||
(q, a, q 0 ) ∈ δ.
|
||
Observe that the transition function of a deterministic automaton preserves supports (i.e., if
|
||
C supports (q, a) then C also supports δ(q, a)). Consequently, all deterministic automata are
|
||
non-guessing. For the sake of succinctness, in the following we drop the qualifier “nominal”
|
||
when referring to these classes of nominal automata.
|
||
For many examples, it is useful to define the notion of an anchor. Given a state q, a word
|
||
w is an anchor if δ(I, w) = {q}, that is, the word w leads to q and no other state. Every
|
||
anchor for q is also a characterising word for q (but not vice versa).
|
||
Finally, we recall the Myhill-Nerode theorem for nominal automata.
|
||
I Theorem 2.4 ([6, Theorem 5.2]). Let L be a language. Then L is accepted by a deterministic
|
||
automaton if and only if Der(L) is orbit-finite.
|
||
|
||
3
|
||
|
||
Separating languages
|
||
|
||
Deterministic, nondeterministic and residual automata have the same expressive power when
|
||
dealing with finite alphabets. The situation is more nuanced in the nominal setting. We
|
||
now give one language for each class in Figure 1. For the sake of simplicity, we will use the
|
||
one-orbit nominal set of atoms A as alphabet. These languages separate the different classes,
|
||
meaning that they belong to the respective class, but not to the classes below or beside it.
|
||
For each example language L, we depict: a nominal automaton recognising L (on the
|
||
left); the set of derivatives Der(L) (on the right). We make explicit the poset structure of
|
||
Der(L): grey rectangles represent orbits of derivatives, and lines stand for set inclusions (we
|
||
grey out irrelevant ones). This poset may not be orbit-finite, in which case we depict a small,
|
||
indicative part. Observing the poset structure of Der(L) explicitly is important for later,
|
||
where we show that the existence of residual automata depends on it. We write aa−1 L to
|
||
mean (aa)−1 L. Variables a, b, . . . are always atoms and u, w, . . . are always words.
|
||
|
||
Deterministic: First symbol equals last symbol
|
||
Consider the language Ld := {awa | a ∈ A, w ∈ A∗ }. This is accepted by the following
|
||
deterministic nominal automaton. The automaton is actually infinite-state, but we represent
|
||
it symbolically using a register-like notation, where we annotate each state with the current
|
||
6= a
|
||
|
||
a
|
||
a
|
||
|
||
Ad =
|
||
|
||
a
|
||
|
||
a
|
||
|
||
a
|
||
6= a
|
||
|
||
aa−1 Ld
|
||
|
||
bb−1 Ld
|
||
|
||
···
|
||
|
||
a−1 Ld
|
||
|
||
b−1 Ld
|
||
|
||
···
|
||
|
||
Ld
|
||
|
||
Figure 2 A deterministic automaton accepting Ld , and the poset Der(Ld ).
|
||
|
||
J. Moerman and M. Sammartino
|
||
|
||
44:5
|
||
|
||
value of a register. Note that the derivatives a−1 Ld , b−1 Ld , . . . are in the same orbit. In total
|
||
Der(Ld ) has three orbits, which correspond to the three orbits of states in the deterministic
|
||
automaton. The derivative awa−1 Ld , for example, equals aa−1 Ld .
|
||
|
||
Non-guessing residual: Some atom occurs twice
|
||
The language is Lng,r := {uavaw | u, v, w ∈ A∗ , a ∈ A}. The poset Der(Lng,r ) is not
|
||
orbit-finite, so by the nominal Myhill-Nerode theorem there is no deterministic automaton
|
||
accepting Lng,r . However, derivatives of the form ab−1 Lng,r can be written as a union
|
||
ab−1 Lng,r = a−1 Lng,r ∪ b−1 Lng,r . In fact, we only need an orbit-finite set of derivatives to
|
||
recover Der(Lng,r ). These orbits are highlighted in the diagram on the right. Selecting the
|
||
“right” derivatives is the key idea behind constructing residual automata in Theorem 4.10.
|
||
aa−1 Lng,r
|
||
|
||
A
|
||
a
|
||
|
||
Ang,r =
|
||
|
||
A
|
||
|
||
A
|
||
|
||
a
|
||
|
||
···
|
||
|
||
abc−1 Lng,r
|
||
|
||
···
|
||
|
||
···
|
||
|
||
ab−1 Lng,r
|
||
|
||
···
|
||
|
||
a
|
||
|
||
a−1 Lng,r · · · b−1 Lng,r
|
||
Lng,r
|
||
|
||
Figure 3 A (nonresidual) nondeterministic automaton accepting Lng,r , and the poset Der(Lng,r ).
|
||
|
||
Nondeterministic: Last letter is unique
|
||
The language is Ln := {wa | a not in w} ∪ {}. Derivatives a−1 Ln are again unions of smaller
|
||
S
|
||
languages: a−1 Ln = b6=a ab−1 Ln . (We have omitted languages like aa−1 Ln , as they only
|
||
differ from a−1 Ln on the empty word.) However, the poset Der(L) has an infinite descending
|
||
chain of languages (with an increasing support), namely a−1 L ⊃ ab−1 L ⊃ abc−1 L ⊃ . . . The
|
||
existence of a such a chain implies that Ln cannot be accepted by a residual automaton. This
|
||
is a consequence of Theorem 4.10, as we shall see later.
|
||
Ln
|
||
a−1 Ln
|
||
|
||
6= a
|
||
|
||
An =
|
||
|
||
guess a
|
||
|
||
a
|
||
|
||
a
|
||
|
||
···
|
||
|
||
b−1 Ln
|
||
|
||
···
|
||
|
||
ab−1 Ln
|
||
|
||
···
|
||
|
||
···
|
||
|
||
abc−1 Ln
|
||
|
||
···
|
||
|
||
Figure 4 A nondeterministic automaton accepting Ln , and the poset Der(Ln ).
|
||
|
||
CONCUR 2020
|
||
|
||
44:6
|
||
|
||
Residual Nominal Automata
|
||
|
||
Residual: Last letter is unique but anchored
|
||
Consider the alphabet Σ = A ∪ {Anc(a) | a ∈ A}, where Anc is nothing more than a label.
|
||
We add the transitions (a, Anc(a), a) to the automaton in the previous example. We obtain
|
||
the language Lr = L(Ar ). Here, we have forced the automaton to be residual, by adding an
|
||
anchor to the first state. Nevertheless, guessing is still necessary. In the poset, we note that all
|
||
elements in the descending chain can now be obtained as unions of Anc(a)−1 Lr . For instance,
|
||
S
|
||
a−1 Lr = b6=a Anc(b)−1 Lr . Note that Anc(a)Anc(b)−1 Lr = ∅ and Anc(a)a−1 Lr = {}.
|
||
Lr
|
||
a−1 Lr
|
||
|
||
6= a
|
||
guess a
|
||
|
||
Ar =
|
||
|
||
a
|
||
|
||
a
|
||
|
||
b−1 Lr
|
||
|
||
···
|
||
|
||
···
|
||
|
||
ab−1 Lr
|
||
|
||
···
|
||
|
||
···
|
||
|
||
abc−1 Lr
|
||
|
||
···
|
||
|
||
Anc(a)
|
||
|
||
Anc(c)−1 Lr
|
||
|
||
Anc(a)a−1 Lr
|
||
|
||
Anc(c)Anc(d)−1 Lr
|
||
|
||
Figure 5 A residual automaton accepting Lr , and the poset Der(Lr ).
|
||
|
||
Non-guessing nondeterministic: Repeated atom with different successor
|
||
The language is Lng := {uabvac | u, v ∈ A∗ , a, b, c ∈ A, b 6= a}. (We allow a = b or a = c.)
|
||
This is a language which can be accepted by a non-guessing automaton. However, there is no
|
||
residual automaton for this language. The poset structure of Der(Lng ) is very complicated.
|
||
We will return to this example after Theorem 4.10.
|
||
|
||
A
|
||
|
||
aba−1 Lng
|
||
a
|
||
|
||
Ang =
|
||
|
||
a
|
||
|
||
cba−1 Lng
|
||
|
||
b
|
||
aa−1 Lng
|
||
|
||
ba−1 Lng
|
||
|
||
A
|
||
|
||
ab
|
||
|
||
a
|
||
|
||
b
|
||
|
||
6= b
|
||
|
||
a−1 Lng
|
||
|
||
b−1 Lng
|
||
Lng
|
||
|
||
Figure 6 A deterministic automaton accepting Lng , and the poset Der(Lng ).
|
||
|
||
ab−1 Lng
|
||
|
||
J. Moerman and M. Sammartino
|
||
|
||
4
|
||
|
||
44:7
|
||
|
||
Canonical Residual Nominal Automata
|
||
|
||
In this section we will give a characterisation of canonical residual automata. We will first
|
||
introduce notions of nominal lattice theory, then we will state our main result (Theorem 4.10).
|
||
We conclude the section by providing similar results for non-guessing automata.
|
||
|
||
4.1
|
||
|
||
Nominal lattice theory
|
||
|
||
We abstract away from words and languages and consider the set Pfs (Z) for an arbitrary
|
||
nominal set Z. This is a Boolean algebra of which the operations ∧, ∨, ¬ are all equivariant
|
||
maps [17]. Moreover, the finitely supported union
|
||
_
|
||
: Pfs (Pfs (Z)) → Pfs (Z)
|
||
is also equivariant. We note that this is more general than a binary union, but it is not a
|
||
complete join semi-lattice. Hereafter, we shall denote set inclusion by ≤ (< when strict).
|
||
I Definition 4.1. Given a nominal set Z and X ⊆ Pfs (Z) equivariant4 , we define the set
|
||
generated by X as
|
||
n_
|
||
o
|
||
hXi :=
|
||
x | x ⊆ X finitely supported ⊆ Pfs (Z).
|
||
W
|
||
I Remark 4.2. The set hXi is closed under the operation , and moreover is the smallest
|
||
W
|
||
equivariant set closed under containing X. In other words, h−i defines a closure operator.
|
||
We will often say “X generates Y ”, by which we mean Y ⊆ hXi.
|
||
I Definition 4.3. Let X ⊆ Pfs (Z) equivariant and x ∈ X, we say that x is join-irreducible
|
||
W
|
||
in X if it is non-empty and x = x =⇒ x ∈ x, for every finitely supported x ⊆ X. The set
|
||
of all join-irreducible elements is denoted by
|
||
JI(X) := {x ∈ X | x join-irreducible in X} .
|
||
This is again an equivariant set.
|
||
I Remark 4.4. In lattice and order theory, join-irreducible elements are usually defined only
|
||
for a lattice (see, e.g., [11]). However, we define them for arbitrary subsets of a lattice. (Note
|
||
that a subset of a lattice is merely a poset.) This generalisation will be needed later, when
|
||
we consider the poset Der(L) which is not a lattice, but is contained in the lattice Pfs (Σ∗ ).
|
||
I Remark 4.5. The notion of join-irreducible, as we have defined here, corresponds to the
|
||
notion of prime in [8, 14, 28]. Unfortunately, the word prime has a slightly different meaning
|
||
in lattice theory. We stick to the terminology of lattice theory.
|
||
If a set Y is well-behaved, then its join-irreducible elements will actually generate the set Y .
|
||
This is normally proven with a descending chain condition. We first restrict our attention to
|
||
orbit-finite sets. The following Lemma extends [11, Lemma 2.45] to the nominal setting.
|
||
I Lemma 4.6. Let X ⊆ Pfs (Z) be an orbit-finite and equivariant set.
|
||
1. Let a ∈ X, b ∈ Pfs (Z) and a 6≤ b. Then there is a join-irreducible x ∈ X such that x ≤ a
|
||
and x 6≤ b.
|
||
W
|
||
2. Let a ∈ X, then a = {x ∈ X | x join-irreducible in X and x ≤ a}.
|
||
|
||
4
|
||
|
||
A similar definition could be given for finitely supported X. In fact, all results in this section generalise
|
||
to finitely supported. But we use equivariance for convenience.
|
||
|
||
CONCUR 2020
|
||
|
||
44:8
|
||
|
||
Residual Nominal Automata
|
||
|
||
I Corollary 4.7. Let X ⊆ Pfs (Z) be an orbit-finite equivariant subset. The join-irreducibles
|
||
of X generate X, i.e., X ⊆ hJI(X)i.
|
||
So far, we have defined join-irreducible elements relative to some fixed set. We will now
|
||
show that these elements remain join-irreducible when considering them in a bigger set, as
|
||
long as the bigger set is generated by the smaller one. This will later allow us to talk about
|
||
the join-irreducible elements.
|
||
I Lemma 4.8. Let Y ⊆ X ⊆ Pfs (Z) equivariant and suppose that X ⊆ hJI(Y )i. Then
|
||
JI(Y ) = JI(X).
|
||
In other words, the join-irreducibles of X are the smallest set generating X.
|
||
I Corollary 4.9. If an orbit-finite set Y generates X, then JI(X) ⊆ Y .
|
||
|
||
4.2
|
||
|
||
Characterising Residual Languages
|
||
|
||
We are now ready to state and prove the main theorem of this paper. We fix the alphabet
|
||
Σ. Recall that the nominal Myhill-Nerode theorem tells us that a language is accepted
|
||
by a deterministic automaton if and only if Der(L) is orbit-finite. Here, we give a similar
|
||
characterisation for languages accepted by residual automata. Moreover, the following result
|
||
gives a canonical construction.
|
||
I Theorem 4.10. Given a language L ∈ Pfs (Σ∗ ), the following are equivalent:
|
||
1. L is accepted by a residual automaton.
|
||
2. There is some orbit-finite set J ⊆ Der(L) which generates Der(L).
|
||
3. The set JI(Der(L)) is orbit-finite and generates Der(L).
|
||
Proof. We prove three implications:
|
||
(1 ⇒ 2) Take the set of languages accepted by the states: J := {L(q) | q ∈ A}. This
|
||
is clearly orbit-finite, since Q is. Moreover, each derivative is generated as follows:
|
||
W
|
||
w−1 L = {L(q) | q ∈ δ(I, w)}.
|
||
(2 ⇒ 3) We can apply Lemma 4.8 with Y = J and X = Der(L). Now it follows that
|
||
JI(Der(L)) is orbit-finite (since it is a subset of J) and generates Der(L).
|
||
(3 ⇒ 1) We can construct the following residual automaton, whose language is exactly L:
|
||
Q := JI(Der(L))
|
||
|
||
I := w−1 L ∈ Q | w−1 L ≤ L
|
||
|
||
F := w−1 L ∈ Q | ∈ w−1 L
|
||
|
||
δ(w−1 L, a) := v −1 L ∈ Q | v −1 L ≤ wa−1 L
|
||
First, note that A := (Σ, Q, I, F, δ) is a well-defined nominal automaton. In fact, all the
|
||
components are orbit-finite, and equivariance of ≤ implies equivariance of δ. Second, we
|
||
show by induction on words that each state q = w−1 L accepts its corresponding language,
|
||
namely L(q) = w−1 L.
|
||
∈ L(w−1 L) ⇐⇒ w−1 L ∈ F ⇐⇒ ∈ w−1 L
|
||
|
||
au ∈ L(w−1 L) ⇐⇒ u ∈ L δ(w−1 L, a)
|
||
|
||
|
||
⇐⇒ u ∈ L v −1 L ∈ Q | v −1 L ≤ wa−1 L
|
||
_
|
||
(i)
|
||
⇐⇒ u ∈
|
||
v −1 L ∈ Q | v −1 L ≤ wa−1 L
|
||
⇐⇒ ∃v −1 L ∈ Q with v −1 L ≤ wa−1 L and u ∈ v −1 L
|
||
(ii)
|
||
|
||
⇐⇒ u ∈ wa−1 L ⇐⇒ au ∈ w−1 L
|
||
|
||
J. Moerman and M. Sammartino
|
||
|
||
44:9
|
||
|
||
At step (i) we have used the induction hypothesis (u is a shorter word than au) and
|
||
the fact that L(−) preserves unions. At step (ii, right-to-left) we have used that v −1 L is
|
||
join-irreducible. The other
|
||
steps are unfolding definitions.
|
||
W −1
|
||
w L | w−1 L ≤ L , since the join-irreducible languages generFinally, note that L =
|
||
ate all languages. In particular, the initial states (together) accept L.
|
||
J
|
||
I Corollary 4.11. The construction above defines a canonical residual automaton with the
|
||
following uniqueness property: it has the minimal number of orbits of states and the maximal
|
||
number of orbits of transitions.
|
||
For finite alphabets, the classes of languages accepted by DFAs and NFAs are the same
|
||
(by determinising an NFA). This means that Der(L) is always finite if L is accepted by an
|
||
NFA, and we can always construct the canonical RFSA. Here, this is not the case, that is why
|
||
we need to stipulate (in Theorem 4.10) that the set JI(Der(L)) is orbit-finite and actually
|
||
generates Der(L). Either condition may fail, as we will see in Example 4.13.
|
||
I Example 4.12. In this example we show that residual automata can also be used to
|
||
compress deterministic automata. The language L := {abb . . . b | a =
|
||
6 b} can be accepted by
|
||
a deterministic automaton of 4 orbits, and this is minimal. (A zero amount of bs is also
|
||
accepted in L.) The minimal residual automaton, however, has only 2 orbits, given by the
|
||
join-irreducible languages:
|
||
−1 L = {abb . . . b | a 6= b}
|
||
ab−1 L = {bb . . . b}
|
||
|
||
(a, b ∈ A distinct)
|
||
|
||
The trick in defining the automaton is that the a-transition from −1 L to ab−1 L guesses the
|
||
value b. In the next section (Section 4.3), we will define the canonical non-guessing residual
|
||
automaton, which has 3 orbits.
|
||
I Example 4.13. We return to the examples Ln and Lng from Section 3. We claim that
|
||
neither language can be accepted by a residual automaton.
|
||
For Ln we note that there is an infinite descending chain of derivatives
|
||
Ln > a−1 Ln > ab−1 Ln > abc−1 Ln > · · ·
|
||
Each of these languages can be written as a union of smaller derivatives. For instance,
|
||
S
|
||
a−1 Ln = b6=a ab−1 Ln . This means that JI(Der(Ln )) = ∅, hence it does not generate Der(Ln )
|
||
and by Theorem 4.10 there is no residual automaton.
|
||
In the case of Lng , we have an infinite ascending chain
|
||
Lng < a−1 Lng < ba−1 Lng < cba−1 Lng < · · ·
|
||
This in itself is not a problem: the language Lng,r also has an infinite ascending chain.
|
||
However, for Lng , none of the languages in this chain are a union of smaller derivatives. Put
|
||
differently: all the languages in this chain are join-irreducible (see appendix for the details).
|
||
So the set JI(Der(Lng )) is not orbit-finite. By Theorem 4.10, we conclude that there is no
|
||
residual automaton accepting Lng .
|
||
I Remark 4.14. For arbitrary (nondeterministic) languages there is also a characterisation in
|
||
the style of Theorem 4.10. Namely, L is accepted by an automaton iff there is an orbit-finite
|
||
set Y ⊆ Pfs (Σ∗ ) which generates the derivatives. However, note that the set Y need not be a
|
||
subset of the set of derivatives. In these cases, we do not have a canonical construction for
|
||
the automaton. Different choices for Y define different automata and there is no way to pick
|
||
Y naturally.
|
||
|
||
CONCUR 2020
|
||
|
||
44:10
|
||
|
||
Residual Nominal Automata
|
||
|
||
4.3
|
||
|
||
Automata without guessing
|
||
|
||
We reconsider the above results for non-guessing automata. Nondeterminism in nominal
|
||
automata allows naturally for guessing, meaning that the automaton may store symbols
|
||
in registers without explicitly reading them. However, the original definition of register
|
||
automata in [20] does not allow for guessing, and non-guessing automata remain actively
|
||
researched [29]. Register automata with guessing were introduced in [21], because it was
|
||
realised that non-guessing automata are not closed under reversal.
|
||
To adapt to non-guessing automata, we redefine join-irreducible elements. As we would
|
||
like to remove states which can be written as a “non-guessing” union of other states, we only
|
||
consider joins of sets of elements where all elements are supported by the same support.
|
||
I Definition 4.15. Let X ⊆ Pfs (Z) be equivariant and x ∈ X, we say that x is joinW
|
||
irreducible− in X if x = x =⇒ x ∈ x, for every finitely supported x ⊆ X such that
|
||
supp(x0 ) ⊆ supp(x), for each x0 ∈ x. The set of all join-irreducible− elements is denoted by
|
||
|
||
JI− (X) := x ∈ X | x join-irreducible− in X .
|
||
The only change required is an additional condition on the elements and supports in x. In
|
||
particular, the sets x are uniformly supported sets. Unions of such sets are called uniformly
|
||
supported unions.
|
||
All the lemmas from the previous section are proven similarly. We state the main result
|
||
for non-guessing automata.
|
||
I Theorem 4.16. Given a language L ∈ Pfs (Σ∗ ), the following are equivalent:
|
||
1. L is accepted by a non-guessing residual automaton.
|
||
2. There is some orbit-finite set J ⊆ Der(L) which generates Der(L) by uniformly supported
|
||
unions.
|
||
3. The set JI− (Der(L)) is orbit-finite and generates Der(L) by uniformly supported unions.
|
||
Proof. The proof is similar to that of Theorem 4.10. However, we need a slightly different
|
||
definition of the canonical automaton. It is defined as follows.
|
||
Q := JI− (Der(L))
|
||
|
||
I := w−1 L ∈ Q | w−1 L ≤ L, supp(w−1 L) ⊆ supp(L)
|
||
|
||
F := w−1 L ∈ Q | ∈ w−1 L
|
||
|
||
δ(w−1 L, a) := v −1 L ∈ Q | v −1 L ≤ wa−1 L, supp(v −1 L) ⊆ supp(wa−1 L)
|
||
Note that, in particular, the initial states have empty support since L is equivariant. This
|
||
means that the automaton cannot guess any values at the start. Similarly, the transition
|
||
relation does not allow for guessing.
|
||
J
|
||
To better understand the structure of the canonical non-guessing residual automaton, we
|
||
recall the following fact (see [33] for details) and its consequence on non-guessing automata.
|
||
I Lemma 4.17. Let X be an orbit-finite nominal set and A ⊂ A be a finite set of atoms.
|
||
The set {x ∈ X | A supports x} is finite.
|
||
I Corollary 4.18. The transition relation δ of non-guessing automata can be equivalently be
|
||
described as a function δ : Q × Σ → Pfin (Q), where Pfin (Q) is the set of finite subsets of Q.
|
||
In particular, this shows that the canonical non-guessing residual automaton has finite
|
||
nondeterminism. It also shows that it is sufficient to consider finite unions in Theorem 4.16,
|
||
instead of uniformly supported unions.
|
||
|
||
J. Moerman and M. Sammartino
|
||
|
||
5
|
||
|
||
44:11
|
||
|
||
Decidability and Closure Results
|
||
|
||
In this section we investigate decidability and closure properties. First, a positive result:
|
||
universality is decidable for residual automata. This is in contrast to the nondeterministic
|
||
case, where universality is undecidable, even for non-guessing automata [4].
|
||
I Proposition 5.1. Universality for residual nominal automata is decidable. Formally: given
|
||
a residual automaton A, it is decidable whether L(A) = Σ∗ .
|
||
Second, a negative result: determining whether an automaton is residual is undecidable.
|
||
In other words, residuality cannot be characterised as a syntactic property. This adds value to
|
||
learning techniques, as they are able to provide automata that are residual by construction,
|
||
thus “getting around” this undecidability issue.
|
||
I Proposition 5.2. The problem of determining whether a given nondeterministic nominal
|
||
automaton is residual is undecidable.
|
||
The above result is obtained by reducing the universality problem for general nondeterministic nominal automata to the residuality problem. Given an automaton A, we construct
|
||
another automaton A0 which is residual if and only if A is universal (see appendix for details).
|
||
This result also holds for the subclass of non-guessing automata, as the construction of A0
|
||
does not introduce any guessing and universality for non-guessing nondeterministic nominal
|
||
automata is undecidable.
|
||
I Remark 5.3. Equivalence between residual nominal automata is still an open problem. The
|
||
usual proof of undecidability of equivalence is via a reduction from universality. This proof does
|
||
not work anymore, because universality for residual automata is decidable (Proposition 5.1).
|
||
We conjecture that equivalence remains undecidable for residual automata.
|
||
|
||
Closure properties
|
||
We will now show that several closure properties fail for residual languages. Interestingly, this
|
||
parallels the situation for probabilistic languages: residual ones are not even closed under
|
||
convex sums. We emphasise that residual automata were devised for learning purposes, where
|
||
closure properties play no significant role. In fact, one typically exploits closure properties
|
||
of the wider class of nondeterministic models, e.g., for automata-based verification. The
|
||
following results show that in our setting this is indeed unavoidable.
|
||
Consider the alphabet Σ = A ∪ {Anc(a) | a ∈ A} and the residual language Lr from
|
||
Section 3. We consider a second language L2 = A∗ which can be accepted by a deterministic
|
||
(hence residual) automaton. We have the following non-closure results:
|
||
Union: The language L = Lr ∪ L2 cannot be accepted by a residual automaton. In fact,
|
||
although derivatives of the form Anc(a)−1 L are still join-irreducible (see Section 3,
|
||
residual case), they have no summand A∗ , which means that they cannot generate
|
||
S
|
||
a−1 L = A∗ ∪ b6=a Anc(b)−1 L. By Theorem 4.10(3) it follows that L is not residual.
|
||
Intersection: The language L = Lr ∩ L2 = Ln cannot be accepted by a residual automaton,
|
||
as we have seen in Section 3.
|
||
Concatenation: The language L = L2 · Lr cannot be accepted by a residual automaton, for
|
||
similar reasons as the union.
|
||
Reversal: The language {aw | a not in w} is residual (even deterministic), but its reverse
|
||
language is Ln and cannot be accepted by a residual automaton.
|
||
Complement: Consider the language Lng,r of words where some atom occurs twice. Its
|
||
complement Lng,r is the language of all fresh atoms, which cannot even be recognised by
|
||
a nondeterministic nominal automaton [6].
|
||
Closure under Kleene star is yet to be settled.
|
||
CONCUR 2020
|
||
|
||
44:12
|
||
|
||
Residual Nominal Automata
|
||
|
||
6
|
||
|
||
Exact learning
|
||
|
||
In our previous paper on learning nominal automata [28], we provided an exact learning
|
||
algorithm for nominal deterministic languages. Moreover, we observed by experimentations
|
||
that the algorithm was also able to learn specific nondeterministic languages. However, several
|
||
questions on nominal languages remained open, most importantly:
|
||
Which nominal languages can be characterised via a finite set of observations?
|
||
Which nominal languages admit an Angluin-style learning algorithm?
|
||
In this section we will answer these questions using the theory developed in the previous
|
||
sections.
|
||
|
||
6.1
|
||
|
||
Angluin-style learning
|
||
|
||
We briefly review the classical automata learning algorithms L? by Angluin [1] for deterministic
|
||
automata, and NL? by Bollig et al. [8] for residual automata. Then we discuss convergence
|
||
in the nominal setting.
|
||
Both algorithms can be seen as a game between two players: the learner and the teacher.
|
||
The learner aims to construct the minimal automaton for an unknown language L over a
|
||
finite alphabet Σ. In order to do this, it may ask the teacher, who knows about the language,
|
||
two types of queries:
|
||
Membership query: Is a given word w in the target language, i.e., w ∈ L?
|
||
Equivalence query: Does a given hypothesis automaton H recognise the target language,
|
||
i.e., L = L(H)?
|
||
If the teacher replies yes to an equivalence query, then the algorithm terminates, as the
|
||
hypothesis H is correct. Otherwise, the teacher must supply a counterexample, that is a word
|
||
in the symmetric difference of L and L(H). Availability of equivalence queries may seem like
|
||
a strong assumption, and in fact it is often weakened by allowing only random sampling
|
||
(see [22] or [35] for details).
|
||
Observations about the language made by the learner via queries are stored in an
|
||
observation table T . This is a table where rows and columns range over two finite sets of
|
||
words S, E ⊆ Σ? respectively, and T (u, v) = 1 if and only if uv ∈ L. Intuitively, each row of
|
||
T approximates a derivative of L, in fact we have T (u) ⊆ u−1 L. However, the information
|
||
contained in T may be incomplete: some derivatives w−1 L are not reached yet because no
|
||
membership queries for w have been posed, and some pairs of rows T (u), T (v) may seem
|
||
equal to the learner, because no word has been seen yet which distinguishes them. The
|
||
learning algorithm will add new words to S when new derivatives are discovered, and to E
|
||
when words distinguishing two previously identical derivatives are discovered.
|
||
The table T is closed whenever one-letter extensions of derivatives are already in the
|
||
table, i.e., T has a row for ua−1 L, for all u ∈ S, a ∈ Σ. If the table is closed,5 L? is able
|
||
to construct an automaton from T , where states are distinct rows (i.e., derivatives). The
|
||
construction follows the classical one for the canonical automaton of a language from its
|
||
derivatives [31]. The NL? algorithm uses a modified notion of closedness, where one is allowed
|
||
to take unions (i.e., a one-letter extension can be written as unions of rows in T ), and hence
|
||
is able to learn a RFSA accepting the target language.
|
||
|
||
5
|
||
|
||
L? also needs the table to be consistent. We do not need that in our discussion here.
|
||
|
||
J. Moerman and M. Sammartino
|
||
|
||
44:13
|
||
|
||
When the table is not closed, then a derivative is missing, and a corresponding row needs
|
||
to be added. Once an automaton is constructed, it is submitted in an equivalence query. If
|
||
a counterexample is returned, then again the table is extended6 , after which the process is
|
||
repeated iteratively.
|
||
|
||
6.2
|
||
|
||
The nominal case
|
||
|
||
In [28] we have given nominal versions of L? and NL? , called νL? and νNL? respectively. They
|
||
seamlessly extend the original algorithms by operating on orbit-finite sets. The algorithm
|
||
νL? always terminates for deterministic languages, because distinct derivatives, and hence
|
||
distinct rows in the observation table, are orbit-finitely many (see Theorem 2.4).
|
||
However, it will never terminate for languages not accepted by deterministic automata
|
||
(such as residual or nondeterministic languages).
|
||
I Theorem 6.1 ([27]). νL? converges if and only if Der(L) is orbit-finite, in which case
|
||
it outputs the canonical deterministic automaton accepting L. Moreover, at most O(nk)
|
||
equivalence queries are needed, where n is the number of orbits of the minimal deterministic
|
||
automaton, and k is the maximum support size of its states.
|
||
The nondeterministic case is more interesting. Using Theorem 4.10, we can finally establish
|
||
which nondeterministic languages can be characterised via orbit-finitely-many observations.
|
||
I Corollary 6.2 (of Theorem 4.10). Let L be a nondeterministic nominal language. Then L
|
||
can be represented via an observation table with orbit-finitely-many rows and columns if and
|
||
only if L is residual. Rows of this table correspond to join-irreducible derivatives.
|
||
This explains why in [28] νNL? was able to learn some residual nondeterministic automata:
|
||
an orbit-finite observation table exists, which allows νNL? to construct the canonical residual
|
||
automaton. Unfortunately, the current νNL? algorithm does not guarantee that it finds this
|
||
orbit-finite observation table. We only have that guarantee for deterministic languages. The
|
||
following example shows that νNL? may indeed diverge when trying to close the table.
|
||
I Example 6.3. Suppose νNL? tries to learn the residual language L accepted by the
|
||
automaton below over the alphabet Σ = A ∪ {Anc(a) | a ∈ A}. This is a slight modification
|
||
of the residual language of Section 3.
|
||
6= a
|
||
guess a
|
||
Anc(a)
|
||
|
||
Anc(6= a)
|
||
|
||
a
|
||
|
||
6= a
|
||
|
||
a
|
||
a
|
||
|
||
Anc(a)
|
||
a
|
||
|
||
The algorithm starts by considering the row for the empty word , and its one-letter extensions
|
||
· a = a and · Anc(a) = Anc(a). These rows correspond to the derivatives −1 L = L, a−1 L
|
||
and Anc(a)−1 L. Column labels are initialised to the empty word . At this point a−1 L and
|
||
Anc(a)−1 L appear identical, as the only column does not distinguish them. However, they
|
||
appear different from −1 L, so the algorithm will add the row for either a or Anc(a) in order
|
||
|
||
6
|
||
|
||
L? and NL? adopt different counterexample-handling strategies: the former adds a new row, the latter a
|
||
new column. Both result in a new derivative being detected.
|
||
|
||
CONCUR 2020
|
||
|
||
44:14
|
||
|
||
Residual Nominal Automata
|
||
|
||
to close the table. Suppose the algorithm decides to add a. Then it will consider one-letter
|
||
extensions ab, abc, abcd, etc... Since these correspond to different derivatives – each strictly
|
||
smaller than the previous one – the algorithm will get stuck in an attempt to close the table.
|
||
At no point it will try to close the table with the word Anc(a), since it stays equivalent to
|
||
a. So in this case νNL? will not terminate. However, if the algorithm instead adds Anc(a)
|
||
to the row labels, it will then also add Anc(a)Anc(b), which is a characterising word for the
|
||
initial state. In that case, νNL? will terminate.
|
||
While there is no hope of convergence in the non-residual case, as no orbit-finite observation table exists characterising derivatives, we now propose a modification of νNL? which
|
||
guarantees termination for residual languages.
|
||
I Theorem 6.4. There is an algorithm which query learns residual nominal languages.
|
||
Proof (Sketch). When the algorithm adds a word w to the set of rows, then it also adds
|
||
all other words of length |w|.7 Since all words of bounded length are added, the algorithm
|
||
will eventually find all words that are characterising for states of the canonical residual
|
||
automaton, and it will therefore be able to reconstruct this automaton. See appendix for
|
||
details.
|
||
J
|
||
Unfortunately, considering all words bounded by a certain length requires many membership
|
||
queries. In fact, characterising words can be exponential in length [14], meaning that this
|
||
algorithm may need doubly exponentially many membership queries.
|
||
I Remark 6.5. We note that nondeterministic automata can be enumerated, and hence can be
|
||
learned via equivalence queries only. This would result in a highly inefficient algorithm. This
|
||
parallels the current understanding of learning probabilistic languages. Although efficient
|
||
(learning in the limit) learning algorithms for deterministic and residual languages exist [12],
|
||
the general case is still open.
|
||
|
||
7
|
||
|
||
Conclusions, related and future work
|
||
|
||
In this paper we have investigated a subclass of nondeterministic automata over infinite
|
||
alphabets. This class naturally arises in the context of query learning, where automata have
|
||
to be constructed from finitely many observations. Although there are many classes of data
|
||
languages, we have shown that our class of residual languages admit canonical automata.
|
||
The states of these automata correspond to join-irreducible elements.
|
||
In the context of learning, we show that convergence of standard Angluin-style algorithms
|
||
is not guaranteed, even for residual languages. We propose a modified algorithm which
|
||
guarantees convergence at the expense of an increase in the number of observations.
|
||
We emphasise that, unlike other algorithms based on residuality such as NL? [8] and
|
||
?
|
||
AL [2], our algorithm does not depend on the size, or even the existence, of the minimal
|
||
deterministic automaton for the target language. This is a crucial difference, since dependence
|
||
on the minimal deterministic automaton hinders generalisation to nondeterministic nominal
|
||
automata, which are strictly more expressive. Ideally, in the residual case, one would like
|
||
an algorithm for which the complexity depends only on the length of characterising words,
|
||
which is an intrinsic feature of residual automata. To the best of our knowledge, no such
|
||
algorithm exists in the finite setting.
|
||
|
||
7
|
||
|
||
The set {w ∈ Σ∗ | |w| = k} is orbit-finite, for any fixed k ∈ N.
|
||
|
||
J. Moerman and M. Sammartino
|
||
|
||
44:15
|
||
|
||
We also show that universality is decidable for residual automata, in contrast to undecidability in the general nondeterministic case. As future work, we plan to attack the language
|
||
inclusion/equivalence problem for residual automata. This is a well-known and challenging
|
||
problem for data languages, which has been answered for specific subclasses [9, 10, 29, 34].
|
||
Of special interest is the subclass of unambiguous automata [10, 29]. We note that
|
||
residual languages are orthogonal to unambiguous languages. For instance, the language
|
||
Ln is unambiguous but not residual, whereas Lng,r is residual but ambiguous. Moreover,
|
||
their intersection has neither property, and every deterministic language has both properties.
|
||
One interesting fact is that if a canonical residual automaton is unambiguous, then the
|
||
join-irreducibles form an anti-chain.
|
||
Other related work are nominal languages/expressions with an explicit notion of binding [15, 25, 26, 34]. Although these are sub-classes of nominal languages, binding is an
|
||
important construct, e.g., to represent resource-allocation. Availability of a notion of derivatives [25] suggests that residuality may prove beneficial for learning these languages.
|
||
Residual automata over finite alphabets also have a categorical characterisation [30].
|
||
We see no obstructions in generalising those results to nominal sets. This would amount
|
||
to finding the right notion of nominal (complete) join-semilattice, with either finitely or
|
||
uniformly supported joins.
|
||
Finally, in [16, 17] aspects of nominal lattices and Boolean algebras are investigated.
|
||
To the best of our knowledge, our results of nominal lattice theory, especially those on
|
||
join-irreducibles, are new.
|
||
|
||
|
||
Omitted proofs
|
||
|
||
W
|
||
I Remark 4.2. The set hXi is closed under the operation , and moreover is the smallest
|
||
W
|
||
equivariant set closed under containing X. In other words, h−i defines a closure operator.
|
||
We will often say “X generates Y ”, by which we mean Y ⊆ hXi.
|
||
W
|
||
Proof. Take any x ⊆ hXi finitely supported. All x ∈ x are of the form yx , for some yx ⊆ X
|
||
finitely supported. Consider the finitely supported set T = {y | y ∈ yx , x ∈ x} ⊆ X. Then
|
||
W
|
||
W
|
||
W
|
||
we see that x = T ∈ hXi, meaning that hXi is closed under . The second part of the
|
||
W
|
||
claim is easy: any set closed under
|
||
and containing X must also contain hXi.
|
||
J
|
||
I Lemma 4.6. Let X ⊆ Pfs (Z) be an orbit-finite and equivariant set.
|
||
1. Let a ∈ X, b ∈ Pfs (Z) and a 6≤ b. Then there is a join-irreducible x ∈ X such that x ≤ a
|
||
and x 6≤ b.
|
||
W
|
||
2. Let a ∈ X, then a = {x ∈ X | x join-irreducible in X and x ≤ a}.
|
||
Proof. In this proof we need a technicality. Let P be a finitely supported, non-empty poset
|
||
(i.e., both P and ≤ are supported by a finite A ⊂ A). If P is A-orbit-finite then P has a
|
||
minimal element, as we can consider the finite poset of A-orbits and find a minimal A-orbit.
|
||
Here we use the notion of an A-orbit, i.e., an orbit defined over permutations that fix A.
|
||
(See [33, Chapter 5] for details.)
|
||
Ad 1. Consider the set S = {x ∈ X | x ≤ a, x 6≤ b}. This is a finitely supported and
|
||
supp(S)-orbit-finite set, hence it has some minimal element m ∈ S. We shall prove that m is
|
||
join-irreducible in X. Let x ⊆ X finitely supported and assume that x0 < m for each x0 ∈ x.
|
||
Note that x0 < m ≤ a and so that x0 ∈
|
||
/ S (otherwise m was not minimal). Hence x0 ≤ b (by
|
||
W
|
||
W
|
||
W
|
||
W
|
||
definition of S). So x ≤ b and so x ∈
|
||
/ S, which concludes that x 6= m, and so x < m
|
||
as required.
|
||
Ad 2. Consider the set T = {x ∈ JI(X) | x ≤ a}. This set is finitely supported, so we
|
||
W
|
||
may define the element b = T ∈ Pfs (Z). It is clear that b ≤ a, we shall prove equality by
|
||
contradiction. Suppose a 6≤ b, then by (1.), there is a join-irreducible x such that x ≤ a and
|
||
W
|
||
y 6≤ b. By the first property of x we have x ∈ T , so that x 6≤ b = T is a contradiction. We
|
||
W
|
||
conclude that a = b, i.e. a = T as required.
|
||
J
|
||
I Lemma 4.8. Let Y ⊆ X ⊆ Pfs (Z) equivariant and suppose that X ⊆ hJI(Y )i. Then
|
||
JI(Y ) = JI(X).
|
||
W
|
||
Proof. (⊇) Let x ∈ X be join-irreducible in X. Suppose that x = y for some finitely
|
||
supported y ⊆ Y . Note that also y ⊆ X Then x = y0 for some y0 ∈ y, and so x is
|
||
join-irreducible in Y .
|
||
W
|
||
(⊆) Let y ∈ Y be join-irreducible in Y . Suppose that y = x for some finitely supported
|
||
x ⊆ X. Note that every element x ∈ x is a union of elements in JI(Y ) (by the assumption
|
||
W
|
||
X ⊆ hJI(Y0 )i). Take yx = {y ∈ JI(Y ) | y ≤ x}, then we have x = yx and
|
||
y=
|
||
|
||
_
|
||
|
||
x=
|
||
|
||
_ n_
|
||
|
||
o _
|
||
yx | x ∈ x =
|
||
{y0 | y0 ∈ yx , x ∈ x} .
|
||
|
||
The last set is a finitely supported subset of Y , and so there is a y0 in it such that y = y0 .
|
||
Moreover, this y0 is below some x0 ∈ x, which gives y0 ≤ x0 ≤ y. We conclude that y = x0
|
||
for some x0 ∈ x.
|
||
J
|
||
|
||
CONCUR 2020
|
||
|
||
44:18
|
||
|
||
Residual Nominal Automata
|
||
|
||
I Corollary 4.11. The construction above defines a canonical residual automaton with the
|
||
following uniqueness property: it has the minimal number of orbits of states and the maximal
|
||
number of orbits of transitions.
|
||
Proof. State minimality follows from Corollary 4.9, where we note that the states of any
|
||
residual automata accepting L form a generating subset of Der(L). Maximality of transitions
|
||
follows from the fact that it is saturated, meaning that no transitions can be added without
|
||
changing the language.
|
||
J
|
||
I Example 4.13. All the languages in the following ascending chain are join-irreducible.
|
||
Lng < a−1 Lng < ba−1 Lng < cba−1 Lng < · · ·
|
||
Proof. Consider the word w = ak . . . a1 a0 with k ≥ 1 and all ai distinct atoms. We will
|
||
prove that w−1 Lng is join-irreducible in Der(Lng ), by considering all u−1 Lng ⊆ w−1 Lng .
|
||
Observe that if u is a suffix of w, then u−1 Lng ⊆ w−1 Lng . This is easily seen from the
|
||
given automaton, since it may skip any prefix. We now show that u being a suffix of w is
|
||
also a necessary condition.
|
||
First, suppose that u contains an atom a different from all ai . If it is the last symbol of
|
||
u, then aaa0 ∈ u−1 Lng , but aaa0 ∈
|
||
/ w−1 Lng . If a is succeeded by b (not necessarily distinct),
|
||
−1
|
||
then either aa or aa0 is in u Lng . But neither aa nor aa0 is in w−1 Lng . This shows that
|
||
for u−1 Lng ⊆ w−1 Lng , we necessarily have that u ∈ {a0 , . . . , ak }∗ . (This also means that
|
||
automatically supp(u−1 Lng ) ⊆ supp(w−1 Lng ).)
|
||
Second, when u = , we have u−1 Lng ⊆ w−1 Lng . And for |u| = 1, if u = a0 , then
|
||
−1
|
||
u Lng ⊆ w−1 Lng . If u = ai with i > 0, then ai ai ai−1 ∈ u−1 Lng , but that word is not in
|
||
w−1 Lng . This shows that for u−1 Lng ⊆ w−1 Lng with |u| ≤ 1, we necessarily have that u is a
|
||
suffix of w.
|
||
Third, we prove the same for |u| ≤ 2. We first consider which bigrams may occur in
|
||
u. Suppose that u contains a bigram ai aj with i > 0 and j 6= i − 1. Then ai ai−1 is in
|
||
u−1 Lng , but not in w−1 Lng . Suppose that u contains a0 ai (i > 0) or a0 a0 , then u−1 Lng
|
||
contains either a0 a0 or a0 a1 respectively. Neither of these words are in w−1 Lng . This shows
|
||
that u−1 Lng ⊆ w−1 Lng implies that u may only contain the bigrams ai ai−1 . In particular,
|
||
these bigrams compose in a unique way. So u is a (contiguous) subword of w, whenever
|
||
u−1 Lng ⊆ w−1 Lng .
|
||
Continuing, suppose that u ends in the bigram ai+1 ai with i > 0. Then we have ai ai ai−1
|
||
in u−1 Lng , but not in w−1 Lng . This shows that u has to end in a1 a0 . That is, for u−1 Lng ⊆
|
||
w−1 Lng with |u| ≥ 2, we necessarily have that u is a suffix of w.
|
||
So far, we have shown that
|
||
{u | u−1 Lng ⊆ w−1 Lng } = {u | u is a suffix of w}.
|
||
W
|
||
To see that w−1 Lng is indeed join-irreducible, we consider the join X = {u−1 Lng |
|
||
u is a strict suffix of w}. Note that ak ak ∈
|
||
/ X, but ak ak ∈ w−1 Lng . We conclude that
|
||
W −1
|
||
−1
|
||
−1
|
||
−1
|
||
w Lng 6= {u Lng | u Lng $ w Lng } as required.
|
||
J
|
||
I Proposition 5.1. Universality for residual nominal automata is decidable. Formally: given
|
||
a residual automaton A, it is decidable whether L(A) = Σ∗ .
|
||
Proof. In the constructions below, we use computation with atoms. This is a computation
|
||
paradigm which allow algorithmic manipulation of infinite – but orbit-finite – nominal
|
||
sets. For instance, it allows looping over such a set in finite time. Important here is that
|
||
this paradigm is equivalent to regular computability (see [7]) and implementations exist to
|
||
compute with atoms [23, 24].
|
||
|
||
J. Moerman and M. Sammartino
|
||
|
||
44:19
|
||
|
||
We will sketch an algorithm that, given a residual automaton A, answers whether
|
||
L(A) = Σ∗ . The algorithm decides negatively in the following cases:
|
||
I = ∅. In this case the language accepted by A is empty.
|
||
Suppose there is a q ∈ Q with q ∈
|
||
/ F . By residuality we have L(q) = w−1 L(A) for some
|
||
w. Note that q is not accepting, so that ∈
|
||
/ w−1 L(A). Put differently: w ∈
|
||
/ L(A). (We
|
||
note that w is not used by the algorithm. It is only needed for the correctness.)
|
||
Suppose there is a q ∈ Q and a ∈ Σ such that δ(q, a) = ∅. Again L(q) = w−1 L(A) for
|
||
some w. Note that a is not in L(q). This means that wa is not in the language.
|
||
When none of these three cases hold, the algorithm decides positively. We shall prove that
|
||
this is indeed the correct decision. If none of the above conditions hold, then I 6= ∅, Q = F ,
|
||
and for all q ∈ Q, a ∈ Σ we have δ(q, a) 6= ∅. Here we can prove that the language of each
|
||
state is L(q) = Σ∗ . Given that there is an initial state, the automaton accepts Σ∗ .
|
||
Note that the operations on sets performed in the above cases all terminate, because all
|
||
involve orbit-finite sets.
|
||
J
|
||
I Proposition 5.2. The problem of determining whether a given nondeterministic nominal
|
||
automaton is residual is undecidable.
|
||
Proof. The construction is inspired by [14, Proposition 8.4].8 We show undecidability by
|
||
reducing the universality problem for nominal automata to the residuality problem.
|
||
Let A = (Σ, Q, I, F, δ) be a nominal (nondeterministic) automaton on the alphabet Σ.
|
||
We first extend the alphabet:
|
||
|
||
Σ0 = Σ ∪ q | q ∈ Q ∪ {q | q ∈ Q} ∪ {$, #} ,
|
||
where we assume the new symbols to be disjoint from Σ. We define A0 = (Σ0 , Q0 , I 0 , F 0 , δ 0 ) by
|
||
|
||
Q0 = {q | q ∈ Q} ∪ q | q ∈ Q ∪ {>, x, y}
|
||
|
||
I 0 = q | q ∈ Q ∪ {x, y}
|
||
F 0 = {q | q ∈ F } ∪ {>}
|
||
|
||
|
||
δ 0 = {(q, a, q 0 | (q, a, q 0 ) ∈ δ} ∪ (q, q, q) | q ∈ Q ∪ (q, q, q) | q ∈ Q
|
||
|
||
∪ {(x, $, >), (x, #, x), (y, #, y)}∪ {(>, a, >) | a ∈ Σ} ∪ (y, $, i) | i ∈ I
|
||
See Figure 7 for a sketch of the automaton A0 . The blue part is a copy of the original
|
||
automaton. The red part forces the original states to be residual, by providing anchors to
|
||
each state. Finally the orange part is the interesting part. The key players are states x and y
|
||
with their languages L(y) ⊆ L(x). Note that their languages are equal if and only if A is
|
||
universal.
|
||
Before we assume anything about A, let us analyse A0 . In particular, let us consider
|
||
whether the residuality property holds for each state. For the original states of A the property
|
||
holds, as we can provide anchors: All the states q and q are anchored by the words q and q
|
||
respectively. Then we consider the states x and >, their languages are L(>) = Σ∗ = $−1 L(A0 )
|
||
and L(x) = #−1 L(A0 ) (see Figure 7). The only remaining state for which we do not yet
|
||
know whether the residuality property holds is state y.
|
||
If L(A) = Σ∗ (i.e. the original automaton is universal), then we note that L(y) = L(x).
|
||
In this case, L(y) = #−1 L(A0 ). So, in this case, A0 is residual.
|
||
|
||
8
|
||
|
||
They prove that checking residuality for NFAs is PSpace-complete via a reduction from universality.
|
||
Instead of using NFAs, they use a union of n DFAs. This would not work in the nominal setting.
|
||
|
||
CONCUR 2020
|
||
|
||
44:20
|
||
|
||
Residual Nominal Automata
|
||
|
||
#
|
||
|
||
#
|
||
|
||
x
|
||
|
||
y
|
||
|
||
q
|
||
|
||
A
|
||
$
|
||
|
||
$
|
||
|
||
q
|
||
|
||
q
|
||
|
||
q
|
||
p
|
||
|
||
$
|
||
|
||
>
|
||
p
|
||
Σ
|
||
|
||
p
|
||
p
|
||
|
||
Figure 7 Sketch of the automaton A0 constructed in the proof of Proposition 5.2.
|
||
|
||
Suppose that A0 is residual. Then L(y) = w−1 L0 for some word w. Provided that L(A)
|
||
is not empty, there is some u ∈ L(A). So we know that $u ∈ L(y). This means that word
|
||
w cannot start with a ∈ Σ, q, q for q ∈ Q, or $ as their derivatives do not contain $u. The
|
||
only possibility is that w = #k for some k > 0. This implies L(y) = L(x), meaning that the
|
||
language of A is universal.
|
||
This proves that A is universal iff A0 is residual. Moreover, the construction A 7→ A0 is
|
||
effective, as it performs computations with orbit-finite sets.
|
||
J
|
||
I Theorem 6.4. There is an algorithm which query learns residual nominal languages.
|
||
Proof. As explained in the text, we modify the νNL? algorithm from [28]: When the table is
|
||
not closed, we not only add the missing words, but all the words of the same length. This
|
||
guarantees that the algorithm finds rows for all join-irreducible derivatives, i.e., all states of
|
||
the canonical residual automaton.
|
||
The pseudocode is given in Algorithm 1, where the modifications to νNL? are highlighted
|
||
in red. We briefly explain the notation. An observation table T is defined by a set of row
|
||
(resp. column) indices S (resp. E). The value T (s, e) is given by L(se) (we may do this via
|
||
membership queries). We denote the set of rows by Rows(S, E) := {row(s) | s ∈ SΣ ∪ S},
|
||
Algorithm 1 Modified nominal NL? algorithm for Theorem 6.4.
|
||
|
||
Modified νNL? learner
|
||
1
|
||
2
|
||
3
|
||
4
|
||
5
|
||
6
|
||
7
|
||
8
|
||
9
|
||
10
|
||
11
|
||
12
|
||
13
|
||
14
|
||
15
|
||
|
||
S, E = {}
|
||
repeat
|
||
while (S, E) is not residually-closed or not residually-consistent
|
||
if (S, E) is not residually-closed
|
||
find s ∈ S, a ∈ A such that row(sa) ∈ JI(Rows(S, E)) \ Rows> (S, E)
|
||
k = length of the word sa
|
||
S = S ∪ Σ≤k
|
||
if (S, E) is not residually-consistent
|
||
find s1 , s2 ∈ S, a ∈ A, and e ∈ E such that row(s1 ) v row(s2 ) and
|
||
L(s1 ae) = 1, L(s2 ae) = 0
|
||
E = E ∪ orb(ae)
|
||
Make the conjecture N (S, E)
|
||
if the Teacher replies no, with a counter-example t
|
||
E = E ∪ {orb(t0 ) | t0 is a suffix of t}
|
||
until the Teacher replies yes to the conjecture N (S, E).
|
||
return N (S, E)
|
||
|
||
J. Moerman and M. Sammartino
|
||
|
||
44:21
|
||
|
||
where row(s)(e) = T (s, e). Note that Rows(S, E) also includes rows for one-letter extensions.
|
||
The set of rows labelled by S is denoted by Rows> (S, E) := {row(s) | s ∈ S}. The set
|
||
Rows(S, E) is a poset, ordered by r1 v r2 iff r1 (e) ≤ r2 (e) for all e ∈ E. To construct a
|
||
hypothesis N (S, E), we use the construction from Theorem 4.10, where Rows(S, E) plays
|
||
the role of Der(L).
|
||
We can give a bound to the number of equivalence queries. Given an orbit-finite nominal
|
||
set X, let |X| be the number of its orbit. Then equivalence queries are bounded by O(m +
|
||
|Σ≤m+1 | × k), where m is the length of the longest characterising word and k is the maximum
|
||
support size of the canonical residual automaton. Intuitively, each of the rows in the table
|
||
could be a separate state, and for each state there is some work to be done, concerning
|
||
learning the right support and local symmetries (see [27] for details on this).
|
||
J
|
||
|
||
CONCUR 2020
|
||
|
||
|
||
Proceedings of Machine Learning Research 93:54–66, 2019
|
||
|
||
International Conference on Grammatical Inference
|
||
|
||
Learning Product Automata
|
||
Joshua Moerman
|
||
|
||
joshua.moerman@cs.ru.nl
|
||
|
||
Institute for Computing and Information Sciences
|
||
Radboud University
|
||
Nijmegen, the Netherlands
|
||
|
||
Editors: Olgierd Unold, Witold Dyrka, and Wojciech Wieczorek
|
||
|
||
Abstract
|
||
We give an optimisation for active learning algorithms, applicable to learning Moore machines with decomposable outputs. These machines can be decomposed themselves by
|
||
projecting on each output. This results in smaller components that can then be learnt with
|
||
fewer queries. We give experimental evidence that this is a useful technique which can reduce the number of queries substantially. Only in some cases the performance is worsened
|
||
by the slight overhead. Compositional methods are widely used throughout engineering,
|
||
and the decomposition presented in this article promises to be particularly interesting for
|
||
learning hardware systems.
|
||
Keywords: query learning, product automata, composition
|
||
|
||
1. Introduction
|
||
Query learning (or, active learning) is becoming a valuable tool in engineering of both
|
||
hardware and software systems (Vaandrager, 2017). Indeed, applications can be found in
|
||
a broad range of applications: finding bugs in network protocols as shown by FiterăuBroştean et al. (2016, 2017), assisting with refactoring legacy software as shown by Schuts
|
||
et al. (2016), and reverse engineering bank cards by Chalupar et al. (2014).
|
||
These learning techniques originate from the field of grammatical inference. One of the
|
||
crucial steps for applying these to black box systems was to move from deterministic finite
|
||
automata to deterministic Moore or Mealy machines, capturing reactive systems with any
|
||
kind of output. With little adaptations, the algorithms work well, as shown by the many
|
||
applications. This is remarkable, since little specific knowledge is used beside the input
|
||
alphabet of actions.
|
||
Realising that composition techniques are ubiquitous in engineering, we aim to use more
|
||
structure of the system during learning. In the present paper we use the simplest type of
|
||
composition; we learn product automata, where outputs of several components are simply
|
||
paired. Other types of compositions, such as sequential composition of Mealy machines, are
|
||
discussed in Section 5.
|
||
To the best of the author’s knowledge, this has not been done before explicitly. Furthermore, libraries such as LearnLib (see Isberner et al., 2015) and libalf (see Bollig et al.,
|
||
2010b) do not include such functionality out of the box. Implicitly, however, it has been
|
||
done before. Rivest and Schapire (1994) use two tricks to reduce the size of some automata
|
||
in their paper “Diversity-based inference of finite automata”. The first trick is to look at
|
||
|
||
c 2019 J. Moerman.
|
||
|
||
Learning Product Automata
|
||
|
||
o1
|
||
|
||
o2
|
||
i
|
||
|
||
M
|
||
|
||
i
|
||
|
||
o1
|
||
|
||
M1
|
||
o2
|
||
M2
|
||
|
||
Figure 1: A Moore machine with two outputs (left) can be equivalently seen as two (potentially smaller) Moore machines with a single output each (right).
|
||
|
||
the reversed automaton (in their terminology, the diversity-based automaton). The second
|
||
trick (which is not explicitly mentioned, unfortunately) is to have a different automaton
|
||
for each observable (i.e., output). In one of their examples the two tricks combined give a
|
||
reduction from ±1019 states to just 54 states.
|
||
In this paper, we isolate this trick and use it in query learning. We give an extension
|
||
of L* which handles products directly and we give a second algorithm which simply runs
|
||
two learners simultaneously. Furthermore, we argue that this is particularly interesting
|
||
in the context of model learning of hardware, as systems are commonly engineered in a
|
||
compositional way. We give preliminary experimental evidence that the technique works
|
||
and improves the learning process. As benchmarks, we learn (simulated) circuits which
|
||
provide several output bits.
|
||
|
||
2. Preliminaries
|
||
We use the formalism of Moore machines to describe our algorithms. Nonetheless, the
|
||
results can also be phrased in terms of Mealy machines.
|
||
Definition 1 A Moore machine is a tuple M = (Q, I, O, δ, o, q0 ) where Q, I and O are
|
||
finite sets of states, inputs and outputs respectively, δ : Q × I → Q is the transition function,
|
||
o : Q → O is the output function, and q0 ∈ Q is the initial state. We define the size of the
|
||
machine, |M |, to be the cardinality of Q.
|
||
We extend the definition of the transition function to words as δ : Q × I ∗ → Q. The
|
||
behaviour of a state q is the map JqK : I ∗ → O defined by JqK(w) = o(δ(q, w)). Two states
|
||
q, q 0 are equivalent if JqK = Jq 0 K. A machine is minimal if all states have different behaviour
|
||
and all states are reachable. We will often write JM K to mean Jq0 K and say that machines
|
||
are equivalent if their initial states are equivalent.
|
||
Definition 2 Given two Moore machines with equal input sets, M1 = (Q1 , I, O1 , δ1 , o1 , q0 1 )
|
||
and M2 = (Q2 , I, O2 , δ2 , o2 , q0 2 ), we define their product M1 × M2 by:
|
||
M1 × M2 = (Q1 × Q2 , I, O1 × O2 , δ, o, (q0 1 , q0 2 )),
|
||
where δ((q1 , q2 ), a) = (δ1 (q1 , a), δ2 (q2 , a)) and o((q1 , q2 )) = (o1 (q1 ), o2 (q2 )).
|
||
|
||
55
|
||
|
||
Learning Product Automata
|
||
|
||
0
|
||
|
||
0
|
||
|
||
1
|
||
|
||
0
|
||
|
||
0
|
||
|
||
1
|
||
|
||
0
|
||
|
||
1
|
||
|
||
Figure 2: A state of the 8-bit register machine.
|
||
The product is formed by running both machines in parallel and letting I act on both
|
||
machine synchronously. The output of both machines is observed. Note that the product
|
||
Moore machine might have unreachable states, even if the components are reachable. The
|
||
product of more than two machines is defined by induction.
|
||
Let M be a machine with outputs in O1 × O2 . By post-composing the output function with projection functions we get two machines, called components, M1 and M2 with
|
||
outputs in O1 and O2 respectively. This is depicted in Figure 1. Note that M is equivalent to M1 × M2 . If M and its components Mi are taken to be minimal,
|
||
then we have
|
||
p
|
||
|M | ≤ |M1 | · |M2 | and |Mi | ≤ |M |. In the p
|
||
best case we have |Mi | = |M | and so the behaviour of M can be described using only 2 |M | states, which is less than |M | (if |M | > 4).
|
||
With iterated products the reduction can be more substantial as shown in the following example. This reduction in state space is beneficial for learning algorithms.
|
||
We introduce basic notation: πi : A1 × A2 → Ai are the usual projection functions. On
|
||
a function f : X → A1 × A2 we use the shorthand πi f to denote πi ◦ f . As usual, uv denotes
|
||
concatenation of string u and v, and this is lifted to sets of strings U V = {uv | u ∈ U, v ∈ V }.
|
||
We define the set [n] = {1, . . . , n} and the set of Boolean values B = {0, 1}.
|
||
2.1. Example
|
||
We take the n-bit register machine example from Rivest and Schapire (1994). The state
|
||
space of the n-bit register machine Mn is given by n bits and a position of the reading/writing head, see Figure 2. The inputs are commands to control the position of the
|
||
head and to flip the current bit. The output is the current bit vector. Formally it is defined
|
||
as Mn = (Bn × [n], {L, R, F }, Bn , δ, o, i), where the initial state is i = ((0, . . . , 0), 1) and the
|
||
output is o(((b1 , . . . , bn ), k)) = (b1 , . . . , bn ). The transition function is defined such that L
|
||
moves the head to the left, R moves the head to the right (and wraps around on either
|
||
ends), and F flips the current bit. Formally,
|
||
(
|
||
((b1 , . . . , bn ), k − 1) if k > 1,
|
||
δ(((b1 , . . . , bn ), k), L) =
|
||
((b1 , . . . , bn ), n)
|
||
if k = 1,
|
||
(
|
||
((b1 , . . . , bn ), k + 1) if k < n,
|
||
δ(((b1 , . . . , bn ), k), R) =
|
||
((b1 , . . . , bn ), 1)
|
||
if k = n,
|
||
δ(((b1 , . . . , bn ), k), F ) = ((b1 , . . . , ¬bk , . . . , bn ), k).
|
||
The machine Mn is minimal and has n · 2n states. So although this machine has very
|
||
simple behaviour, learning it will require a lot of queries because of its size. Luckily, the
|
||
machine can be decomposed into smaller components. For each bit l, we define a component
|
||
Mnl = (B × [n], {L, R, F }, B, δ l , π1 , (0, 1)) which only stores one bit and the head position.
|
||
The transition function δ l is defined similarly as before on L and R, but only flips the bit
|
||
on L if the head is on position l (i.e., δ l ((b, l), F ) = (¬b, l) and δ l ((b, k)) = (b, k) if k 6= l).
|
||
56
|
||
|
||
Learning Product Automata
|
||
|
||
The product Mn1 × · · · × Mnn is equivalent to Mn . Each of the components Mnl is minimal
|
||
and has only 2n states. So by this decomposition, we only need 2 · n2 states to describe the
|
||
whole behaviour of Mn . Note, however, that the product Mn1 × · · · × Mnn is not minimal;
|
||
many states are unreachable.
|
||
|
||
3. Learning
|
||
We describe two approaches for active learning of product machines. One is a direct extension of the well-known L* algorithm. The other reduces the problem to any active learning
|
||
algorithm, so that one can use more optimised algorithms.
|
||
We fix an unknown target machine M with a known input alphabet I and output
|
||
alphabet O = O1 × O2 . The goal of the learning algorithm is to infer a machine equivalent
|
||
to M , given access to a minimally adequate teacher as introduced by Angluin (1987). The
|
||
teacher will answer the following two types of queries.
|
||
• Membership queries (MQs): The query consists of a word w ∈ I ∗ and the teacher will
|
||
answer with the output JM K(w) ∈ O.
|
||
• Equivalence queries (EQs): The query consists of a Moore machine H, the hypothesis,
|
||
and the teacher will answer with YES if M and H are equivalent and she will answer
|
||
with a word w such that JM K(w) 6= JHK(w) otherwise.
|
||
3.1. Learning product automata with an L* extension
|
||
We can use the general framework for automata learning as set up by van Heerdt et al.
|
||
(2017). The general account does not directly give concrete algorithms, but it does give
|
||
generalised definitions for closedness and consistency. The main data structure for the
|
||
algorithm is an observation table.
|
||
Definition 3 An observation table is a triple (S, E, T ) where S, E ⊆ I ∗ are finite sets of
|
||
words and T : S ∪ SI → OE is defined by T (s)(e) = JM K(se).
|
||
During the L* algorithm the sets S, E grow and T encodes the knowledge of JM K so far.
|
||
Definition 4 Let (S, E, T ) be an observation table.
|
||
• The table is product-closed if for all t ∈ SI there exist s1 , s2 ∈ S such that
|
||
πi T (t) = πi T (si ) for i = 1, 2.
|
||
• The table is product-consistent if for i = 1, 2 and for all s, s0 ∈ S we have
|
||
πi T (s) = πi T (s0 ) implies πi T (sa) = πi T (s0 a) for all a ∈ I.
|
||
These definitions are related to the classical definitions of closedness and consistency as
|
||
shown in the following lemma. The converses of the first two points do not necessarily hold.
|
||
We also proof that if a observation table is product-closed and product-consistent, then a
|
||
well-defined product machine can be constructed which is consistent with the table.
|
||
|
||
57
|
||
|
||
Learning Product Automata
|
||
|
||
Algorithm 1 The product-L* algorithm.
|
||
1: Initialise S and E to {}
|
||
2: Initialise T with MQs
|
||
3: repeat
|
||
4:
|
||
while (S, E, T ) is not product-closed or -consistent do
|
||
5:
|
||
if (S, E, T ) not product-closed then
|
||
6:
|
||
find t ∈ SI such that there is no s ∈ S with πi T (t) = πi T (s) for some i
|
||
7:
|
||
add t to S and fill the new row using MQs
|
||
8:
|
||
if (S, E, T ) not product-consistent then
|
||
9:
|
||
find s, s0 ∈ S, a ∈ I and e ∈ E such that πi T (s) = πi T (s0 ) but πi T (sa)(e) 6=
|
||
πi T (s0 a)(e) for some i
|
||
10:
|
||
add ae to E and fill the new column using MQs
|
||
11:
|
||
Construct H (by Lemma 6)
|
||
12:
|
||
if EQ(H) gives a counterexample w then
|
||
13:
|
||
add w and all its prefixes to S
|
||
14:
|
||
fill the new rows with MQs
|
||
15: until EQ(H) = YES
|
||
16: return H
|
||
Lemma 5 Let OT = (S, E, T ) be an observation table and let πi OT = (S, E, πi T ) be a
|
||
component. The following implications hold.
|
||
1.
|
||
2.
|
||
3.
|
||
4.
|
||
|
||
OT
|
||
OT
|
||
OT
|
||
OT
|
||
|
||
is
|
||
is
|
||
is
|
||
is
|
||
|
||
closed =⇒ OT is product-closed.
|
||
consistent ⇐= OT is product-consistent.
|
||
product-closed ⇐⇒ πi OT is closed for each i.
|
||
product-consistent ⇐⇒ πi OT is consistent for each i.
|
||
|
||
Proof (1) If OT is closed, then each t ∈ SI has a s ∈ S such that T (t) = T (s). This
|
||
implies in particular that πi T (t) = πi T (s), as required. (In terms of the definition, this
|
||
means we can take s1 = s2 = s.)
|
||
(2) Let OT be product-consistent and s, s0 ∈ S such that T (s) = T (s0 ). We then know
|
||
that πi T (s) = πi T (s0 ) for each i and hence πi T (sa) = πi T (s0 a) for each i and a. This means
|
||
that T (sa) = T (s0 a) as required.
|
||
Statements (3) and (4) just rephrase the definitions.
|
||
|
||
Lemma 6 Given a product-closed and -consistent table we can define a product Moore
|
||
machine consistent with the table, where each component is minimal.
|
||
Proof If the table OT is product-closed and -consistent, then by the previous lemma, the
|
||
tables πi OT are closed and consistent in the usual way. For these tables we can use the
|
||
construction of Angluin (1987). As a result we get a minimal machine Hi which is consistent
|
||
with table πi OT . Taking the product of these gives a machine which is consistent with OT .
|
||
(Beware that this product is not necessarily the minimal machine consistent with OT .)
|
||
|
||
58
|
||
|
||
Learning Product Automata
|
||
|
||
Algorithm 2 Learning product machines with other learners.
|
||
1: Initialise two learners L1 and L2
|
||
2: repeat
|
||
3:
|
||
while Li queries M Q(w) do
|
||
4:
|
||
forward M Q(w) to the teacher and get output o
|
||
5:
|
||
return πi o to Li
|
||
{at this point both learners constructed a hypothesis}
|
||
6:
|
||
Let Hi be the hypothesis of Li
|
||
7:
|
||
Construct H = H1 × H2
|
||
8:
|
||
if EQ(H) returns a counterexample w then
|
||
9:
|
||
if JH1 K(w) 6= π1 JM K(w) then
|
||
10:
|
||
return w to L1
|
||
11:
|
||
if JH2 K(w) 6= π2 JM K(w) then
|
||
12:
|
||
return w to L2
|
||
13: until EQ(H) = YES
|
||
14: return YES to both learners
|
||
15: return H
|
||
The product-L* algorithm (Algorithm 1) resembles the original L* algorithm, but uses
|
||
the new notions of closed and consistent. Its termination follows from the fact that L*
|
||
terminates on both components.
|
||
By Lemma 5 (1) we note that the algorithm does not need more rows than we would
|
||
need by running L* on M . By point (4) of the same lemma, we find that it does not need
|
||
more columns than L* would need on each component combined. This means that in the
|
||
worst case, the table is twice as big as the original L* would do. However, in good cases
|
||
(such as the running example), the table is much smaller, as the number of rows is less for
|
||
each component and the columns needed for each component may be similar.
|
||
3.2. Learning product automata via a reduction
|
||
The previous algorithm constructs two machines from a single table. This suggests that we
|
||
can also run two learning algorithms to construct two machines. We lose the fact that the
|
||
data structure is shared between the learners, but we gain that we can use more efficient
|
||
algorithms than L* without any effort.
|
||
Algorithm 2 is the algorithm for learning product automata via this reduction. It runs
|
||
two learning algorithms at the same time. All membership queries are passed directly to the
|
||
teacher and only the relevant output is passed back to the learner. (In the implementation,
|
||
the query is cached, so that if the other learner poses the same query, it can be immediately answered.) If both learners are done posing membership queries, they will pose an
|
||
equivalence query at which point the algorithm constructs the product automaton. If the
|
||
equivalence query returns a counterexample, the algorithm forwards it to the learners.
|
||
The crucial observation is that a counterexample is necessarily a counterexample for at
|
||
least one of the two learners. (If at a certain stage only one learner makes an error, we keep
|
||
the other learner suspended, as we may obtain a counterexample for that one later on.)
|
||
|
||
59
|
||
|
||
Learning Product Automata
|
||
|
||
This observation means that at least one of the learners makes progress and will eventually
|
||
converge. Hence, the whole algorithm will converge.
|
||
In the worst case, twice as many queries will be posed, compared to learning the whole
|
||
machine at once. (This is because learning the full machine also learns its components.)
|
||
In good cases, such as the running example, it requires much less queries. Typical learning
|
||
algorithms require roughly O(n2 ) membership queries, where n is the number of states in
|
||
the minimal machine. For the example Mn this bound gives O((n · 2n )2 ) = O(n2 · 22n )
|
||
queries. When learning the components Mnl with the above algorithm, the bound gives just
|
||
O((2n)2 + · · · + (2n)2 ) = O(n3 ) queries.
|
||
|
||
4. Experiments
|
||
We have implemented the algorithm via reduction in LearnLib.1 As we expect the reduction
|
||
algorithm to be the most efficient and simpler, we leave an implementation of the direct
|
||
extension of L* as future work. The implementation handles products of any size (as opposed
|
||
to only products of two machines). Additionally, the implementation also works on Mealy
|
||
machines and this is used for some of the benchmarks.
|
||
In this section, we compare the product learner with a regular learning algorithm. We
|
||
use the TTT algorithm by Isberner et al. (2014) for the comparison and also as the learners
|
||
used in Algorithm 2. We measure the number of membership and equivalence queries. The
|
||
results can be found in Table 1.
|
||
The equivalence queries are implemented by random sampling so as to imitate the intended application of learning black-box systems. This way, an exact learning algorithm
|
||
turns into a PAC (probably approximately correct) algorithm. Efficiency is typically measured by the total number of input actions which also accounts for the length of the membership queries (including the resets). This is a natural measure in the context of learning
|
||
black box systems, as each action requires some amount of time to perform.
|
||
We evaluated the product learning algorithm on the following two classes of machines.
|
||
n-bit register machine The machines Mn are as described before. We note that the
|
||
product learner is much more efficient, as expected.
|
||
Circuits In addition to the (somewhat artificial) examples Mn , we use circuits which
|
||
appeared in the logic synthesis workshops (LGSynth89/91/93), part of the ACM/SIGDA
|
||
benchmarks.2 These models have been used as benchmarks before for FSM-based testing
|
||
methods by Hierons and Türker (2015) and describe the behaviour of real-world circuits.
|
||
The circuits have bit vectors as outputs, and can hence be naturally be decomposed by
|
||
taking each bit individually. As an example, Figure 3 depicts one of the circuits (bbara).
|
||
The behaviour of this particular circuit can be modelled with seven states, but when restricting to each individual output bit, we obtain two machines of just four states. For the
|
||
circuits bbsse and mark1, we additionally regrouped bit together in order to see how the
|
||
performance changes when we decompose differently.
|
||
1. The implementation and models can be found on-line at https://gitlab.science.ru.nl/moerman/
|
||
learning-product-automata.
|
||
2. The original files describing these circuits can be found at https://people.engr.ncsu.edu/brglez/
|
||
CBL/benchmarks/.
|
||
|
||
60
|
||
|
||
Learning Product Automata
|
||
|
||
/0
|
||
|
||
--0-/0
|
||
--10/0
|
||
-111/0
|
||
0011/0
|
||
|
||
1011
|
||
|
||
/0
|
||
-111
|
||
|
||
-11
|
||
1/0
|
||
|
||
0
|
||
|
||
1
|
||
00 011
|
||
-1
|
||
11 /0
|
||
11
|
||
/0
|
||
/0
|
||
|
||
10
|
||
11
|
||
/00
|
||
-111 0011
|
||
/
|
||
/00 00
|
||
|
||
-111/00
|
||
1011/00
|
||
|
||
--0-/1
|
||
--10/1
|
||
-111/1
|
||
|
||
1011/00
|
||
|
||
1/0
|
||
101
|
||
|
||
--0-/00
|
||
--10/00
|
||
|
||
/0
|
||
11 /0
|
||
10 011
|
||
0
|
||
|
||
0
|
||
|
||
1/0
|
||
001
|
||
--0-/01
|
||
--10/01
|
||
1011/01
|
||
|
||
--0-/0
|
||
--10/0
|
||
0011/0
|
||
1011/0
|
||
|
||
-111/0
|
||
|
||
--10/0
|
||
--0-/0
|
||
|
||
1011/0
|
||
|
||
/0
|
||
11
|
||
1/0 0
|
||
10
|
||
-11011/
|
||
0
|
||
|
||
--0-/00
|
||
--10/00
|
||
0011/00
|
||
|
||
/00
|
||
0011
|
||
1/00
|
||
/00
|
||
101
|
||
11
|
||
-1
|
||
|
||
0
|
||
1011/0
|
||
0011
|
||
/00
|
||
-111/0
|
||
0
|
||
--0-/00
|
||
--10/00
|
||
|
||
1/0
|
||
001
|
||
|
||
--0-/00
|
||
--10/00
|
||
|
||
-111/0
|
||
|
||
001
|
||
1/0
|
||
|
||
0
|
||
|
||
0
|
||
|
||
/0
|
||
11
|
||
|
||
10
|
||
/00
|
||
|
||
0011
|
||
|
||
--0-/1
|
||
--10/1
|
||
1011/1
|
||
|
||
--10/0
|
||
--0-/0
|
||
|
||
--0-/00
|
||
--10/00
|
||
|
||
--0-/0
|
||
--10/0
|
||
|
||
001
|
||
1/0
|
||
|
||
-111/00
|
||
|
||
00 111/
|
||
11 0
|
||
/0
|
||
|
||
--0-/10
|
||
--10/10
|
||
-111/10
|
||
|
||
--0-/0
|
||
--10/0
|
||
|
||
Figure 3: The bbara circuit (left) has two output bits. This can be decomposed into two
|
||
smaller circuits with a single output bit (middle and right).
|
||
|
||
For some circuits the number of membership queries is reduced compared to a regular
|
||
learner. Unfortunately, the results are not as impressive as for the n-bit register machine.
|
||
An interesting case is ex3 where the number of queries is slightly increased, but the total
|
||
amount of actions performed is substantially reduced. The number of actions needed in
|
||
total is actually reduced in all cases, except for bbsse. This exception can be explained
|
||
by the fact that the biggest component of bbsse still has 25 states, which is close to the
|
||
original 31 states. We also note that the choice of decomposition matters, for both mark1
|
||
and bbsse it was beneficial to regroup components.
|
||
In Figure 4, we look at the size of each hypothesis generated during the learning process.
|
||
We note that, although each component grows monotonically, the number of reachable
|
||
states in the product does not grow monotonically. In this particular instance where we
|
||
learn mark1 there was a hypothesis of 58 128 states, much bigger than the target machine of
|
||
202 states. This is not an issue, as the teacher will allow it and answer the query regardless.
|
||
Even in the PAC model with membership queries, this poses no problem as we can still
|
||
efficiently determine membership. However, in some applications the equivalence queries
|
||
are implemented with a model checker (e.g., in the work by Fiterău-Broştean et al., 2016)
|
||
or a sophisticated test generation tool. In these cases, the increased size of intermediate
|
||
hypotheses may be undesirable.
|
||
|
||
5. Discussion
|
||
We have shown two query learning algorithms which exploit a decomposable output. If
|
||
the output can be split, then also the machine itself can be decomposed in components.
|
||
As the preliminary experiments show, this can be a very effective optimization for learning
|
||
black box reactive systems. It should be stressed that the improvement of the optimization
|
||
depends on the independence of the components. For example, the n-bit register machine
|
||
has nearly independent components and the reduction in the number of queries is big. The
|
||
more realistic circuits did not show such drastic improvements in terms of queries. When
|
||
|
||
61
|
||
|
||
Learning Product Automata
|
||
|
||
Machine
|
||
M2
|
||
M3
|
||
M4
|
||
M5
|
||
M6
|
||
M7
|
||
M8
|
||
bbara
|
||
keyb
|
||
ex3
|
||
bbsse
|
||
mark1
|
||
bbsse*
|
||
mark1*
|
||
|
||
States
|
||
8
|
||
24
|
||
64
|
||
160
|
||
384
|
||
896
|
||
2048
|
||
7
|
||
41
|
||
28
|
||
31
|
||
202
|
||
31
|
||
202
|
||
|
||
Components
|
||
2
|
||
3
|
||
4
|
||
5
|
||
6
|
||
7
|
||
8
|
||
2
|
||
2
|
||
2
|
||
7
|
||
16
|
||
4
|
||
8
|
||
|
||
Product learner
|
||
EQs
|
||
MQs Actions
|
||
3
|
||
100
|
||
621
|
||
3
|
||
252
|
||
1 855
|
||
8
|
||
456
|
||
3 025
|
||
6
|
||
869
|
||
7 665
|
||
11
|
||
1 383
|
||
12 870
|
||
11
|
||
2 087
|
||
24 156
|
||
13
|
||
3 289
|
||
41 732
|
||
3
|
||
167
|
||
1 049
|
||
25 12 464 153 809
|
||
24
|
||
1 133
|
||
9 042
|
||
20 14 239 111 791
|
||
30 16 712 145 656
|
||
19 11 648
|
||
89 935
|
||
22 13 027 117 735
|
||
|
||
EQs
|
||
5
|
||
5
|
||
6
|
||
17
|
||
25
|
||
52
|
||
160
|
||
3
|
||
24
|
||
18
|
||
8
|
||
67
|
||
8
|
||
67
|
||
|
||
TTT learner
|
||
MQs Actions
|
||
115
|
||
869
|
||
347
|
||
2946
|
||
1 058
|
||
13 824
|
||
2 723
|
||
34 657
|
||
6 250
|
||
90 370
|
||
14 627 226 114
|
||
34 024 651 678
|
||
216
|
||
1 535
|
||
6024 265 805
|
||
878
|
||
91 494
|
||
4 872
|
||
35 469
|
||
15 192 252 874
|
||
4 872
|
||
35 469
|
||
15 192 252 874
|
||
|
||
number of states
|
||
|
||
Table 1: Comparison of the product learner with an ordinary learner.
|
||
|
||
104
|
||
102
|
||
100
|
||
2
|
||
|
||
4
|
||
|
||
6
|
||
|
||
8
|
||
|
||
10
|
||
12
|
||
14
|
||
Hypothesis-number
|
||
|
||
16
|
||
|
||
18
|
||
|
||
20
|
||
|
||
22
|
||
|
||
Figure 4: The number of states for each hypothesis while learning mark1.
|
||
|
||
62
|
||
|
||
Learning Product Automata
|
||
|
||
taking the length of the queries in account as well (i.e., counting all actions performan on
|
||
the system), we see an improvement for most of the test cases.
|
||
In the remainder of this section we discuss related ideas and future work.
|
||
5.1. Measuring independence
|
||
As the results show, the proposed technique is often beneficial but not always. It would
|
||
be interesting to know when to use decomposition. It is an interesting question how to
|
||
(quantitatively) measure the independence. Such a measure can potentially be used by the
|
||
learning algorithm to determine whether to decompose or not.
|
||
5.2. Generalisation to subsets of products
|
||
In some cases, we might know even more about our output alphabet. The output set O may
|
||
be a proper subset of O1 × O2 , indicating that some outputs can only occur “synchronised”.
|
||
For example, we might have O = {(0, 0)} ∪ {(a, b) | a, b ∈ [3]}, that is, the output 0 for
|
||
either component can only occur if the other component is also 0.
|
||
In such cases we can use the above algorithm still, but we may insist that the teacher
|
||
only accepts machines with output in O for the equivalence queries (as opposed to outputs
|
||
in {0, 1, 2, 3}2 ). When constructing H = H1 × H2 in line 7 of Algorithm 2, we can do a
|
||
reachability analysis on H to check for non-allowed outputs. If such traces exist, we know
|
||
it is a counterexample for at least one of the two learners. With such traces we can fix the
|
||
defect ourselves, without having to rely on the teacher.
|
||
5.3. Product DFAs
|
||
For two DFAs (Q1 , δ1 , F1 , q0 1 ) and (Q2 , δ2 , F2 , q0 2 ), a state in the product automaton is
|
||
accepting if both components are accepting. In the formalism of Moore machines, the finals
|
||
states are determined by their characteristic function and this means that the output is given
|
||
by o(q1 , q2 ) = o1 (q1 )∧o2 (q2 ). Again, the components may be much smaller than the product
|
||
and this motivated Heinz and Rogers (2013) to learn (a subclass of) product DFAs. This
|
||
type of product is more difficult to learn as the two components are not directly observable.
|
||
Such automata are also relevant in model checking and some of the (open) problems are
|
||
discussed by Kupferman and Mosheiff (2015).
|
||
5.4. Learning automata in reverse
|
||
The main result of Rivest and Schapire (1994) was to exploit the structure of the socalled “diversity-based” automaton. This automaton may also be called the reversed Moore
|
||
machine. Reversing provides a duality between reachability and equivalence. This duality is
|
||
theoretically explored by Rot (2016) and Bonchi et al. (2014) in the context of Brzozowski’s
|
||
minimization algorithm.
|
||
Let M R denote the reverse of M , then we have JM R K(w) = JM K(wR ). This allows us
|
||
to give an L* algorithm which learns M R by posing membership queries with the words
|
||
reversed. We computed M R for the circuit models and all but one of them was much larger
|
||
than the original. This suggests that it might not be useful as an optimisation in learning
|
||
hardware or software systems. However, a more thorough investigation is desired.
|
||
63
|
||
|
||
Learning Product Automata
|
||
|
||
A; B
|
||
i
|
||
|
||
o
|
||
A
|
||
|
||
B
|
||
|
||
Figure 5: The sequential compostion A; B of two Mealy machines A and B.
|
||
5.5. Other types of composition
|
||
The case of learning a sequential composition is investigated by Abel and Reineke (2016).
|
||
In their work, there are two Mealy machines, A and B, and the output of A is fed into B, see
|
||
Figure 5. The goal is to learn a machine for B, assuming that A is known (i.e., white box).
|
||
The oracle only answers queries for the sequential composition, which is defined formally as
|
||
JA; BK(w) = JBK(JAK(w)). Since B can only be interacted with through A, we cannot use
|
||
L* directly. The authors show how to learn B using a combination of L* and SAT solvers.
|
||
Moreover, they give evidence that this is more efficient than learning A; B as a whole.
|
||
An interesting generalisation of the above is to consider A as an unknown as well.
|
||
The goal is to learn A and B simultaneously, while observing the outputs of B and the
|
||
communication between the components. The authors conjecture that this would indeed
|
||
be possible and result in a learning algorithm which is more efficient than learning A; B
|
||
(private communication).
|
||
Another type of composition is used by Bollig et al. (2010a). Here, several automata
|
||
are put in parallel and communicate with each other. The goal is not to learn a black box
|
||
system, but to use learning when designing such a system. Instead of words, the teacher
|
||
(i.e., designer in this case) receives message sequence charts which encode the processes and
|
||
actions. Furthermore, they exploit partial order reduction in the learning algorithm.
|
||
We believe that a combination of our and the above compositional techniques can improve the scalability of learning black box systems. Especially in the domain of software
|
||
and hardware we expect such techniques to be important, since the systems themselves are
|
||
often designed in a modular way.
|
||
|
||
Acknowledgments
|
||
We would like to thank Nathanaël Fijalkow, Ramon Janssen, Gerco van Heerdt, Harco Kuppens, Alexis Linard, Alexandra Silva, Rick Smetsers, and Frits Vaandrager for proofreading
|
||
this paper and providing useful feedback. Thanks to Andreas Abel for discussing the case
|
||
of learning a sequential composition of two black box systems. Also thanks to anonymous
|
||
reviewers for interesting references and comments.
|
||
|
||
|
||
|
||
Nominal Techniques and
|
||
Black Box Testing for
|
||
Automata Learning
|
||
|
||
Joshua Moerman
|
||
|
||
ii
|
||
|
||
Work in the thesis has been carried out under the auspices of the research school IPA
|
||
(Institute for Programming research and Algorithmics)
|
||
Printed by Gildeprint, Enschede
|
||
Typeset using ConTEXt MKIV
|
||
ISBN: 978–94–632–3696–6
|
||
IPA Dissertation series: 2019-06
|
||
Copyright © Joshua Moerman, 2019
|
||
www.joshuamoerman.nl
|
||
|
||
Nominal Techniques and Black Box
|
||
Testing for Automata Learning
|
||
|
||
Proefschrift
|
||
ter verkrijging van de graad van doctor
|
||
aan de Radboud Universiteit Nijmegen
|
||
op gezag van de rector magnificus prof. dr. J.H.J.M. van Krieken,
|
||
volgens besluit van het college van decanen
|
||
in het openbaar te verdedigen
|
||
op
|
||
maandag 1 juli 2019
|
||
om
|
||
16:30 uur precies
|
||
|
||
door
|
||
Joshua Samuel Moerman
|
||
geboren op 1 oktober 1991
|
||
te Utrecht
|
||
|
||
iv
|
||
Promotoren:
|
||
– prof. dr. F.W. Vaandrager
|
||
– prof. dr. A. Silva (University College London, Verenigd Koninkrijk)
|
||
Copromotor:
|
||
– dr. S.A. Terwijn
|
||
Leden manuscriptcommissie:
|
||
– prof. dr. B.P.F. Jacobs
|
||
– prof. dr. A.R. Cavalli (Télécom SudParis, Frankrijk)
|
||
– prof. dr. F. Howar (Technische Universität, Dortmund, Duitsland)
|
||
– prof. dr. S. Lasota (Uniwesytet Warszawkski, Polen)
|
||
– dr. D. Petrișan (Université Paris Diderot, Frankrijk)
|
||
|
||
Paranimfen:
|
||
– Alexis Linard
|
||
– Tim Steenvoorden
|
||
|
||
Samenvatting
|
||
Het leren van automaten speelt een steeds grotere rol bij de verificatie van software. Tijdens het leren, verkent een leeralgoritme het gedrag van software. Dit gaat in principe
|
||
volledig automatisch, en het algoritme pakt vanzelf interessante eigenschappen op
|
||
van de software. Het is hiermee mogelijk een redelijk precies model te maken van de
|
||
werking van het stukje software dat we onder de loep nemen. Fouten en onverwacht
|
||
gedrag van software kunnen hiermee worden blootgelegd.
|
||
In dit proefschrift kijken we in eerste instantie naar technieken voor testgeneratie.
|
||
Deze zijn nodig om het leeralgoritme een handje te helpen. Na het automatisch verkennen van gedrag, formuleert het leeralgoritme namelijk een hypothese die de software
|
||
nog niet goed genoeg modelleert. Om de hypothese te verfijnen en verder te leren,
|
||
hebben we tests nodig. Efficiëntie staat hierbij centraal: we willen zo min mogelijk
|
||
testen, want dat kost tijd. Aan de andere kant moeten we wel volledig testen: als er een
|
||
discrepantie is tussen het geleerde model en de software, dan willen we die met een
|
||
test kunnen aanwijzen.
|
||
In de eerste paar hoofdstukken laten we zien hoe testen van automaten te werk
|
||
gaat. We geven een theoretisch kader om verschillende, bestaande n-volledige testgeneratiemethodes te vergelijken. Op grond hiervan beschrijven we een nieuw, efficiënt
|
||
algoritme. Dit nieuwe algoritme staat centraal bij een industriële casus waarin we een
|
||
model van complexe printer-software van Océ leren. We laten ook zien hoe een van
|
||
de deelproblemen – het onderscheiden van toestanden met zo kort mogelijke invoer –
|
||
efficiënt kan worden opgelost.
|
||
Het tweede thema in dit proefschrift is de theorie van formele talen en automaten
|
||
met oneindige alfabetten. Ook dit is zinnig voor het leren van automaten. Software, en in
|
||
het bijzonder internet-communicatie-protocollen, maken namelijk vaak gebruik van
|
||
„identifiers” om bijvoorbeeld verschillende gebruikers te onderscheiden. Het liefst
|
||
nemen we oneindig veel van zulke identifiers aan, aangezien we niet weten hoeveel
|
||
er nodig zijn voor het leren van de automaat.
|
||
We laten zien hoe we de leeralgoritmes gemakkelijk kunnen veralgemeniseren
|
||
naar oneindige alfabetten door gebruik te maken van nominale verzamelingen. In het
|
||
bijzonder kunnen we hiermee registerautomaten leren. Vervolgens werken we de
|
||
theorie van nominale automaten verder uit. We laten zien hoe je deze structuren
|
||
efficiënt kan implementeren. En we geven een speciale klasse van nominale automaten
|
||
die een veel kleinere representatie hebben. Dit zou gebruikt kunnen worden om zulke
|
||
automaten sneller te leren.
|
||
|
||
vi
|
||
|
||
Summary
|
||
Automata learning plays a more and more prominent role in the field of software
|
||
verification. Learning algorithms are able to automatically explore the behaviour of
|
||
software. By revealing interesting properties of the software, these algorithms can
|
||
create models of the, otherwise unknown, software. These learned models can, in
|
||
turn, be inspected and analysed, which often leads to finding bugs and inconsistencies
|
||
in the software.
|
||
An important tool which we need when learning software is test generation. This
|
||
is the topic of the first part of this thesis. After the learning algorithm has learned a
|
||
model and constructed a hypothesis, test generation methods are used to validate this
|
||
hypothesis. Efficiency is key: we want to test as little as possible, as testing may take
|
||
valuable time. However, our tests have to be complete: if the hypothesis fails to model
|
||
the software well, we better have a test which shows this discrepancy.
|
||
The first few chapters explain black box testing of automata. We present a theoretical framework in which we can compare existing n-complete test generation methods.
|
||
From this comparison, we are able to define a new, efficient algorithm. In an industrial
|
||
case study on embedded printer software, we show that this new algorithm works
|
||
well for finding counterexamples for the hypothesis. Besides the test generation, we
|
||
show that one of the subproblems – finding the shortest sequences to separate states –
|
||
can be solved very efficiently.
|
||
The second part of this thesis is on the theory of formal languages and automata
|
||
with infinite alphabets. This, too, is discussed in the context of automata learning. Many
|
||
pieces of software make use of identifiers or sequence numbers. These are used, for
|
||
example, in order to distinguish different users or messages. Ideally, we would like to
|
||
model such systems with infinitely many identifiers, as we do not know beforehand
|
||
how many of them will be used.
|
||
Using the theory of nominal sets, we show that learning algorithms can easily be
|
||
generalised to automata with infinite alphabets. In particular, this shows that we can
|
||
learn register automata. Furthermore, we deepen the theory of nominal sets. First,
|
||
we show that, in a special case, these sets can be implemented in an efficient way.
|
||
Second, we give a subclass of nominal automata which allow for a much smaller
|
||
representation. This could be useful for learning such automata more quickly.
|
||
|
||
viii
|
||
|
||
Acknowledgements
|
||
Foremost, I would like to thank my supervisors. Having three of them ensured that
|
||
there were always enough ideas to work on, theory to understand, papers to review,
|
||
seminars to attend, and chats to have. Frits, thank you for being a very motivating
|
||
supervisor, pushing creativity, and being only a few meters away. It started with a
|
||
small puzzle (trying a certain test algorithm to help with a case study), which was a
|
||
great, hands-on start of my Ph.D.. You introduced me to the field of model learning
|
||
in a way that showcases both the theoretical and practical aspects.
|
||
Alexandra, thanks for introducing me to abstract reasoning about state machines,
|
||
the coalgebraic way. Although not directly shown in this thesis, this way of thinking
|
||
has helped and you pushed me to pursuit clear reasoning. Besides the theoretical
|
||
things I’ve learned, you have also taught me many personal lessons inside and outside
|
||
of academia; thanks for inviting me to London, Caribbean islands, hidden cocktail
|
||
clubs, and the best food. And thanks for leaving me with Daniela and Matteo, who
|
||
introduced me to nominal techniques, while you were on sabbatical.
|
||
Bas, thanks for broadening my understanding of the topics touched upon in this
|
||
thesis. Unfortunately, we have no papers together, but the connections you showed to
|
||
logic, computational learning, and computability theory have influenced the thesis
|
||
nevertheless. I am grateful for the many nice chats we had.
|
||
I would like to thank the members of the manuscript committee, Bart, Ana, Falk,
|
||
Sławek, and Daniela. Reading a thesis is undoubtedly a lot of work, so thank you for
|
||
the effort and feedback you have given me. Thanks, also, to the additional members
|
||
coming to Nijmegen to oppose during the defence, Jan Friso, Jorge, and Paul.
|
||
On the first floor of the Mercator building, I had the pleasure of spending four
|
||
years with fun office mates. Michele, thanks for introducing me to the Ph.D. life, by
|
||
always joking around. Hopefully, we can play a game of Briscola again. Alexis, many
|
||
thanks for all the tasty proeverijen, whether it was beers, wines, poffertjes, kroketten,
|
||
or anything else. Your French influences will be missed. Niels, thanks for the abstract
|
||
nonsense and bashing on politics.
|
||
Next to our office, was the office with Tim, with whom I had the pleasure of
|
||
working from various coffee houses in Nijmegen. Further down the corridor, there
|
||
was the office of Paul and Rick. Paul, thanks for being the kindest colleague I’ve
|
||
had and for inviting us to your musical endeavours. Rick, thanks for the algorithmic
|
||
sparring, we had a great collaboration. Was there a more iconic duo on our floor? A
|
||
good contender would be Petra and Ramon. Thanks for the fun we had with ioco,
|
||
together with Jan and Mariëlle. Nils, thanks for steering me towards probabilistic
|
||
|
||
x
|
||
things and opening a door to Aachen. I am also very grateful to Jurriaan for bringing
|
||
back some coalgebra and category theory to our floor, and hosting me in London. My
|
||
other co-authors, Wouter, David, Bartek, Michał, and David, also deserve many credits
|
||
for all the interesting discussion we had. Harco, thanks for the technical support.
|
||
Special thanks go to Ingrid, for helping with the often-overlooked, but important,
|
||
administrative matters.
|
||
Doing a Ph.D. would not be complete without a good amount of playing kicker,
|
||
having borrels, and eating cakes at the iCIS institute. Thanks to all of you, Markus,
|
||
Bram, Marc, Sam, Bas, Joost, Dan, Giso, Baris, Simone, Aleks, Manxia, Leon, Jacopo,
|
||
Gabriel, Michael, Paulus, Marcos, Bas, and Henning.1
|
||
Thanks to the people I have met across the channel (which hopefully will remain
|
||
part of the EU): Benni, Nath, Kareem, Rueben, Louis, Borja, Fred, Tobias, Paul, Gerco,
|
||
and Carsten, for the theoretical adventure, but also for joining me to Phonox and other
|
||
parties in London. I am especially thankful to Matteo and Emanuela for hosting me
|
||
many times and for Hillary and Justin for accommodating me for three months each.
|
||
I had a lot of fun at the IPA events. I’m very thankful to Tim and Loek for organising
|
||
these events. Special thanks to Nico and Priyanka for organising a Halloween social
|
||
event with me. Also thanks to all the participants in the IPA events, you made it
|
||
a lot of fun! My gratitude extends to all the people I have met at summer schools
|
||
and conferences. I had a lot of fun learning about different cultures, languages, and
|
||
different ways of doing research. Hope we meet again!
|
||
Besides all the fun research, I had a great time with my friends and family. We went
|
||
to nice parties, had excellent dinners, and much more; thanks, Nick, Edo, Gabe, Saskia,
|
||
Stijn, Sandra, Geert, Marco, Carmen, and Wesley. Thanks to Marlon, Hannah, Wouter,
|
||
Dennis, Christiaan, and others from #RU for borrels, bouldering, and jams. Thanks to
|
||
Ragnar, Josse, Julian, Jeroen, Vincent, and others from the BAPC for algorithmic fun.
|
||
Thanks to my parents, Kees and Irene, and my brother, David, and his wife, Germa,
|
||
for their love and support. My gratitude extends to my family in law, Ine, Wim, Jolien
|
||
and Jesse. My final words of praise go to Tessa, my wife, I am very happy to have you
|
||
on my side. You inspire me in many ways, and I enjoy doing all the fun stuff we do.
|
||
Thank you a lot.
|
||
|
||
1
|
||
|
||
In no particular order. These lists are randomised.
|
||
|
||
Contents
|
||
Samenvatting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
|
||
|
||
v
|
||
|
||
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
|
||
Acknowledgements
|
||
1
|
||
|
||
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
|
||
|
||
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
|
||
Model Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
|
||
Applications of Model Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
|
||
Research challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
|
||
Black Box Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
|
||
Nominal Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
|
||
Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
|
||
Conclusion and Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
|
||
|
||
Part 1:
|
||
|
||
Testing Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
|
||
|
||
17
|
||
|
||
2
|
||
|
||
FSM-based Test Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
|
||
Mealy machines and sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
|
||
Test generation methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
|
||
Hybrid ADS method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
|
||
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
|
||
Proof of completeness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
|
||
Related Work and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
|
||
|
||
19
|
||
19
|
||
26
|
||
31
|
||
35
|
||
36
|
||
38
|
||
|
||
3
|
||
|
||
Applying Automata Learning to Embedded Control Software . . . . . . . . . 41
|
||
Engine Status Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
|
||
Learning the ESM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
|
||
Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
|
||
Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
|
||
|
||
4
|
||
|
||
Minimal Separating Sequences for All Pairs of States . . . . . . . . . . . . . . . .
|
||
Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
|
||
Minimal Separating Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
|
||
Optimising the Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
|
||
Application in Conformance Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
|
||
Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
|
||
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
|
||
|
||
59
|
||
60
|
||
64
|
||
67
|
||
70
|
||
71
|
||
72
|
||
|
||
xii
|
||
Part 2:
|
||
|
||
Nominal Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
|
||
|
||
73
|
||
|
||
5
|
||
|
||
Learning Nominal Automata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
|
||
Overview of the Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
|
||
Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
|
||
Angluin’s Algorithm for Nominal DFAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
|
||
Learning Non-Deterministic Nominal Automata . . . . . . . . . . . . . . . . . . . . 93
|
||
Implementation and Preliminary Experiments . . . . . . . . . . . . . . . . . . . . . . 101
|
||
Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
|
||
Discussion and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
|
||
|
||
6
|
||
|
||
Fast Computations on Ordered Nominal Sets . . . . . . . . . . . . . . . . . . . . . . .
|
||
Nominal sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
|
||
Representation in the total order symmetry . . . . . . . . . . . . . . . . . . . . . . . . .
|
||
Implementation and Complexity of ONS . . . . . . . . . . . . . . . . . . . . . . . . . . .
|
||
Results and evaluation in automata theory . . . . . . . . . . . . . . . . . . . . . . . .
|
||
Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
|
||
Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
|
||
|
||
109
|
||
111
|
||
113
|
||
118
|
||
120
|
||
126
|
||
128
|
||
|
||
7
|
||
|
||
Separation and Renaming in Nominal Sets . . . . . . . . . . . . . . . . . . . . . . . . .
|
||
Monoid actions and nominal sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
|
||
A monoidal construction from Pm-sets to Sb-sets . . . . . . . . . . . . . . . . . . .
|
||
Nominal and separated automata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
|
||
Related and future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
|
||
|
||
131
|
||
133
|
||
137
|
||
143
|
||
149
|
||
|
||
Bibliography
|
||
|
||
.......................................................
|
||
|
||
Curriculum Vitae
|
||
|
||
...................................................
|
||
|
||
151
|
||
169
|
||
|
||
Chapter 1
|
||
Introduction
|
||
When I was younger, I often learned how to play with new toys by messing about
|
||
with them, by pressing buttons at random, observing their behaviour, pressing more
|
||
buttons, and so on. Only resorting to the manual – or asking “experts” – to confirm
|
||
my beliefs on how the toys work. Now that I am older, I do mostly the same with new
|
||
devices, new tools, and new software. However, now I know that this is an established
|
||
computer science technique, called model learning.
|
||
Model learning2 is an automated technique to construct a state-based model –
|
||
often a type of automaton – from a black box system. The goal of this technique can
|
||
be manifold: it can be used to reverse-engineer a system, to find bugs in it, to verify
|
||
properties of the system, or to understand the system in one way or another. It is not
|
||
just random testing: the information learned during the interaction with the system
|
||
is actively used to guide following interactions. Additionally, the information learned
|
||
can be inspected and analysed.
|
||
This thesis is about model learning and related techniques. In the first part, I
|
||
present results concerning black box testing of automata. Testing is a crucial part in
|
||
learning software behaviour and often remains a bottleneck in applications of model
|
||
learning. In the second part, I show how nominal techniques can be used to learn
|
||
automata over structured infinite alphabets. The study on nominal automata was
|
||
directly motivated by work on learning network protocols which rely on identifiers or
|
||
sequence numbers.
|
||
But before we get ahead of ourselves, we should first understand what we mean by
|
||
learning, as learning means very different things to different people. In educational
|
||
science, learning may involve concepts such as teaching, blended learning, and interdisciplinarity. Data scientists may think of data compression, feature extraction, and
|
||
neural networks. In this thesis we are mostly concerned with software verification.
|
||
But even in the field of verification several types of learning are relevant.
|
||
|
||
1 Model Learning
|
||
In the context of software verification, we often look at stateful computations with
|
||
inputs and outputs. For this reason, it makes sense to look at words, or traces. For an
|
||
alphabet Σ, we denote the set of words by Σ∗ .
|
||
2
|
||
|
||
There are many names for the type of learning, such as active automata learning. The generic name “model
|
||
learning” is chosen as a counterpoint to model checking.
|
||
|
||
2
|
||
|
||
Chapter 1
|
||
|
||
The learning problem is defined as follows. There is some fixed, but unknown,
|
||
language ℒ ⊆ Σ∗ . This language may define the behaviour of a software component,
|
||
a property in model checking, a set of traces from a protocol, etc. We wish to infer a
|
||
description of ℒ after only having observed a small part of this language. For example,
|
||
we may have seen hundred words belonging to the language and a few which do not
|
||
belong to the language. Then concluding with a good description of ℒ is difficult, as
|
||
we are missing information about the infinitely many words we have not observed.
|
||
Such a learning problem can be stated and solved in a variety of ways. In the
|
||
applications we do in our research group, we often try to infer a model of a software
|
||
component. (Chapter 3 describes such an application.) In these cases, a learning algorithm can interact with the software. So it makes sense to study a learning paradigm
|
||
which allows for queries, and not just a data set of samples.
|
||
A typical query learning framework was established by Angluin (1987). In her
|
||
framework, the learning algorithm may pose two types of queries to a teacher, or oracle:
|
||
Membership queries (MQ) The learner poses such a query by providing a word
|
||
w ∈ Σ∗ to the teacher. The teacher will then reply whether w ∈ ℒ or not. This
|
||
type of query is often generalised to more general output, in these cases we consider
|
||
ℒ : Σ∗ → O and the teacher replies with ℒ(w). In some papers, such a query is then
|
||
called an output query.
|
||
Equivalence queries (EQ) The learner can provide a hypothesised description H of
|
||
ℒ to the teacher. If the hypothesis is correct, the teacher replies with yes. If, however,
|
||
the hypothesis is incorrect, the teacher replies with no together with a counterexample,
|
||
i.e., a word which is in ℒ but not in the hypothesis or vice versa.
|
||
By posing many such queries, the learner algorithm is supposed to converge to
|
||
a correct model. This type of learning is hence called exact learning. Angluin (1987)
|
||
showed that one can do this efficiently for deterministic finite automata (DFAs), when
|
||
ℒ is in the class of regular languages.
|
||
It should be clear why this is called query learning or active learning. The learning
|
||
algorithm initiates interaction with the teacher by posing queries, it may construct its
|
||
own data points and ask for their corresponding label. Active learning is in contrast
|
||
to passive learning where all observations are given to the algorithm up front.
|
||
Another paradigm which is relevant for our type of applications is PAC-learning
|
||
with membership queries. Here, the algorithm can again use MQs as before, but the EQs
|
||
are replaced by random sampling. So the allowed query is:
|
||
Random sample queries (EX) If the learner poses this query (there are no parameters), the teacher responds with a random word w together with its label, i.e., whether
|
||
w ∈ ℒ or not. (Here, random means that the words are sampled by some probability
|
||
distribution known to the teacher.)
|
||
|
||
Introduction
|
||
|
||
3
|
||
|
||
Instead of requiring that the learner exactly learns the model, we only require the
|
||
following. The learner should probably return a model which is approximate to the
|
||
target. This gives the name probably approximately correct (PAC). Note that there are
|
||
two uncertainties: the probable and the approximate part. Both parts are bounded by
|
||
parameters, so one can determine the confidence.
|
||
As with many problems in computer science, we are also interested in the efficiency
|
||
of learning algorithms. Instead of measuring time or space, we analyse the number of
|
||
queries posed by an algorithm. Efficiency often means that we require a polynomial
|
||
number of queries. But polynomial in what? The learner has no input, other than the
|
||
access to a teacher. We ask the algorithms to be polynomial in the size of the target (i.e.,
|
||
the size of the description which has yet to be learned). In the case of PAC learning
|
||
we also require it to be polynomial in the two parameters for confidence.
|
||
Deterministic automata can be efficiently learned in the PAC model. In fact, any efficient exact learning algorithm with MQs and EQs can be transformed into an efficient
|
||
PAC algorithm with MQs (see Kearns & Vazirani, 1994, exercise 8.1). For this reason,
|
||
we mostly focus on the former type of learning in this thesis. The transformation
|
||
from exact learning to PAC learning is implemented by simply testing the hypothesis
|
||
with random samples. This can be postponed until we actually implement a learning
|
||
algorithm and apply it.
|
||
When using only EQs, only MQs, or only EXs, then there are hardness results for
|
||
exact learning of DFAs. So the combinations MQs + EQs (for exact learning) and MQs
|
||
+ EXs (for PAC learning) have been carefully picked, they provide a minimal basis
|
||
for efficient learning. See the book of Kearns and Vazirani (1994) for such hardness
|
||
results and more information on PAC learning.
|
||
So far, all the queries are assumed to be just there. Somehow, these are existing
|
||
procedures which we can invoke with MQ(w), EQ(H), or EX(). This is a useful abstraction
|
||
when designing a learning algorithm. One can analyse the complexity (in terms of
|
||
number of queries) independently of how these queries are resolved. Nevertheless,
|
||
at some point in time one has to implement them. In our case of learning software
|
||
behaviour, membership queries are easily implemented: simply provide the word w
|
||
to a running instance of the software and observe the output.3 Equivalence queries,
|
||
however, are in general not doable. Even if we have the (machine) code, it is often way
|
||
too complicated to check equivalence. That is why we resort to testing with EX queries.
|
||
The EX query from PAC learning normally assumes a fixed, unknown probability
|
||
distribution on words. In our case, we choose and implement a distribution to test
|
||
against. This cuts both ways: On the one hand, it allows us to only test behaviour
|
||
we really care about, on the other hand the results are only as good as our choice of
|
||
distribution. We deviate even further from the PAC-model as we sometimes change
|
||
|
||
3
|
||
|
||
In reality, it is a bit harder than this. There are plentiful of challenges to solve, such as timing, choosing
|
||
your alphabet, choosing the kind of observations to make, and being able to reliably reset the software.
|
||
|
||
4
|
||
|
||
Chapter 1
|
||
|
||
our distribution while learning. Yet, as applications show, this is a useful way of
|
||
learning software behaviour.
|
||
|
||
2 Applications of Model Learning
|
||
Since this thesis contains only one real-world application of learning in Chapter 3,
|
||
it is good to mention a few others. Although we remain in the context of learning
|
||
software behaviour, the applications are quite different from each other. This is by no
|
||
means a complete list.
|
||
Bug finding in protocols. A prominent example is by Fiterău-Broștean, et al. (2016).
|
||
They learn models of TCP implementations – both clients and server sides. Interestingly, they found bugs in the (closed source) Windows implementation. Later,
|
||
Fiterău-Broștean and Howar (2017) also found a bug in the sliding window of the
|
||
Linux implementation of TCP. Other protocols have been learned as well, such as the
|
||
MQTT protocol by Tappler, et al. (2017), TLS by de Ruiter and Poll (2015), and SSH
|
||
by Fiterău-Broștean, et al. (2017). Many of these applications reveal bugs by learning
|
||
a model and consequently apply model checking. The combination of learning and
|
||
model checking was first described by Peled, et al. (2002).
|
||
Bug finding in smart cards. Aarts, et al. (2013) learn the software on smart cards
|
||
of several Dutch and German banks. These cards use the EMV protocol, which is
|
||
run on the card itself. So this is an example of a real black box system, where no
|
||
other monitoring is possible and no code is available. No vulnerabilities were found,
|
||
although each card had a slightly different state machine. The e.dentifier, a card
|
||
reader implementing a challenge-response protocol, has been learned by Chalupar, et
|
||
al. (2014). They built a Lego machine which could automatically press buttons and
|
||
the researchers found a security flaw in this card reader.
|
||
Regression testing. Hungar, et al. (2003) describe the potential of automata learning
|
||
in regression testing. The aim is not to find bugs, but to monitor the development
|
||
process of a system. By considering the differences between models at different stages,
|
||
one can generate regressions tests.
|
||
Refactoring legacy software. Model learning can also be used in order to verify
|
||
refactored software. Schuts, et al. (2016) have applied this at a project within Philips.
|
||
They learn both an old version and a new version of the same component. By comparing the learned models, some differences could be seen. This gave developers
|
||
opportunities to solve problems before replacing the old component by the new one.
|
||
|
||
Introduction
|
||
|
||
5
|
||
|
||
3 Research challenges
|
||
In this thesis, we will mostly see learning of deterministic automata or Mealy machines.
|
||
Although this is limited, as many pieces of software require richer models, it has been
|
||
successfully applied in the above examples. The limitations include the following.
|
||
– The system behaves deterministically.
|
||
– One can reliably reset the system.
|
||
– The system can be modelled with a finite state space. This also means that the
|
||
model does not incorporate time or data.
|
||
– The input alphabet is finite.
|
||
– One knows when the target is reached.
|
||
Research challenge 1: Approximating equivalence queries. Having confidence
|
||
in a learned model is difficult. We have PAC guarantees (as discussed before), but
|
||
sometimes we may want to draw other conclusions. For example, we may require the
|
||
hypothesis to be correct, provided that the real system is implemented with a certain
|
||
number of states. Efficiency is important here: We want to obtain those guarantees
|
||
fast and we want to quickly find counterexamples when the hypothesis is wrong. Test
|
||
generation methods is the topic of the first part in this thesis. We will review existing
|
||
algorithms and discuss new algorithms for test generation.
|
||
Research challenge 2: Generalisation to infinite alphabets. Automata over infinite
|
||
alphabets are very useful for modelling protocols which involve identifiers or timestamps. Not only is the alphabet infinite in these cases, the state space is as well, since
|
||
the values have to be remembered. In the second part of this thesis, we will see how
|
||
nominal techniques can be used to tackle this challenge.
|
||
Being able to learn automata over an infinite alphabet is not new. It has been
|
||
tackled, for instance, by Howar, et al. (2012), Bollig, et al. (2013) and in the theses
|
||
of Aarts (2014), Cassel (2015), and Fiterău-Broștean (2018). In the first thesis, the
|
||
problem is solved by considering abstractions, which reduces the alphabet to a finite
|
||
one. These abstractions are automatically refined when a counterexample is presented
|
||
to the algorithms. Fiterău-Broștean (2018) extends this approach to cope with “fresh
|
||
values”, crucial for protocols such as TCP. In the thesis by Cassel (2015), another
|
||
approach is taken. The queries are changed to tree queries. The approach in my thesis
|
||
will be based on symmetries, which gives yet another perspective into the problem of
|
||
learning such automata.
|
||
|
||
4 Black Box Testing
|
||
An important step in automata learning is equivalence checking. Normally, this is
|
||
abstracted away and done by an oracle, but we intend to implement such an oracle
|
||
|
||
6
|
||
|
||
Chapter 1
|
||
|
||
ourselves for our applications. Concretely, the problem we need to solve is that of
|
||
conformance checking4 as it was first described by Moore (1956).
|
||
The problem is as follows: Given the description of a finite state machine and a
|
||
black box system, does the system behave exactly as the description? We wish to
|
||
determine this by running experiments on the system (as it is black box). It should
|
||
be clear that this is a hopelessly difficult task, as an error can be hidden arbitrarily
|
||
deep in the system. That is why we often assume some knowledge of the system. In
|
||
this thesis we often assume a bound on the number of states of the system. Under these
|
||
conditions, Moore (1956) already solved the problem. Unfortunately, his experiment
|
||
is exponential in size, or in his own words: “fantastically large.”
|
||
Years later, Chow (1978) and Vasilevskii (1973) independently designed efficient
|
||
experiments. In particular, the set of experiments is polynomial in the number of
|
||
states. These techniques will be discussed in detail in Chapter 2. More background
|
||
and other related problems, as well as their complexity results, are well exposed in a
|
||
survey of Lee and Yannakakis (1994).
|
||
|
||
slow
|
||
spinning
|
||
|
||
<EFBFBD>
|
||
|
||
fast
|
||
spinning
|
||
|
||
<EFBFBD>
|
||
|
||
<EFBFBD> <20>
|
||
|
||
<EFBFBD> <20>
|
||
|
||
no
|
||
spinning
|
||
|
||
<EFBFBD> <20>
|
||
|
||
<EFBFBD>
|
||
|
||
<EFBFBD> <20>
|
||
|
||
no
|
||
spinning
|
||
|
||
<EFBFBD>
|
||
|
||
Figure 1.1 Behaviour of a record player
|
||
modelled as a finite state machine.
|
||
To give an example of conformance checking, we model a record player as a finite
|
||
state machine. We will not model the audible output – that would depend not only
|
||
on the device, but also the record one chooses to play5. Instead, the only observation
|
||
we can make is looking how fast the turntable spins. The device has two buttons: a
|
||
|
||
4
|
||
5
|
||
|
||
Also known as machine verification or fault detection.
|
||
In particular, we have to add time to the model as one side of a record only lasts for roughly 25 minutes.
|
||
Unless we take a record with sound on the locked groove such as the Sgt. Pepper’s Lonely Hearts Club Band
|
||
album by The Beatles.
|
||
|
||
Introduction
|
||
|
||
7
|
||
|
||
start-stop button ( ) and a speed button (<28> ) which toggles between 33 13 rpm and
|
||
<EFBFBD> <20>
|
||
|
||
45 rpm. When turned on, the system starts playing immediately at 33 13 rpm – this
|
||
|
||
is useful for DJing. The intended behaviour of the record player has four states as
|
||
depicted in Figure 1.1.
|
||
Let us consider some faults which could be present in an implementation with
|
||
four states. In Figure 1.2, two flawed record players are given. In the first (Figure 1.2a),
|
||
the sequence <20> <20> leads us to the wrong state. However, this is not immediately
|
||
observable, the turntable is in a non-spinning state as it should be. The fault is only
|
||
visible when we press
|
||
once more: now the turntable is spinning fast instead of
|
||
slow. The sequence <20> <20>
|
||
is a counterexample. In the second example (Figure 1.2b),
|
||
the fault is again not immediately obvious: after pressing <20>
|
||
we are in the wrong
|
||
state as observed by pressing . Here, the counterexample is <20>
|
||
.
|
||
When a model of the implementation is given, it is not hard to find counterexamples. However, in a black box setting we do not have such a model. In order test
|
||
whether a black box system is equivalent to a model, we somehow need to test all
|
||
possible counterexamples. In this example, a test suite should include sequences such
|
||
as <20> <20>
|
||
and <20>
|
||
.
|
||
<EFBFBD> <20>
|
||
|
||
<EFBFBD> <20>
|
||
|
||
<EFBFBD> <20>
|
||
|
||
<EFBFBD> <20>
|
||
|
||
<EFBFBD> <20>
|
||
|
||
<EFBFBD> <20>
|
||
|
||
<EFBFBD> <20>
|
||
|
||
<EFBFBD> <20>
|
||
|
||
<EFBFBD> <20>
|
||
|
||
slow
|
||
spinning
|
||
|
||
<EFBFBD> <20>
|
||
|
||
<EFBFBD> <20>
|
||
|
||
<EFBFBD>
|
||
|
||
fast
|
||
spinning
|
||
|
||
<EFBFBD>
|
||
|
||
slow
|
||
spinning
|
||
|
||
<EFBFBD>
|
||
|
||
<EFBFBD> <20>
|
||
|
||
<EFBFBD> <20>
|
||
|
||
fast
|
||
spinning
|
||
|
||
<EFBFBD>
|
||
|
||
<EFBFBD> <20>
|
||
|
||
<EFBFBD> <20>
|
||
|
||
<EFBFBD> <20>
|
||
|
||
<EFBFBD> <20>
|
||
|
||
<EFBFBD> <20>
|
||
|
||
<EFBFBD> <20>
|
||
<EFBFBD> <20>
|
||
|
||
no
|
||
spinning
|
||
|
||
no
|
||
spinning
|
||
|
||
<EFBFBD>
|
||
|
||
no
|
||
spinning
|
||
|
||
<EFBFBD>
|
||
|
||
no
|
||
spinning
|
||
|
||
<EFBFBD>
|
||
|
||
<EFBFBD>
|
||
|
||
(a)
|
||
Figure 1.2
|
||
|
||
(b)
|
||
Two faulty record players.
|
||
|
||
5 Nominal Techniques
|
||
In the second part of this thesis, I will present results related to nominal automata.
|
||
Usually, nominal techniques are introduced in order to solve problems involving
|
||
name binding in topics like lambda calculus. However, we use them in automata
|
||
|
||
8
|
||
|
||
Chapter 1
|
||
|
||
theory, specifically to model register automata. These are automata which have an
|
||
infinite alphabet, often thought of as input actions with data. The control flow of the
|
||
automaton may actually depend on the data. However, the data cannot be used in an
|
||
arbitrary way as this would lead to many decision problems, such as emptiness and
|
||
equivalence, being undecidable.6 A principal concept in nominal techniques is that of
|
||
symmetries.
|
||
To motivate the use of symmetries, we will look at an example of a register auttomaton. In the following automaton we model a (not-so-realistic) login system for a
|
||
single person. The alphabet consists of the following actions:
|
||
|
||
sign-up(p)
|
||
login(p)
|
||
|
||
logout()
|
||
view()
|
||
|
||
The sign-up action allows one to set a password p. This can only be done when the
|
||
system is initialised. The login and logout actions speak for themselves and the
|
||
view action allows one to see the secret data (we abstract away from what the user
|
||
actually gets to see here). A simple automaton with roughly this behaviour is given
|
||
in Figure 1.3. We will only informally discuss its semantics for now.
|
||
|
||
q0
|
||
|
||
∗/<2F>
|
||
|
||
sign-up(p) / <20>
|
||
set r ≔ p
|
||
|
||
login(p) / <20>
|
||
if r = p
|
||
q1
|
||
r
|
||
|
||
q2
|
||
r
|
||
|
||
logout() / <20>
|
||
∗/<2F>
|
||
|
||
view() / <20>
|
||
∗/<2F>
|
||
|
||
Figure 1.3 A simple register automaton. The symbol ∗ denotes any input
|
||
otherwise not specified. The r in states q1 and q2 is a register.
|
||
To model the behaviour, we want the domain of passwords to be infinite. After all,
|
||
one should allow arbitrarily long passwords to be secure. This means that a register
|
||
automaton is actually an automaton over an infinite alphabet.
|
||
Common algorithms for automata, such as learning, will not work with an infinite
|
||
alphabet. Any loop which iterates over the alphabet will diverge. In order to cope
|
||
with this, we will use the symmetries present in the alphabet.
|
||
Let us continue with the example and look at its symmetries. If a person signs up
|
||
with a password “hello” and consequently logins with “hello”, then this is not distinguishable from a person signing up and logging in with “bye”. This is an example
|
||
of symmetry: the values “hello” and “bye” can be permuted, or interchanged. Note,
|
||
however, that the trace sign-up(hello) login(bye) is different from the two before:
|
||
6
|
||
|
||
The class of automata with arbitrary data operations is sometimes called extended finite state machines.
|
||
|
||
Introduction
|
||
|
||
9
|
||
|
||
no permutation of “hello” and “bye” will bring us to a logged-in state with that trace.
|
||
So we see that, despite the symmetry, we cannot simply identify the value “hello”
|
||
and “bye”. For this reason, we keep the alphabet infinite and explicitly mention its
|
||
symmetries.
|
||
Using symmetries in automata theory is not a new idea. In the context of model
|
||
checking, the first to use symmetries were Emerson and Sistla (1996) and Ip and
|
||
Dill (1996). But only Ip and Dill (1996) used it to deal with infinite data domains.
|
||
For automata learning with infinite domains, symmetries were used by Sakamoto
|
||
(1997). He devised an L∗ learning algorithm for register automata, much like the
|
||
one presented in Chapter 5. The symmetries are crucial to reduce the problem to
|
||
a finite alphabet and use the regular L∗ algorithm. (Chapter 5 shows how to do it
|
||
with more general symmetries.) Around the same time Ferrari, et al. (2005) worked
|
||
on automata theoretic algorithms for the π-calculus. Their approach was based on
|
||
the same symmetries and they developed a theory of named sets to implement their
|
||
algorithms. Named sets are equivalent to nominal sets. However, nominal sets are
|
||
defined in a more elementary way. The nominal sets we will soon see are introduced
|
||
by Gabbay and Pitts (2002) to solve certain problems in name binding in abstract
|
||
syntaxes. Although this is not really related to automata theory, it was picked up
|
||
by Bojańczyk, et al. (2014), who provide an equivalence between register automata
|
||
and nominal automata. (This equivalence is exposed in more detail in the book of
|
||
Bojańczyk, 2018.) Additionally, they generalise the work on nominal sets to other
|
||
symmetries.
|
||
The symmetries we encounter in this thesis are listed below, but other symmetries
|
||
can be found in the literature. The symmetry directly corresponds to the data values
|
||
(and operations) used in an automaton. The data values are often called atoms.
|
||
– The equality symmetry. Here the domain can be any countably infinite set. We can
|
||
take, for example, the set of strings we used before as the domain from which we
|
||
take passwords. No further structure is used on this domain, meaning that any
|
||
value is just as good as any other. The symmetries therefore consist of all bijections
|
||
on this domain.
|
||
– The total order symmetry. In this case, we take a countable infinite set with a dense
|
||
total order. Typically, this means we use the rational numbers, ℚ, as data values
|
||
and symmetries which respect the ordering.
|
||
|
||
5.1 What is a nominal set?
|
||
So what exactly is a nominal set? I will not define it here and leave the formalities to
|
||
the corresponding chapters. It suffices, for now, to think of nominal sets as abstract
|
||
sets (often infinite) on which a group of symmetries acts. This action makes it possible
|
||
to interpret the symmetries of the data values in the abstract set. For automata, this
|
||
|
||
10
|
||
|
||
Chapter 1
|
||
|
||
allows us to talk about symmetries on the state space, the set of transitions, and the
|
||
alphabet.
|
||
In order to implement these sets algorithmically, we impose two finiteness requirements. Both properties can be expressed using only the group action.
|
||
– Each element is finitely supported. A way to think of this requirement is that each
|
||
element is “constructed” out of finitely many data values.
|
||
– The set is orbit-finite. This means that we can choose finitely many elements such
|
||
that any other element is a permuted version of one of those elements.
|
||
If we wish to model the automaton from Figure 1.3 as a nominal automaton, then we
|
||
can simply define the state space as
|
||
Q = {q0 } ∪ {q1,a | a ∈ 𝔸} ∪ {q2,a | a ∈ 𝔸},
|
||
|
||
where 𝔸 is the set of atoms. In this example, 𝔸 is the set of all possible passwords.
|
||
The set Q is infinite, but satisfies the two finiteness requirements.
|
||
The upshot of doing this is that the set Q (and transition structure) corresponds
|
||
directly to the semantics of the automaton. We do not have to encode how values relate
|
||
or how they interact. Instead, the set (and transition structure) defines all we need
|
||
to know. Algorithms, such as reachability, minimisation, and learning, can be run on
|
||
such automata, despite the sets being infinite. These algorithms can be implemented
|
||
rather easily by using a libraries such as Nλ, Lois, or Ons from Chapter 6. These
|
||
libraries implement a data structure for nominal sets, and provide ways to iterate over
|
||
such (infinite) sets.
|
||
One has to be careful as not all results from automata theory transfer to nominal automata. A notable example is the powerset construction, which converts a
|
||
non-deterministic automaton into a deterministic one. The problem here is that the
|
||
powerset of a set is generally not orbit-finite and so the finiteness requirement is not
|
||
met. Consequently, languages accepted by nominal DFAs are not closed under Kleene
|
||
star, or even concatenation.
|
||
|
||
6 Contributions
|
||
This thesis is split into two parts. Part 1 contains material about black box testing, while
|
||
Part 2 is about nominal techniques. The chapters can be read in isolation. However,
|
||
the chapters do get more technical and mathematical – especially in Part 2.
|
||
Detailed discussion on related work and future directions of research are presented
|
||
in each chapter.
|
||
Chapter 2: FSM-based test methods. This chapter introduces test generation methods which can be used for learning or conformance testing. The methods are presented
|
||
|
||
Introduction
|
||
|
||
11
|
||
|
||
in a uniform way, which allows to give a single proof of completeness for all these methods. Moreover, the uniform presentation gives room to develop new test generation
|
||
methods. The main contributions are:
|
||
– Uniform description of known methods: Theorem 26 (p. 35)
|
||
– A new proof of completeness: Section 5 (p. 36)
|
||
– New algorithm (hybrid ADS) and its implementation: Section 3.2 (p. 34)
|
||
Chapter 3: Applying automata learning to embedded control software. In this
|
||
chapter we will apply model learning to an industrial case study. It is a unique
|
||
benchmark as it is much bigger than any of the applications seen before (3410 states
|
||
and 77 inputs). This makes it challenging to learn a model and the main obstacle is
|
||
finding counterexamples. The main contribution is:
|
||
– Application of the hybrid ADS algorithm: Section 2.2 (p. 49)
|
||
– Succesfully learn a large-scale system: Section 2.3 (p. 51)
|
||
This is based on the following publication:
|
||
Smeenk, W., Moerman, J., Vaandrager, F. W., & Jansen, D. N. (2015). Applying Automata Learning to Embedded Control Software. In Formal Methods and Software
|
||
Engineering - 17th International Conference on Formal Engineering Methods, ICFEM, Proceedings. Springer. doi:10.1007/978-3-319-25423-4_5.
|
||
Chapter 4: Minimal separating sequences for all pairs of states. Continuing on
|
||
test generation methods, this chapter presents an efficient algorithm to construct
|
||
separating sequences. Not only is the algorithm efficient – it runs in 𝒪(n log n) time – it
|
||
also constructs minimal length sequences. The algorithm is inspired by a minimisation
|
||
algorithm by Hopcroft (1971), but extending it to construct witnesses is non-trivial.
|
||
The main contributions are:
|
||
– Efficient algorithm for separating sequences: Algorithms 4.2 & 4.4 (p. 66 & 68)
|
||
– Applications to black box testing: Section 4 (p. 70)
|
||
– Implementation: Section 5 (p. 71)
|
||
This is based on the following publication:
|
||
Smetsers, R., Moerman, J., & Jansen, D. N. (2016). Minimal Separating Sequences for
|
||
All Pairs of States. In Language and Automata Theory and Applications - 10th International
|
||
Conference, LATA, Proceedings. Springer. doi:10.1007/978-3-319-30000-9_14.
|
||
Chapter 5: Learning nominal automata. In this chapter, we show how to learn
|
||
automata over infinite alphabets. We do this by translating the L∗ algorithm directly to
|
||
a nominal version, νL∗ . The correctness proofs mimic the original proofs by Angluin
|
||
|
||
12
|
||
|
||
Chapter 1
|
||
|
||
(1987). Since our new algorithm is close to the original, we are able to translate
|
||
variants of the L∗ algorithm as well. In particular, we provide a learning algorithm for
|
||
nominal non-deterministic automata. The main contributions are:
|
||
–
|
||
–
|
||
–
|
||
–
|
||
|
||
L∗ -algorithm for nominal automata: Section 3 (p. 86)
|
||
Its correctness and complexity: Theorem 7 & Corollary 11 (p. 89 & 93)
|
||
Generalisation to non-deterministic automata: Section 4.2 (p. 96)
|
||
Implementation in Nλ: Section 5.2 (p. 103)
|
||
|
||
This is based on the following publication:
|
||
Moerman, J., Sammartino, M., Silva, A., Klin, B., & Szynwelski, M. (2017). Learning
|
||
nominal automata. In Proceedings of the 44th ACM SIGPLAN Symposium on Principles
|
||
of Programming Languages, POPL. ACM. doi:10.1145/3009837.3009879.
|
||
Chapter 6: Fast computations on ordered nominal sets. In this chapter, we provide a
|
||
library to compute with nominal sets. We restrict our attention to nominal sets over the
|
||
total order symmetry. This symmetry allows for a rather easy characterisation of orbits,
|
||
and hence an easy implementation. We experimentally show that it is competitive
|
||
with existing tools, which are based on SMT solvers. The main contributions are:
|
||
– Characterisation theorem of orbits: Table 6.1 (p. 118)
|
||
– Complexity results: Theorems 18 & 21 (p. 119 and 123)
|
||
– Implementation: Section 3 (p. 118)
|
||
This is based on the following publication:
|
||
Venhoek, D., Moerman, J., & Rot, J. (2018). Fast Computations on Ordered Nominal Sets. In Theoretical Aspects of Computing - ICTAC - 15th International Colloquium,
|
||
Proceedings. Springer. doi:10.1007/978-3-030-02508-3_26.
|
||
Chapter 7: Separation and Renaming in Nominal Sets. We investigate how to
|
||
reduce the size of certain nominal automata. This is based on the observation that
|
||
some languages (with outputs) are not just invariant under symmetries, but invariant
|
||
under arbitrary transformations, or renamings. We define a new type of automaton, the
|
||
separated nominal automaton, and show that they exactly accept those languages which
|
||
are closed under renamings. All of this is shown by using a theoretical framework:
|
||
we establish a strong relationship between nominal sets on one hand, and nominal
|
||
renaming sets on the other. The main contributions are:
|
||
– Adjunction between nominal sets and renaming sets: Theorem 16 (p. 138)
|
||
– This adjunction is monoidal: Theorem 17 (p. 139)
|
||
– Separated automata have reduced state space: Example 36 (p. 147)
|
||
This is based on a paper under submission:
|
||
|
||
Introduction
|
||
|
||
13
|
||
|
||
Moerman, J. & Rot, J. (2019). Separation and Renaming in Nominal Sets. (Under submission).
|
||
Besides these chapters in this thesis, I have published the following papers. These
|
||
are not included in this thesis, but a short summary of those papers is presented
|
||
below.
|
||
Complementing Model Learning with Mutation-Based Fuzzing. Our group at
|
||
the Radboud University participated in the RERS challenge 2016. This is a challenge
|
||
where reactive software is provided and researchers have to asses validity of certain
|
||
properties (given as LTL specifications). We approached this with model learning:
|
||
Instead of analysing the source code, we simply learned the external behaviour, and
|
||
then used model checking on the learned model. This has worked remarkably well,
|
||
as the models of the external behaviour are not too big. Our results were presented at
|
||
the RERS workshop (ISOLA 2016). The report can be found on arΧiv:
|
||
Smetsers, R., Moerman, J., Janssen, M., & Verwer, S. (2016). Complementing Model
|
||
Learning with Mutation-Based Fuzzing. CoRR, abs/1611.02429. Retrieved from http:/
|
||
/arxiv.org/abs/1611.02429.
|
||
𝐧-Complete test suites for IOCO.
|
||
|
||
In this paper, we investigate complete test suites
|
||
for labelled transition systems (LTSs), instead of deterministic Mealy machines. This
|
||
is a much harder problem than conformance testing of deterministic systems. The
|
||
system may adversarially avoid certain states the tester wishes to test. We provide a test
|
||
suite which is n-complete (provided the implementation is a suspension automaton).
|
||
My main personal contribution here is the proof of completeness, which resembles the
|
||
proof presented in Chapter 2 closely. The conference paper was presented at ICTSS:
|
||
van den Bos, P., Janssen, R., & Moerman, J. (2017). n-Complete Test Suites for IOCO.
|
||
In ICTSS 2017 Proceedings. Springer. doi:10.1007/978-3-319-67549-7_6.
|
||
An extended version has appeared in:
|
||
van den Bos, P., Janssen, R., & Moerman, J. (2018). n-Complete Test Suites for IOCO.
|
||
Software Quality Journal. Advanced online publication. doi:10.1007/s11219-018-9422-x.
|
||
Learning Product Automata. In this article, we consider Moore machines with
|
||
multiple outputs. These machines can be decomposed by projecting on each output,
|
||
resulting in smaller components that can be learned with fewer queries. We give
|
||
experimental evidence that this is a useful technique which can reduce the number of
|
||
queries substantially. This is all motivated by the idea that compositional methods are
|
||
widely used throughout engineering and that we should use this in model learning.
|
||
This work was presented at ICGI 2018:
|
||
|
||
14
|
||
|
||
Chapter 1
|
||
|
||
Moerman, J. (2019). Learning Product Automata. In International Conference on Grammatical Inference, ICGI, Proceedings. Proceedings of Machine Learning Research. (To
|
||
appear).
|
||
|
||
7 Conclusion and Outlook
|
||
With the current tools for model learning, it is possible to learn big state machines
|
||
of black box systems. It involves using the clever algorithms for learning (such as
|
||
the TTT algorithm by Isberner, 2015) and efficient testing methods (see Chapter 2).
|
||
However, as the industrial case study from Chapter 3 shows, the bottleneck is often
|
||
in conformance testing.
|
||
In order to improve on this bottleneck, one possible direction is to consider ‘grey
|
||
box testing.’ The methods discussed in this thesis are all black box methods, this could
|
||
be considered as ‘too pessimistic.’ Often, we do have (parts of the) source code and
|
||
we do know relationships between different inputs. A question for future research
|
||
is how this additional information can be integrated in a principled manner in the
|
||
learning and testing of systems.
|
||
Black box testing still has theoretical challenges. Current generalisations to nondeterministic systems or language inclusion (such as black box testing for IOCO) often
|
||
need exponentially big test suites. Whether this is necessary is unknown (to me): we
|
||
only have upper bounds but no lower bounds. An interesting approach could be to
|
||
see if there exists a notion of reduction between test suites. This is analogous to the
|
||
reductions used in complexity theory to prove hardness of problems, or reductions
|
||
used in PAC theory to prove learning problems to be inherently unpredictable.
|
||
Another path taken in this thesis is the research on nominal automata. This was
|
||
motivated by the problem of learning automata over infinite alphabets. So far, the
|
||
results on nominal automata are mostly theoretical in nature. Nevertheless, we show
|
||
that the nominal algorithms can be implemented and that they can be run concretely
|
||
on black box systems (Chapter 5). The advantage of using the foundations of nominal
|
||
sets is that the algorithms are closely related to the original L∗ algorithm. Consequently,
|
||
variations of L∗ can easily be implemented. For instance, we show that NL∗ algorithm
|
||
for non-deterministic automata works in the nominal case too. (We have not attempted
|
||
to implement more recent algorithms such as TTT.) The nominal learning algorithms
|
||
can be implemented in just a few hundreds lines of code, much less than the approach
|
||
taken by, e.g., Fiterău-Broștean (2018).
|
||
In this thesis, we tackle some efficiency issues when computing with nominal sets.
|
||
In Chapter 6 we characterise orbits in order to give an efficient representation (for the
|
||
total-order symmetry). Another result is the fact that some nominal automata can be
|
||
‘compressed’ to separated automata, which can be exponentially smaller (Chapter 7).
|
||
However, the nominal tools still leave much to be desired in terms of efficiency.
|
||
|
||
Introduction
|
||
|
||
15
|
||
|
||
Last, it would be interesting to marry the two paths taken in this thesis. I am
|
||
not aware of n-complete test suites for register automata or nominal automata. The
|
||
results on learning nominal automata in Chapter 5 show that this should be possible,
|
||
as an observation table gives a test suite.7 However, there is an interesting twist to
|
||
this problem. The test methods from Chapter 2 can all account for extra states. For
|
||
nominal automata, we should be able to cope with extra states and extra registers. It
|
||
would be interesting to see how the test suite grows as these two dimensions increase.
|
||
|
||
7
|
||
|
||
The rows of a table are access sequences, and the columns provide a characterisation set.
|
||
|
||
16
|
||
|
||
Chapter 1
|
||
|
||
Part 1:
|
||
Testing Techniques
|
||
|
||
18
|
||
|
||
Chapter
|
||
|
||
Chapter 2
|
||
FSM-based Test Methods
|
||
In this chapter, we will discuss some of the theory of test generation methods for black
|
||
box conformance checking. Since the systems we consider are black box, we cannot
|
||
simply determine equivalence with a specification. The only way to gain confidence
|
||
is to perform experiments on the system. A key aspect of test generation methods is
|
||
the size and completeness of the test suites. On one hand, we want to cover as much
|
||
as the specification as possible, hopefully ensuring that we find mistakes in any faulty
|
||
implementation. On the other hand: testing takes time, so we want to minimise the
|
||
size of a test suite.
|
||
The test methods described here are well-known in the literature of FSM-based
|
||
testing. They all share similar concepts, such as access sequences and state identifiers. In
|
||
this chapter we will define these concepts, relate them with one another and show
|
||
how to build test suites from these concepts. This theoretically discussion is new and
|
||
enables us to compare the different methods uniformly. For instance, we can prove
|
||
all these methods to be n-complete with a single proof.
|
||
The discussion also inspired a new algorithm: the hybrid ADS methods. This method
|
||
is applied to an industrial case study in Chapter 3. It combines the strength of the
|
||
ADS method (which is not always applicable) with the generality of the HSI method.
|
||
This chapter starts with the basics: Mealy machines, sequences and what it means
|
||
to test a black box system. Then, starting from Section 1.3 we define several concepts,
|
||
such as state identifiers, in order to distinguish one state from another. These concepts
|
||
are then combined in Section 2 to derive test suites. In a similar vein, we define a novel
|
||
test method in Section 3 and we discuss some of the implementation details of the
|
||
hybrid-ads tool. We summarise the various test methods in Section 4. All methods
|
||
are proven to be n-complete in Section 5. Finally, in Section 6, we discuss related work.
|
||
|
||
1 Mealy machines and sequences
|
||
We will focus on Mealy machines, as those capture many protocol specifications and
|
||
reactive systems.
|
||
We fix finite alphabets I and O of inputs respectively outputs. We use the usual
|
||
notation for operations on sequences (also called words): uv for the concatenation of
|
||
two sequences u, v ∈ I∗ and |u| for the length of u. For a sequence w = uv we say that
|
||
u and v are a prefix and suffix respectively.
|
||
|
||
20
|
||
|
||
Chapter 2
|
||
|
||
Definition 1. A (deterministic and complete) Mealy machine M consists of a finite
|
||
set of states S, an initial state s0 ∈ S and two functions:
|
||
– a transition function δ : S × I → S, and
|
||
– an output function λ : S × I → O.
|
||
Both the transition function and output function are extended inductively to sequences
|
||
as δ : S × I∗ → S and λ : S × I∗ → O∗ :
|
||
δ(s, ϵ) = s
|
||
δ(s, aw) = δ(δ(s, a), w)
|
||
|
||
λ(s, ϵ) = ϵ
|
||
λ(s, aw) = λ(s, a)λ(δ(s, a), w)
|
||
|
||
The behaviour of a state s is given by the output function λ(s, −) : I∗ → O∗ . Two states
|
||
s and t are equivalent if they have equal behaviours, written s ∼ t, and two Mealy
|
||
machines are equivalent if their initial states are equivalent.
|
||
Remark 2. We will use the following conventions and notation. We often write
|
||
s ∈ M instead of s ∈ S and for a second Mealy machine M′ its constituents are denoted
|
||
S′ , s′0 , δ′ and λ′ . Moreover, if we have a state s ∈ M, we silently assume that s is not
|
||
a member of any other Mealy machine M′ . (In other words, the behaviour of s is
|
||
determined by the state itself.) This eases the notation since we can write s ∼ t without
|
||
needing to introduce a context.
|
||
An example Mealy machine is given in Figure 2.1.
|
||
s1
|
||
|
||
a/0
|
||
|
||
a/1
|
||
|
||
s2
|
||
|
||
b/0, c/0
|
||
|
||
a/1
|
||
|
||
b/0, c/0
|
||
|
||
b/0
|
||
|
||
s0
|
||
|
||
c/0
|
||
|
||
c/0
|
||
|
||
b/0
|
||
|
||
s3
|
||
|
||
a/1
|
||
|
||
s4
|
||
|
||
c/1
|
||
|
||
a/1
|
||
|
||
b/0
|
||
|
||
Figure 2.1 An example specification with
|
||
input I = {a, b, c} and output O = {0, 1}.
|
||
|
||
1.1 Testing
|
||
In conformance testing we have a specification modelled as a Mealy machine and
|
||
an implementation (the system under test, or SUT) which we assume to behave as
|
||
a Mealy machine. Tests, or experiments, are generated from the specification and
|
||
applied to the implementation. We assume that we can reset the implementation
|
||
before every test. If the output is different from the specified output, then we know
|
||
|
||
FSM-based Test Methods
|
||
|
||
21
|
||
|
||
the implementation is flawed. The goals is to test as little as possible, while covering
|
||
as much as possible.
|
||
A test suite is nothing more than a set of sequences. We do not have to encode
|
||
outputs in the test suite, as those follow from the deterministic specification.
|
||
Definition 3.
|
||
|
||
A test suite is a finite subset T ⊆ I∗ .
|
||
|
||
A test t ∈ T is called maximal if it is not a proper prefix of another test s ∈ T. We denote
|
||
the set of maximal tests of T by max(T ). The maximal tests are the only tests in T we
|
||
actually have to apply to our SUT as we can record the intermediate outputs. In the
|
||
examples of this chapter we will show max(T ) instead of T.
|
||
We define the size of a test suite as usual (Dorofeeva, et al., 2010 and Petrenko,
|
||
et al., 2014). The size of a test suite is measured as the sum of the lengths of all its
|
||
maximal tests plus one reset per test.
|
||
Definition 4.
|
||
|
||
The size of a test suite T is defined to be ‖T ‖ =
|
||
|
||
(|t| + 1).
|
||
|
||
∑
|
||
|
||
t∈max(T )
|
||
|
||
1.2 Completeness of test suites
|
||
Example 5. No test suite is complete. Consider the specification in Figure 2.2a.
|
||
This machine will always outputs a cup of coffee – when given money. For any test
|
||
suite we can make a faulty implementation which passes the test suite. A faulty
|
||
implementation might look like Figure 2.2b, where the machine starts to output beers
|
||
after n steps (signalling that it’s the end of the day), where n is larger than the length
|
||
of the longest sequence in the suite. This shows that no test-suite can be complete and
|
||
it justifies the following definition.
|
||
|
||
<EFBFBD>
|
||
|
||
/<2F>
|
||
s0
|
||
|
||
(a)
|
||
|
||
<EFBFBD>
|
||
|
||
s′0
|
||
|
||
<EFBFBD>
|
||
|
||
/<2F>
|
||
|
||
s′1
|
||
|
||
<EFBFBD>
|
||
|
||
⋯
|
||
|
||
/<2F>
|
||
|
||
<EFBFBD>
|
||
|
||
/<2F>
|
||
|
||
/<2F>
|
||
s′n
|
||
|
||
(b)
|
||
|
||
Figure 2.2 A basic example showing that finite test suites are incomplete. The implementation on the right will pass any test suite if we choose n big enough.
|
||
Definition 6. Let M be a Mealy machine and T be a test suite. We say that T is
|
||
m-complete (for M) if for all inequivalent machines M′ with at most m states there
|
||
exists a t ∈ T such that λ(s0 , t) ≠ λ′ (s′0 , t).
|
||
We are often interested in the case of m-completeness, where m = n + k for some
|
||
k ∈ ℕ and n is the number of states in the specification. Here k will stand for the
|
||
number of extra states we can test.
|
||
|
||
22
|
||
|
||
Chapter 2
|
||
|
||
Note the order of the quantifiers in the above definition. We ask for a single test
|
||
suite which works for all implementations of bounded size. This is crucial for black
|
||
box testing, as we do not know the implementation, so the test suite has to work for
|
||
all of them.
|
||
|
||
1.3 Separating Sequences
|
||
Before we construct test suites, we discuss several types of useful sequences. All the
|
||
following notions are standard in the literature, and the corresponding references
|
||
will be given in Section 2, where we discuss the test generation methods using these
|
||
notions. We fix a Mealy machine M for the remainder of this chapter.
|
||
Definition 7. We define the following kinds of sequences.
|
||
– Given two states s, t in M we say that w is a separating sequence if λ(s, w) ≠ λ(t, w).
|
||
– For a single state s in M, a sequence w is a unique input output sequence (UIO) if for
|
||
every inequivalent state t in M we have λ(s, w) ≠ λ(t, w).
|
||
– Finally, a (preset) distinguishing sequence (DS) is a single sequence w which separates all states of M, i.e., for every pair of inequivalent states s, t in M we have
|
||
λ(s, w) ≠ λ(t, w).
|
||
The above list is ordered from weaker to stronger notions, i.e., every distinguishing
|
||
sequence is an UIO sequence for every state. Similarly, an UIO for a state s is a
|
||
separating sequence for s and any inequivalent t. Separating sequences always exist
|
||
for inequivalent states and finding them efficiently is the topic of Chapter 4. On the
|
||
other hand, UIOs and DSs do not always exist for a machine.
|
||
A machine M is minimal if every distinct pair of states is inequivalent (i.e.,
|
||
s ∼ t ⟹ s = t). We will not require M te be minimal, although this is often
|
||
done in literature. Minimality is sometimes convenient, as one can write ‘every other
|
||
state t’ instead of ‘every inequivalent state t’.
|
||
Example 8. For the machine in Figure 2.1, we note that state s0 and s2 are separated
|
||
by the sequence aa (but not by any shorter sequence). In fact, the sequence aa is an
|
||
UIO for state s0 since it is the only state outputting 10 on that input. However, state
|
||
s2 has no UIO: If the sequence were to start with b or c, state s3 and s4 respectively
|
||
have equal transition, which makes it impossible to separate those states after the first
|
||
symbol. If it starts with an a, states s3 and s4 are swapped and we make no progress
|
||
in distinguishing these states from s2 . Since s2 has no UIO, the machine as a whole
|
||
does not admit a DS.
|
||
In this example, all other states actually have UIOs. For the states s0 , s1 , s3 and s4 ,
|
||
we can pick the sequences aa, a, c and ac respectively. In order to separate s2 from
|
||
the other state, we have to pick multiple sequences. For instance, the set {aa, ac, c}
|
||
will separate s2 from all other states.
|
||
|
||
FSM-based Test Methods
|
||
|
||
23
|
||
|
||
1.4 Sets of separating sequences
|
||
As the example shows, we need sets of sequences and sometimes even sets of sets of
|
||
sequences – called families.8
|
||
Definition 9. We define the following kinds of sets of sequences. We require that all
|
||
sets are prefix-closed, however, we only show the maximal sequences in examples.9
|
||
– A set of sequences W is a called a characterisation set if it contains a separating
|
||
sequence for each pair of inequivalent states in M.
|
||
– A state identifier for a state s ∈ M is a set Ws such that for every inequivalent t ∈ M
|
||
a separating sequence for s and t exists in Ws .
|
||
– A set of state identifiers {Ws }s is harmonised if Ws ∩ Wt contains a separating
|
||
sequence for inequivalent states s and t. This is also called a separating family.
|
||
A state identifier Ws will be used to test against a single state. In contrast to a characterisation set, it only include sequences which are relevant for s. The property of
|
||
being harmonised might seem a bit strange. This property ensures that the same tests
|
||
are used for different states. This extra consistency within a test suite is necessary for
|
||
some test methods. We return to this notion in more detail in Example 22.
|
||
We may obtain a characterisation set by simply considering every pair of states and
|
||
look for a difference. However, it turns out a harmonised set of state identifiers exists
|
||
for every machine and this can be constructed very efficiently (Chapter 4). From a
|
||
set of state identifiers we may obtain a characterisation set by taking the union of all
|
||
those sets.
|
||
Example 10. As mentioned before, state s2 from Figure 2.1 has a state identifier
|
||
{aa, ac, b}. In fact, this set is a characterisation set for the whole machine. Since the
|
||
other states have UIOs, we can pick singleton sets as state identifiers. For example,
|
||
state s0 has the UIO aa, so a state identifier for s0 is W0 = {aa}. Similarly, we can take
|
||
W1 = {a} and W3 = {c}. But note that such a family will not be harmonised since the
|
||
sets {a} and {c} have no common separating sequence.
|
||
One more type of state identifier is of our interest: the adaptive distinguishing sequence.
|
||
It it the strongest type of state identifier, and as a result not many machines have
|
||
one. Like DSs, adaptive distinguishing sequences can identify a state using a single
|
||
word. We give a slightly different (but equivalent) definition than the one of Lee and
|
||
Yannakakis (1994).
|
||
Definition 11. A separating family ℋ is an adaptive distinguishing sequence (ADS) if
|
||
each set max(Hs ) is a singleton.
|
||
|
||
8
|
||
9
|
||
|
||
A family is often written as {Xs }s∈M or simply {Xs }s , meaning that for each state s ∈ M we have a set Xs .
|
||
Taking these sets to be prefix-closed makes many proofs easier.
|
||
|
||
24
|
||
|
||
Chapter 2
|
||
|
||
It is called an adaptive sequence, since it has a tree structure which depends on the
|
||
output of the machine. To see this tree structure, consider the first symbols of each of
|
||
the sequences in the family. Since the family is harmonised and each set is essentially
|
||
given by a single word, there is only one first symbol. Depending on the output after
|
||
the first symbol, the sequence continues.
|
||
Example 12. In Figure 2.3 we see a machine with an ADS. The ADS is given as
|
||
follows:
|
||
H0 = {aba}
|
||
|
||
H1 = {aaba}
|
||
|
||
H2 = {aba}
|
||
|
||
H3 = {aaba}
|
||
|
||
Note that all sequences start with a. This already separates s0 , s2 from s1 , s3 . To
|
||
further separate the states, the sequences continues with either a b or another a. And
|
||
so on.
|
||
|
||
s0
|
||
|
||
a/0, b/0
|
||
|
||
s1
|
||
|
||
a
|
||
|
||
0
|
||
|
||
b/0
|
||
|
||
a
|
||
|
||
b
|
||
a/1,
|
||
b/0
|
||
|
||
s3
|
||
|
||
0
|
||
s2
|
||
|
||
0
|
||
|
||
0
|
||
|
||
a/1
|
||
a/0, b/0
|
||
|
||
s2
|
||
|
||
a
|
||
|
||
b
|
||
|
||
1
|
||
s0
|
||
0
|
||
s1
|
||
|
||
(a)
|
||
|
||
1
|
||
|
||
0
|
||
a
|
||
|
||
1
|
||
s3
|
||
|
||
(b)
|
||
|
||
Figure 2.3 (a): A Mealy machine with an ADS and (b): the tree structure
|
||
of this ADS.
|
||
Given an ADS, there exists an UIO for every state. The converse – if every state has
|
||
an UIO, then the machine admits an ADS – does not hold. The machine in Figure 2.1
|
||
admits no ADS, since s2 has no UIO.
|
||
|
||
1.5 Partial equivalence
|
||
Definition 13. We define the following notation.
|
||
– Let W be a set of sequences. Two states x, y are W-equivalent, written x ∼W y, if
|
||
λ(x, w) = λ(y, w) for all w ∈ W.
|
||
– Let 𝒲 be a family. Two states x, y are 𝒲-equivalent, written x ∼𝒲 y, if λ(x, w) =
|
||
λ(y, w) for all w ∈ Wx ∩ Wy .
|
||
|
||
FSM-based Test Methods
|
||
|
||
25
|
||
|
||
The relation ∼W is an equivalence relation and W ⊆ V implies that V separates more
|
||
states than W, i.e., x ∼V y ⟹ x ∼W y. Clearly, if two states are equivalent (i.e., s ∼ t),
|
||
then for any set W we have s ∼W t.
|
||
Lemma 14. The relations ∼W and ∼𝒲 can be used to define characterisation sets and
|
||
separating families. Concretely:
|
||
– W is a characterisation set if and only if for all s, t in M, s ∼W t implies s ∼ t.
|
||
– 𝒲 is a separating family if and only if for all s, t in M, s ∼𝒲 t implies s ∼ t.
|
||
Proof.
|
||
– W is a characterisation set by definition means s ̸∼ t ⟹ s ̸∼W t as W contains a
|
||
separating sequence (if it exists at all). This is equivalent to s ∼W t ⟹ s ∼ t.
|
||
– Let 𝒲 be a separating family and s ̸∼ t. Then there is a sequence w ∈ Ws ∩ Wt
|
||
such that λ(s, w) ≠ λ(t, w), i.e., s ̸∼𝒲 t. We have shown s ̸∼ t ⟹ s ̸∼𝒲 t, which is
|
||
equivalent to s ∼𝒲 t ⟹ s ∼ t. The converse is proven similarly.
|
||
□
|
||
|
||
1.6 Access sequences
|
||
Besides sequences which separate states, we also need sequences which brings a
|
||
machine to specified states.
|
||
Definition 15. An access sequence for s is a word w such that δ(s0 , w) = s. A set P
|
||
consisting of an access sequence for each state is called a state cover. If P is a state cover,
|
||
then the set {pa | p ∈ P, a ∈ I} is called a transition cover.
|
||
|
||
1.7 Constructions on sets of sequences
|
||
In order to define a test suite modularly, we introduce notation for combining sets of
|
||
words. For sets of words X and Y, we define
|
||
– their concatenation X ⋅ Y = {xy | x ∈ X, y ∈ Y},
|
||
– iterated concatenation X0 = {ϵ} and Xn+1 = X ⋅ Xn , and
|
||
– bounded concatenation X≤n = ⋃i≤n Xi .
|
||
On families we define
|
||
– flattening: ⋃ 𝒳 = {x | x ∈ Xs , s ∈ S},
|
||
– union: 𝒳 ∪ 𝒴 is defined point-wise: (𝒳 ∪ 𝒴)s = Xs ∪ Ys ,
|
||
– concatenation10: X ⊙ 𝒴 = {xy | x ∈ X, y ∈ Yδ(s0 ,x) }, and
|
||
– refinement: 𝒳; 𝒴 defined by11
|
||
|
||
10
|
||
11
|
||
|
||
We will often see the combination P ⋅ I ⊙ 𝒳, this should be read as (P ⋅ I) ⊙ 𝒳.
|
||
We use the convention that ∩ binds stronger than ∪. In fact, all the operators here bind stronger than ∪.
|
||
|
||
26
|
||
|
||
Chapter 2
|
||
(𝒳; 𝒴)s = Xs ∪ Ys ∩ ∪ Yt .
|
||
s∼𝒳 t
|
||
s̸∼𝒴 t
|
||
|
||
The latter construction is new and will be used to define a hybrid test generation
|
||
method in Section 3. It refines a family 𝒳, which need not be separating, by including
|
||
sequences from a second family 𝒴. It only adds those sequences to states if 𝒳 does not
|
||
distinguish those states. This is also the reason behind the ;-notation: first the tests
|
||
from 𝒳 are used to distinguish states, and then for the remaining states 𝒴 is used.
|
||
Lemma 16. For all families 𝒳 and 𝒴:
|
||
– 𝒳; 𝒳 = 𝒳,
|
||
– 𝒳; 𝒴 = 𝒳, whenever 𝒳 is a separating family, and
|
||
– 𝒳; 𝒴 is a separating family whenever 𝒴 is a separating family.
|
||
Proof. For the first item, note that there are no states t such that s ∼𝒳 t and s ̸∼𝒳 t.
|
||
Consequently, the union is empty, and the expression simplifies to
|
||
(𝒳; 𝒳)s = Xs ∪ (Xs ∩ ∅) = Xs .
|
||
|
||
If 𝒳 is a separating family, then the only t for which s ∼𝒳 t hold are t such that s ∼ t
|
||
(Lemma 14). But s ∼ t is ruled out by s ̸∼𝒴 t, and again so
|
||
(𝒳; 𝒴)s = Xs ∪ (Ys ∩ ∅) = Xs .
|
||
|
||
For the last item, suppose that s ∼𝒳;𝒴 t. Then s and t agree on every sequence in
|
||
(𝒳; 𝒴)s ∩ (𝒳; 𝒴)t . We distinguish two cases:
|
||
– Suppose s ∼𝒳 t, then Ys ∩ Yt ⊆ (𝒳; 𝒴)s ∩ (𝒳; 𝒴)t . And so s and t agree on Ys ∩ Yt ,
|
||
meaning s ∼𝒴 t. Since 𝒴 is a separating family, we have s ∼ t.
|
||
– Suppose s ̸∼𝒳 t. This contradicts s ∼𝒳;𝒴 t, since Xs ∩ Xt ⊆ (𝒳; 𝒴)s ∩ (𝒳; 𝒴)t .
|
||
We conclude that s ∼ t. This proves that 𝒳; 𝒴 is a separating family.
|
||
□
|
||
|
||
2 Test generation methods
|
||
In this section, we review the classical conformance testing methods: the W, Wp, UIO,
|
||
UIOv, HSI, ADS methods. At the end of this section, we construct the test suites for
|
||
the running example. Our hybrid ADS method uses a similar construction.
|
||
There are many more test generation methods. Literature shows, however, that
|
||
not all of them are complete. For example, the method by Bernhard (1994) is falsified
|
||
by Petrenko (1997), and the UIO-method from Sabnani and Dahbura (1988) is shown
|
||
to be incomplete by Chan, et al. (1989). For that reason, completeness of the correct
|
||
methods is shown in Theorem 26. The proof is general enough to capture all the
|
||
methods at once. We fix a state cover P throughout this section and take the transition
|
||
cover Q = P ⋅ I.
|
||
|
||
FSM-based Test Methods
|
||
|
||
27
|
||
|
||
2.1 W-method (Chow, 1978 and Vasilevskii, 1973)
|
||
After the work of Moore (1956), it was unclear whether a test suite of polynomial
|
||
size could exist. He presented a finite test suite which was complete, however it was
|
||
exponential in size. Both Chow (1978) and Vasilevskii (1973) independently prove
|
||
that test suites of polynomial size exist.12 The W-method is a very structured test suite
|
||
construction. It is called the W-method as the characterisation set is often called W.
|
||
Definition 17.
|
||
|
||
Given a characterisation set W, we define the W test suite as
|
||
TW = (P ∪ Q) ⋅ I≤k ⋅ W.
|
||
|
||
This – and all following methods – tests the machine in two phases. For simplicity, we
|
||
explain these phases when k = 0. The first phase consists of the tests P ⋅ W and tests
|
||
whether all states of the specification are (roughly) present in the implementation.
|
||
The second phase is Q ⋅ W and tests whether the successor states are correct. Together,
|
||
these two phases put enough constraints on the implementation to know that the
|
||
implementation and specification coincide (provided that the implementation has no
|
||
more states than the specification).
|
||
|
||
2.2 The Wp-method (Fujiwara, et al., 1991)
|
||
Fujiwara, et al. (1991) realised that one needs fewer tests in the second phase of the
|
||
W-method. Since we already know the right states are present after phase one, we
|
||
only need to check if the state after a transition is consistent with the expected state.
|
||
This justifies the use of state identifiers for each state.
|
||
Definition 18.
|
||
|
||
Let 𝒲 be a family of state identifiers. The Wp test suite is defined as
|
||
TWp = P ⋅ I≤k ⋅ ∪ 𝒲 ∪ Q ⋅ I≤k ⊙ 𝒲.
|
||
|
||
Note that ⋃ 𝒲 is a characterisation set as defined for the W-method. It is needed for
|
||
completeness to test states with the whole set ⋃ 𝒲. Once states are tested as such, we
|
||
can use the smaller sets Ws for testing transitions.
|
||
|
||
2.3 The HSI-method (Luo, et al., 1995 and Petrenko, et al., 1993)
|
||
The Wp-method in turn was refined by Luo, et al. (1995) and Petrenko, et al. (1993).
|
||
They make use of harmonised state identifiers, allowing to take state identifiers in the
|
||
initial phase of the test suite.
|
||
Definition 19.
|
||
12
|
||
|
||
Let ℋ be a separating family. We define the HSI test suite by
|
||
|
||
More precisely: the size of TW is polynomial in the size of the specification for each fixed k.
|
||
|
||
28
|
||
|
||
Chapter 2
|
||
THSI = (P ∪ Q) ⋅ I≤k ⊙ ℋ.
|
||
|
||
Our hybrid ADS method is an instance of the HSI-method as we define it here. However, Luo, et al. (1995) and Petrenko, et al. (1993) describe the HSI-method together
|
||
with a specific way of generating the separating families. Namely, the set obtained by
|
||
a splitting tree with shortest witnesses. The hybrid ADS method does not refine the
|
||
HSI-method defined in the more restricted sense.
|
||
|
||
2.4 The ADS-method (Lee & Yannakakis, 1994)
|
||
As discussed before, when a Mealy machine admits a adaptive distinguishing sequence, only a single test has to be performed for identifying a state. This is exploited
|
||
in the ADS-method.
|
||
Definition 20. Let 𝒵 be an adaptive distinguishing sequence. The ADS test suite is
|
||
defined as
|
||
TADS = (P ∪ Q) ⋅ I≤k ⊙ 𝒵.
|
||
|
||
2.5 The UIOv-method (Chan, et al., 1989)
|
||
Some Mealy machines which do not admit an adaptive distinguishing sequence, may
|
||
still admit state identifiers which are singletons. These are exactly UIO sequences and
|
||
gives rise to the UIOv-method. In a way this is a generalisation of the ADS-method,
|
||
since the requirement that state identifiers are harmonised is dropped.
|
||
Definition 21. Let 𝒰 = {a single UIO for s}s∈S be a family of UIO sequences, the
|
||
UIOv test suite is defined as
|
||
TUIOv = P ⋅ I≤k ⋅ ∪ 𝒰 ∪ Q ⋅ I≤k ⊙ 𝒰.
|
||
|
||
One might think that using a single UIO sequence instead of the set ⋃ 𝒰 to verify the
|
||
state is enough. In fact, this idea was used for the UIO-method which defines the test
|
||
suite (P ∪ Q) ⋅ I≤k ⊙ 𝒰. The following is a counterexample, due to Chan, et al. (1989),
|
||
to such conjecture.
|
||
Example 22. The Mealy machines in Figure 2.4 shows that UIO-method does not
|
||
define a 3-complete test suite. Take for example the UIOs u0 = aa, u1 = a, u2 = ba for
|
||
the states s0 , s1 , s2 respectively. The test suite then becomes {aaaa, abba, baaa, bba}
|
||
and the faulty implementation passes this suite. This happens because the sequence
|
||
u2 is not an UIO in the implementation, and the state s′2 simulates both UIOs u1 and
|
||
u2 . Hence we also want to check that a state does not behave as one of the other states,
|
||
and therefore we use ⋃ 𝒰. With the same UIOs as above, the resulting UIOv test suite
|
||
for the specification in Figure 2.4 is {aaaa, aba, abba, baaa, bba} of size 23. (Recall
|
||
that we also count resets when measuring the size.)
|
||
|
||
FSM-based Test Methods
|
||
|
||
29
|
||
|
||
b/1
|
||
s1
|
||
|
||
s′1
|
||
|
||
a/0
|
||
a/1
|
||
|
||
b/1
|
||
b/1
|
||
a/0
|
||
|
||
s0
|
||
|
||
a/0
|
||
a/1
|
||
|
||
s2
|
||
|
||
a/0
|
||
|
||
s′0
|
||
|
||
b/1
|
||
|
||
s′2
|
||
|
||
b/1
|
||
|
||
Specification
|
||
Figure 2.4
|
||
|
||
b/1
|
||
|
||
Implementation
|
||
|
||
An example where the UIO-method is not complete.
|
||
|
||
2.6 All test suites for Figure 2.1
|
||
Let us compute all the previous test suites on the specification in Figure 2.1. We will
|
||
be testing without extra states, i.e., we construct 5-complete test suites. We start by
|
||
defining the state and transition cover. For this, we take all shortest sequences from
|
||
the initial state to the other states. This state cover is depicted in Figure 2.5. The
|
||
transition cover is simply constructed by extending each access sequence with another
|
||
symbol.
|
||
P = {ϵ, a, aa, b, ba}
|
||
Q = P ⋅ I = {a, b, c, aa, ab, ac, aaa, aab, aac, ba, bb, bc, baa, bab, bac}
|
||
s1
|
||
|
||
b/0, c/0
|
||
|
||
b/0
|
||
|
||
s3
|
||
|
||
c/1
|
||
|
||
ϵ
|
||
a/1
|
||
|
||
a/0
|
||
|
||
a/1
|
||
|
||
s2
|
||
|
||
Figure 2.5
|
||
|
||
b/0, c/0
|
||
|
||
s0
|
||
|
||
c/0
|
||
|
||
c/0
|
||
|
||
b/0
|
||
|
||
a/1
|
||
|
||
s4
|
||
|
||
a/1
|
||
|
||
b/0
|
||
|
||
A state cover for the specification from Figure 2.1.
|
||
|
||
As shown earlier, the set W = {aa, ac, c} is a characterisation set. The W-method,
|
||
which simply combines P ∪ Q with W, gives the following test suite of size 169:
|
||
TW = {
|
||
|
||
aaaaa, aaaac, aaac, aabaa, aabac, aabc, aacaa,
|
||
aacac, aacc, abaa, abac, abc, acaa, acac, acc, baaaa,
|
||
baaac, baac, babaa, babac, babc, bacaa, bacac, bacc,
|
||
bbaa, bbac, bbc, bcaa, bcac, bcc, caa, cac, cc }
|
||
|
||
30
|
||
|
||
Chapter 2
|
||
|
||
With the Wp-method we get to choose a different state identifier per state. Since many
|
||
states have an UIO, we can use them as state identifiers. This defines the following
|
||
family 𝒲:
|
||
W0 = {aa}
|
||
|
||
W1 = {a}
|
||
|
||
W2 = {aa, ac, c}
|
||
|
||
W3 = {c}
|
||
|
||
W4 = {ac}
|
||
|
||
For the first part of the Wp test suite we need ⋃ 𝒲 = {aa, ac, c}. For the second part,
|
||
we only combine the sequences in the transition cover with the corresponding suffixes.
|
||
All in all we get a test suite of size 75:
|
||
TWp = {
|
||
|
||
aaaaa, aaaac, aaac, aabaa, aacaa, abaa,
|
||
acaa, baaac, baac, babaa, bacc, bbac, bcaa, caa
|
||
|
||
}
|
||
|
||
For the HSI-method we need a separating family ℋ. We pick the following sets:
|
||
H0 = {aa, c}
|
||
|
||
H1 = {a}
|
||
|
||
H2 = {aa, ac, c}
|
||
|
||
H3 = {a, c}
|
||
|
||
H4 = {aa, ac, c}
|
||
|
||
(We repeat that these sets are prefix-closed, but we only show the maximal sequences.)
|
||
Note that these sets are harmonised, unlike the family 𝒲. For example, the separating
|
||
sequence a is contained in both H1 and H3 . This ensures that we do not have to consider ⋃ ℋ in the first part of the test suite. When combining this with the corresponding
|
||
prefixes, we obtain the HSI test suite of size 125:
|
||
THSI = {
|
||
|
||
aaaaa, aaaac, aaac, aabaa, aabc, aacaa, aacc,
|
||
abaa, abc, acaa, acc, baaaa, baaac, baac, babaa,
|
||
babc, baca, bacc, bbaa, bbac, bbc, bcaa, bcc, caa, cc
|
||
|
||
}
|
||
|
||
On this particular example the Wp-method outperforms the HSI-method. The reason
|
||
is that many states have UIOs and we picked those to be the state identifiers. In general,
|
||
however, UIOs may not exist (and finding them is hard).
|
||
The UIO-method and ADS-method are not applicable in this example because
|
||
state s2 does not have an UIO.
|
||
s′1
|
||
|
||
a/0
|
||
|
||
s′2
|
||
|
||
b/0, c/0
|
||
|
||
a/1
|
||
|
||
a/1, b/0, c/0
|
||
|
||
s′3
|
||
|
||
b/0
|
||
|
||
s′0
|
||
c/0
|
||
|
||
c/0
|
||
|
||
a/1
|
||
|
||
b/0
|
||
|
||
Figure 2.6 A faulty implementation
|
||
for the specification in Figure 2.1.
|
||
|
||
c/1
|
||
|
||
a/1
|
||
|
||
s′4
|
||
|
||
b/0
|
||
|
||
FSM-based Test Methods
|
||
|
||
31
|
||
|
||
We can run these test suites on the faulty implementation shown in Figure 2.6. Here,
|
||
the a-transition from state s′2 transitions to the wrong target state. It is not an obvious
|
||
mistake, since the faulty target s′0 has very similar transitions as s2 . Yet, all the test
|
||
suites detect this error. When choosing the prefix aaa (included in the transition
|
||
cover), and suffix aa (included in the characterisation set and state identifiers for s2 ),
|
||
we see that the specification outputs 10111 and the implementation outputs 10110.
|
||
The sequence aaaaa is the only sequence (in any of the test suites here) which detects
|
||
this fault.
|
||
Alternatively, the a-transition from s′2 would transition to s′4 , we need the suffix ac
|
||
as aa will not detect the fault. Since the sequences ac is included in the state identifier
|
||
for s2 , this fault would also be detected. This shows that it is sometimes necessary to
|
||
include multiple sequences in the state identifier.
|
||
Another approach to testing would be to enumerate all sequences up to a certain
|
||
length. In this example, we need sequences of at least length 5. Consequently, the test
|
||
suite contains 243 sequences and this boils down to a size of 1458. Such a brute-force
|
||
approach is not scalable.
|
||
|
||
3 Hybrid ADS method
|
||
In this section, we describe a new test generation method for Mealy machines. Its
|
||
completeness will be proven in Theorem 26, together with completeness for all methods defined in the previous section. From a high level perspective, the method uses
|
||
the algorithm by Lee and Yannakakis (1994) to obtain an ADS. If no ADS exists, their
|
||
algorithm still provides some sequences which separates some inequivalent states.
|
||
Our extension is to refine the set of sequences by using pairwise separating sequences.
|
||
Hence, this method is a hybrid between the ADS-method and HSI-method.
|
||
The reason we do this is the fact that the ADS-method generally constructs small
|
||
test suites as experiments by Dorofeeva, et al. (2010) suggest. The test suites are small
|
||
since an ADS can identify a state with a single word, instead of a set of words which
|
||
is generally needed. Even if the ADS does not exist, using the partial result of Lee and
|
||
Yannakakis’ algorithm can reduce the size of test suites.
|
||
We will now see the construction of this hybrid method. Instead of manipulating
|
||
separating families directly, we use a splitting tree. This is a data structure which is
|
||
used to construct separating families or adaptive distinguishing sequences.
|
||
Definition 23. A splitting tree (for M) is a rooted tree where each node u has
|
||
– a non-empty set of states l(u) ⊆ M, and
|
||
– if u is not a leaf, a sequence σ(u) ∈ I∗ .
|
||
We require that if a node u has children C(u) then
|
||
– the sets of states of the children of u partition l(u), i.e., the set P(u) = {l(v) | v ∈ C(u)}
|
||
is a non-trivial partition of l(u), and
|
||
|
||
32
|
||
|
||
Chapter 2
|
||
|
||
– the sequence σ(u) witnesses the partition P(u), meaning that for all p, q ∈ P(u) we
|
||
have p = q iff λ(s, σ(u)) = λ(t, σ(u)) for all s ∈ p, t ∈ q.
|
||
A splitting tree is called complete if all inequivalent states belong to different leaves.
|
||
Efficient construction of a splitting tree is described in more detail in Chapter 4. Briefly,
|
||
the splitting tree records the execution of a partition refinement algorithm (such as
|
||
Moore’s or Hopcroft’s algorithm). Each non-leaf node encodes a split together with
|
||
a witness, which is a separating sequence for its children. From such a tree we can
|
||
construct a state identifier for a state by locating the leaf containing that state and
|
||
collecting all the sequences you read when traversing to the root.
|
||
For adaptive distinguishing sequences an additional requirement is put on the
|
||
splitting tree: for each non-leaf node u, the sequence σ(u) defines an injective map
|
||
x ↦ (δ(x, σ(u)), λ(x, σ(u))) on the set l(u). Lee and Yannakakis (1994) call such splits
|
||
valid. Figure 2.7 shows both valid and invalid splits. Validity precisely ensures that
|
||
after performing a split, the states are still distinguishable. Hence, sequences of such
|
||
splits can be concatenated.
|
||
s0 , s 1 , s 2 , s 3 , s 4
|
||
a
|
||
s0 , s2 , s3 , s4
|
||
c
|
||
|
||
s1
|
||
|
||
s0 , s2 , s4
|
||
aa
|
||
|
||
s3
|
||
|
||
s2 , s4
|
||
ac
|
||
|
||
s0
|
||
|
||
s2
|
||
|
||
s4
|
||
|
||
Figure 2.7 A complete splitting tree
|
||
with shortest witnesses for the specification of Figure 2.1. Only the splits a,
|
||
aa, and ac are valid.
|
||
The following lemma is a result of Lee and Yannakakis (1994).
|
||
Lemma 24. A complete splitting tree with only valid splits exists if and only if there
|
||
exists an adaptive distinguishing sequence.
|
||
Our method uses the exact same algorithm as the one by Lee and Yannakakis. However, we also apply it in the case when the splitting tree with valid splits is not complete
|
||
(and hence no adaptive distinguishing sequence exists). Their algorithm still produces
|
||
a family of sets, but is not necessarily a separating family.
|
||
|
||
FSM-based Test Methods
|
||
|
||
33
|
||
|
||
In order to recover separability, we refine that family. Let 𝒵′ be the result of Lee
|
||
and Yannakakis’ algorithm (to distinguish from their notation, we add a prime) and
|
||
let ℋ be a separating family extracted from an ordinary splitting tree. The hybrid ADS
|
||
family is defined as 𝒵′ ; ℋ, and can be computed as sketched in Algorithm 2.1 (the
|
||
algorithm works on splitting trees instead of separating families). By Lemma 16 we
|
||
note the following: in the best case this family is an adaptive distinguishing sequence;
|
||
in the worst case it is equal to ℋ; and in general it is a combination of the two families.
|
||
In all cases, the result is a separating family because ℋ is.
|
||
|
||
1
|
||
2
|
||
3
|
||
4
|
||
5
|
||
6
|
||
7
|
||
8
|
||
9
|
||
|
||
Require: A Mealy machine M
|
||
Ensure: A separating family Z
|
||
T1 ← splitting tree for Moore’s minimisation algorithm
|
||
T2 ← splitting tree with valid splits (see Lee & Yannakakis, 1994)
|
||
𝒵′ ← (incomplete) family constructed from T2
|
||
for all inequivalent states s, t in the same leaf of T2 do
|
||
u ← lca(T1 , s, t)
|
||
Zs ← Z′s ∪ {σ(u)}
|
||
Zt ← Zt′ ∪ {σ(u)}
|
||
|
||
end for
|
||
return Z
|
||
Algorithm 2.1
|
||
|
||
Obtaining the hybrid separating family 𝒵′ ; ℋ
|
||
|
||
With the hybrid family we can define the test suite as follows. Its m-completeness is
|
||
proven in Section 5.
|
||
Definition 25. Let P be a state cover, 𝒵′ be a family of sets constructed with the Lee
|
||
and Yannakakis algorithm, and ℋ be a separating family. The hybrid ADS test suite is
|
||
Th-ADS = (P ∪ Q) ⋅ I≤k ⊙ (𝒵′ ; ℋ).
|
||
|
||
3.1 Example
|
||
In the figure we see the (unique) result of Lee and Yannakakis’ algorithm. We note
|
||
that the states s2 , s3 , s4 are not split, so we need to refine the family for those states.
|
||
We take the separating family ℋ from before. From the incomplete ADS in Figure 2.8b above we obtain the family 𝒵′ . These families and the refinement 𝒵′ ; ℋ are
|
||
given below.
|
||
|
||
34
|
||
|
||
Chapter 2
|
||
s0 , s1 , s2 , s3 , s4
|
||
a
|
||
0
|
||
s0 , s 2 , s 3 , s 4
|
||
aa
|
||
|
||
s1
|
||
|
||
s1
|
||
|
||
s2 , s3 , s4
|
||
|
||
s0
|
||
|
||
a
|
||
|
||
1
|
||
0
|
||
|
||
a
|
||
|
||
s0
|
||
|
||
(a)
|
||
|
||
1
|
||
s2 , s 3 , s 4
|
||
|
||
(b)
|
||
|
||
Figure 2.8 (a): Largest splitting tree with only valid splits for Figure 2.1.
|
||
(b): Its incomplete adaptive distinguishing tree.
|
||
H0
|
||
H1
|
||
H2
|
||
H3
|
||
H4
|
||
|
||
= {aa, c}
|
||
= {a}
|
||
= {aa, ac, c}
|
||
= {a, c}
|
||
= {aa, ac, c}
|
||
|
||
Z′0
|
||
Z′1
|
||
Z′2
|
||
Z′3
|
||
Z′4
|
||
|
||
= {aa}
|
||
= {a}
|
||
= {aa}
|
||
= {aa}
|
||
= {aa}
|
||
|
||
(Z′ ; H)0
|
||
(Z′ ; H)1
|
||
(Z′ ; H)2
|
||
(Z′ ; H)3
|
||
(Z′ ; H)4
|
||
|
||
= {aa}
|
||
= {a}
|
||
= {aa, ac, c}
|
||
= {aa, c}
|
||
= {aa, ac, c}
|
||
|
||
With the separating family 𝒵′ ; ℋ we obtain the following test suite of size 96:
|
||
Th-ADS = {
|
||
|
||
aaaaa, aaaac, aaac, aabaa, aacaa, abaa, acaa,
|
||
baaaa, baaac, baac, babaa, bacaa, bacc, bbaa, bbac,
|
||
bbc, bcaa, caa }
|
||
|
||
We note that this is indeed smaller than the HSI test suite. In particular, we have a
|
||
smaller state identifier for s0 : {aa} instead of {aa, c}. As a consequence, there are less
|
||
combinations of prefixes and suffixes. We also observe that one of the state identifiers
|
||
grew in length: {aa, c} instead of {a, c} for state s3 .
|
||
|
||
3.2 Implementation
|
||
All the algorithms concerning the hybrid ADS-method have been implemented and
|
||
can be found at https://github.com/Jaxan/hybrid-ads. We note that Algorithm 2.1 is
|
||
implemented a bit more efficiently, as we can walk the splitting trees in a particular order. For constructing the splitting trees in the first place, we use Moore’s minimisation
|
||
algorithm and the algorithms by Lee and Yannakakis (1994). We keep all relevant
|
||
sets prefix-closed by maintaining a trie data structure. A trie data structure also allows
|
||
us to immediately obtain the set of maximal tests only.
|
||
|
||
3.3 Randomisation
|
||
Many constructions of the test suite generation can be randomised. There may exist
|
||
many shortest access sequences to a state and we can randomly pick any. Also in the
|
||
|
||
FSM-based Test Methods
|
||
|
||
35
|
||
|
||
construction of state identifiers many steps in the algorithm are non-deterministic: the
|
||
algorithm may ask to find any input symbol which separates a set of states. The tool
|
||
randomises many such choices. We have noticed that this can have a huge influence
|
||
on the size of the test suite. However, a decent statistical investigation still lacks at the
|
||
moment.
|
||
In many of the applications such as learning, no bound on the number of states of
|
||
the SUT is known. In such cases it is possible to randomly select test cases from an
|
||
infinite test suite. Unfortunately, we lose the theoretical guarantees of completeness
|
||
with random generation. Still, as we will see in Chapter 3, this can work really well.
|
||
We can randomly test cases as follows. In the above definition for the hybrid ADS
|
||
test suite we replace I≤k by I∗ to obtain an infinite test suite. Then we sample tests as
|
||
follows:
|
||
1. sample an element p from P uniformly,
|
||
2. sample a word w from I∗ with a geometric distribution, and
|
||
3. sample uniformly from (𝒵′ ; ℋ)s for the state s = δ(s0 , pw).
|
||
|
||
4 Overview
|
||
We give an overview of the aforementioned test methods. We classify them in two
|
||
directions,
|
||
– whether they use harmonised state identifiers or not and
|
||
– whether it used singleton state identifiers or not.
|
||
Theorem 26. Assume M to be minimal, reachable, and of size n. The following test
|
||
suites are all n + k-complete:
|
||
Arbitrary
|
||
Many / pairwise
|
||
|
||
Harmonised
|
||
|
||
Wp
|
||
|
||
HSI
|
||
|
||
P ⋅ I≤k ⋅ ⋃ 𝒲 ∪ Q ⋅ I≤k ⊙ 𝒲
|
||
|
||
(P ∪ Q) ⋅ I≤k ⊙ ℋ
|
||
|
||
Hybrid
|
||
|
||
Hybrid ADS
|
||
(P ∪ Q) ⋅ I≤k ⊙ (𝒵′ ; ℋ)
|
||
|
||
Single / global
|
||
|
||
UIOv
|
||
|
||
ADS
|
||
|
||
P ⋅ I≤k ⋅ ⋃ 𝒰 ∪ Q ⋅ I≤k ⊙ 𝒰
|
||
|
||
(P ∪ Q) ⋅ I≤k ⊙ 𝒵
|
||
|
||
Proof. See Corollary 33 and 35.
|
||
|
||
□
|
||
|
||
Each of the methods in the right column can be written simpler as P ⋅ I≤k+1 ⊙ ℋ, since
|
||
Q = P ⋅ I. This makes them very easy to implement.
|
||
It should be noted that the ADS-method is a specific instance of the HSI-method
|
||
and similarly the UIOv-method is an instance of the Wp-method. What is generally
|
||
|
||
36
|
||
|
||
Chapter 2
|
||
|
||
meant by the Wp-method and HSI-method is the above formula together with a
|
||
particular way to obtain the (harmonised) state identifiers.
|
||
We are often interested in the size of the test suite. In the worst case, all methods
|
||
generate a test suite with a size in 𝒪(pn3 ) and this bound is tight (Vasilevskii, 1973).
|
||
Nevertheless we expect intuitively that the right column performs better, as we are
|
||
using a more structured set (given a separating family for the HSI-method, we can
|
||
always forget about the common prefixes and apply the Wp-method, which will never
|
||
be smaller if constructed in this way). Also we expect the bottom row to perform
|
||
better as there is a single test for each state. Small experimental results confirm this
|
||
intuition (Dorofeeva, et al., 2010).
|
||
On the example in Figure 2.1, we computed all applicable test suites in Sections 2.6
|
||
and 3.1. The UIO and ADS methods are not applicable. For the W, Wp, HSI and
|
||
hybrid ADS methods we obtained test suites of size 169, 75, 125 and 96 respectively.
|
||
|
||
5 Proof of completeness
|
||
In this section, we will prove n-completeness of the discussed test methods. Before we
|
||
dive into the proof, we give some background on the proof-principle of bisimulation.
|
||
The original proofs of completeness often involve an inductive argument (on the
|
||
length of words) inlined with arguments about characterisation sets. This can be hard
|
||
to follow and so we prefer a proof based on bisimulations, which defers the inductive
|
||
argument to a general statement about bisimulation. Many notions of bisimulation
|
||
exist in the theory of labelled transition systems, but for Mealy machines there is just
|
||
one simple definition. We give the definition and the main proof principle, all of
|
||
which can be found in a paper by Rutten (1998).
|
||
Definition 27. Let M be a Mealy machine. A relation R ⊆ S × S is called a bisimulation
|
||
if for every (s, t) ∈ R we have
|
||
– equal outputs: λ(s, a) = λ(t, a) for all a ∈ I, and
|
||
– related successor states: (δ(s, a), δ(t, a)) ∈ R for all a ∈ I.
|
||
Lemma 28.
|
||
|
||
If two states s, t are related by a bisimulation, then s ∼ t.13
|
||
|
||
We use a slight generalisation of the bisimulation proof technique, called bisimulation
|
||
up-to. This allows one to give a smaller set R which extends to a bisimulation. A good
|
||
introduction of these up-to techniques is given by Bonchi and Pous (2015) or the thesis
|
||
of Rot (2015). In our case we use bisimulation up-to ∼-union. The following lemma
|
||
can be found in the given references.
|
||
|
||
13
|
||
|
||
The converse – which we do not need here – also holds, as ∼ is a bisimulation.
|
||
|
||
FSM-based Test Methods
|
||
|
||
37
|
||
|
||
Definition 29. Let M be a Mealy machine. A relation R ⊆ S × S is called a bisimulation
|
||
up-to ∼-union if for every (s, t) ∈ R we have
|
||
– equal outputs: λ(s, a) = λ(t, a) for all a ∈ I, and
|
||
– related successor states: (δ(s, a), δ(t, a)) ∈ R or δ(s, a) ∼ δ(t, a) for all a ∈ I.
|
||
Lemma 30.
|
||
|
||
Any bisimulation up-to ∼-union is contained in a bisimulation.
|
||
|
||
We fix a specification M which has a minimal representative with n states and an
|
||
implementation M′ with at most n + k states. We assume that all states are reachable
|
||
from the initial state in both machines (i.e., both are connected).
|
||
The next proposition gives sufficient conditions for a test suite of a certain shape
|
||
to be complete. We then prove that these conditions hold for the test suites in this
|
||
chapter.
|
||
Proposition 31. Let 𝒲 and 𝒲′ be two families of words and P a state cover for M.
|
||
Let T = P ⋅ I≤k ⊙ 𝒲 ∪ P ⋅ I≤k+1 ⊙ 𝒲′ be a test suite. If
|
||
1. for all x, y ∈ M : x ∼Wx ∩Wy y implies x ∼ y,
|
||
2. for all x, y ∈ M and z ∈ M′ : x ∼Wx z and z ∼Wy′ y implies x ∼ y, and
|
||
3. the machines M and M′ agree on T,
|
||
then M and M′ are equivalent.
|
||
Proof. First, we prove that P ⋅ I≤k reaches all states in M′ . For p, q ∈ P and x =
|
||
δ(s0 , p), y = δ(s0 , q) such that x ̸∼Wx ∩Wy y, we also have δ′ (s′0 , p) ̸∼Wx ∩Wy δ′ (s′0 , q) in
|
||
the implementation M′ . By (1) this means that there are at least n different behaviours
|
||
in M′ , hence at least n states.
|
||
Now n states are reached by the previous argument (using the set P). By assumption M′ has at most k extra states. If those extra states are reachable, they are reachable
|
||
from an already visited state in at most k steps. So we can reach all states of M′ by
|
||
using I≤k after P.
|
||
Second, we show that the reached states are bisimilar. Define the relation R =
|
||
{(δ(s0 , p), δ′ (s′0 , p)) | p ∈ P ⋅ I≤k }. Note that for each (s, i) ∈ R we have s ∼Ws i. For each
|
||
state i ∈ M′ there is a state s ∈ M such that (s, i) ∈ R, since we reach all states in both
|
||
machines by P ⋅ I≤k . We will prove that this relation is in fact a bisimulation up-to
|
||
∼-union.
|
||
For output, we note that (s, i) ∈ R implies λ(s, a) = λ′ (i, a) for all a, since the
|
||
machines agree on P ⋅ I≤k+1 . For the successors, let (s, i) ∈ R and a ∈ I and consider
|
||
the successors s2 = δ(s, a) and i2 = δ′ (i, a). We know that there is some t ∈ M with
|
||
(t, i2 ) ∈ R. We also know that we tested i2 with the set Wt . So we have:
|
||
s2 ∼Ws′ i2 ∼Wt t.
|
||
2
|
||
|
||
By the second assumption, we conclude that s2 ∼ t. So s2 ∼ t and (t, i) ∈ R, which
|
||
means that R is a bisimulation up-to ∼-union. Moreover, R contains the pair (s0 , s′0 ).
|
||
By using Lemmas 30 and 28, we conclude that the initial s0 and s′0 are equivalent. □
|
||
|
||
38
|
||
|
||
Chapter 2
|
||
|
||
Before we show that the conditions hold for the test methods, we reflect on the above
|
||
proof first. This proof is very similar to the completeness proof by Chow (1978).14 In
|
||
the first part we argue that all states are visited by using some sort of counting and
|
||
reachability argument. Then in the second part we show the actual equivalence. To the
|
||
best of the authors knowledge, this is first m-completeness proof which explicitly uses
|
||
the concept of a bisimulation. Using a bisimulation allows us to slightly generalise
|
||
and use bisimulation up-to ∼-union, dropping the the often-assumed requirement
|
||
that M is minimal.
|
||
Lemma 32. Let 𝒲′ be a family of state identifiers for M. Define the family 𝒲 by
|
||
Ws = ⋃ 𝒲′ . Then the conditions (1) and (2) in Proposition 31 are satisfied.
|
||
Proof. The first condition we note that Wx ∩ Wy = Wx = Wy , and so x ∼Wx ∩Wy y
|
||
implies x ∼Wx y, now by definition of state identifier we get x ∼ y.
|
||
For the second condition, let x ∼∪ 𝒲′ z ∼Wy′ y. Then we note that Wy′ ⊆ ⋃ W ′ and
|
||
so we get x ∼Wy′ z ∼Wy′ y. By transitivity we get x ∼Wy′ y and so by definition of state
|
||
identifier we get x ∼ y.
|
||
□
|
||
Corollary 33.
|
||
|
||
The W, Wp, and UIOv test suites are n + k-complete.
|
||
|
||
Lemma 34. Let ℋ be a separating family and take 𝒲 = 𝒲′ = ℋ. Then the conditions
|
||
(1) and (2) in Proposition 31 are satisfied.
|
||
Proof. Let x ∼Hx ∩Hy y, then by definition of separating family x ∼ y. For the second
|
||
condition, let x ∼Hx z ∼Hy y. Then we get x ∼Hx ∩Hy z ∼Hx ∩Hy y and so by transitivity
|
||
x ∼Hx ∩Hy y, hence again x ∼ y.
|
||
□
|
||
Corollary 35.
|
||
|
||
The HSI, ADS and hybrid ADS test suites are n + k-complete.
|
||
|
||
6 Related Work and Discussion
|
||
In this chapter, we have mostly considered classical test methods which are all based
|
||
on prefixes and state identifiers. There are more recent methods which almost fit in
|
||
the same framework. We mention the P (Simão & Petrenko, 2010), H (Dorofeeva,
|
||
et al., 2005), and SPY (Simão, et al., 2009) methods. The P method construct a test
|
||
suite by carefully considering sufficient conditions for a p-complete test suite (here
|
||
p ≤ n, where n is the number of states). It does not generalise to extra states, but it
|
||
seems to construct very small test suites. The H method is a refinement of the HSImethod where state identifiers for a testing transitions are reconsidered. (Note that
|
||
Proposition 31 allows for a different family when testing transitions.) Last, the SPY
|
||
14
|
||
|
||
In fact, it is also similar to Lemma 4 by Angluin (1987) which proves termination in the L* learning algorithm. This correspondence was noted by Berg, et al. (2005).
|
||
|
||
FSM-based Test Methods
|
||
|
||
39
|
||
|
||
method builds upon the HSI-method and changes the prefixes in order to minimise the
|
||
size of a test suite, exploiting overlap in test sequences. We believe that this technique
|
||
is independent of the HSI-method and can in fact be applied to all methods presented
|
||
in this chapter. As such, the SPY method should be considered as an optimisation
|
||
technique, orthogonal to the work in this chapter.
|
||
Recently, Hierons and Türker (2015) devise a novel test method which is based
|
||
on incomplete distinguishing sequences and is similar to the hybrid ADS method.
|
||
They use sequences which can be considered to be adaptive distinguishing sequences
|
||
on a subset of the state space. With several of those one can cover the whole state
|
||
space, obtaining a m-complete test suite. This is a bit dual to our approach, as our
|
||
“incomplete” adaptive distinguishing sequences define a course partition of the complete state space. Our method becomes complete by refining the tests with pairwise
|
||
separating sequences.
|
||
Some work is put into minimising the adaptive distinguishing sequences themselves. Türker and Yenigün (2014) describe greedy algorithms which construct small
|
||
adaptive distinguishing sequences. Moreover, they show that finding the minimal
|
||
adaptive distinguishing sequence is NP-complete in general, even approximation is
|
||
NP-complete. We expect that similar heuristics also exist for the other test methods
|
||
and that they will improve the performance. Note that minimal separating sequences
|
||
do not guarantee a minimal test suite. In fact, we see that the hybrid ADS method
|
||
outperforms the HSI-method on the example in Figure 2.1 since it prefers longer, but
|
||
fewer, sequences.
|
||
Some of the assumptions made at the start of this chapter have also been challenged. For non-deterministic Mealy machine, we mention the work of Petrenko and
|
||
Yevtushenko (2014). We also mention the work of van den Bos, et al. (2017) and
|
||
Simão and Petrenko (2014) for input/output transition systems with the ioco relation.
|
||
In both cases, the test suites are still defined in the same way as in this chapter: prefixes followed by state identifiers. However, for non-deterministic systems, guiding
|
||
an implementation into a state is harder as the implementation may choose its own
|
||
path. For that reason, sequences are often replaced by automata, so that the testing
|
||
can be adaptive. This adaptive testing is game-theoretic and the automaton provides
|
||
a strategy. This game theoretic point of view is further investigated by van den Bos
|
||
and Stoelinga (2018). The test suites generally are of exponential size, depending on
|
||
how non-deterministic the systems are.
|
||
The assumption that the implementation is resettable is also challenged early on.
|
||
If the machine has no reliable reset (or the reset is too expensive), one tests the system with a single checking sequence. Lee and Yannakakis (1994) give a randomised
|
||
algorithm for constructing such a checking sequence using adaptive distinguishing sequences. There is a similarity with the randomised algorithm by Rivest and Schapire
|
||
(1993) for learning non-resettable automata. Recently, Groz, et al. (2018) give a
|
||
deterministic learning algorithm for non-resettable machines based on adaptive distinguishing sequences.
|
||
|
||
40
|
||
|
||
Chapter 2
|
||
|
||
Many of the methods described here are benchmarked on small or random Mealy
|
||
machines by Dorofeeva, et al. (2010) and Endo and Simão (2013). The benchmarks
|
||
are of limited scope, the machine from Chapter 3, for instance, is neither small nor
|
||
random. For this reason, we started to collect more realistic benchmarks at http:/
|
||
/automata.cs.ru.nl/.
|
||
|
||
Chapter 3
|
||
Applying Automata Learning to Embedded
|
||
Control Software
|
||
Wouter Smeek
|
||
Océ Technologies B.V.
|
||
|
||
Joshua Moerman
|
||
Radboud University
|
||
|
||
Frits Vaandrager
|
||
Radboud University
|
||
|
||
David N. Jansen
|
||
Radboud University
|
||
Abstract
|
||
|
||
Using an adaptation of state-of-the-art algorithms for black-box automata learning, as implemented in the LearnLib tool, we succeeded to learn a model of the
|
||
Engine Status Manager (ESM), a software component that is used in printers
|
||
and copiers of Océ. The main challenge that we encountered was that LearnLib, although effective in constructing hypothesis models, was unable to find
|
||
counterexamples for some hypotheses. In fact, none of the existing FSM-based
|
||
conformance testing methods that we tried worked for this case study. We
|
||
therefore implemented an extension of the algorithm of Lee & Yannakakis for
|
||
computing an adaptive distinguishing sequence. Even when an adaptive distinguishing sequence does not exist, Lee & Yannakakis’ algorithm produces
|
||
an adaptive sequence that “almost” identifies states. In combination with a
|
||
standard algorithm for computing separating sequences for pairs of states, we
|
||
managed to verify states with on average 3 test queries. Altogether, we needed
|
||
around 60 million queries to learn a model of the ESM with 77 inputs and 3.410
|
||
states. We also constructed a model directly from the ESM software and established equivalence with the learned model. To the best of our knowledge, this is
|
||
the first paper in which active automata learning has been applied to industrial
|
||
control software.
|
||
|
||
This chapter is based on the following publication:
|
||
Smeenk, W., Moerman, J., Vaandrager, F. W., & Jansen, D. N. (2015). Applying Automata Learning to Embedded Control Software. In Formal Methods and Software
|
||
Engineering - 17th International Conference on Formal Engineering Methods, ICFEM, Proceedings. Springer. doi:10.1007/978-3-319-25423-4_5
|
||
|
||
42
|
||
|
||
Chapter 3
|
||
|
||
Once they have high-level models of the behaviour of software components, software
|
||
engineers can construct better software in less time. A key problem in practice, however, is the construction of models for existing software components, for which no or
|
||
only limited documentation is available.
|
||
The construction of models from observations of component behaviour can be
|
||
performed using regular inference – also known as automata learning (see Angluin,
|
||
1987; de la Higuera, 2010; Steffen, et al., 2011). The most efficient such techniques use
|
||
the set-up of active learning, illustrated in Figure 3.1, in which a “learner” has the task
|
||
to learn a model of a system by actively asking questions to a “teacher”.
|
||
Learner
|
||
|
||
Teacher
|
||
Membership Query
|
||
Output
|
||
|
||
SUT
|
||
|
||
Equivalence Query
|
||
|
||
yes or no with counterexample
|
||
|
||
Figure 3.1
|
||
|
||
MBT tool
|
||
|
||
Active learning of reactive systems.
|
||
|
||
The core of the teacher is a System Under Test (SUT), a reactive system to which one can
|
||
apply inputs and whose outputs one may observe. The learner interacts with the SUT
|
||
to infer a model by sending inputs and observing the resulting outputs (“membership
|
||
queries”). In order to find out whether an inferred model is correct, the learner may
|
||
pose an “equivalence query”. The teacher uses a model-based testing (MBT) tool to
|
||
try and answer such queries: Given a hypothesised model, an MBT tool generates a
|
||
long test sequence using some conformance testing method. If the SUT passes this
|
||
test, then the teacher informs the learner that the model is deemed correct. If the
|
||
outputs of the SUT and the model differ, this constitutes a counterexample, which
|
||
is returned to the learner. Based on such a counterexample, the learner may then
|
||
construct an improved hypothesis. Hence, the task of the learner is to collect data by
|
||
interacting with the teacher and to formulate hypotheses, and the task of the MBT
|
||
tool is to establish the validity of these hypotheses. It is important to note that it may
|
||
occur that an SUT passes the test for an hypothesis, even though this hypothesis is
|
||
not valid.
|
||
Triggered by various theoretical and practical results, see for instance the work by
|
||
Aarts (2014); Berg, et al. (2005); Cassel, et al. (2015); Howar, et al. (2012); Leucker
|
||
(2006); Merten, et al. (2012); Raffelt, et al. (2009), there is a fast-growing interest in
|
||
automata learning technology. In recent years, automata learning has been applied
|
||
successfully, e.g., to regression testing of telecommunication systems (Hungar, et al.,
|
||
2003), checking conformance of communication protocols to a reference implementation (Aarts, et al., 2014), finding bugs in Windows and Linux implementations of TCP
|
||
|
||
Applying Automata Learning to Embedded Control Software
|
||
|
||
43
|
||
|
||
(Fiterău-Broștean, et al., 2014), analysis of botnet command and control protocols
|
||
(Cho, et al., 2010), and integration testing (Groz, et al., 2008 and Li, et al., 2006).
|
||
In this chapter, we explore whether LearnLib by Raffelt, et al. (2009), a state-ofthe-art automata learning tool, is able to learn a model of the Engine Status Manager
|
||
(ESM), a piece of control software that is used in many printers and copiers of Océ.
|
||
Software components like the ESM can be found in many embedded systems in one
|
||
form or another. Being able to retrieve models of such components automatically is
|
||
potentially very useful. For instance, if the software is fixed or enriched with new
|
||
functionality, one may use a learned model for regression testing. Also, if the source
|
||
code of software is hard to read and poorly documented, one may use a model of the
|
||
software for model-based testing of a new implementation, or even for generating an
|
||
implementation on a new platform automatically. Using a model checker one may
|
||
also study the interaction of the software with other components for which models
|
||
are available.
|
||
The ESM software is actually well documented, and an extensive test suite exists.
|
||
The ESM, which has been implemented using Rational Rose Real-Time (RRRT), is
|
||
stable and has been in use for 10 years. Due to these characteristics, the ESM is an
|
||
excellent benchmark for assessing the performance of automata learning tools in
|
||
this area. The ESM has also been studied in other research projects: Ploeger (2005)
|
||
modelled the ESM and other related managers and verified properties based on the
|
||
official specifications of the ESM, and Graaf and van Deursen (2007) have checked
|
||
the consistency of the behavioural specifications defined in the ESM against the RRRT
|
||
definitions.
|
||
Learning a model of the ESM turned out to be more complicated than expected.
|
||
The top level UML/RRRT statechart from which the software is generated only has
|
||
16 states. However, each of these states contains nested states, and in total there are
|
||
70 states that do not have further nested states. Moreover, the C++ code contained
|
||
in the actions of the transitions also creates some complexity, and this explains why
|
||
the minimal Mealy machine that models the ESM has 3.410 states. LearnLib has been
|
||
used to learn models with tens of thousands of states by Raffelt, et al. (2009), and
|
||
therefore we expected that it would be easy to learn a model for the ESM. However,
|
||
finding counterexamples for incorrect hypotheses turned out to be challenging due
|
||
to the large number of 77 inputs. The test algorithms implemented in LearnLib, such
|
||
as random testing, the W-method by Chow (1978) and Vasilevskii (1973) and the
|
||
Wp-method by Fujiwara, et al. (1991), failed to deliver counterexamples within an
|
||
acceptable time. Automata learning techniques have been successfully applied to
|
||
case studies in which the total number of input symbols is much larger, but in these
|
||
cases it was possible to reduce the number of inputs to a small number (less than 10)
|
||
using abstraction techniques (Aarts, et al., 2015 and Howar, et al., 2011). In the case of
|
||
ESM, use of abstraction techniques only allowed us to reduce the original 156 concrete
|
||
actions to 77 abstract actions.
|
||
|
||
44
|
||
|
||
Chapter 3
|
||
|
||
We therefore implemented an extension of an algorithm of Lee and Yannakakis
|
||
(1994) for computing adaptive distinguishing sequences. Even when an adaptive
|
||
distinguishing sequence does not exist, Lee & Yannakakis’ algorithm produces an
|
||
adaptive sequence that “almost” identifies states. In combination with a standard
|
||
algorithm for computing separating sequences for pairs of states, we managed to verify
|
||
states with on average 3 test queries and to learn a model of the ESM with 77 inputs
|
||
and 3.410 states. We also constructed a model directly from the ESM software and
|
||
established equivalence with the learned model. To the best of our knowledge, this is
|
||
the first paper in which active automata learning has been applied to industrial control
|
||
software. Preliminary evidence suggests that our adaptation of Lee & Yannakakis’
|
||
algorithm outperforms existing FSM-based conformance algorithms.
|
||
During recent years most researchers working on active automata learning focused
|
||
their efforts on efficient algorithms and tools for the construction of hypothesis models.
|
||
Our work shows that if we want to further scale automata learning to industrial
|
||
applications, we also need better algorithms for finding counterexamples for incorrect
|
||
hypotheses. Following Berg, et al. (2005), our work shows that the context of automata
|
||
learning provides both new challenges and new opportunities for the application of
|
||
testing algorithms. All the models for the ESM case study together with the learning
|
||
and testing statistics are available at http://www.mbsd.cs.ru.nl/publications/papers
|
||
/fvaan/ESM/, as a benchmark for both the automata learning and testing communities.
|
||
It is now also included in the automata wiki at http://automata.cs.ru.nl/.
|
||
|
||
1 Engine Status Manager
|
||
The focus of this article is the Engine Status Manager (ESM), a software component
|
||
that is used to manage the status of the engine of Océ printers and copiers. In this
|
||
section, the overall structure and context of the ESM will be explained.
|
||
|
||
1.1 ESRA
|
||
The requirements and behaviour of the ESM are defined in a software architecture
|
||
called Embedded Software Reference Architecture (ESRA). The components defined
|
||
in this architecture are reused in many of the products developed by Océ and form an
|
||
important part of these products. This architecture is developed for cut-sheet printers
|
||
or copiers. The term cut-sheet refers to the use of separate sheets of paper as opposed
|
||
to a continuous feed of paper.
|
||
An engine refers to the printing or scanning part of a printer or copier. Other
|
||
products can be connected to an engine that pre- or postprocess the paper, for example
|
||
a cutter, folder, stacker or stapler.
|
||
|
||
Applying Automata Learning to Embedded Control Software
|
||
|
||
45
|
||
|
||
Engine Software
|
||
Controller
|
||
|
||
External Interface Adapters
|
||
Managers
|
||
|
||
OS Facilities
|
||
and Services
|
||
|
||
Functions
|
||
|
||
Figure 3.2
|
||
|
||
Global overview of the engine software.
|
||
|
||
Figure 3.2 gives an overview of the software in a printer or copier. The controller
|
||
communicates the required actions to the engine software. This includes transport
|
||
of digital images, status control, print or scan actions and error handling. The controller is responsible for queuing, processing the actions received from the network
|
||
and operators and delegating the appropriate actions to the engine software. The
|
||
managers communicate with the controller using the external interface adapters. These
|
||
adapters translate the external protocols to internal protocols. The managers manage
|
||
the different functions of the engine. They are divided by the different functionalities
|
||
such as status control, print or scan actions or error handling they implement. In
|
||
order to do this, a manager may communicate with other managers and functions. A
|
||
function is responsible for a specific set of hardware components. It translates commands from the managers to the function hardware and reports the status and other
|
||
information of the function hardware to the managers. This hardware can for example
|
||
be the printing hardware or hardware that is not part of the engine hardware such as
|
||
a stapler. Other functionalities such as logging and debugging are orthogonal to the
|
||
functions and managers.
|
||
|
||
1.2 ESM and connected components
|
||
The ESM is responsible for the transition from one status of the printer or copier to
|
||
another. It coordinates the functions to bring them in the correct status. Moreover, it
|
||
informs all its connected clients (managers or the controller) of status changes. Finally,
|
||
it handles status transitions when an error occurs.
|
||
Figure 3.3 shows the different components to which the ESM is connected. The Error Handling Manager (EHM), Action Control Manager (ACM) and other clients request
|
||
engine statuses. The ESM decides whether a request can be honored immediately, has
|
||
to be postponed or ignored. If the requested action is processed the ESM requests the
|
||
functions to go to the appropriate status. The EHM has the highest priority and its
|
||
requests are processed first. The EHM can request the engine to go into the defect
|
||
status. The ACM has the next highest priority. The ACM requests the engine to switch
|
||
between running and standby status. The other clients request transitions between the
|
||
other statuses, such as idle, sleep, standby and low power. All the other clients have
|
||
|
||
46
|
||
|
||
Chapter 3
|
||
|
||
Error
|
||
Handling
|
||
Manager
|
||
|
||
Action
|
||
Control
|
||
Manager
|
||
|
||
Other Client
|
||
|
||
Top Capsule
|
||
|
||
Engine
|
||
Status
|
||
Manager
|
||
|
||
Information
|
||
Manager
|
||
|
||
Function
|
||
|
||
Figure 3.3 Overview of the managers and clients
|
||
connected to the ESM.
|
||
the same lowest priority. The Top Capsule instantiates the ESM and communicates
|
||
with it during the initialisation of the ESM. The Information Manager provides some
|
||
parameters during the initialisation.
|
||
There are more managers connected to the ESM but they are of less importance
|
||
and are thus not mentioned here.
|
||
|
||
1.3 Rational Rose RealTime
|
||
The ESM has been implemented using Rational Rose RealTime (RRRT). In this tool
|
||
so-called capsules can be created. Each of these capsules defines a hierarchical statechart
|
||
diagram. Capsules can be connected with each other using structure diagrams. Each
|
||
capsule contains a number of ports that can be connected to ports of other capsules by
|
||
adding connections in the associated structure diagram. Each of these ports specifies
|
||
which protocol should be used. This protocol defines which messages may be sent
|
||
to and from the port. Transitions in the statechart diagram of the capsule can be
|
||
triggered by arriving messages on a port of the capsule. Messages can be sent to
|
||
these ports using the action code of the transition. The transitions between the states,
|
||
actions and guards are defined in C++ code. From the state diagram, C++ source
|
||
files are generated.
|
||
The RRRT language and semantics is based on UML (Object Management Group
|
||
(OMG), 2004) and ROOM (Selic, et al., 1994). One important concept used in RRRT
|
||
is the run-to-completion execution model (Eshuis, et al., 2002). This means that when
|
||
a received message is processed, the execution cannot be interrupted by other arriving
|
||
messages. These messages are placed in a queue to be processed later.
|
||
|
||
Applying Automata Learning to Embedded Control Software
|
||
|
||
47
|
||
|
||
1.4 The ESM state diagram
|
||
defect
|
||
|
||
goingToDefect
|
||
|
||
sleep
|
||
|
||
goingToSleep
|
||
|
||
power on
|
||
startup
|
||
|
||
awakening
|
||
|
||
power off
|
||
idle
|
||
resetting
|
||
|
||
goingToStandby
|
||
|
||
goingToLowPower
|
||
|
||
standby
|
||
|
||
medium
|
||
starting
|
||
|
||
lowPower
|
||
|
||
stopping
|
||
running
|
||
|
||
Figure 3.4
|
||
|
||
Top states and transitions of the ESM.
|
||
|
||
Figure 3.4 shows the top states of the ESM statechart. The statuses that can be requested by the clients and managers correspond to gray states. The other states are
|
||
so called transitory states. In transitory states the ESM is waiting for the functions to
|
||
report that they have moved to the corresponding status. Once all functions have
|
||
reported, the ESM moves to the corresponding status.
|
||
The idle status indicates that the engine has started up but that it is still cold
|
||
(uncontrolled temperature). The standby status indicates that the engine is warm
|
||
and ready for printing or scanning. The running status indicates that the engine is
|
||
printing or scanning. The transitions from the overarching state to the goingToSleep
|
||
and goingToDefect states indicate that it is possible to move to the sleep or defect
|
||
status from any state. In some cases it is possible to awake from sleep status, in other
|
||
cases the main power is turned off. The medium status is designed for diagnostics. In
|
||
this status the functions can each be in a different status. For example one function is
|
||
in standby status while another function is in idle status.
|
||
The statechart diagram in Figure 3.4 may seem simple, but it hides many details.
|
||
Each of the states has up to 5 nested states. In total there are 70 states that do not
|
||
have further nested states. The C++ code contained in the actions of the transitions
|
||
is in some cases non-trivial. The possibility to transition from any state to the sleep or
|
||
defect state also complicates the learning.
|
||
|
||
48
|
||
|
||
Chapter 3
|
||
|
||
2 Learning the ESM
|
||
In order to learn a model of the ESM, we connected it to LearnLib by Merten, et al.
|
||
(2011), a state-of-the-art tool for learning Mealy machines developed at the University
|
||
of Dortmund. A Mealy machine is a tuple M = (I, O, Q, q0 , δ, λ), where
|
||
– I is a finite set of input symbols,
|
||
– O is a finite set of output symbols,
|
||
– Q is a finite set of states,
|
||
– q0 ∈ Q is an initial state,
|
||
– δ : Q × I → Q is a transition function, and
|
||
– λ : Q × I → O is an output function.
|
||
The behaviour of a Mealy machine is deterministic, in the sense that the outputs are
|
||
fully determined by the inputs. Functions δ and λ are extended to accept sequences
|
||
in the standard way. We say that Mealy machines M = (I, O, Q, q0 , δ, λ) and M′ =
|
||
(I′ , O′ , Q′ , q′0 , δ′ , λ′ ) are equivalent if they generate an identical sequence of outputs for
|
||
every sequence of inputs, that is, if I = I′ and, for all w ∈ I∗ , λ(q0 , w) = λ′ (q′0 , w). If
|
||
the behaviour of an SUT is described by a Mealy machine M then the task of LearnLib
|
||
is to learn a Mealy machine M′ that is equivalent to M.
|
||
|
||
2.1 Experimental set-up
|
||
A clear interface to the ESM has been defined in RRRT. The ESM defines ports from
|
||
which it receives a predefined set of inputs and to which it can send a predefined
|
||
set of outputs. However, this interface can only be used within RRRT. In order to
|
||
communicate with the LearnLib software a TCP connection was set up. An extra
|
||
capsule was created in RRRT which connects to the ports defined by the ESM. This
|
||
capsule opened a TCP connection to LearnLib. Inputs and outputs are translated to
|
||
and from a string format and sent over the connection. Before each membership query,
|
||
the learner needs to bring the SUT back to its initial state. In other words, LearnLib
|
||
needs a way to reset the SUT.
|
||
Some inputs and outputs sent to and from the ESM carry parameters. These parameters are enumerations of statuses, or integers bounded by the number of functions
|
||
connected to the ESM. Currently, LearnLib cannot handle inputs with parameters;
|
||
therefore, we introduced a separate input action for every parameter value. Based on
|
||
domain knowledge and discussions with the Océ engineers, we could group some of
|
||
these inputs together and reduce the total number of inputs. When learning the ESM
|
||
using one function, 83 concrete inputs are grouped into four abstract inputs. When
|
||
using two functions, 126 concrete inputs can be grouped. When an abstract input
|
||
needs to be sent to the ESM, one concrete input of the represented group is randomly
|
||
selected, as in the approach of Aarts, et al. (2015). This is a valid abstraction because
|
||
all the inputs in the group have exactly the same behaviour in any state of the ESM.
|
||
|
||
Applying Automata Learning to Embedded Control Software
|
||
|
||
49
|
||
|
||
This has been verified by doing code inspection. No other abstractions were found
|
||
during the research. After the inputs are grouped a total of 77 inputs remain when
|
||
learning the ESM using 1 function, and 105 inputs remain when using 2 functions.
|
||
It was not immediately obvious how to model the ESM by a Mealy machine, since
|
||
some inputs trigger no output, whereas other inputs trigger several outputs. In order
|
||
to resolve this, we benefited from the run-to-completion execution model used in
|
||
RRRT. Whenever an input is sent, all the outputs are collected until quiescence is
|
||
detected. Next, all the outputs are concatenated and are sent to LearnLib as a single
|
||
aggregated output. In model-based testing, quiescence is usually detected by waiting
|
||
for a fixed time-out period. However, this causes the system to be mostly idle while
|
||
waiting for the time-out, which is inefficient. In order to detect quiescence faster, we
|
||
exploited the run-to-completion execution model used by RRRT: we modified the ESM
|
||
to respond to a new low-priority test input with a (single) special output. This test
|
||
input is sent after each normal input. Only after the normal input is processed and
|
||
all the generated outputs have been sent, the test input is processed and the special
|
||
output is generated; upon its reception, quiescence can be detected immediately and
|
||
reliably.
|
||
|
||
2.2 Test selection strategies
|
||
In the ESM case study the most challenging problem was finding counterexamples
|
||
for the hypotheses constructed during learning.
|
||
LearnLib implements several algorithms for conformance testing, one of which is
|
||
a random walk algorithm. The random walk algorithm works by first selecting the
|
||
length of the test query according to a geometric distribution, cut off at a fixed upper
|
||
bound. Each of the input symbols in the test query is then randomly selected from
|
||
the input alphabet I from a uniform distribution. In order to find counterexamples,
|
||
a specific sequence of input symbols is needed to arrive at the state in the SUT that
|
||
differentiates it from the hypothesis. The upper bound for the size of this search
|
||
space is |I|n where |I| is the size of the input alphabet used, and n the length of the
|
||
counterexample that needs to be found. If this sequence is long the chance of finding
|
||
it is small. Because the ESM has many different input symbols to choose from, finding
|
||
the correct one is hard. When learning the ESM with 1 function there are 77 possible
|
||
input symbols. If for example the length of the counterexample needs to be at least 6
|
||
inputs to identify a certain state, then the upper bound on the number of test queries
|
||
would be around 2 × 1011 . An average test query takes around 1 ms, so it would take
|
||
about 7 years to execute these test queries.
|
||
Augmented DS-method15. In order to reduce the number of tests, Chow (1978) and
|
||
Vasilevskii (1973) pioneered the so called W-method. In their framework a test query
|
||
15
|
||
|
||
This was later called the hybrid ADS-method.
|
||
|
||
50
|
||
|
||
Chapter 3
|
||
|
||
consists of a prefix p bringing the SUT to a specific state, a (random) middle part
|
||
m and a suffix s assuring that the SUT is in the appropriate state. This results in a
|
||
test suite of the form PI≤k W, where P is a set of (shortest) access sequences, I≤k the
|
||
set of all sequences of length at most k, and W is a characterisation set. Classically,
|
||
this characterisation set is constructed by taking the set of all (pairwise) separating
|
||
sequences. For k = 1 this test suite is complete in the sense that if the SUT passes all
|
||
tests, then either the SUT is equivalent to the specification or the SUT has strictly more
|
||
states than the specification. By increasing k we can check additional states.
|
||
We tried using the W-method as implemented by LearnLib to find counterexamples. The generated test suite, however, was still too big in our learning context.
|
||
Fujiwara, et al. (1991) observed that it is possible to let the set W depend on the state
|
||
the SUT is supposed to be. This allows us to only take a subset of W which is relevant
|
||
for a specific state. This slightly reduces the test suite without losing the power of
|
||
the full test suite. This method is known as the Wp-method. More importantly, this
|
||
observation allows for generalisations where we can carefully pick the suffixes.
|
||
In the presence of an (adaptive) distinguishing sequence one can take W to be
|
||
a single suffix, greatly reducing the test suite. Lee and Yannakakis (1994) describe
|
||
an algorithm (which we will refer to as the LY algorithm) to efficiently construct
|
||
this sequence, if it exists. In our case, unfortunately, most hypotheses did not enjoy
|
||
existence of an adaptive distinguishing sequence. In these cases the incomplete result
|
||
from the LY algorithm still contained a lot of information which we augmented by
|
||
pairwise separating sequences.
|
||
I46
|
||
|
||
O9
|
||
I6.0
|
||
|
||
O3.14
|
||
|
||
O3.3
|
||
|
||
I10
|
||
|
||
I11
|
||
|
||
I10
|
||
|
||
I10
|
||
|
||
Q
|
||
{18, 133
|
||
1287, 1295}
|
||
|
||
I19 I31.0 I37.3 I9.2
|
||
|
||
Q
|
||
|
||
{555}
|
||
|
||
{856}
|
||
|
||
I19 I31.0 I37.3 I9.2
|
||
I19
|
||
|
||
I19
|
||
I9.1 I11 I37.0 I10 I18
|
||
|
||
{7, 106, 1025,
|
||
130, 1289, 1291}
|
||
{514}
|
||
|
||
{516}
|
||
{425}
|
||
|
||
{1135}
|
||
|
||
... O28.0
|
||
|
||
{788}
|
||
|
||
{1137}
|
||
{597}
|
||
|
||
{556}
|
||
|
||
{465}
|
||
|
||
... Q
|
||
|
||
{857}
|
||
|
||
{563}
|
||
|
||
Figure 3.5 A small part of an incomplete distinguishing sequence as produced by
|
||
the LY algorithm. Leaves contain a set of possible initial states, inner nodes have input
|
||
sequences and edges correspond to different output symbols (of which we only drew
|
||
some), where Q stands for quiescence.
|
||
|
||
Applying Automata Learning to Embedded Control Software
|
||
|
||
51
|
||
|
||
As an example we show an incomplete adaptive distinguishing sequence for one of
|
||
the hypothesis in Figure 3.5. When we apply the input sequence I46 I6.0 I10 I19 I31.0
|
||
I37.3 I9.2 and observe outputs O9 O3.3 Q … O28.0, we know for sure that the SUT was
|
||
in state 788. Unfortunately, not all paths lead to a singleton set. When for instance
|
||
we apply the sequence I46 I6.0 I10 and observe the outputs O9 O3.14 Q, we know for
|
||
sure that the SUT was in one of the states 18, 133, 1287 or 1295. In these cases we have
|
||
to perform more experiments and we resort to pairwise separating sequences.
|
||
We note that this augmented DS-method is in the worst case not any better than
|
||
the classical Wp-method. In our case, however, it greatly reduced the test suites.
|
||
Once we have our set of suffixes, which we call Z now, our test algorithm works
|
||
as follows. The algorithm first exhausts the set PI≤1 Z. If this does not provide a
|
||
counterexample, we will randomly pick test queries from PI2 I∗ Z, where the algorithm samples uniformly from P, I2 and Z (if Z contains more that 1 sequence for the
|
||
supposed state) and with a geometric distribution on I∗ .
|
||
Sub-alphabet selection. Using the above method the algorithm still failed to learn
|
||
the ESM. By looking at the RRRT-based model we were able to see why the algorithm
|
||
failed to learn. In the initialisation phase, the controller gives exceptional behaviour
|
||
when providing a certain input eight times consecutively. Of course such a sequence
|
||
is hard to find in the above testing method. With this knowledge we could construct
|
||
a single counterexample by hand by which means the algorithm was able to learn the
|
||
ESM.
|
||
In order to automate this process, we defined a sub-alphabet of actions that are
|
||
important during the initialisation phase of the controller. This sub-alphabet will be
|
||
used a bit more often than the full alphabet. We do this as follows. We start testing with
|
||
the alphabet which provided a counterexample for the previous hypothesis (for the
|
||
first hypothesis we take the sub-alphabet). If no counterexample can be found within
|
||
a specified query bound, then we repeat with the next alphabet. If both alphabets
|
||
do not produce a counterexample within the bound, the bound is increased by some
|
||
factor and we repeat all. This method only marginally increases the number of tests.
|
||
But it did find the right counterexample we first had to construct by hand.
|
||
|
||
2.3 Results
|
||
Using the learning set-up discussed in Section 2.1 and the test selection strategies
|
||
discussed in Section 2.2, a model of the ESM using 1 function could be learned. After
|
||
an additional eight hours of testing no counterexample was found and the experiment
|
||
was stopped. The following list gives the most important statistics gathered during
|
||
the learning:
|
||
– The learned model has 3.410 states.
|
||
– Altogether, 114 hypotheses were generated.
|
||
– The time needed for learning the final hypothesis was 8 h, 26 min, and 19 s.
|
||
|
||
52
|
||
|
||
Chapter 3
|
||
|
||
– 29.933.643 membership queries were posed (on average 35,77 inputs per query).
|
||
– 30.629.711 test queries were required (on average 29,06 inputs per query).
|
||
|
||
3 Verification
|
||
To verify the correctness of the model that was learned using LearnLib, we checked
|
||
its equivalence with a model that was generated directly from the code.
|
||
|
||
3.1 Approach
|
||
As mentioned already, the ESM has been implemented using Rational Rose RealTime
|
||
(RRRT). Thus a statechart representation of the ESM is available. However, we have
|
||
not been able to find a tool that translates RRRT models to Mealy machines, allowing us
|
||
to compare the RRRT-based model of the ESM with the learned model. We considered
|
||
several formalisms and tools that were proposed in the literature to flatten statecharts
|
||
to state machines. The first one was a tool for hierarchical timed automata (HTA) by
|
||
David, et al. (2002). However, we found it hard to translate the output of this tool, a
|
||
network of Uppaal timed automata, to a Mealy machine that could be compared to the
|
||
learned model. The second tool that we considered has been developed by Hansen, et
|
||
al. (2010). This tool misses some essential features, for example the ability to assign
|
||
new values to state variables on transitions. Finally, we considered a formalism called
|
||
object-oriented action systems (OOAS) by Krenn, et al. (2009), but no tools to use
|
||
this could be found.
|
||
In the end we decided to implement the required model transformations ourselves.
|
||
Figure 3.6 displays the different formats for representing models that we used and
|
||
the transformations between those formats.
|
||
RRRT
|
||
UML
|
||
statechart
|
||
|
||
PapyrusUML
|
||
|
||
Uppaal
|
||
RFSM
|
||
.xml
|
||
|
||
HEFSM
|
||
|
||
LearnLib
|
||
Mealy
|
||
Machine
|
||
.dot
|
||
|
||
CADP
|
||
LTS
|
||
|
||
Figure 3.6 Formats for representing models and transformations between formats.
|
||
We used the bisimulation checker of CADP by Garavel, et al. (2011) to check the
|
||
equivalence of labelled transition system models in .aut format. The Mealy machine
|
||
models learned by LearnLib are represented as .dot files. A small script converts
|
||
|
||
Applying Automata Learning to Embedded Control Software
|
||
|
||
53
|
||
|
||
these Mealy machines to labelled transition systems in .aut format. We used the
|
||
Uppaal tool by Behrmann, et al. (2006) as an editor for defining extended finite state
|
||
machines (EFSM), represented as .xml files. A script developed in the ITALIA project
|
||
(http://www.italia.cs.ru.nl/) converts these EFSM models to LOTOS, and then
|
||
CADP takes care of the conversion from LOTOS to the .aut format.
|
||
The Uppaal syntax is not sufficiently expressive to directly encode the RRRT definition of the ESM, since this definition makes heavy use of UML (Object Management
|
||
Group (OMG), 2004) concepts such as state hierarchy and transitions from composite
|
||
states, concepts which are not present in Uppaal. Using Uppaal would force us to
|
||
duplicate many transitions and states.
|
||
We decided to manually create an intermediate hierarchical EFSM (HEFSM) model
|
||
using the UML drawing tool PapyrusUML (Lanusse, et al., 2009). The HEFSM model
|
||
closely resembles the RRRT UML model, but many elements used in UML state machines are left out because they are not needed for modelling the ESM and complicate
|
||
the transformation process.
|
||
|
||
3.2 Model transformations
|
||
We explain the transformation from the HEFSM model to the EFSM model using
|
||
examples. The transformation is divided into five steps, which are executed in order:
|
||
1. combine transitions without input or output signal,
|
||
2. transform supertransitions,
|
||
3. transform internal transitions,
|
||
4. add input signals that do not generate an output, and
|
||
5. replace invocations of the next function.
|
||
1. Empty transitions. In order to make the model more readable and to make it
|
||
easy to model if and switch statements in the C++ code the HEFSM model allows
|
||
for transitions without a signal. These transitions are called empty transitions. An
|
||
empty transition can still contain a guard and an assignment. However these kinds of
|
||
transitions are only allowed on states that only contain empty outgoing transitions.
|
||
This was done to make the transformation easy and the model easy to read.
|
||
In order to transform a state with empty transitions all the incoming and outgoing
|
||
transitions are collected. For each combination of incoming transition a and outgoing
|
||
transition b a new transition c is created with the source of a as source and the target of
|
||
b as target. The guard for transition c evaluates to true if and only if the guard of a and
|
||
b both evaluate to true. The assignment of c is the concatenation of the assignment of
|
||
a and b. The signal of c will be the signal of a because b cannot have a signal. Once
|
||
all the new transitions are created all the states with empty transitions are removed
|
||
together with all their incoming and outgoing transitions.
|
||
Figure 3.7 shows an example model with empty transitions and its transformed
|
||
version. Each of the incoming transitions from the state B is combined with each of
|
||
|
||
54
|
||
|
||
Chapter 3
|
||
|
||
the outgoing transitions. This results into two new transitions. The old transitions
|
||
and state B are removed.
|
||
A
|
||
|
||
A
|
||
|
||
OP()
|
||
["a==0"]
|
||
|
||
B
|
||
|
||
["a==1"]
|
||
|
||
["a==1"]
|
||
|
||
["a==0"]
|
||
C
|
||
|
||
D
|
||
|
||
C
|
||
|
||
D
|
||
|
||
Figure 3.7 Example of empty transition transformation. On
|
||
the left the original version. On the right the transformed version.
|
||
2. Supertransitions. The RRRT model of the ESM contains many transitions originating from a composite state. Informally, these supertransitions can be taken in each
|
||
of the substates of the composite state if the guard evaluates to true. In order to model
|
||
the ESM as closely as possible, supertransitions are also supported in the HEFSM
|
||
model.
|
||
In RRRT transitions are evaluated from bottom to top. This means that first the
|
||
transitions from the leaf state are considered, then transitions from its parent state
|
||
and then from its parent’s parent state, etc. Once a transition for which the guard
|
||
evaluates to true and the correct signal has been found it is taken. When flattening the
|
||
statechart, we modified the guards of supertransitions to ensure the correct priorities.
|
||
A
|
||
|
||
IP()
|
||
|
||
B
|
||
|
||
B
|
||
|
||
IP() ["a==1"]
|
||
C
|
||
|
||
A
|
||
IP() ["a!=1"]
|
||
|
||
IP() ["a==1"]
|
||
C
|
||
|
||
IP()
|
||
|
||
Figure 3.8 Example of supertransition transformation. On the
|
||
left the original version. On the right the transformed version.
|
||
Figure 3.8 shows an example model with supertransitions and its transformed version.
|
||
The supertransition from state A can be taken at each of A’s leaf states B and C. The
|
||
transformation removes the original supertransition and creates a new transition at
|
||
states B and C using the same target state. For leaf state C this is easy because it does not
|
||
contain a transition with the input signal IP. In state B the transition to state C would be
|
||
taken if a signal IP was processed and the state variable a equals 1. The supertransition
|
||
can only be taken if the other transition cannot be taken. This is why the negation of
|
||
other the guard is added to the new transition. If the original supertransition is an
|
||
|
||
Applying Automata Learning to Embedded Control Software
|
||
|
||
55
|
||
|
||
internal transition the model needs further transformation after this transformation.
|
||
This is described in the next paragraph. If the original supertransition is not an internal
|
||
transition the new transitions will have the initial state of A as target.
|
||
3. Internal transitions. The ESM model also makes use of internal transitions in
|
||
RRRT. Using such a transition the current state does not change. If such a transition
|
||
is defined on a composite state it can be taken from all of the substates and return to
|
||
the same leaf state it originated from. If defined on a composite state it is thus also
|
||
a supertransition. This is also possible in the HEFSM model. In order to transform
|
||
an internal transition it is first seen as a supertransition and the above transformation
|
||
is applied. Then the target of the transition is simply set to the leaf state it originates
|
||
from. An example can be seen in Figure 3.8. If the supertransition from state A is also
|
||
defined to be an internal transition the transformed version on the right would need
|
||
another transformation. The new transitions that now have the target state A would
|
||
be transformed to have the same target state as their current source state.
|
||
4. Quiescent transitions. In order to reduce the number of transitions in the HEFSM
|
||
model quiescent transitions are added automatically. For every state all the transitions
|
||
for each signal are collected in a set T. A new self transition a is added for each signal.
|
||
The guard for transition a evaluates to true if and only if none of the guards of the
|
||
transactions in T evaluates to true. This makes the HEFSM input enabled without
|
||
having to specify all the transitions.
|
||
5. The next function. In RRRT it is possible to write the guard and assignment in
|
||
C++ code. It is thus possible that the value of a variable changes while an input signal
|
||
is processed. In the HEFSM however all the assignments only take effect after the
|
||
input signal is processed. In order to simulate this behaviour the next function is used.
|
||
This function takes a variable name and evaluates to the value of this variable after
|
||
the transition.
|
||
|
||
3.3 Results
|
||
Figure 3.9 shows a visualisation of the learned model that was generated using Gephi
|
||
(Bastian, et al., 2009). States are coloured according to the strongly connected components. The number of transitions between two states is represented by the thickness of
|
||
the edge. The large number of states (3.410) and transitions (262.570) makes it hard
|
||
to visualise this model. Nevertheless, the visualisation does provide insight in the
|
||
behaviour of the ESM. The three protrusions at the bottom of Figure 3.9 correspond to
|
||
deadlocks in the model. These deadlocks are “error” states that are present in the ESM
|
||
by design. According to the Océ engineers, the sequences of inputs that are needed to
|
||
drive the ESM into these deadlock states will always be followed by a system power
|
||
reset. The protrusion at the top right of the figure corresponds to the initialisation
|
||
phase of the ESM. This phase is performed only once and thus only transitions from
|
||
the initialisation cluster to the main body of states are present.
|
||
|
||
56
|
||
|
||
Chapter 3
|
||
|
||
Figure 3.9
|
||
|
||
Final model of the ESM.
|
||
|
||
During the construction of the RRRT-based model, the ESM code was thoroughly
|
||
inspected. This resulted in the discovery of missing behaviour in one transition of the
|
||
ESM code. An Océ software engineer confirmed that this behaviour is a (minor) bug,
|
||
which will be fixed. We have verified the equivalence of the learned model and the
|
||
RRRT-based model by using CADP (Garavel, et al., 2011).
|
||
|
||
4 Conclusions and Future Work
|
||
Using an extension of algorithm by Lee and Yannakakis (1994) for adaptive distinguishing sequences, we succeeded to learn a Mealy machine model of a piece of
|
||
widely used industrial control software. Our extension of Lee & Yannakakis’ algorithm is rather obvious, but nevertheless appears to be new. Preliminary evidence
|
||
suggests that it outperforms existing conformance testing algorithms. We are currently performing experiments in which we compare the new algorithm with other
|
||
test algorithms on a number of realistic benchmarks.
|
||
There are several possibilities for extending the ESM case study. To begin with,
|
||
one could try to learn a model of the ESM with more than one function. Another
|
||
interesting possibility would be to learn models of the EHM, ACM, and other managers
|
||
connected to the ESM. Using these models some of the properties discussed by Ploeger
|
||
|
||
Applying Automata Learning to Embedded Control Software
|
||
|
||
57
|
||
|
||
(2005) could be verified at a more detailed level. We expect that the combination
|
||
of LearnLib with the extended Lee & Yannakakis algorithm can be applied to learn
|
||
models of many other software components.
|
||
In the specific case study described in this article, we know that our learning
|
||
algorithm has succeeded to learn the correct model, since we established equivalence
|
||
with a reference model that was constructed independently from the RRRT model of
|
||
the ESM software. In the absence of a reference model, we can never guarantee that
|
||
the actual system behaviour conforms to a learned model. In order to deal with this
|
||
problem, it is important to define metrics that quantify the difference (or distance)
|
||
between a hypothesis and a correct model of the SUT and to develop test generation
|
||
algorithms that guarantee an upper bound on this difference. Preliminary work in
|
||
this area is reported by Smetsers, et al. (2014).
|
||
|
||
Acknowledgements
|
||
We thank Lou Somers for suggesting the ESM case study and for his support of our
|
||
research. Fides Aarts and Harco Kuppens helped us with the use of LearnLib and
|
||
CADP, and Jan Tretmans gave useful feedback.
|
||
|
||
58
|
||
|
||
Chapter 3
|
||
|
||
Chapter 4
|
||
Minimal Separating Sequences
|
||
for All Pairs of States
|
||
Rick Smetsers
|
||
Radboud University
|
||
|
||
Joshua Moerman
|
||
Radboud University
|
||
David N. Jansen
|
||
Radboud University
|
||
|
||
Abstract
|
||
Finding minimal separating sequences for all pairs of inequivalent states
|
||
in a finite state machine is a classic problem in automata theory. Sets of
|
||
minimal separating sequences, for instance, play a central role in many
|
||
conformance testing methods. Moore has already outlined a partition
|
||
refinement algorithm that constructs such a set of sequences in 𝒪(mn)
|
||
time, where m is the number of transitions and n is the number of
|
||
states. In this chapter, we present an improved algorithm based on the
|
||
minimisation algorithm of Hopcroft that runs in 𝒪(m log n) time. The
|
||
efficiency of our algorithm is empirically verified and compared to the
|
||
traditional algorithm.
|
||
|
||
This chapter is based on the following publication:
|
||
Smetsers, R., Moerman, J., & Jansen, D. N. (2016). Minimal Separating Sequences for
|
||
All Pairs of States. In Language and Automata Theory and Applications - 10th International
|
||
Conference, LATA, Proceedings. Springer. doi:10.1007/978-3-319-30000-9_14
|
||
|
||
60
|
||
|
||
Chapter 4
|
||
|
||
In diverse areas of computer science and engineering, systems can be modelled by finite
|
||
state machines (FSMs). One of the cornerstones of automata theory is minimisation of
|
||
such machines – and many variation thereof. In this process one obtains an equivalent
|
||
minimal FSM, where states are different if and only if they have different behaviour.
|
||
The first to develop an algorithm for minimisation was Moore (1956). His algorithm
|
||
has a time complexity of 𝒪(mn), where m is the number of transitions, and n is
|
||
the number of states of the FSM. Later, Hopcroft (1971) improved this bound to
|
||
𝒪(m log n).
|
||
Minimisation algorithms can be used as a framework for deriving a set of separating sequences that show why states are inequivalent. The separating sequences in
|
||
Moore’s framework are of minimal length (Gill, 1962). Obtaining minimal separating
|
||
sequences in Hopcroft’s framework, however, is a non-trivial task. In this chapter, we
|
||
present an algorithm for finding such minimal separating sequences for all pairs of
|
||
inequivalent states of a FSM in 𝒪(m log n) time.
|
||
Coincidentally, Bonchi and Pous (2013) recently introduced a new algorithm for
|
||
the equally fundamental problem of proving equivalence of states in non-deterministic
|
||
automata. As both their and our work demonstrate, even classical problems in automata theory can still offer surprising research opportunities. Moreover, new ideas
|
||
for well-studied problems may lead to algorithmic improvements that are of practical
|
||
importance in a variety of applications.
|
||
One such application for our work is in conformance testing. Here, the goal is
|
||
to test if a black box implementation of a system is functioning as described by a
|
||
given FSM. It consists of applying sequences of inputs to the implementation, and
|
||
comparing the output of the system to the output prescribed by the FSM. Minimal
|
||
separating sequences are used in many test generation methods (Dorofeeva, et al.,
|
||
2010). Therefore, our algorithm can be used to improve these methods.
|
||
|
||
1 Preliminaries
|
||
We define a FSM as a Mealy machine M = (I, O, S, δ, λ), where I, O and S are finite
|
||
sets of inputs, outputs and states respectively, δ : S × I → S is a transition function and
|
||
λ : S × I → O is an output function. The functions δ and λ are naturally extended to
|
||
δ : S × I∗ → S and λ : S × I∗ → O∗ . Moreover, given a set of states S′ ⊆ S and a sequence
|
||
x ∈ I∗ , we define δ(S′ , x) = {δ(s, x) | s ∈ S′ } and λ(S′ , x) = {λ(s, x) | s ∈ S′ }. The inverse
|
||
transition function δ−1 : S × I → 𝒫(S) is defined as δ−1 (s, a) = {t ∈ S | δ(t, a) = s}.
|
||
Observe that Mealy machines are deterministic and input-enabled (i.e., complete)
|
||
by definition. The initial state is not specified because it is of no importance in what
|
||
follows. For the remainder of this chapter we fix a machine M = (I, O, S, δ, λ). We
|
||
use n to denote its number of states, that is n = |S|, and m to denote its number of
|
||
transitions, that is m = |S| × |I|.
|
||
Definition 1.
|
||
|
||
States s and t are equivalent if λ(s, x) = λ(t, x) for all x in I∗ .
|
||
|
||
Minimal Separating Sequences for All Pairs of States
|
||
|
||
61
|
||
|
||
We are interested in the case where s and t are not equivalent, i.e., inequivalent. If
|
||
all pairs of distinct states of a machine M are inequivalent, then M is minimal. An
|
||
example of a minimal FSM is given in Figure 4.1.
|
||
Definition 2. A separating sequence for states s and t in s is a sequence x ∈ I∗ such
|
||
that λ(s, x) ≠ λ(t, x). We say x is minimal if |y| ≥ |x| for all separating sequences y for s
|
||
and t.
|
||
A separating sequence always exists if two states are inequivalent, and there might
|
||
be multiple minimal separating sequences. Our goal is to obtain minimal separating
|
||
sequences for all pairs of inequivalent states of M.
|
||
|
||
1.1 Partition Refinement
|
||
In this section we will discuss the basics of minimisation. Both Moore’s algorithm
|
||
and Hopcroft’s algorithm work by means of partition refinement. A similar treatment
|
||
(for DFAs) is given by Gries (1973).
|
||
A partition P of S is a set of pairwise disjoint non-empty subsets of S whose union
|
||
is exactly S. Elements in P are called blocks. If P and P′ are partitions of S, then P′ is a
|
||
refinement of P if every block of P′ is contained in a block of P. A partition refinement
|
||
algorithm constructs the finest partition under some constraint. In our context the
|
||
constraint is that equivalent states belong to the same block.
|
||
Definition 3.
|
||
|
||
A partition is valid if equivalent states are in the same block.
|
||
|
||
Partition refinement algorithms for FSMs start with the trivial partition P = {S}, and
|
||
iteratively refine P until it is the finest valid partition (where all states in a block are
|
||
equivalent). The blocks of such a complete partition form the states of the minimised
|
||
FSM, whose transition and output functions are well-defined because states in the
|
||
same block are equivalent.
|
||
Let B be a block and a be an input. There are two possible reasons to split B (and
|
||
hence refine the partition). First, we can split B with respect to output after a if the set
|
||
λ(B, a) contains more than one output. Second, we can split B with respect to the state
|
||
after a if there is no single block B′ containing the set δ(B, a). In both cases it is obvious
|
||
what the new blocks are: in the first case each output in λ(B, a) defines a new block,
|
||
in the second case each block containing a state in δ(B, a) defines a new block. Both
|
||
types of refinement preserve validity.
|
||
Partition refinement algorithms for FSMs first perform splits w.r.t. output, until
|
||
there are no such splits to be performed. This is precisely the case when the partition
|
||
is acceptable.
|
||
Definition 4. A partition is acceptable if for all pairs s, t of states contained in the
|
||
same block and for all inputs a in I, λ(s, a) = λ(t, a).
|
||
|
||
62
|
||
|
||
Chapter 4
|
||
|
||
Any refinement of an acceptable partition is again acceptable. The algorithm continues
|
||
performing splits w.r.t. state, until no such splits can be performed. This is exactly the
|
||
case when the partition is stable.
|
||
Definition 5. A partition is stable if it is acceptable and for any input a in I and states
|
||
s and t that are in the same block, states δ(s, a) and δ(t, a) are also in the same block.
|
||
Since an FSM has only finitely many states, partition refinement will terminate. The
|
||
output is the finest valid partition which is acceptable and stable. For a more formal
|
||
treatment on partition refinement we refer to Gries (1973).
|
||
|
||
1.2 Splitting Trees and Refinable Partitions
|
||
Both types of splits described above can be used to construct a separating sequence
|
||
for the states that are split. In a split w.r.t. the output after a, this sequence is simply
|
||
a. In a split w.r.t. the state after a, the sequence starts with an a and continues with
|
||
the separating sequence for states in δ(B, a). In order to systematically keep track of
|
||
this information, we maintain a splitting tree. The splitting tree was introduced by Lee
|
||
and Yannakakis (1994) as a data structure for maintaining the operational history of
|
||
a partition refinement algorithm.
|
||
Definition 6. A splitting tree for M is a rooted tree T with a finite set of nodes with
|
||
the following properties:
|
||
– Each node u in T is labelled by a subset of S, denoted l(u).
|
||
– The root is labelled by S.
|
||
– For each inner node u, l(u) is partitioned by the labels of its children.
|
||
– Each inner node u is associated with a sequence σ(u) that separates states contained
|
||
in different children of u.
|
||
We use C(u) to denote the set of children of a node u. The lowest common ancestor (lca)
|
||
for a set S′ ⊆ S is the node u such that S′ ⊆ l(u) and S′ ̸⊆ l(v) for all v ∈ C(u) and
|
||
is denoted by lca(S′ ). For a pair of states s and t we use the shorthand lca(s, t) for
|
||
lca({s, t}).
|
||
The labels l(u) can be stored as a refinable partition data structure (Valmari &
|
||
Lehtinen, 2008). This is an array containing a permutation of the states, ordered
|
||
so that states in the same block are adjacent. The label l(u) of a node then can be
|
||
indicated by a slice of this array. If node u is split, some states in the slice l(u) may be
|
||
moved to create the labels of its children, but this will not change the set l(u).
|
||
A splitting tree T can be used to record the history of a partition refinement algorithm because at any time the leaves of T define a partition on S, denoted P(T ). We say
|
||
a splitting tree T is valid (resp. acceptable, stable, complete) if P(T ) is as such. A leaf
|
||
can be expanded in one of two ways, corresponding to the two ways a block can be
|
||
split. Given a leaf u and its block B = l(u) we define the following two splits:
|
||
|
||
Minimal Separating Sequences for All Pairs of States
|
||
|
||
63
|
||
|
||
(split-output) Suppose there is an input a such that B can be split w.r.t output after
|
||
a. Then we set σ(u) = a, and we create a node for each subset of B that produces the
|
||
same output x on a. These nodes are set to be children of u.
|
||
(split-state) Suppose there is an input a such that B can be split w.r.t. the state after a.
|
||
Then instead of splitting B as described before, we proceed as follows. First, we locate
|
||
the node v = lca(δ(B, a)). Since v cannot be a leaf, it has at least two children whose
|
||
labels contain elements of δ(B, a). We can use this information to expand the tree as
|
||
follows. For each node w in C(v) we create a child of u labelled {s ∈ B | δ(s, a) ∈ l(w)}
|
||
if the label contains at least one state. Finally, we set σ(u) = aσ(v).
|
||
A straight-forward adaptation of partition refinement for constructing a stable
|
||
splitting tree for M is shown in Algorithm 4.1. The termination and the correctness of
|
||
the algorithm outlined in Section 1.1 are preserved. It follows directly that states are
|
||
equivalent if and only if they are in the same label of a leaf node.
|
||
|
||
1
|
||
2
|
||
3
|
||
4
|
||
5
|
||
6
|
||
7
|
||
8
|
||
9
|
||
|
||
Require: An FSM M
|
||
Ensure: A valid and stable splitting tree T
|
||
initialise T to be a tree with a single node labelled S
|
||
repeat
|
||
find a ∈ I, B ∈ P(T ) such that we can split B w.r.t. output λ(⋅, a)
|
||
expand the u ∈ T with l(u) = B as described in (split-output)
|
||
until P(T ) is acceptable
|
||
repeat
|
||
find a ∈ I, B ∈ P(T ) such that we can split B w.r.t. state δ(⋅, a)
|
||
expand the u ∈ T with l(u) = B as described in (split-state)
|
||
until P(T ) is stable
|
||
Algorithm 4.1
|
||
|
||
Constructing a stable splitting tree.
|
||
|
||
Example 7. Figure 4.1 shows an FSM and a complete splitting tree for it. This tree is
|
||
constructed by Algorithm 4.1 as follows. First, the root node is labelled by {s0 , …, s5 }.
|
||
The even and uneven states produce different outputs after a, hence the root node
|
||
is split. Then we note that s4 produces a different output after b than s0 and s2 , so
|
||
{s0 , s2 , s4 } is split as well. At this point T is acceptable: no more leaves can be split
|
||
w.r.t. output. Now, the states δ({s1 , s3 , s5 }, a) are contained in different leaves of T.
|
||
Therefore, {s1 , s3 , s5 } is split into {s1 , s5 } and {s3 } and associated with sequence ab. At
|
||
this point, δ({s0 , s2 }, a) contains states that are in both children of {s1 , s3 , s5 }, so {s0 , s2 }
|
||
is split and the associated sequence is aab. We continue until T is complete.
|
||
|
||
64
|
||
|
||
Chapter 4
|
||
|
||
s1
|
||
|
||
a/1
|
||
|
||
s2
|
||
|
||
b/0
|
||
|
||
a/0
|
||
b/0
|
||
|
||
a/0
|
||
b/0
|
||
|
||
s3
|
||
|
||
s0
|
||
a/1
|
||
b/0
|
||
s5
|
||
|
||
a/0
|
||
b/1
|
||
|
||
a/1
|
||
b/0
|
||
s4
|
||
|
||
(a)
|
||
Figure 4.1
|
||
|
||
(b)
|
||
An FSM (a) and a complete splitting tree for it (b).
|
||
|
||
2 Minimal Separating Sequences
|
||
In Section 1.2 we have described an algorithm for constructing a complete splitting
|
||
tree. This algorithm is non-deterministic, as there is no prescribed order on the splits.
|
||
In this section we order them to obtain minimal separating sequences.
|
||
Let u be a non-root inner node in a splitting tree, then the sequence σ(u) can also be
|
||
used to split the parent of u. This allows us to construct splitting trees where children
|
||
will never have shorter sequences than their parents, as we can always split with those
|
||
sequences first. Trees obtained in this way are guaranteed to be layered, which means
|
||
that for all nodes u and all u′ ∈ C(u), |σ(u)| ≤ |σ(u′ )|. Each layer consists of nodes for
|
||
which the associated separating sequences have the same length.
|
||
Our approach for constructing minimal sequences is to ensure that each layer is
|
||
as large as possible before continuing to the next one. This idea is expressed formally
|
||
by the following definitions.
|
||
Definition 8. A splitting tree T is k-stable if for all states s and t in the same leaf we
|
||
have λ(s, x) = λ(t, x) for all x ∈ I≤k .
|
||
Definition 9. A splitting tree T is minimal if for all states s and t in different leaves
|
||
λ(s, x) ≠ λ(t, x) implies |x| ≥ |σ(lca(s, t))| for all x ∈ I∗ .
|
||
Minimality of a splitting tree can be used to obtain minimal separating sequences for
|
||
pairs of states. If the tree is in addition stable, we obtain minimal separating sequences
|
||
for all inequivalent pairs of states. Note that if a minimal splitting tree is (n − 1)-stable
|
||
(n is the number of states of M), then it is stable (Definition 5). This follows from the
|
||
well-known fact that n − 1 is an upper bound for the length of a minimal separating
|
||
sequence (Moore, 1956).
|
||
Algorithm 4.2 ensures a stable and minimal splitting tree. The first repeat-loop
|
||
is the same as before (in Algorithm 4.1). Clearly, we obtain a 1-stable and minimal
|
||
splitting tree here. It remains to show that we can extend this to a stable and minimal
|
||
|
||
Minimal Separating Sequences for All Pairs of States
|
||
|
||
65
|
||
|
||
splitting tree. Algorithm 4.3 will perform precisely one such step towards stability,
|
||
while maintaining minimality. Termination follows from the same reason as for Algorithm 4.1. Correctness for this algorithm is shown by the following key lemma.
|
||
We will denote the input tree by T and the tree after performing Algorithm 4.3 by T ′ .
|
||
Observe that T is an initial segment of T ′ .
|
||
Lemma 10.
|
||
|
||
Algorithm 4.3 ensures a (k + 1)-stable minimal splitting tree.
|
||
|
||
Proof. Let us proof stability. Let s and t be in the same leaf of T ′ and let x ∈ I∗ be such
|
||
that λ(s, x) ≠ λ(t, x). We show that |x| > k + 1.
|
||
Suppose for the sake of contradiction that |x| ≤ k + 1. Let u be the leaf containing
|
||
s and t and write x = ax′ . We see that δ(s, a) and δ(t, a) are separated by k-stability of
|
||
T. So the node v = lca(δ(l(u), a)) has children and an associated sequence σ(v). There
|
||
are two cases:
|
||
– |σ(v)| < k, then aσ(v) separates s and t and is of length ≤ k. This case contradicts
|
||
the k-stability of T.
|
||
– |σ(v)| = k, then the loop in Algorithm 4.3 will consider this case and split. Note
|
||
that this may not split s and t (it may occur that aσ(v) splits different elements in
|
||
l(u)). We can repeat the above argument inductively for the newly created leaf
|
||
containing s and t. By finiteness of l(u), the induction will stop and, in the end, s
|
||
and t are split.
|
||
Both cases end in contradiction, so we conclude that |x| > k + 1.
|
||
Let us now prove minimality. It suffices to consider only newly split states in T ′ .
|
||
Let s and t be two states with |σ(lca(s, t))| = k + 1. Let x ∈ I∗ be a sequence such that
|
||
λ(s, x) ≠ λ(t, x). We need to show that |x| ≥ k + 1. Since x ≠ ϵ we can write x = ax′ and
|
||
consider the states s′ = δ(s, a) and t′ = δ(t, a) which are separated by x′ . Two things
|
||
can happen:
|
||
– The states s′ and t′ are in the same leaf in T. Then by k-stability of T we get
|
||
λ(s′ , y) = λ(t′ , y) for all y ∈ I≤k . So |x′ | > k.
|
||
– The states s′ and t′ are in different leaves in T and let u = lca(s′ , t′ ). Then aσ(u)
|
||
separates s and t. Since s and t are in the same leaf in T we get |aσ(u)| ≥ k + 1 by
|
||
k-stability. This means that |σ(u)| ≥ k and by minimality of T we get |x′ | ≥ k.
|
||
In both cases we have shown that |x| ≥ k + 1 as required.
|
||
□
|
||
Example 11. Figure 4.2a shows a stable and minimal splitting tree T for the machine
|
||
in Figure 4.1. This tree is constructed by Algorithm 4.2 as follows. It executes the same
|
||
as Algorithm 4.1 until we consider the node labelled {s0 , s2 }. At this point k = 1. We
|
||
observe that the sequence of lca(δ({s0 , s2 }, a)) has length 2, which is too long, so we
|
||
continue with the next input. We find that we can indeed split w.r.t. the state after b,
|
||
so the associated sequence is ba. Continuing, we obtain the same partition as before,
|
||
but with smaller witnesses.
|
||
The internal data structure (a refinable partition) is shown in Figure 4.2(b): the
|
||
array with the permutation of the states is at the bottom, and every block includes
|
||
|
||
66
|
||
|
||
1
|
||
2
|
||
3
|
||
4
|
||
5
|
||
6
|
||
7
|
||
8
|
||
|
||
Chapter 4
|
||
|
||
Require: An FSM M with n states
|
||
Ensure: A stable, minimal splitting tree T
|
||
initialise T to be a tree with a single node labelled S
|
||
repeat
|
||
find a ∈ I, B ∈ P(T ) such that we can split B w.r.t. output λ(⋅, a)
|
||
expand the u ∈ T with l(u) = B as described in (split-output)
|
||
until P(T ) is acceptable
|
||
for k = 1 to n − 1 do
|
||
invoke Algorithm 4.3 or Algorithm 4.4 on T for k
|
||
end for
|
||
Algorithm 4.2
|
||
|
||
1
|
||
2
|
||
3
|
||
4
|
||
5
|
||
6
|
||
|
||
Constructing a stable and minimal splitting tree.
|
||
|
||
Require: A k-stable and minimal splitting tree T
|
||
Ensure: T is a (k + 1)-stable, minimal splitting tree
|
||
for all leaves u ∈ T and all inputs a ∈ I do
|
||
v ← lca(δ(l(u), a))
|
||
if v is an inner node and |σ(v)| = k then
|
||
expand u as described in (split-state) (this generates new leaves)
|
||
|
||
end if
|
||
end for
|
||
Algorithm 4.3
|
||
|
||
A step towards the stability of a splitting tree.
|
||
|
||
an indication of the slice containing its label and a pointer to its parent (as our final
|
||
algorithm needs to find the parent block, but never the child blocks).
|
||
B2
|
||
B4
|
||
B6
|
||
|
||
(a)
|
||
|
||
σ=a
|
||
|
||
B8
|
||
|
||
σ=b
|
||
|
||
σ=bσ(B2 )
|
||
|
||
B0
|
||
|
||
B5
|
||
|
||
s2
|
||
|
||
s0
|
||
|
||
B3
|
||
|
||
B10
|
||
|
||
s4
|
||
|
||
σ=aσ(B4 )
|
||
|
||
σ=aσ(B6 )
|
||
|
||
B1
|
||
|
||
B9
|
||
|
||
s5
|
||
|
||
s1
|
||
|
||
B7
|
||
|
||
s3
|
||
|
||
(b)
|
||
|
||
Figure 4.2 (a) A complete and minimal splitting tree for the FSM in Figure 4.1 and
|
||
(b) its internal refinable partition data structure.
|
||
|
||
Minimal Separating Sequences for All Pairs of States
|
||
|
||
67
|
||
|
||
3 Optimising the Algorithm
|
||
In this section, we present an improvement on Algorithm 4.3 that uses two ideas
|
||
described by Hopcroft (1971) in his seminal paper on minimising finite automata:
|
||
using the inverse transition set, and processing the smaller half. The algorithm that we
|
||
present is a drop-in replacement, so that Algorithm 4.2 stays the same except for
|
||
some bookkeeping. This way, we can establish correctness of the new algorithms
|
||
more easily. The variant presented in this section reduces the amount of redundant
|
||
computations that were made in Algorithm 4.3.
|
||
Using Hopcroft’s first idea, we turn our algorithm upside down: instead of searching for the lca for each leaf, we search for the leaves u for which l(u) ⊆ δ−1 (l(v), a),
|
||
for each potential lca v and input a. To keep the order of splits as before, we define
|
||
k-candidates.
|
||
Definition 12.
|
||
|
||
A k-candidate is a node v with |σ(v)| = k.
|
||
|
||
A k-candidate v and an input a can be used to split a leaf u if v = lca(δ(l(u), a)),
|
||
because in this case there are at least two states s, t in l(u) such that δ(s, a) and δ(t, a)
|
||
are in labels of different nodes in C(v). Refining u this way is called splitting u with
|
||
respect to (v, a). The set C(u) is constructed according to (split-state), where each child
|
||
w ∈ C(v) defines a child uw of u with states
|
||
l(uw ) = {s ∈ l(u) | δ(s, a) ∈ l(w)}
|
||
= l(u) ∩ δ−1 (l(w), a).
|
||
|
||
(4.1)
|
||
|
||
In order to perform the same splits in each layer as before, we maintain a list Lk of
|
||
k-candidates. We keep the list in order of the construction of nodes, because when
|
||
we split w.r.t. a child of a node u before we split w.r.t. u, the result is not well-defined.
|
||
Indeed, the order on Lk is the same as the order used by Algorithm 4.2. So far, the
|
||
improved algorithm still would have time complexity 𝒪(mn).
|
||
To reduce the complexity we have to use Hopcroft’s second idea of processing the
|
||
smaller half. The key idea is that, when we fix a k-candidate v, all leaves are split
|
||
with respect to (v, a) simultaneously. Instead of iterating over of all leaves to refine
|
||
them, we iterate over s ∈ δ−1 (l(w), a) for all w in C(v) and look up in which leaf it is
|
||
contained to move s out of it. From Lemma 8 by Knuutila (2001) it follows that we can
|
||
skip one of the children of v. This lowers the time complexity to 𝒪(m log n). In order
|
||
to move s out of its leaf, each leaf u is associated with a set of temporary children
|
||
C′ (u) that is initially empty, and will be finalised after iterating over all s and w.
|
||
In Algorithm 4.4 we use the ideas described above. For each k-candidate v and
|
||
input a, we consider all children w of v, except for the largest one (in case of multiple
|
||
largest children, we skip one of these arbitrarily). For each state s ∈ δ−1 (l(w), a) we
|
||
consider the leaf u containing it. If this leaf does not have an associated temporary
|
||
|
||
68
|
||
|
||
Chapter 4
|
||
|
||
Require: A k-stable and minimal splitting tree T, and a list Lk
|
||
Ensure: T is a (k + 1)-stable and minimal splitting tree, and a list Lk+1
|
||
1
|
||
2
|
||
3
|
||
4
|
||
5
|
||
6
|
||
7
|
||
8
|
||
9
|
||
10
|
||
11
|
||
12
|
||
13
|
||
14
|
||
15
|
||
16
|
||
17
|
||
18
|
||
19
|
||
20
|
||
21
|
||
22
|
||
23
|
||
24
|
||
25
|
||
26
|
||
27
|
||
28
|
||
29
|
||
|
||
Lk+1 ← ∅
|
||
for all k-candidates v in Lk in order do
|
||
let w′ be a node in C(v) with |l(w′ )| ≥ |l(w)| for all nodes w ∈ C(v)
|
||
for all inputs a in I do
|
||
for all nodes w in C(v) ∖ {w′ } do
|
||
for all states s in δ−1 (l(w), a) do
|
||
locate leaf u such that s ∈ l(u)
|
||
if C′ (u) does not contain node uw then
|
||
add a new node uw to C′ (u)
|
||
|
||
end if
|
||
move s from l(u) to l(uw )
|
||
end for
|
||
end for
|
||
for all leaves u with C′ (u) ≠ ∅do
|
||
if |l(u)| = 0 then
|
||
if |C′ (u)| = 1 then
|
||
recover u by moving its elements back and clear C′ (u)
|
||
continue with the next leaf
|
||
end if
|
||
set p = u and C(u) = C′ (u)
|
||
else
|
||
construct a new node p and set C(p) = C′ (u) ∪ {u}
|
||
insert p in the tree in the place where u was
|
||
end if
|
||
set σ(p) = aσ(v)
|
||
append p to Lk+1 and clear C′ (u)
|
||
end for
|
||
end for
|
||
end for
|
||
Algorithm 4.4
|
||
|
||
A better step towards the stability of a splitting tree.
|
||
|
||
child for w we create such a child (line 9), if this child exists we move s into that child
|
||
(line 11).
|
||
Once we have done the simultaneous splitting for the candidate v and input a,
|
||
we finalise the temporary children. This is done at lines 14–26. If there is only one
|
||
temporary child with all the states, no split has been made and we recover this node
|
||
(line 17). In the other case we make the temporary children permanent.
|
||
|
||
Minimal Separating Sequences for All Pairs of States
|
||
|
||
69
|
||
|
||
The states remaining in u are those for which δ(s, a) is in the child of v that we
|
||
have skipped; therefore we will call it the implicit child. We should not touch these
|
||
states to keep the theoretical time bound. Therefore, we construct a new parent node
|
||
p that will “adopt” the children in C′ (u) together with u (line 20).
|
||
We will now explain why considering all but the largest children of a node lowers
|
||
the algorithm’s time complexity. Let T be a splitting tree in which we colour all children
|
||
of each node blue, except for the largest one. Then:
|
||
Lemma 13.
|
||
|
||
A state s is in at most (log2 n) − 1 labels of blue nodes.
|
||
|
||
Proof. Observe that every blue node u has a sibling u′ such that |l(u′ )| ≥ |l(u)|. So the
|
||
parent p(u) has at least 2|l(u)| states in its label, and the largest blue node has at most
|
||
n/2 states.
|
||
Suppose a state s is contained in m blue nodes. When we walk up the tree starting
|
||
at the leaf containing s, we will visit these m blue nodes. With each visit we can double
|
||
the lower bound of the number of states. Hence n/2 ≥ 2m and m ≤ (log2 n) − 1. □
|
||
Corollary 14. A state s is in at most log2 n sets δ−1 (l(u), a), where u is a blue node
|
||
and a is an input in I.
|
||
If we now quantify over all transitions, we immediately get the following result. We
|
||
note that the number of blue nodes is at most n − 1, but since this fact is not used, we
|
||
leave this to the reader.
|
||
Corollary 15.
|
||
|
||
Let ℬ denote the set of blue nodes and define
|
||
𝒳 = {(b, a, s) | b ∈ ℬ, a ∈ I, s ∈ δ−1 (l(b), a)}.
|
||
|
||
Then 𝒳 has at most m log2 n elements.
|
||
The important observation is that when using Algorithm 4.4 we iterate in total over
|
||
every element in 𝒳 at most once.
|
||
Theorem 16.
|
||
|
||
Algorithm 4.2 using Algorithm 4.4 runs in 𝒪(m log n) time.
|
||
|
||
Proof. We prove that bookkeeping does not increase time complexity by discussing
|
||
the implementation.
|
||
Inverse transition.
|
||
|
||
δ−1 can be constructed as a preprocessing step in 𝒪(m).
|
||
|
||
State sorting. As described in Section 1.2, we maintain a refinable partition data
|
||
structure. Each time new pair of a k-candidate v and input a is considered, leaves are
|
||
split by performing a bucket sort.
|
||
First, buckets are created for each node in w ∈ C(v) ∖ w′ and each leaf u that
|
||
contains one or more elements from δ−1 (l(w), a), where w′ is a largest child of v. The
|
||
buckets are filled by iterating over the states in δ−1 (l(w), a) for all w. Then, a pivot is
|
||
|
||
70
|
||
|
||
Chapter 4
|
||
|
||
set for each leaf u such that exactly the states that have been placed in a bucket can
|
||
be moved right of the pivot (and untouched states in δ−1 (l(w′ ), a) end up left of the
|
||
pivot). For each leaf u, we iterate over the states in its buckets and the corresponding
|
||
indices right of its pivot, and we swap the current state with the one that is at the
|
||
current index. For each bucket a new leaf node is created. The refinable partition is
|
||
updated such that the current state points to the most recently created leaf.
|
||
This way, we assure constant time lookup of the leaf for a state, and we can update
|
||
the array in constant time when we move elements out of a leaf.
|
||
Largest child. For finding the largest child, we maintain counts for the temporary
|
||
children and a current biggest one. On finalising the temporary children we store (a
|
||
reference to) the biggest child in the node, so that we can skip this node later in the
|
||
algorithm.
|
||
Storing sequences. The operation on line 25 is done in constant time by using a
|
||
linked list.
|
||
□
|
||
|
||
4 Application in Conformance Testing
|
||
A splitting tree can be used to extract relevant information for two classical test generation methods: a characterisation set for the W-method and a separating family for
|
||
the HSI-method. For an introduction and comparison of FSM-based test generation
|
||
methods we refer to Dorofeeva, et al. (2010) or Chapter 2.
|
||
Definition 17. A set W ⊂ I∗ is called a characterisation set if for every pair of inequivalent states s, t there is a sequence w ∈ W such that λ(s, w) ≠ λ(t, w).
|
||
Lemma 18. Let T be a complete splitting tree, then the set {σ(u) | u ∈ T } is a characterisation set.
|
||
Proof. Let W = {σ(u) | u ∈ T }. Let s, t ∈ S be inequivalent states, then by completeness
|
||
s and t are contained in different leaves of T. Hence u = lca(s, t) exists and σ(u) ∈ W
|
||
separates s and t. This shows that W is a characterisation set.
|
||
□
|
||
Lemma 19. A characterisation set with minimal length sequences can be constructed
|
||
in time 𝒪(m log n).
|
||
Proof. By Lemma 18 the sequences associated with the inner nodes of a splitting tree
|
||
form a characterisation set. By Theorem 16, such a tree can be constructed in time
|
||
𝒪(m log n). Traversing the tree to obtain the characterisation set is linear in the number
|
||
of nodes (and hence linear in the number of states).
|
||
□
|
||
|
||
Minimal Separating Sequences for All Pairs of States
|
||
|
||
71
|
||
|
||
Definition 20. A collection of sets {Hs }s∈S is called a separating family if for every
|
||
pair of inequivalent states s, t there is a sequence ℎ such that λ(s, ℎ) ≠ λ(t, ℎ) and ℎ is
|
||
a prefix of some ℎs ∈ Hs and some ℎt ∈ Ht .
|
||
Lemma 21. Let T be a complete splitting tree, the sets {σ(u) | s ∈ l(u), u ∈ T }s∈S form
|
||
a separating family.
|
||
Proof. Let Hs = {σ(u) | s ∈ l(u)}. Let s, t ∈ S be inequivalent states, then by completeness s and t are contained in different leaves of T. Hence u = lca(s, t) exists. Since
|
||
both s and t are contained in l(u), the separating sequence σ(u) is contained in both
|
||
sets Hs and Ht . Therefore, it is a (trivial) prefix of some word ℎs ∈ Hs and some
|
||
ℎt ∈ Ht . Hence {Hs }s∈S is a separating family.
|
||
□
|
||
Lemma 22. A separating family with minimal length sequences can be constructed
|
||
in time 𝒪(m log n + n2 ).
|
||
Proof. The separating family can be constructed from the splitting tree by collecting all
|
||
sequences of all parents of a state (by Lemma 21). Since we have to do this for every
|
||
state, this takes 𝒪(n2 ) time.
|
||
□
|
||
For test generation one also needs a transition cover. This can be constructed in linear
|
||
time with a breadth first search. We conclude that we can construct all necessary
|
||
information for the W-method in time 𝒪(m log n) as opposed the the 𝒪(mn) algorithm
|
||
used by Dorofeeva, et al. (2010). Furthermore, we conclude that we can construct
|
||
all the necessary information for the HSI-method in time 𝒪(m log n + n2 ), improving
|
||
on the the reported bound 𝒪(mn3 ) by Hierons and Türker (2015). The original HSImethod was formulated differently and might generate smaller sets. We conjecture
|
||
that our separating family has the same size if we furthermore remove redundant
|
||
prefixes. This can be done in 𝒪(n2 ) time using a trie data structure.
|
||
|
||
5 Experimental Results
|
||
We have implemented Algorithms 4.3 in Go, and we have compared their running
|
||
time on two sets of FSMs.16 The first set is from Smeenk, et al. (2015a), where FSMs
|
||
for embedded control software were automatically constructed. These FSMs are of
|
||
increasing size, varying from 546 to 3 410 states, with 78 inputs and up to 151 outputs.
|
||
The second set is inferred from Hopcroft (1971), where two classes of finite automata,
|
||
A and B, are described that serve as a worst case for Algorithms 4.3 respectively. The
|
||
FSMs that we have constructed for these automata have 1 input, 2 outputs, and 22 – 215
|
||
states. The running times in seconds on an Intel Core i5-2500 are plotted in Figure 4.3.
|
||
We note that different slopes imply different complexity classes, since both axes have
|
||
a logarithmic scale.
|
||
16
|
||
|
||
Available at https://github.com/Jaxan/partition.
|
||
|
||
72
|
||
|
||
Chapter 4
|
||
102
|
||
|
||
102
|
||
|
||
101
|
||
|
||
100
|
||
10−2
|
||
|
||
100
|
||
|
||
10−4
|
||
|
||
10−1
|
||
500
|
||
|
||
1000
|
||
|
||
2000 3000
|
||
|
||
(a) Embedded control software.
|
||
Figure 4.3
|
||
black.
|
||
|
||
10−6
|
||
|
||
22
|
||
|
||
26
|
||
|
||
211
|
||
|
||
215
|
||
|
||
(b) Class A (dashed)
|
||
and class B (solid).
|
||
|
||
Running time in seconds of Algorithm 4.3 in grey and Algorithm 4.4 in
|
||
|
||
6 Conclusion
|
||
In this chapter we have described an efficient algorithm for constructing a set of
|
||
minimal-length sequences that pairwise distinguish all states of a finite state machine.
|
||
By extending Hopcroft’s minimisation algorithm, we are able to construct such sequences in 𝒪(m log n) for a machine with m transitions and n states. This improves
|
||
on the traditional 𝒪(mn) method that is based on the classic algorithm by Moore. As
|
||
an upshot, the sequences obtained form a characterisation set and a separating family,
|
||
which play a crucial in conformance testing.
|
||
Two key observations were required for a correct adaptation of Hopcroft’s algorithm. First, it is required to perform splits in order of the length of their associated
|
||
sequences. This guarantees minimality of the obtained separating sequences. Second,
|
||
it is required to consider nodes as a candidate before any one of its children are considered as a candidate. This order follows naturally from the construction of a splitting
|
||
tree.
|
||
Experimental results show that our algorithm outperforms the classic approach
|
||
for both worst-case finite state machines and models of embedded control software.
|
||
Applications of minimal separating sequences such as the ones described by Dorofeeva, et al. (2010) and Smeenk, et al. (2015a) therefore show that our algorithm is
|
||
useful in practice.
|
||
|
||
Part 2:
|
||
Nominal Techniques
|
||
|
||
74
|
||
|
||
Chapter
|
||
|
||
Chapter 5
|
||
Learning Nominal Automata
|
||
Joshua Moerman
|
||
Radboud University
|
||
|
||
Matteo Sammartino
|
||
University
|
||
College London
|
||
|
||
Bartek Klin
|
||
University of Warsaw
|
||
|
||
Alexandra Silva
|
||
University
|
||
College London
|
||
|
||
Michał Szynwelski
|
||
University of Warsaw
|
||
|
||
Abstract
|
||
We present an Angluin-style algorithm to learn nominal automata, which
|
||
are acceptors of languages over infinite (structured) alphabets. The abstract approach we take allows us to seamlessly extend known variations
|
||
of the algorithm to this new setting. In particular, we can learn a subclass
|
||
of nominal non-deterministic automata. An implementation using a recently developed Haskell library for nominal computation is provided
|
||
for preliminary experiments.
|
||
|
||
This chapter is based on the following publication:
|
||
Moerman, J., Sammartino, M., Silva, A., Klin, B., & Szynwelski, M. (2017). Learning
|
||
nominal automata. In Proceedings of the 44th ACM SIGPLAN Symposium on Principles
|
||
of Programming Languages, POPL. ACM. doi:10.1145/3009837.3009879
|
||
|
||
76
|
||
|
||
Chapter 5
|
||
|
||
Automata are a well established computational abstraction with a wide range of
|
||
applications, including modelling and verification of (security) protocols, hardware,
|
||
and software systems. In an ideal world, a model would be available before a system
|
||
or protocol is deployed in order to provide ample opportunity for checking important
|
||
properties that must hold and only then the actual system would be synthesised from
|
||
the verified model. Unfortunately, this is not at all the reality: Systems and protocols
|
||
are developed and coded in short spans of time and if mistakes occur they are most
|
||
likely found after deployment. In this context, it has become popular to infer or learn
|
||
a model from a given system just by observing its behaviour or response to certain
|
||
queries. The learned model can then be used to ensure the system is complying to
|
||
desired properties or to detect bugs and design possible fixes.
|
||
Automata learning, or regular inference, is a widely used technique for creating
|
||
an automaton model from observations. The original algorithm by Angluin (1987)
|
||
works for deterministic finite automata, but since then has been extended to other
|
||
types of automata, including Mealy machines and I/O automata (see Niese, 2003, §8.5,
|
||
and Aarts & Vaandrager, 2010), and even a special class of context-free grammars
|
||
(see Isberner, 2015, §6) Angluin’s algorithm is sometimes referred to as active learning,
|
||
because it is based on direct interaction of the learner with an oracle (“the Teacher”)
|
||
that can answer different types of queries. This is in contrast with passive learning,
|
||
where a fixed set of positive and negative examples is given and no interaction with
|
||
the system is possible.
|
||
In this chapter, staying in the realm of active learning, we will extend Angluin’s
|
||
algorithm to a richer class of automata. We are motivated by situations in which a
|
||
program model, besides control flow, needs to represent basic data flow, where data
|
||
items are compared for equality (or for other theories such as total ordering). In these
|
||
situations, values for individual symbols are typically drawn from an infinite domain
|
||
and automata over infinite alphabets become natural models, as witnessed by a recent
|
||
trend (Aarts, et al., 2015; Bojańczyk, et al., 2014; Bollig, et al., 2013; Cassel, et al., 2016;
|
||
D’Antoni & Veanes, 2014).
|
||
One of the foundational approaches to formal language theory for infinite alphabets uses the notion of nominal sets (Bojańczyk, et al., 2014). The theory of nominal
|
||
sets originates from the work of Fraenkel in 1922, and they were originally used to
|
||
prove the independence of the axiom of choice and other axioms. They have been
|
||
rediscovered in Computer Science by Gabbay and Pitts (see Pitts, 2013 for historical
|
||
notes), as an elegant formalism for modelling name binding, and since then they form
|
||
the basis of many research projects in the semantics and concurrency community. In
|
||
a nutshell, nominal sets are infinite sets equipped with symmetries which make them
|
||
finitely representable and tractable for algorithms. We make crucial use of this feature
|
||
in the development of a learning algorithm.
|
||
Our main contributions are the following.
|
||
– A generalisation of Angluin’s original algorithm to nominal automata. The generalisation follows a generic pattern for transporting computation models from
|
||
|
||
Learning Nominal Automata
|
||
|
||
77
|
||
|
||
finite sets to nominal sets, which leads to simple correctness proofs and opens the
|
||
door to further generalisations. The use of nominal sets with different symmetries also creates potential for generalisation, e.g., to languages with time features
|
||
(Bojańczyk & Lasota, 2012) or data dependencies represented as graphs (Montanari & Sammartino, 2014).
|
||
– An extension of the algorithm to nominal non-deterministic automata (nominal
|
||
NFAs). To the best of our knowledge, this is the first learning algorithm for nondeterministic automata over infinite alphabets. It is important to note that, in the
|
||
nominal setting, NFAs are strictly more expressive than DFAs. We learn a subclass
|
||
of the languages accepted by nominal NFAs, which includes all the languages
|
||
accepted by nominal DFAs. The main advantage of learning NFAs directly is
|
||
that they can provide exponentially smaller automata when compared to their
|
||
deterministic counterpart. This can be seen both as a generalisation and as an
|
||
optimisation of the algorithm.
|
||
– An implementation using a recently developed Haskell library tailored to nominal
|
||
computation – NLambda, or Nλ, by Klin and Szynwelski (2016). Our implementation is the first non-trivial application of a novel programming paradigm of
|
||
functional programming over infinite structures, which allows the programmer to
|
||
rely on convenient intuitions of searching through infinite sets in finite time.
|
||
This chapter is organised as follows. In Section 1, we present an overview of our
|
||
contributions (and the original algorithm) highlighting the challenges we faced in
|
||
the various steps. In Section 2, we revise some basic concepts of nominal sets and
|
||
automata. Section 3 contains the core technical contributions: The new algorithm
|
||
and proof of correctness. In Section 4, we describe an algorithm to learn nominal
|
||
non-deterministic automata. Section 5 contains a description of NLambda, details
|
||
of the implementation, and results of preliminary experiments. Section 6 contains a
|
||
discussion of related work. We conclude this chapter with a discussion section where
|
||
also future directions are presented.
|
||
|
||
1 Overview of the Approach
|
||
In this section, we give an overview through examples. We will start by explaining
|
||
the original algorithm for regular languages over finite alphabets, and then explain
|
||
the challenges in extending it to nominal languages.
|
||
Angluin’s algorithm L∗ provides a procedure to learn the minimal DFA accepting
|
||
a certain (unknown) language ℒ. The algorithm has access to a teacher which answers
|
||
two types of queries:
|
||
– membership queries, consisting of a single word w ∈ A∗ , to which the teacher will
|
||
reply whether w ∈ ℒ or not;
|
||
|
||
78
|
||
|
||
Chapter 5
|
||
|
||
– equivalence queries, consisting of a hypothesis DFA H, to which the teacher replies
|
||
yes if ℒ(H) = ℒ, and no otherwise, providing a counterexample w ∈ ℒ(H)△ℒ
|
||
(where △ denotes the symmetric difference of two languages).
|
||
The learning algorithm works by incrementally building an observation table, which at
|
||
each stage contains partial information about the language ℒ. The algorithm is able to
|
||
fill the table with membership queries. As an example, and to set notation, consider
|
||
the following table (over the alphabet A = {a, b}).
|
||
ϵ
|
||
ϵ 0
|
||
S ∪ S⋅A a 0
|
||
b 0
|
||
|
||
E
|
||
a
|
||
0
|
||
1
|
||
0
|
||
|
||
aa
|
||
1
|
||
0
|
||
0
|
||
|
||
row : S ∪ S⋅A → 2E
|
||
row(u)(v) = 1 ⟺ uv ∈ ℒ
|
||
|
||
This table indicates that ℒ contains at least aa and definitely does not contain the
|
||
words ϵ, a, b, ba, baa, aaa. Since row is fully determined by the language ℒ, we will
|
||
from now on refer to an observation table as a pair (S, E), leaving the language ℒ
|
||
implicit.
|
||
Given an observation table (S, E) one can construct a deterministic automaton
|
||
M(S, E) = (Q, q0 , δ, F) where
|
||
– Q = {row(s) | s ∈ S} is a finite set of states;
|
||
– F = {row(s) | s ∈ S, row(s)(ϵ) = 1} ⊆ Q is the set of final states;
|
||
– q0 = row(ϵ) is the initial state;
|
||
– δ : Q × A → Q is the transition function given by δ(row(s), a) = row(sa).
|
||
For this to be well-defined, we need to have ϵ ∈ S (for the initial state) and ϵ ∈ E (for
|
||
final states), and for the transition function there are two crucial properties of the table
|
||
that need to hold: Closedness and consistency. An observation table (S, E) is closed if
|
||
for all t ∈ S⋅A there exists an s ∈ S such that row(t) = row(s). An observation table
|
||
(S, E) is consistent if, whenever s1 and s2 are elements of S such that row(s1 ) = row(s2 ),
|
||
for all a ∈ A, row(s1 a) = row(s2 a). Each time the algorithm constructs an automaton,
|
||
it poses an equivalence query to the teacher. It terminates when the answer is yes,
|
||
otherwise it extends the table with the counterexample provided.
|
||
|
||
1.1 Simple Example of Execution
|
||
Angluin’s algorithm is displayed in Algorithm 5.1. Throughout this section, we will
|
||
consider the language(s)
|
||
ℒn = {ww | w ∈ A∗ , |w| = n} .
|
||
|
||
If the alphabet A is finite then ℒn is regular for any n ∈ ℕ, and there is a finite DFA
|
||
accepting it.
|
||
|
||
Learning Nominal Automata
|
||
|
||
1
|
||
2
|
||
3
|
||
4
|
||
5
|
||
6
|
||
7
|
||
8
|
||
9
|
||
10
|
||
11
|
||
12
|
||
13
|
||
14
|
||
15
|
||
16
|
||
17
|
||
18
|
||
19
|
||
|
||
79
|
||
|
||
S, E ← {ϵ}
|
||
|
||
repeat
|
||
while (S, E) is not closed or not consistent do
|
||
if (S, E) is not closed then
|
||
find s1 ∈ S, a ∈ A such that row(s1 a) ≠ row(s) for all s ∈ S
|
||
S ← S ∪ {s1 a}
|
||
|
||
end if
|
||
if (S, E) is not consistent then
|
||
find s1 , s2 ∈ S, a ∈ A, and e ∈ E such that
|
||
row(s1 ) = row(s2 ) and ℒ(s1 ae) ≠ ℒ(s2 ae)
|
||
E ← E ∪ {ae}
|
||
|
||
end if
|
||
end while
|
||
Make the conjecture M(S, E)
|
||
if the Teacher replies no, with a counter-example t then
|
||
S ← S ∪ pref(t)
|
||
|
||
end if
|
||
until the Teacher replies yes to the conjecture M(S, E)
|
||
return M(S, E)
|
||
Algorithm 5.1
|
||
|
||
The L∗ learning algorithm from Angluin (1987).
|
||
|
||
The language ℒ1 = {aa, bb} looks trivial, but the minimal DFA recognising it has as
|
||
many as 5 states. Angluin’s algorithm will terminate in (at most) 5 steps. We illustrate
|
||
some relevant ones.
|
||
Step 1 We start from S, E = {ϵ}, and we fill the entries of the table below by asking
|
||
membership queries for ϵ, a and b. The table is closed and consistent, so we construct
|
||
the hypothesis 𝒜1 , where q0 = row(ϵ) = {ϵ ↦ 0}:
|
||
ϵ
|
||
ϵ 0
|
||
a 0
|
||
b 0
|
||
|
||
𝒜1 =
|
||
|
||
q0
|
||
|
||
a, b
|
||
|
||
The Teacher replies no and gives the counterexample aa, which is in ℒ1 but it is not
|
||
accepted by 𝒜1 . Therefore, line 16 of the algorithm is triggered and we set S = {ϵ, a, aa}.
|
||
Step 2 The table becomes the one on the left below. It is closed, but not consistent:
|
||
Rows ϵ and a are identical, but appending a leads to different rows, as depicted.
|
||
|
||
80
|
||
|
||
Chapter 5
|
||
|
||
Therefore, line 10 is triggered and an extra column a, highlighted in red, is added.
|
||
The new table is closed and consistent and a new hypothesis 𝒜2 is constructed.
|
||
|
||
ϵ
|
||
a
|
||
a
|
||
aa
|
||
b
|
||
ab
|
||
aaa
|
||
aab
|
||
|
||
ϵ
|
||
0
|
||
0
|
||
a
|
||
1
|
||
0
|
||
0
|
||
0
|
||
0
|
||
|
||
ϵ
|
||
a
|
||
aa
|
||
b
|
||
ab
|
||
aaa
|
||
aab
|
||
|
||
ϵ
|
||
0
|
||
0
|
||
1
|
||
0
|
||
0
|
||
0
|
||
0
|
||
|
||
a
|
||
0
|
||
1
|
||
0
|
||
0
|
||
0
|
||
0
|
||
0
|
||
|
||
b
|
||
q0
|
||
|
||
𝒜2 =
|
||
|
||
q1
|
||
|
||
a
|
||
|
||
a
|
||
|
||
a, b
|
||
|
||
b
|
||
|
||
q2
|
||
|
||
The Teacher again replies no and gives the counterexample bb, which should be
|
||
accepted by 𝒜2 but it is not. Therefore we put S ← S ∪ {b, bb}.
|
||
Step 3 The new table is the one on the left. It is closed, but ϵ and b violate consistency,
|
||
when b is appended. Therefore we add the column b and we get the table on the right,
|
||
which is closed and consistent. The new hypothesis is 𝒜3 .
|
||
|
||
b
|
||
|
||
ϵ
|
||
a
|
||
aa
|
||
b
|
||
bb
|
||
ab
|
||
aaa
|
||
aab
|
||
ba
|
||
bba
|
||
bbb
|
||
|
||
ϵ
|
||
0
|
||
0
|
||
1
|
||
0
|
||
1
|
||
0
|
||
0
|
||
0
|
||
0
|
||
0
|
||
0
|
||
|
||
a
|
||
0
|
||
1
|
||
0
|
||
0
|
||
b
|
||
0
|
||
0
|
||
0
|
||
0
|
||
0
|
||
0
|
||
0
|
||
|
||
ϵ
|
||
a
|
||
aa
|
||
b
|
||
bb
|
||
ab
|
||
aaa
|
||
aab
|
||
ba
|
||
bba
|
||
bbb
|
||
|
||
ϵ
|
||
0
|
||
0
|
||
1
|
||
0
|
||
1
|
||
0
|
||
0
|
||
0
|
||
0
|
||
0
|
||
0
|
||
|
||
a
|
||
0
|
||
1
|
||
0
|
||
0
|
||
0
|
||
0
|
||
0
|
||
0
|
||
0
|
||
0
|
||
0
|
||
|
||
b
|
||
0
|
||
0
|
||
0
|
||
1
|
||
0
|
||
0
|
||
0
|
||
0
|
||
0
|
||
0
|
||
0
|
||
|
||
a
|
||
q0
|
||
|
||
𝒜3 =
|
||
b
|
||
|
||
a
|
||
q2
|
||
|
||
b
|
||
a, b
|
||
b
|
||
|
||
q1
|
||
a
|
||
q3
|
||
|
||
The Teacher replies no and provides the counterexample babb, so S ← S ∪ {ba, bab}.
|
||
Step 4
|
||
|
||
One more step brings us to the correct hypothesis 𝒜4 (details are omitted).
|
||
|
||
a
|
||
𝒜4 =
|
||
|
||
q1
|
||
|
||
q0
|
||
|
||
b
|
||
a
|
||
q3
|
||
|
||
b
|
||
|
||
q2
|
||
|
||
a, b
|
||
|
||
b
|
||
a
|
||
|
||
q4
|
||
|
||
a, b
|
||
|
||
Learning Nominal Automata
|
||
|
||
81
|
||
|
||
1.2 Learning Nominal Languages
|
||
Consider now an infinite alphabet A = {a, b, c, d, …}. The language ℒ1 becomes
|
||
{aa, bb, cc, dd, …}. Classical theory of finite automata does not apply to this kind
|
||
of languages, but one may draw an infinite deterministic automaton that recognises
|
||
ℒ1 in the standard sense:
|
||
≠a
|
||
|
||
qa
|
||
a
|
||
|
||
a
|
||
𝒜5 =
|
||
|
||
q0
|
||
|
||
b
|
||
|
||
qb
|
||
|
||
b
|
||
|
||
q3
|
||
|
||
A
|
||
|
||
q4
|
||
|
||
A
|
||
|
||
≠b
|
||
|
||
⋮
|
||
|
||
A
|
||
≠a
|
||
where ⟶
|
||
and ⟶
|
||
stand for the infinitely-many transitions labelled by elements of A
|
||
and A ∖ {a}, respectively. This automaton is infinite, but it can be finitely presented in
|
||
a variety of ways, for example:
|
||
|
||
∀x ∈ A
|
||
𝒜6 =
|
||
|
||
q0
|
||
|
||
x
|
||
|
||
qx
|
||
|
||
x
|
||
|
||
q3
|
||
|
||
A
|
||
|
||
q4
|
||
|
||
A
|
||
|
||
≠x
|
||
|
||
One can formalise the quantifier notation above (or indeed the “dots” notation above
|
||
that) in several ways. A popular solution is to consider finite register automata (Demri
|
||
& Lazic, 2009 and Kaminski & Francez, 1994), i.e., finite automata equipped with a
|
||
finite number of registers where alphabet letters can be stored and later compared
|
||
for equality. Our language ℒ1 is recognised by a simple automaton with four states
|
||
and one register. The problem of learning registered automata has been successfully
|
||
attacked before by, for instance, Howar, et al. (2012).
|
||
In this chapter, however, we consider nominal automata by Bojańczyk, et al. (2014)
|
||
instead. These automata ostensibly have infinitely many states, but the set of states
|
||
can be finitely presented in a way open to effective manipulation. More specifically,
|
||
in a nominal automaton the set of states is subject to an action of permutations of a
|
||
set 𝔸 of atoms, and it is finite up to that action. For example, the set of states of 𝒜5 is:
|
||
{q0 , q3 , q4 } ∪ {qa | a ∈ A}
|
||
|
||
and it is equipped with a canonical action of permutations π : 𝔸 → 𝔸 that maps every
|
||
qa to qπ(a) and leaves q0 , q3 and q4 fixed. Technically speaking, the set of states
|
||
has four orbits (one infinite orbit and three fixed points) of the action of the group of
|
||
permutations of 𝔸. Moreover, it is required that in a nominal automaton the transition
|
||
|
||
82
|
||
|
||
Chapter 5
|
||
|
||
relation is equivariant, i.e., closed under the action of permutations. The automaton 𝒜5
|
||
a
|
||
has this property: For example, it has a transition qa ⟶
|
||
q3 , and for any π : 𝔸 → 𝔸
|
||
π(a)
|
||
there is also a transition π(qa ) = qπ(a) ⟶ q3 = π(q3 ).
|
||
Nominal automata with finitely many orbits of states are equi-expressive with finite
|
||
register automata (Bojańczyk, et al., 2014), but they have an important theoretical
|
||
advantage: They are a direct reformulation of the classical notion of finite automaton,
|
||
where one replaces finite sets with orbit-finite sets and functions (or relations) with
|
||
equivariant ones. A research programme advocated by Bojańczyk, et al. is to transport
|
||
various computation models, algorithms and theorems along this correspondence.
|
||
This can often be done with remarkable accuracy, and our results are a witness to this.
|
||
Indeed, as we shall see, nominal automata can be learned with an algorithm that is
|
||
almost a verbatim copy of the classical Angluin’s one.
|
||
Indeed, consider applying Angluin’s algorithm to our new language ℒ1 . The key
|
||
idea is to change the basic data structure: Our observation table (S, E) will be such
|
||
that S and E are equivariant subsets of A∗ , i.e., they are closed under the canonical action
|
||
of atom permutations. In general, such a table has infinitely many rows and columns, so
|
||
the following aspects of Algorithm 5.1 seem problematic:
|
||
– line 4 and line 8: finding witnesses for closedness or consistency violations potentially require checking all infinitely many rows;
|
||
– line 16: every counterexample t has infinitely many prefixes, so it is not clear how
|
||
one constructs an infinite set S in finite time. However, an infinite S is necessary
|
||
for the algorithm to ever succeed, because no finite automaton recognises ℒ1 .
|
||
At this stage, we need to observe that due to equivariance of S, E and ℒ1 , the following
|
||
crucial properties hold:
|
||
(P1) the sets S, S⋅A and E admit a finite representation up to permutations;
|
||
(P2) the function row is such that row(π(s))(π(e)) = row(s)(e), for all s ∈ S and
|
||
e ∈ E, so the observation table admits a finite symbolic representation.
|
||
Intuitively, checking closedness and consistency, and finding a witness for their
|
||
violations, can be done effectively on the representations up to permutations (P1).
|
||
This is sound, as row is invariant w.r.t. permutations (P2).
|
||
We now illustrate these points through a few steps of the algorithm for ℒ1 .
|
||
Step 1’ We start from S, E = {ϵ}. We have S⋅A = A, which is infinite but admits a
|
||
finite representation. In fact, for any a ∈ A, we have A = {π(a) | π is a permutation}.
|
||
Then, by (P2), row(π(a))(ϵ) = row(a)(ϵ) = 0, for all π, so the first table can be written
|
||
as:
|
||
ϵ
|
||
ϵ 0
|
||
a 0
|
||
|
||
𝒜′1 =
|
||
|
||
q0
|
||
|
||
A
|
||
|
||
Learning Nominal Automata
|
||
|
||
83
|
||
|
||
It is closed and consistent. Our hypothesis is 𝒜′1 , where δ𝒜′1 (row(ϵ), x) = row(x) = q0 ,
|
||
for all x ∈ A. As in Step 1, the Teacher replies with the counterexample aa.
|
||
Step 2’ By equivariance of ℒ1 , the counterexample tells us that all words
|
||
of length 2 with two repeated letters are accepted. Therefore we extend S
|
||
ϵ
|
||
with the (infinite!) set of such words. The new symbolic table is depcited
|
||
ϵ
|
||
0
|
||
on the right.
|
||
a
|
||
0
|
||
The lower part stands for elements of S⋅A. For instance, ab stands for
|
||
aa 1
|
||
words obtained by appending a fresh letter to words of length 1 (row a).
|
||
ab 0
|
||
It can be easily verified that all cases are covered. Notice that the table is
|
||
aaa 0
|
||
different from that of Step 2: A single b is not in the lower part, because
|
||
aab 0
|
||
it can be obtained from a via a permutation. The table is closed.
|
||
Now, for consistency we need to check row(ϵx) = row(ax), for all a, x ∈ A. Again,
|
||
by (P2), it is enough to consider rows of the table above. Consistency is violated,
|
||
because row(a) ≠ row(aa). We found a “symbolic” witness a for such violation. In
|
||
order to fix consistency, while keeping E equivariant, we need to add columns for all
|
||
π(a). The resulting table is
|
||
|
||
ϵ
|
||
a
|
||
aa
|
||
ab
|
||
aaa
|
||
aab
|
||
|
||
ϵ
|
||
0
|
||
0
|
||
1
|
||
0
|
||
0
|
||
0
|
||
|
||
a
|
||
0
|
||
1
|
||
0
|
||
0
|
||
0
|
||
0
|
||
|
||
b
|
||
0
|
||
0
|
||
0
|
||
0
|
||
0
|
||
0
|
||
|
||
c
|
||
0
|
||
0
|
||
0
|
||
0
|
||
0
|
||
0
|
||
|
||
…
|
||
…
|
||
…
|
||
…
|
||
…
|
||
…
|
||
…
|
||
|
||
where non-specified entries are 0. Only finitely many entries of the table are relevant:
|
||
row(s) is fully determined by its values on letters in s and on just one letter not in s.
|
||
For instance, we have row(a)(a) = 1 and row(a)(a′ ) = 0, for all a′ ∈ A ∖ {a}. The table
|
||
is trivially consistent.
|
||
Notice that this step encompasses both Step 2 and 3, because the rows b and bb
|
||
added by Step 2 are already represented by a and aa. The hypothesis automaton is
|
||
|
||
x
|
||
𝒜′2 =
|
||
|
||
q0
|
||
|
||
≠x
|
||
|
||
A
|
||
qx
|
||
|
||
x
|
||
|
||
q2
|
||
|
||
∀x ∈ A
|
||
|
||
This is again incorrect, but one additional step will give the correct hypothesis automaton 𝒜6 .
|
||
|
||
84
|
||
|
||
Chapter 5
|
||
|
||
1.3 Generalisation to Non-Deterministic Automata
|
||
Since our extension of Angluin’s L∗ algorithm stays close to her original development,
|
||
exploring extensions of other variations of L∗ to the nominal setting can be done in
|
||
a systematic way. We will show how to extend the algorithm NL∗ for learning NFAs
|
||
by Bollig, et al. (2009). This has practical implications: It is well-known that NFAs
|
||
are exponentially more succinct than DFAs. This is true also in the nominal setting.
|
||
However, there are challenges in the extension that require particular care.
|
||
– Nominal NFAs are strictly more expressive than nominal DFAs. We will show
|
||
that the nominal version of NL∗ terminates for all nominal NFAs that have a corresponding nominal DFA and, more surprisingly, that it is capable of learning some
|
||
languages that are not accepted by nominal DFAs.
|
||
– Language equivalence of nominal NFAs is undecidable. This does not affect the
|
||
correctness proof, as it assumes a teacher which is able to answer equivalence
|
||
queries accurately. For our implementation, we will describe heuristics that produce correct results in many cases.
|
||
For the learning algorithm the power of non-determinism means that we can make
|
||
some shortcuts during learning: If we want to make the table closed, we were previously required to find an equivalent row in the upper part; now we may find a sum
|
||
of rows which, together, are equivalent to an existing row. This means that in some
|
||
cases fewer rows will be added for closedness.
|
||
|
||
2 Preliminaries
|
||
We recall the notions of nominal sets, nominal automata and nominal regular languages. We refer to Bojańczyk, et al. (2014) for a detailed account.
|
||
Let 𝔸 be a countable set and let Perm(𝔸) be the set of permutations on 𝔸, i.e.,
|
||
the bijective functions π : 𝔸 → 𝔸. Permutations form a group where the identity
|
||
permutation id is the unit element, inverse is functional inverse and multiplication is
|
||
function composition.
|
||
A nominal set (Pitts, 2013) is a set X together with a function ⋅ : Perm(𝔸) × X → X,
|
||
interpreting permutations over X. Such function must be a group action of Perm(𝔸),
|
||
i.e., it must satisfy id ⋅ x = x and π ⋅ (π′ ⋅ x) = (π ∘ π′ ) ⋅ x. We say that a finite A ⊂ 𝔸
|
||
supports x ∈ X whenever, for all π acting as the identity on A, we have π ⋅ x = x. In
|
||
other words, permutations that only move elements outside A do not affect x. The
|
||
support of x ∈ X, denoted supp(x), is the smallest finite set supporting x. We require
|
||
nominal sets to have finite support, meaning that supp(x) exists for all x ∈ X.
|
||
The orbit of x, denoted orb(x), is the set of elements in X reachable from x via
|
||
permutations, explicitly
|
||
orb(x) = {π ⋅ x | π ∈ Perm(𝔸)}.
|
||
|
||
We say that X is orbit-finite whenever it is a union of finitely many orbits.
|
||
|
||
Learning Nominal Automata
|
||
|
||
85
|
||
|
||
Given a nominal set X, a subset Y ⊆ X is equivariant if it is preserved by permutations, i.e., π ⋅ y ∈ Y, for all y ∈ Y. In other words, Y is a union of some orbits of X.
|
||
This definition extends to the notion of an equivariant relation R ⊆ X × Y, by setting
|
||
π⋅(x, y) = (π⋅x, π⋅y), for (x, y) ∈ R; similarly for relations of greater arity. The dimension
|
||
of nominal set X is the maximal size of supp(x), for any x ∈ X. Every orbit-finite set
|
||
has finite dimension.
|
||
We define 𝔸(k) = {(a1 , …, ak ) | ai ≠ aj for i ≠ j}. For every single-orbit nominal
|
||
set X with dimension k, there is a surjective equivariant map
|
||
fX : 𝔸(k) → X.
|
||
|
||
This map can be used to get an upper bound for the number of orbits of X1 × … × Xn ,
|
||
for Xi a nominal set with li orbits and dimension ki . Suppose Oi is an orbit of Xi .
|
||
Then we have a surjection
|
||
fO1 × ⋯ × fOn : 𝔸(ki ) × ⋯ × 𝔸(kn ) → O1 × ⋯ × On
|
||
|
||
stipulating that the codomain cannot have more orbits than the domain. Let f𝔸 ({ki })
|
||
denote the number of orbits of 𝔸(k1 ) × ⋯ × 𝔸(kn ) , for any finite sequence of natural
|
||
numbers {ki }. We can form at most l = l1 l2 …ln tuples of the form O1 × ⋯ × On , so
|
||
X1 × ⋯ × Xn has at most lf𝔸 (k1 , …, kn ) orbits.
|
||
For X single-orbit, the local symmetries are defined by the group
|
||
{g ∈ Sk | fX (x1 , …, xk ) = fX (xg(1) , …, xg(k) ) for all xi ∈ X},
|
||
|
||
where k is the dimension of X and Sk is the symmetric group of permutations over k
|
||
distinct elements.
|
||
NFAs on sets have a finite state space. We can define nominal NFAs, with the
|
||
requirement that the state space is orbit-finite and the transition relation is equivariant.
|
||
A nominal NFA is a tuple (Q, A, Q0 , F, δ), where:
|
||
– Q is an orbit-finite nominal set of states;
|
||
– A is an orbit-finite nominal alphabet;
|
||
– Q0 , F ⊆ Q are equivariant subsets of initial and final states;
|
||
– δ ⊆ Q × A × Q is an equivariant transition relation.
|
||
A nominal DFA is a special case of nominal NFA where Q0 = {q0 } and the transition
|
||
relation is an equivariant function δ : Q × A → Q. Equivariance here can be rephrased
|
||
as requiring δ(π ⋅ q, π ⋅ a) = π ⋅ δ(q, a). In most examples we take the alphabet to be
|
||
A = 𝔸, but it can be any orbit-finite nominal set. For instance, A = Act × 𝔸, where Act
|
||
is a finite set of actions, represents actions act(x) with one parameter x ∈ 𝔸 (actions
|
||
with arity n can be represented via n-fold products of 𝔸).
|
||
A language ℒ is nominal regular if it is recognised by a nominal DFA. The theory
|
||
of nominal regular languages recasts the classical one using nominal concepts. A
|
||
nominal Myhill-Nerode-style syntactic congruence is defined: w, w′ ∈ A∗ are equivalent
|
||
w.r.t. ℒ, written w ≡ℒ w′ , whenever
|
||
|
||
86
|
||
|
||
Chapter 5
|
||
wv ∈ ℒ ⟺ w′ v ∈ ℒ
|
||
|
||
for all v ∈ A∗ . This relation is equivariant and the set of equivalence classes [w]ℒ is a
|
||
nominal set.
|
||
Theorem 1. (Myhill-Nerode theorem for nominal sets by Bojańczyk, et al., 2014)
|
||
Let ℒ be a nominal language. The following conditions are equivalent:
|
||
1. the set of equivalence classes of ≡ℒ is orbit-finite;
|
||
2. ℒ is recognised by a nominal DFA.
|
||
Unlike what happens for ordinary regular languages, nominal NFAs and nominal
|
||
DFAs are not equi-expressive. Here is an example of a language accepted by a nominal
|
||
NFA, but not by a nominal DFA:
|
||
ℒeq = {a1 …an | ai = aj , for some i < j ∈ {1, …, n}} .
|
||
|
||
In the theory of nominal regular languages, several problems are decidable: Language
|
||
inclusion and minimality test for nominal DFAs. Moreover, orbit-finite nominal sets
|
||
can be finitely-represented, and so can be manipulated by algorithms. This is the key
|
||
idea underpinning our implementation.
|
||
|
||
2.1 Different Atom Symmetries
|
||
An important advantage of nominal set theory as considered by Bojańczyk, et al.
|
||
(2014) is that it retains most of its properties when the structure of atoms 𝔸 is replaced with an arbitrary infinite relational structure subject to a few model-theoretic
|
||
assumptions. An example alternative structure of atoms is the total order of rational
|
||
numbers (ℚ, <), with the group of monotone bijections of ℚ taking the role of the
|
||
group of all permutations. The theory of nominal automata remains similar, and an
|
||
example nominal language over the atoms (ℚ, <) is:
|
||
{a1 …an | ai ≤ aj , for some i < j ∈ {1, …, n}}
|
||
|
||
which is recognised by a nominal DFA over those atoms.
|
||
To simplify the presentation, in this chapter we concentrate on the “equality atoms”
|
||
only. However, both the theory and the implementation can be generalised to other
|
||
atom structures, with the “ordered atoms” (ℚ, <) as the simplest other example. We
|
||
investigate the total order symmetry (ℚ, <) in Chapter 6.
|
||
|
||
3 Angluin’s Algorithm for Nominal DFAs
|
||
In our algorithm, we will assume a teacher as described at the start of Section 1. In
|
||
particular, the teacher is able to answer membership queries and equivalence queries,
|
||
|
||
Learning Nominal Automata
|
||
|
||
87
|
||
|
||
now in the setting of nominal languages. We fix a target language ℒ, which is assumed
|
||
to be a nominal regular language.
|
||
The learning algorithm for nominal automata, νL∗ , will be very similar to L∗ in
|
||
Algorithm 5.1. In fact, we only change the following lines:
|
||
6’
|
||
11’
|
||
16’
|
||
|
||
S ← S ∪ orb(sa)
|
||
E ← E ∪ orb(ae)
|
||
S ← S ∪ pref(orb(t))
|
||
|
||
(5.1)
|
||
|
||
The basic data structure is an observation table (S, E, T ) where S and E are orbit-finite
|
||
subsets of A∗ and T : S∪S⋅A×E → 2 is an equivariant function defined by T (s, e) = ℒ(se)
|
||
for each s ∈ S ∪ S⋅A and e ∈ E. Since T is determined by ℒ we omit it from the notation.
|
||
Let row : S∪S⋅A → 2E denote the curried counterpart of T. Let u ∼ v denote the relation
|
||
row(u) = row(v).
|
||
Definition 2. The table is called closed if for each t ∈ S⋅A there is a s ∈ S with t ∼ s.
|
||
The table is called consistent if for each pair s1 , s2 ∈ S with s1 ∼ s2 we have s1 a ∼ s2 a
|
||
for all a ∈ A.
|
||
The above definitions agree with the abstract definitions given by Jacobs and Silva
|
||
(2014) and we may use some of their results implicitly. The intuition behind the
|
||
definitions is as follows. Closedness assures us that for each state we have a successor
|
||
state for each input. Consistency assures us that each state has at most one successor
|
||
for each input. Together it allows us to construct a well-defined minimal automaton
|
||
from the observations in the table.
|
||
The algorithm starts with a trivial observation table and tries to make it closed and
|
||
consistent by adding orbits of rows and columns, filling the table via membership
|
||
queries. When the table is closed and consistent it constructs a hypothesis automaton
|
||
and poses an equivalence query.
|
||
The pseudocode for the nominal version is the same as listed in Algorithm 5.1,
|
||
modulo the changes displayed in (5.1). However, we have to take care to ensure that
|
||
all manipulations and tests on the (possibly) infinite sets S, E and A terminate in finite
|
||
time. We refer to Bojańczyk, et al. (2014) and Pitts (2013) for the full details on how
|
||
to represent these structures and provide a brief sketch here. The sets S, E, A and S⋅A
|
||
can be represented by choosing a representative for each orbit. The function T in turn
|
||
can be represented by cells Ti,j : orb(si ) × orb(ej ) → 2 for each representative si and
|
||
ej . Note, however, that the product of two orbits may consist of several orbits, so that
|
||
Ti,j is not a single boolean value. Each cell is still orbit-finite and can be filled with
|
||
only finitely many membership queries. Similarly the curried function row can be
|
||
represented by a finite structure.
|
||
To check whether the table is closed, we observe that if we have a corresponding
|
||
row s ∈ S for some t ∈ S⋅A, this holds for any permutation of t. Hence it is enough
|
||
to check the following: For all representatives t ∈ S⋅A there is a representative s ∈ S
|
||
|
||
88
|
||
|
||
Chapter 5
|
||
|
||
with row(t) = π ⋅ row(s) for some permutation π. Note that we only have to consider
|
||
finitely many permutations, since the support is finite and so we can decide this
|
||
property. Furthermore, if the property does not hold, we immediately find a witness
|
||
represented by t.
|
||
Consistency is a bit more complicated, but it is enough to consider the set of
|
||
inconsistencies, {(s1 , s2 , a, e) | row(s1 ) = row(s2 ) ∧ row(s1 a)(e) ≠ row(s2 a)(e)}. It is
|
||
an equivariant subset of S × S × A × E and so it is orbit-finite. Hence we can decide
|
||
emptiness and obtain representatives if it is non-empty.
|
||
Constructing the hypothesis happens in the same way as before (Section 1), where
|
||
we note the state space is orbit-finite since it is a quotient of S. Moreover, the function
|
||
row is equivariant, so all structure (Q0 , F and δ) is equivariant as well.
|
||
The representation given above is not the only way to represent nominal sets. For
|
||
example, first-order definable sets can be used as well (Klin & Szynwelski, 2016). From
|
||
now on we assume to have set theoretic primitives so that each line in Algorithm 5.1
|
||
is well defined.
|
||
|
||
3.1 Correctness
|
||
To prove correctness we only have to prove that the algorithm terminates, that is,
|
||
only finitely many hypotheses will be produced. Correctness follows trivially from
|
||
termination since the last step of the algorithm is an equivalence query to the teacher
|
||
inquiring whether an hypothesis automaton accepts the target language. We start out
|
||
by listing some facts about observation tables.
|
||
Lemma 3. The relation ∼ is an equivariant equivalence relation. Furthermore, for all
|
||
u, v ∈ S we have that u ≡ℒ v implies u ∼ v.
|
||
This lemma implies that at any stage of the algorithm the number of orbits of S/∼ does
|
||
not exceed the number of orbits of the minimal acceptor with state space A∗ /≡ℒ (recall
|
||
that ≡ℒ is the nominal Myhill-Nerode equivalence relation). Moreover, the following
|
||
lemma shows that the dimension of the state space never exceeds the dimension of
|
||
the minimal acceptor. Recall that the dimension is the maximal size of the support of
|
||
any state, which is different than the number of orbits.
|
||
Lemma 4.
|
||
|
||
We have supp([u]∼ ) ⊆ supp([u]≡ℒ ) ⊆ supp(u) for all u ∈ S.
|
||
|
||
Lemma 5. The automaton constructed from a closed and consistent table is minimal.
|
||
Proof. Follows from the categorical perspective by Jacobs and Silva (2014).
|
||
|
||
□
|
||
|
||
We note that the constructed automaton is consistent with the table (we use that the
|
||
set S is prefix-closed and E is suffix-closed (Angluin, 1987)). The following lemma
|
||
shows that there are no strictly “smaller” automata consistent with the table. So the
|
||
automaton is not just minimal, it is minimal w.r.t. the table.
|
||
|
||
Learning Nominal Automata
|
||
|
||
89
|
||
|
||
Lemma 6. Let H be the automaton associated with a closed and consistent table (S, E).
|
||
If M′ is an automaton consistent with (S, E) (meaning that se ∈ ℒ(M′ ) ⟺ se ∈ ℒ(H)
|
||
for all s ∈ S ∪ S⋅A and e ∈ E) and M′ has at most as many orbits as H, then there is a
|
||
surjective map f : QM′ → QH . If moreover
|
||
– M′ s dimension is bounded by the dimension of H, i.e., supp(m) ⊆ supp(f(m)) for
|
||
all m ∈ Q′M , and
|
||
– M′ has no fewer local symmetries than H, i.e., π ⋅ f(m) = f(m) implies π ⋅ m = m
|
||
for all m ∈ Q′M ,
|
||
then f defines an isomorphism M′ ≅ H of nominal DFAs.
|
||
Proof. (All maps in this proof are equivariant.) Define a map row′ : Q′M → 2E by
|
||
∗
|
||
restricting the language map Q′M → 2A to E. First, observe that row′ (δ′ (q′0 , s)) =
|
||
row(s) for all s ∈ S ∪ S⋅A, since ϵ ∈ E and M′ is consistent with the table. Second, we
|
||
have {row′ (δ′ (q′0 , s)) | s ∈ S} ⊆ {row′ (q) | q ∈ M′ }.
|
||
Let n be the number of orbits of H. The former set has n orbits by the first observation, the latter set has at most n orbits by assumption. We conclude that the two sets
|
||
(both being equivariant) must be equal. That means that for each q ∈ M′ there is a
|
||
s ∈ S such that row′ (q) = row(s). We see that row′ : M′ → {row′ (δ′ (q′0 , s)) | s ∈ S} = H
|
||
is a surjective map. Since a surjective map cannot increase the dimensions of orbits
|
||
and the dimensions of M′ are bounded, we note that the dimensions of the orbits in H
|
||
and M′ have to agree. Similarly, surjective maps preserve local symmetries. This map
|
||
must hence be an isomorphism of nominal sets. Note that row′ (q) = row′ (δ′ (q′0 , s))
|
||
implies q = δ′ (q′0 , s).
|
||
It remains to prove that it respects the automaton structures. It preserve the initial
|
||
state: row′ (q′0 ) = row(δ′ (q′0 , ϵ)) = row(ϵ). Now let q ∈ M′ be a state and s ∈ S such
|
||
that row′ (q) = row(s). It preserves final states: q ∈ F′ ⟺ row′ (q)(ϵ) = 1 ⟺
|
||
row(s)(ϵ) = 1. Finally, it preserves the transition structure:
|
||
row′ (δ′ (q, a)) = row′ (δ′ (δ′ (q′0 , s), a)) = row′ (δ′ (q′0 , sa)) = row(sa) = δ(row(s), a)
|
||
□
|
||
|
||
The above proof is an adaptation of Angluin’s proof for automata over sets. We will
|
||
now prove termination of the algorithm by proving that all steps are productive.
|
||
Theorem 7.
|
||
|
||
The algorithm terminates and is hence correct.
|
||
|
||
Proof. Provided that the if-statements and set operations terminate, we are left proving
|
||
that the algorithm adds (orbits of) rows and columns only finitely often. We start by
|
||
proving that a table can be made closed and consistent in finite time.
|
||
If the table is not closed, we find a row s1 ∈ S⋅A such that row(s1 ) ≠ row(s) for all
|
||
s ∈ S. The algorithm then adds the orbit containing s1 to S. Since s1 was nonequivalent
|
||
to all rows, we find that S ∪ orb(t)/∼ has strictly more orbits than S/∼. Since orbits of
|
||
S/∼ cannot be more than those of A∗ /≡ℒ , this happens finitely often.
|
||
|
||
90
|
||
|
||
Chapter 5
|
||
|
||
Columns are added in case of an inconsistency. Here the algorithm finds two
|
||
elements s1 , s2 ∈ S with row(s1 ) = row(s2 ) but row(s1 ae) ≠ row(s2 ae) for some a ∈ A
|
||
and e ∈ E. Adding ae to E will ensure that row′ (s1 ) ≠ row′ (s2 ) (row′ is the function
|
||
belonging to the updated observation table). If the two elements row′ (s1 ), row′ (s2 )
|
||
are in different orbits, the number of orbits is increased. If they are in the same orbit,
|
||
we have row′ (s2 ) = π ⋅ row′ (s1 ) for some permutation π. Using row(s1 ) = row(s2 )
|
||
and row′ (s1 ) ≠ row′ (s2 ) we have:
|
||
row(s1 ) = π ⋅ row(s1 )
|
||
|
||
row′ (s1 ) ≠ π ⋅ row′ (s1 )
|
||
|
||
Consider all such π and suppose there is a π and x ∈ supp(row(s1 )) such that π ⋅ x ∉
|
||
supp(row(s1 )). Then we find that π ⋅ x ∈ supp(row′ (s1 )), and so the support of the
|
||
row has grown. By Lemma 4 this happens finitely often. Suppose such π and x do not
|
||
exist, then we consider the finite group R = {ρ|supp([s1 ]∼ ) | row(s1 ) = ρ ⋅ row(s1 )}. We
|
||
see that {ρ|supp([s1 ]∼ ) | row′ (s1 ) = ρ ⋅ row′ (s1 )} is a proper subgroup of R. So, adding
|
||
a column in this case decreases the size of the group R, which can happen only finitely
|
||
often. In this case a local symmetry is removed.
|
||
In short, the algorithm will succeed in producing a hypothesis in each round. It
|
||
remains to prove that it needs only finitely many equivalence queries.
|
||
Let (S, E) be the closed and consistent table and H its corresponding hypothesis.
|
||
If it is incorrect, then a second hypothesis H′ will be constructed which is consistent
|
||
with the old table (S, E). The two hypotheses are nonequivalent, as H′ will handle
|
||
the counterexample correctly and H does not. Therefore, H′ will have at least one
|
||
orbit more, one local symmetry less, or one orbit will have strictly bigger dimension
|
||
(Lemma 6), all of which can only happen finitely often.
|
||
□
|
||
We remark that all the lemmas and proofs as above are close to the original ones of
|
||
Angluin. However, two things are crucially different. First, adding a column does
|
||
not always increase the number of (orbits of) states. It can happen that by adding
|
||
a column a bigger support is found or that a local symmetry is broken. Second, the
|
||
new hypothesis does not necessarily have more states, again it might have bigger
|
||
dimensions or less local symmetries.
|
||
From the proof Theorem 7 we observe moreover that the way we handle counterexamples is not crucial. Any other method which ensures a nonequivalent hypothesis
|
||
will work. In particular our algorithm is easily adapted to include optimisations
|
||
such as the ones by Maler and Pnueli (1995) and Rivest and Schapire (1993), where
|
||
counterexamples are added as columns.17
|
||
|
||
17
|
||
|
||
The additional optimisation of omitting the consistency check (Rivest & Schapire, 1993) cannot be done:
|
||
we always add a whole orbit to S (to keep the set equivariant) and inconsistencies can arise within an orbit.
|
||
|
||
Learning Nominal Automata
|
||
|
||
x
|
||
q0
|
||
|
||
x
|
||
x
|
||
z
|
||
|
||
q1,x
|
||
y
|
||
|
||
y
|
||
|
||
q2,x,y
|
||
|
||
T1
|
||
|
||
ϵ
|
||
|
||
ϵ
|
||
a
|
||
ab
|
||
aa
|
||
aba
|
||
abb
|
||
abc
|
||
|
||
0
|
||
0
|
||
1
|
||
0
|
||
0
|
||
0
|
||
1
|
||
|
||
T2
|
||
ϵ
|
||
|
||
ϵ a′
|
||
0 0
|
||
|
||
a
|
||
|
||
0
|
||
|
||
1
|
||
{0
|
||
|
||
if a′ ≠ a
|
||
else
|
||
|
||
ab
|
||
|
||
1
|
||
|
||
1
|
||
{0
|
||
|
||
if a′ ≠ a,b
|
||
else
|
||
|
||
91
|
||
|
||
aa
|
||
|
||
0 0
|
||
aba 0 0
|
||
abb 0
|
||
|
||
1
|
||
{0
|
||
|
||
if a′ ≠ a
|
||
else
|
||
|
||
abc 1
|
||
|
||
1
|
||
{0
|
||
|
||
if a′ ≠ a,b
|
||
else
|
||
|
||
T3
|
||
ϵ
|
||
|
||
ϵ a′
|
||
0 0
|
||
|
||
a
|
||
|
||
0
|
||
|
||
1
|
||
{0
|
||
|
||
if a′ ≠ a
|
||
else
|
||
|
||
ab
|
||
|
||
1
|
||
|
||
1
|
||
{0
|
||
|
||
if a′ ≠ a,b
|
||
1 if (b′ ≠ a,b ∧ a′ ≠ a,b) ∨ (b′ = b ∧ a′ ≠ a)
|
||
else
|
||
{ 0 else
|
||
|
||
b′ a′
|
||
1
|
||
|
||
aa
|
||
|
||
0 0
|
||
aba 0 0
|
||
|
||
1
|
||
{0
|
||
|
||
if a ≠ a′ ,b′
|
||
else
|
||
|
||
1
|
||
1
|
||
|
||
abb 0
|
||
|
||
1
|
||
{0
|
||
|
||
if a′ ≠ a
|
||
else
|
||
|
||
1
|
||
{0
|
||
|
||
if a ≠ a′ ,b′
|
||
else
|
||
|
||
abc 1
|
||
|
||
1
|
||
{0
|
||
|
||
if a′ ≠ a,b
|
||
1 if (b′ ≠ a,b ∧ a′ ≠ a,b) ∨ (b′ = b ∧ a′ ≠ a)
|
||
else
|
||
{ 0 else
|
||
|
||
Figure 5.1 Example automaton to be learnt and three subsequent tables computed by νL∗ . In the automaton, x, y, z denote distinct atoms.
|
||
|
||
92
|
||
|
||
Chapter 5
|
||
|
||
3.2 Example
|
||
Consider the target automaton in Figure 5.1 and an observation table T1 at some stage
|
||
during the algorithm. We remind the reader that the table is represented in a symbolic
|
||
way: The sequences in the rows and columns stand for whole orbits and the cells
|
||
denote functions from the product of the orbits to 2. Since the cells can consist of
|
||
multiple orbits, where each orbit is allowed to have a different value, we use a formula
|
||
to specify which orbits have a 1.
|
||
The table T1 has to be checked for closedness and consistency. We note that it is
|
||
definitely closed. For consistency we check the rows row(ϵ) and row(a) which are
|
||
equal. Observe, however, that row(ϵb)(ϵ) = 0 and row(ab)(ϵ) = 1, so we have an
|
||
inconsistency. The algorithm adds the orbit orb(b) as column and extends the table,
|
||
obtaining T2 . We note that, in this process, the number of orbits did grow, as the two
|
||
rows are split. Furthermore, we see that both row(a) and row(ab) have empty support
|
||
in T1 , but not in T2 , because row(a)(a′ ) depends on a′ being equal or different from a,
|
||
similarly for row(ab)(a′ ).
|
||
The table T2 is still not consistent as we see that row(ab) = row(ba) but row(abb)(c) =
|
||
1 and row(bab)(c) = 0. Hence the algorithm adds the columns orb(bc), obtaining
|
||
table T3 . We note that in this case, no new orbits are obtained and no support has
|
||
grown. In fact, the only change here is that the local symmetry between row(ab) and
|
||
row(ba) is removed. This last table, T3 , is closed and consistent and will produce the
|
||
correct hypothesis.
|
||
|
||
3.3 Query Complexity
|
||
In this section, we will analyse the number of queries made by the algorithm in the
|
||
worst case. Let M be the minimal target automaton with n orbits and of dimension k.
|
||
We will use log in base two.
|
||
Lemma 8.
|
||
|
||
The number of equivalence queries En,k is 𝒪(nk log k).
|
||
|
||
Proof. By Lemma 6 each hypothesis will be either 1) bigger in the number of orbits,
|
||
which is bounded by n, or 2) bigger in the dimension of an orbit, which is bounded
|
||
by k or 3) smaller in local symmetries of an orbit. For the last part we want to know
|
||
how long a subgroup series of the permutation group Sk can be. This is bounded by
|
||
the number of divisors of k!, as each subgroup divides the order of the group. We
|
||
can easily bound the number of divisors of any m by log m and so one can at take a
|
||
subgroup at most k log k times when starting with Sk .18
|
||
18
|
||
|
||
After publication we found a better bound by Cameron, et al. (1989): the length of the longest chain of
|
||
subgroups of Sk is ⌈ 32 k⌉ − b(k) − 1, where b(k) is the number of ones in the binary representation of k. This
|
||
gives a linear bound in k, instead of the ‘linearithmic’ bound.
|
||
|
||
Learning Nominal Automata
|
||
|
||
93
|
||
|
||
Since the hypothesis will grow monotonically in the number of orbits and for each
|
||
orbit will grow monotonically w.r.t. the remaining two dimensions, the number of
|
||
equivalence queries is bound by n + n(k + k log k).
|
||
□
|
||
Next we will give a bound for the size of the table.
|
||
Lemma 9. The table has at most n + mEn,k orbits in S with sequences of at most
|
||
length n+m, where m is the length of the longest counter example given by the teacher.
|
||
The table has at most n(k + k log k + 1) orbits in E of at most length n(k + k log k + 1)
|
||
Proof. In the termination proof we noted that rows are added at most n times. In
|
||
addition (all prefixes of) counter examples are added as rows which add another
|
||
mEn,k rows. Obviously counter examples are of length at most m and are extended
|
||
at most n times, making the length at most m + n in the worst case.
|
||
For columns we note that one of three dimensions approaches a bound similarly
|
||
to the proof of Lemma 8. So at most n(k + k log k + 1) columns are added. Since they
|
||
are suffix closed, the length is at most n(k + k log k + 1).
|
||
□
|
||
Let p and l denote respectively the dimension and the number of orbits of A.
|
||
Lemma 10. The number of orbits in the lower part of the table, S⋅A, is bounded by
|
||
(n + mEn,k )lf𝔸 (p(n + m), p).
|
||
Proof. Any sequence in S is of length at most n + m, so it contains at most p(n + m)
|
||
distinct atoms. When we consider S⋅A, the extension can either reuse atoms from
|
||
those p(n + m), or none at all. Since the extra letter has at most p distinct atoms, the set
|
||
𝔸(p(n+m)) × 𝔸(p) gives a bound f𝔸 (p(n + m), p) for the number of orbits of OS × OA ,
|
||
with OX an orbit of X. Multiplying by the number of such ordered pairs, namely
|
||
(n + mEn,k )l, gives a bound for S⋅A.
|
||
□
|
||
Let Cn,k,m = (n+mEn,k )(lf𝔸 (p(n+m), p)+1)n(k+k log k+1) be the maximal number
|
||
of cells in the table. We note that this number is polynomial in k, l, m and n but it is
|
||
not polynomial in p.
|
||
Corollary 11.
|
||
|
||
The number of membership queries is bounded by
|
||
Cn,k,m f𝔸 (p(n + m), pn(k + k log k + 1)).
|
||
|
||
4 Learning Non-Deterministic Nominal Automata
|
||
In this section, we introduce a variant of νL∗ , which we call νNL∗ , where the learnt
|
||
automaton is non-deterministic. It will be based on the NL∗ algorithm by Bollig, et
|
||
al. (2009), an Angluin-style algorithm for learning NFAs. The algorithm is shown
|
||
|
||
94
|
||
|
||
Chapter 5
|
||
|
||
in Algorithm 5.2. We first illustrate NL∗ , then we discuss its extension to nominal
|
||
automata.
|
||
NL∗ crucially relies on the use of residual finite-state automata (RFSA) (Denis, et
|
||
al., 2002), which are NFAs admitting unique minimal canonical representatives. The
|
||
states of this automaton correspond to Myhill-Nerode right-congruence classes, but
|
||
can be exponentially smaller than the corresponding minimal DFA: Composed states,
|
||
language-equivalent to sets of other states, can be dropped.
|
||
|
||
1
|
||
2
|
||
3
|
||
4
|
||
5
|
||
6
|
||
7
|
||
8
|
||
9
|
||
10
|
||
11
|
||
12
|
||
13
|
||
14
|
||
15
|
||
16
|
||
17
|
||
18
|
||
19
|
||
|
||
S, E ← {ϵ}
|
||
|
||
repeat
|
||
while (S, E) is not RFSA-closed or not RFSA-consistent do
|
||
if (S, E) is not RFSA-closed then
|
||
find s ∈ S, a ∈ A such that row(sa) ∈ PR(S, E) ∖ PR⊤ (S, E)
|
||
S ← S ∪ {sa}
|
||
|
||
end if
|
||
if (S, E) is not RFSA-consistent then
|
||
find s1 , s2 ∈ S, a ∈ A, and e ∈ E such that
|
||
row(s1 ) ⊑ row(s2 ) and ℒ(s1 ae) = 1, ℒ(s2 ae) = 0
|
||
E ← E ∪ {ae}
|
||
|
||
end if
|
||
end while
|
||
Make the conjecture N(S, E)
|
||
if the Teacher replies no, with a counter-example t then
|
||
E ← E ∪ suff(t)
|
||
|
||
end if
|
||
until the Teacher replies yes to the conjecture N(S, E)
|
||
return N(S, E)
|
||
Algorithm 5.2
|
||
|
||
Algorithm for learning NFAs by Bollig, et al. (2009).
|
||
|
||
The algorithm NL∗ equips the observation table (S, E) with a union operation, allowing
|
||
for the detection of composed and prime rows.
|
||
Definition 12. Let (row(s1 ) ⊔ row(s2 ))(e) = row(s1 )(e) ∨ row(s2 )(e) (regarding cells
|
||
as booleans). This operation induces an ordering between rows: row(s1 ) ⊑ row(s2 )
|
||
whenever row(s1 )(e) = 1 implies row(s2 )(e) = 1, for all e ∈ E.
|
||
A row row(s) is composed if row(s) = row(s1 ) ⊔ ⋯ ⊔ row(sn ), for row(si ) ≠ row(s).
|
||
Otherwise it is prime. We denote by PR⊤ (S, E) the rows in the top part of the table
|
||
(ranging over S) which are prime w.r.t. the whole table (not only w.r.t. the top part).
|
||
We write PR(S, E) for all the prime rows of (S, E).
|
||
|
||
Learning Nominal Automata
|
||
|
||
95
|
||
|
||
As in L∗ , states of hypothesis automata will be rows of (S, E) but, as the aim is to
|
||
construct a minimal RFSA, only prime rows are picked. New notions of closedness
|
||
and consistency are introduced, to reflect features of RFSAs.
|
||
Definition 13. A table (S, E) is:
|
||
– RFSA-closed if, for all t ∈ S⋅A, row(t) = ⨆{row(s) ∈ PR⊤ (S, E) | row(s) ⊑ row(t)};
|
||
– RFSA-consistent if, for all s1 , s2 ∈ S and a ∈ A, row(s1 )⊑row(s2 ) implies row(s1 a)⊑
|
||
row(s2 a).
|
||
If (S, E) is not RFSA-closed, then there is a row in the bottom part of the table which is
|
||
prime, but not contained in the top part. This row is then added to S (line 5). If (S, E) is
|
||
not RFSA-consistent, then there is a suffix which does not preserve the containment of
|
||
two existing rows, so those rows are actually incomparable. A new column is added to
|
||
distinguish those rows (line 10). Notice that counterexamples supplied by the teacher
|
||
are added to columns (line 16). Indeed, it is shown by Bollig, et al. (2009) that treating
|
||
the counterexamples as in the original L∗ , namely adding them to rows, does not lead
|
||
to a terminating algorithm.
|
||
Definition 14. Given a RFSA-closed and RFSA-consistent table (S, E), the conjecture
|
||
automaton is N(S, E) = (Q, Q0 , F, δ), where:
|
||
– Q = PR⊤ (S, E);
|
||
– Q0 = {r ∈ Q | r ⊑ row(ϵ)};
|
||
– F = {r ∈ Q | r(ϵ) = 1};
|
||
– the transition relation is given by δ(row(s), a) = {r ∈ Q | r ⊑ row(sa)}.
|
||
As observed by Bollig, et al. (2009), N(S, E) is not necessarily a RFSA, but it is a
|
||
canonical RFSA if it is consistent with (S, E). If the algorithm terminates, then N(S, E)
|
||
must be consistent with (S, E), which ensures correctness. The termination argument
|
||
is more involved than that of L∗ , but still it relies on the minimal DFA.
|
||
Developing an algorithm to learn nominal NFAs is not an obvious extension of
|
||
NL∗ : Non-deterministic nominal languages strictly contain nominal regular languages,
|
||
so it is not clear what the developed algorithm should be able to learn. To deal with
|
||
this, we introduce a nominal notion of RFSAs. They are a proper subclass of nominal
|
||
NFAs, because they recognise nominal regular languages. Nonetheless, they are more
|
||
succinct than nominal DFAs.
|
||
|
||
4.1 Nominal Residual Finite-State Automata
|
||
Let ℒ be a nominal language and u be a finite string. The derivative of ℒ w.r.t. u is
|
||
u−1 ℒ = {v ∈ A∗ | uv ∈ ℒ}.
|
||
|
||
A language ℒ′ ⊆ 𝔸∗ is a residual of ℒ if there is u with ℒ′ = u−1 ℒ. Note that a residual
|
||
might not be equivariant, but it does have a finite support. We write R(ℒ) for the set of
|
||
|
||
96
|
||
|
||
Chapter 5
|
||
|
||
residuals of ℒ. Residuals form an orbit-finite nominal set: They are in bijection with
|
||
the state-space of the minimal nominal DFA for ℒ.
|
||
A nominal residual finite-state automaton for ℒ is a nominal NFA whose states are
|
||
subsets of such minimal automaton. Given a state q of an automaton, we write ℒ(q)
|
||
for the set of words leading from q to a set of states containing a final one.
|
||
Definition 15. A nominal residual finite-state automaton (nominal RFSA) is a nominal
|
||
NFA 𝒜 such that ℒ(q) ∈ R(ℒ(𝒜)), for all q ∈ Q𝒜 .
|
||
Intuitively, all states of a nominal RSFA recognise residuals, but not all residuals are
|
||
recognised by a single state: There may be a residual ℒ′ and a set of states Q′ such
|
||
that ℒ′ = ⋃q∈Q′ ℒ(q), but no state q′ is such that ℒ(q′ ) = ℒ′ . A residual ℒ′ is called
|
||
composed if it is equal to the union of the components it strictly contains, explicitly
|
||
ℒ′ = ∪{ℒ″ ∈ R(ℒ) | ℒ″ ⊊ ℒ′ };
|
||
|
||
otherwise it is called prime. In an ordinary RSFA, composed residuals have finitelymany components. This is not the case in a nominal RFSA. However, the set of components of ℒ′ always has a finite support, namely supp(ℒ′ ).
|
||
The set of prime residuals PR(ℒ) is an orbit-finite nominal set, and can be used
|
||
to define a canonical nominal RFSA for ℒ, which has the minimal number of states
|
||
and the maximal number of transitions. This can be regarded as obtained from the
|
||
minimal nominal DFA, by removing composed states and adding all initial states and
|
||
transitions that do not change the recognised language. This automaton is necessarily
|
||
unique.
|
||
Lemma 16. Let the canonical nominal RSFA of ℒ be (Q, Q0 , F, δ) such that:
|
||
– Q = PR(ℒ);
|
||
– Q0 = {ℒ′ ∈ Q | ℒ′ ⊆ ℒ};
|
||
– F = {ℒ′ ∈ Q | ϵ ∈ ℒ′ };
|
||
– δ(ℒ1 , a) = {ℒ2 ∈ Q | ℒ2 ⊆ a−1 ℒ1 }.
|
||
It is a well-defined nominal NFA accepting ℒ.
|
||
|
||
4.2 νNL∗
|
||
Our nominal version of NL∗ again makes use of an observation table (S, E) where S
|
||
and E are equivariant subsets of A∗ and row is an equivariant function. As in the basic
|
||
algorithm, we equip (S, E) with a union operation ⊔ and row containment relation ⊑,
|
||
defined as in Definition 12. It is immediate to verify that ⊔ and ⊑ are equivariant.
|
||
Our algorithm is a simple modification of the algorithm in Algorithm 5.2, where a
|
||
few lines are replaced:
|
||
|
||
Learning Nominal Automata
|
||
6’
|
||
11’
|
||
16’
|
||
|
||
97
|
||
|
||
S ← S ∪ orb(sa)
|
||
E ← E ∪ orb(ae)
|
||
E ← E ∪ suff(orb(t))
|
||
|
||
Switching to nominal sets, several decidability issues arise. The most critical one
|
||
is that rows may be the union of infinitely many component rows, as happens for
|
||
residuals of nominal languages, so finding all such components can be challenging.
|
||
We adapt the notion of composed to rows: row(t) is composed whenever
|
||
row(t) = ⨆{row(s) | row(s) ⊏ row(t)}.
|
||
|
||
where ⊏ is strict row inclusion; otherwise row(t) is prime.
|
||
We now check that three relevant parts of our algorithm terminate.
|
||
1. Row containment check. The basic containment check row(s) ⊑ row(t) is decidable, as row(s) and row(t) are supported by the finite supports of s and t respectively.
|
||
2. RFSA-Closedness and RFSA-Consistency Checks. (Line 3)
|
||
We first show that prime rows form orbit-finite nominal sets.
|
||
Lemma 17. PR(S, E), PR⊤ (S, E) and PR(S, E) ∖ PR⊤ (S, E) are orbit-finite nominal sets.
|
||
Consider now RFSA-closedness. It requires computing the set C(row(t)) of components of row(t) contained in PR⊤ (S, E) (possibly including row(t)). This may not be
|
||
equivariant under permutations Perm(𝔸), but it is if we pick a subgroup.
|
||
Lemma 18. The set C(row(t)) has the following properties:
|
||
– supp(C(row(t))) ⊆ supp(row(t)).
|
||
– it is equivariant and orbit-finite under the action of the group
|
||
Gt = {π ∈ Perm(𝔸) | π|supp(row(t)) = id}
|
||
|
||
of permutations fixing supp(row(t)).
|
||
We established that C(row(t)) can be effectively computed, and the same holds for
|
||
⨆ C(row(t)). In fact, ⨆ is equivariant w.r.t. the whole Perm(𝔸) and then, in particular,
|
||
w.r.t. Gt , so it preserves orbit-finiteness. Now, to check row(t) = ⨆ C(row(t)), we can
|
||
just pick one representative of every orbit of S⋅A, because we have C(π ⋅ row(t)) =
|
||
π ⋅ C(row(t)) and permutations distribute over ⊔, so permuting both sides of the
|
||
equation gives again a valid equation.
|
||
For RFSA-consistency, consider the two sets
|
||
N = {(s1 , s2 ) ∈ S × S | row(s1 ) ⊑ row(s2 )}, and
|
||
M = {(s1 , s2 ) ∈ S × S | ∀a ∈ A : row(s1 a) ⊑ row(s2 a)}.
|
||
|
||
98
|
||
|
||
Chapter 5
|
||
|
||
They are both orbit-finite nominal sets, by equivariance of row, ⊑ and A. We can check
|
||
RFSA-consistency in finite time by picking orbit representatives from N and M. For
|
||
each representative n ∈ N, we look for a representative m ∈ M and a permutation π
|
||
such that n = π ⋅ m. If no such m and π exist, then n does not belong to any orbit of
|
||
M, so it violates RFSA-consistency.
|
||
3. Finding Witnesses for Violations. (Lines 5 and 10) We can find witnesses by
|
||
comparing orbit representatives of orbit-finite sets, as we did with RFSA-consistency.
|
||
Specifically, we can pick representatives in S × A and S × S × A × E and check them
|
||
against the following orbit-finite nominal sets:
|
||
– {(s, a) ∈ S × A | row(sa) ∈ PR(S, E) ∖ PR⊤ (S, E)};
|
||
– {(s1 , s2 , a, e) ∈ S × S × A × E | row(s1 a)(e) = 1, row(s2 a)(e) = 0, row(s1 ) ⊑ row(s2 )};
|
||
|
||
4.3 Correctness
|
||
Now we prove correctness and termination of the algorithm. First, we prove that
|
||
hypothesis automata are nominal NFAs.
|
||
Lemma 19. The hypothesis automaton N(S, E) (see Definition 14) is a nominal NFA.
|
||
N(S, E), as in ordinary NL∗ , is not always a nominal RFSA. However, we have the
|
||
|
||
following.
|
||
Theorem 20. If the table (S, E) is RFSA-closed, RFSA-consistent and N(S, E) is consistent with (S, E), then N(S, E) is a canonical nominal RFSA.
|
||
This is proved by Bollig, et al. (2009) for ordinary RFSAs, using the standard theory
|
||
of regular languages. The nominal proof is exactly the same, using derivatives of
|
||
nominal regular languages and nominal RFSAs as defined in Section 4.1.
|
||
Lemma 21. The table (S, E) cannot have more than n orbits of distinct rows, where
|
||
n is the number of orbits of the minimal nominal DFA for the target language.
|
||
Proof. Rows are residuals of ℒ, which are states of the minimal nominal DFA for ℒ, so
|
||
orbits cannot be more than n.
|
||
□
|
||
Theorem 22. The algorithm νNL∗ terminates and returns the canonical nominal
|
||
RFSA for ℒ.
|
||
Proof. If the algorithm terminates, then it must return the canonical nominal RFSA
|
||
for ℒ by Theorem 20. We prove that a table can be made RFSA-closed and RFSAconsistent in finite time. This is similar to the proof of Theorem 7 and is inspired by
|
||
the proof of Theorem 2 of Bollig, et al. (2009).
|
||
|
||
Learning Nominal Automata
|
||
|
||
99
|
||
|
||
If the table is not RFSA-closed, we find a row s ∈ S⋅A such that row(s) ∈ PR(S, E) ∖
|
||
PR⊤ (S, E). The algorithm then adds orb(s) to S. Since s was nonequivalent to all upper
|
||
prime rows, and thus from all the rows indexed by S, we find that S ∪ orb(t)/∼ has
|
||
strictly more orbits than S/∼ (recall that s ∼ t ⟺ row(s) = row(t)). This addition
|
||
can only be done finitely many times, because the number of orbits of S/∼ is bounded,
|
||
by Lemma 21.
|
||
Now, the case of RFSA-consistency needs some additional notions. Let R be the
|
||
(orbit-finite) nominal set of all rows, and let I = {(r, r′ ) ∈ R × R | r ⊏ r′ } be the set of all
|
||
inclusion relations among rows. The set I is orbit-finite. In fact, consider
|
||
J = {(s, t) ∈ (S ∪ S⋅A) × (S ∪ S⋅A) | row(s) ⊏ row(t)}.
|
||
|
||
This set is an equivariant, thus orbit-finite, subset of (S ∪ S⋅A) × (S ∪ S⋅A). The set I is
|
||
the image of J via row × row, which is equivariant, so it preserves orbit-finiteness.
|
||
Now, suppose the algorithm finds two elements s1 , s2 ∈ S with row(s1 ) ⊑ row(s2 )
|
||
but row(s1 a)(e) = 1 and row(s2 a)(e) = 0 for some a ∈ A and e ∈ E. Adding a column
|
||
to fix RFSA-consistency may: C1) increase orbits of (S ∪ S⋅A)/∼, or; C2) decrease orbits
|
||
of I, or; C3) decrease local symmetries/increase dimension of one orbit of rows. In
|
||
fact, if no new rows are added (C1), we have two cases.
|
||
– If row(s1 ) ⊏ row(s2 ), i.e., (row(s1 ), row(s2 )) ∈ I, then row′ (s1 ) ̸⊏ row′ (s2 ), where
|
||
row′ is the new table. Therefore the orbit of (row′ (s1 ), row′ (s2 )) is not in I. Moreover, row′ (s) ⊏ row′ (t) implies row(s) ⊏ row(t) (as no new rows are added), so
|
||
no new pairs are added to I. Overall, I has less orbits (C2).
|
||
– If row(s1 ) = row(s2 ), then we must have row(s1 ) = π ⋅ row(s1 ), for some π, because
|
||
lines 4–7 forbids equal rows in different orbits. In this case row′ (s1 ) ≠ π ⋅ row′ (s1 )
|
||
and we can use part of the proof of Theorem 7 to see that the orbit of row′ (s1 ) has
|
||
bigger dimension or less local symmetries than that of row(s1 ) (C3).
|
||
Orbits of (S∪S⋅A)/∼ and of I are finitely-many, by Lemma 21 and what we proved above.
|
||
Moreover, local symmetries can decrease finitely many times, and the dimension of
|
||
each orbit of rows is bounded by the dimension of the minimal DFA state-space.
|
||
Therefore all the above changes can happen finitely many times.
|
||
We have proved that the table eventually becomes RFSA-closed and RFSA-consistent.
|
||
Now we prove that a finite number of equivalence queries is needed to reach the final
|
||
hypothesis automaton. To do this, we cannot use a suitable version of Lemma 6, because this relies on N(S, E) being consistent with (S, E), which in general is not true
|
||
(see (Bollig, et al., 2008) for an example of this). We can, however, use an argument
|
||
similar to that for RFSA-consistency, because the algorithm adds columns in response
|
||
to counterexamples. Let w the counterexample provided by the teacher. When 16′
|
||
is executed, the table must change. In fact, by Lemma 2 of Bollig, et al. (2009), if it
|
||
does not, then w is already correctly classified by N(S, E), which is absurd. We have
|
||
the following cases. E1) orbits of (S ∪ S⋅A)/∼ increase (C1). Or, E2) either: Orbits in
|
||
PR(S, E) increase, or any of the following happens: Orbits in I decrease (C2), local
|
||
symmetries/dimension of an orbit of rows change (C3). In fact, if E1 does not happen
|
||
|
||
100
|
||
|
||
Chapter 5
|
||
|
||
and PR(S, E), I and local symmetries/dimension of orbits of rows do not change, the
|
||
automaton 𝒜 for the new table coincides with N(S, E). But N(S, E) = 𝒜 is a contradiction, because 𝒜 correctly classifies w (by Lemma 2 of Bollig, et al. (2009), as w
|
||
now belongs to columns), whereas N(S, E) does not. Both E1 and E2 can only happen
|
||
finitely many times.
|
||
□
|
||
|
||
4.4 Query Complexity
|
||
We now give bounds for the number of equivalence and membership queries needed
|
||
by νNL∗ . Let n be the number of orbits of the minimal DFA M for the target language
|
||
and let k be the dimension (i.e., the size of the maximum support) of its nominal set
|
||
of states.
|
||
Lemma 23.
|
||
|
||
The number of equivalence queries E′n,k is O(n2 f𝔸 (k, k) + nk log k).
|
||
|
||
Proof. In the proof of Theorem 22, we saw that equivalence queries lead to more orbits
|
||
in (S ∪ S⋅A)/∼, in PR(S, E), less orbits in I or less local symmetries/bigger dimension
|
||
for an orbit. Clearly the first two ones can happen at most n times. We now estimate
|
||
how many times I can decrease. Suppose (S ∪ S⋅A)/∼ has d orbits and ℎ orbits are
|
||
added to it. Recall that, given an orbit O of rows of dimension at most m, f𝔸 (m, m)
|
||
is an upper bound for the number of orbits in the product O × O. Since the support
|
||
of rows is bounded by k, we can give a bound for the number of orbits added to I:
|
||
dℎf𝔸 (k, k), for new pairs r ⊏ r′ with r in a new orbit of rows and r′ in an old one
|
||
(or vice versa); plus (ℎ(ℎ − 1)/2)f𝔸 (k, k), for r and r′ both in (distinct) new orbits;
|
||
plus ℎf𝔸 (k, k), for r and r′ in the same new orbit. Notice that, if PR(S, E) grows but
|
||
(S ∪ S⋅A)/∼ does not, I does not increase. By Lemma 21, ℎ, d ≤ n, so I cannot decrease
|
||
more than (n2 + n(n − 1)/2 + n)f𝔸 (k, k) times.
|
||
Local symmetries of an orbit of rows can decrease at most k log k times (see proof
|
||
of Lemma 8), and its dimension can increase at most k times. Therefore n(k + log k)
|
||
is a bound for all the orbits of rows, which are at most n, by Lemma 21. Summing up,
|
||
we get the main result.
|
||
□
|
||
Lemma 24. Let m be the length of the longest counterexample given by the teacher.
|
||
Then the table has:
|
||
– at most n orbits in S, with words of length at most n;
|
||
– at most mE′n,k orbits in E, with words of length at most mE′n,k .
|
||
Proof. By Lemma 21, the number of orbits of rows indexed by S is at most n. Now,
|
||
notice that line 5 does not add orb(sa) to S if sa ∈ S, and lines 16 and 11 cannot identify
|
||
rows, so S has at most n orbits. The length of the longest word in S must be at most
|
||
n, as S = {ϵ} when the algorithm starts, and line 6’ adds words with one additional
|
||
symbol than those in S.
|
||
|
||
Learning Nominal Automata
|
||
|
||
101
|
||
|
||
For columns, we note that both fixing RFSA-consistency and adding counterexamples increase the number of columns, but this can happen at most E′n,k times (see
|
||
proof of Lemma 23). Each time at most m suffixes are added to E.
|
||
□
|
||
We compute the maximum number of cells as in Section 3.3.
|
||
Lemma 25. The number of orbits in the lower part of the table, S⋅A, is bounded by
|
||
nlf𝔸 (pn, p).
|
||
Then C′n,k,m = n(lf𝔸 (pn, p) + 1)mE′n,k is the maximal number of cells in the table.
|
||
This bound is polynomial in n, m and l, but not in k and p.
|
||
Corollary 26.
|
||
|
||
The number of membership queries is bounded by C′n,k,m f𝔸 (pn, pmE′n,k ).
|
||
|
||
5 Implementation and Preliminary Experiments
|
||
Our algorithms for learning nominal automata operate on infinite sets of rows and
|
||
columns, and hence it is not immediately clear how to actually implement them on
|
||
a computer. We have used NLambda, a recently developed Haskell library by Klin
|
||
and Szynwelski (2016) designed to allow direct manipulation of infinite (but orbitfinite) nominal sets, within the functional programming paradigm. The semantics
|
||
of NLambda is based by Bojańczyk, et al. (2012), and the library itself is inspired
|
||
by Fresh O’Caml by Shinwell (2006), a language for functional programming over
|
||
nominal data structures with binding.
|
||
|
||
5.1 NLambda
|
||
NLambda extends Haskell with a new type Atoms. Values of this type are atomic
|
||
values that can be compared for equality and have no other discernible structure. They
|
||
correspond to the elements of the infinite alphabet 𝔸 described in Section 2.
|
||
Furthermore, NLambda provides a unary type constructor Set. This appears
|
||
similar to the the Data.Set type constructor from the standard Haskell library, but its
|
||
semantics is markedly different: Whereas the latter is used to construct finite sets, the
|
||
former has orbit-finite sets as values. The new constructor Set can be applied to a range
|
||
of equality types that include Atoms, but also the tuple type (Atoms, Atoms), the list
|
||
type [Atoms], the set type Set Atoms, and other types that provide basic infrastructure
|
||
necessary to speak of supports and orbits. All these are instances of a type class
|
||
NominalType specified in NLambda for this purpose.
|
||
NLambda, in addition to all the standard machinery of Haskell, offers primitives
|
||
to manipulate values of any nominal types τ, σ:
|
||
– empty : Set τ, returns the empty set of any type;
|
||
– atoms : Set Atoms, returns the (infinite but single-orbit) set of all atoms;
|
||
|
||
102
|
||
–
|
||
–
|
||
–
|
||
–
|
||
|
||
Chapter 5
|
||
|
||
insert : τ → Set τ → Set τ, adds an element to a set;
|
||
map : (τ → σ) → (Set τ → Set σ), applies a function to every element of a set;
|
||
sum : Set (Set τ) → Set τ, computes the union of a family of sets;
|
||
isEmpty : Set τ → Formula, checks whether a set is empty.
|
||
|
||
The type Formula has the role of a Boolean type. For technical reasons, it is distinct
|
||
from the standard Haskell type Bool, but it provides standard logical operations, e.g.,
|
||
|
||
not : Formula → Formula,
|
||
|
||
or : Formula → Formula → Formula,
|
||
|
||
as well as a conditional operator ite : Formula → τ → τ → τ that mimics the
|
||
standard if-then-else construction. It is also the result type of a built-in equality
|
||
test on atoms:
|
||
|
||
eq : Atoms → Atoms → Formula.
|
||
Using these primitives, one builds more functions to operate on orbit-finite sets, such
|
||
as a function to build singleton sets:
|
||
|
||
singleton : τ → Set τ
|
||
singleton x = insert x empty
|
||
or a filtering function to select elements that satisfy a given predicate:
|
||
|
||
filter : (τ → Formula) → Set τ → Set τ
|
||
filter p s = sum (map (λx. ite (p x) (singleton x) empty) s)
|
||
or functions to quantify a predicate over a set:
|
||
|
||
exists, forall : (τ → Formula) → Set τ → Formula
|
||
exists p s = not (isEmpty (filter p s))
|
||
forall p s = isEmpty (filter (λx. not (p x)) s)
|
||
and so on. Note that these functions are written in exactly the same way as they would
|
||
be for finite sets and the standard Data.Set type. This is not an accident, and indeed
|
||
the programmer can use the convenient set-theoretic intuition of NLambda primitives.
|
||
For example, one could conveniently construct various orbit-finite sets such as the set
|
||
of all pairs of atoms:
|
||
|
||
atomPairs = sum (map (λx. map (λy. (x, y)) atoms) atoms),
|
||
the set of all pairs of distinct atoms:
|
||
|
||
distPairs = filter (λ(x, y). not (eq x y)) atomPairs
|
||
and so on.
|
||
It should be stressed that all these constructions terminate in finite time, even
|
||
though they formally involve infinite sets. To achieve this, values of orbit-finite set
|
||
|
||
Learning Nominal Automata
|
||
|
||
103
|
||
|
||
types Set τ are internally not represented as lists or trees of elements of type τ. Instead, they are stored and manipulated symbolically, using first-order formulas over
|
||
variables that range over atom values. For example, the value of distPairs above is
|
||
stored as the formal expression:
|
||
{(a, b) | a, b ∈ 𝔸, a ≠ b}
|
||
|
||
or, more specifically, as a triple:
|
||
– a pair (a, b) of “atom variables”,
|
||
– a list [a, b] of those atom variables that are bound in the expression (in this case,
|
||
the expression contains no free variables),
|
||
– a formula a ≠ b over atom variables.
|
||
All the primitives listed above, such as isEmpty, map and sum, are implemented on
|
||
this internal representation. In some cases, this involves checking the satisfiability of
|
||
certain formulas over atoms. In the current implementation of NLambda, an external
|
||
SMT solver Z3 (de Moura & Bjørner, 2008) is used for that purpose. For example, to
|
||
evaluate the expression isEmpty distPairs, NLambda makes a system call to the
|
||
SMT solver to check whether the formula a ≠ b is satisfiable in the first-order theory
|
||
of equality and, after receiving the affirmative answer, returns the value False.
|
||
For more details about the semantics and implementation of NLambda, see Klin
|
||
and Szynwelski (2016). The library itself can be downloaded from https://www
|
||
.mimuw.edu.pl/~szynwelski/nlambda/.
|
||
|
||
5.2 Implementation of νL∗ and νNL∗
|
||
Using NLambda we implemented the algorithms from Sections 3 and 4. We note that
|
||
the internal representation is slightly different than the one discussed in Section 3.
|
||
Instead of representing the table (S, E) with actual representatives of orbits, the sets
|
||
are represented logically as described above. Furthermore, the control flow of the
|
||
algorithm is adapted to fit in the functional programming paradigm. In particular,
|
||
recursion is used instead of a while loop. In addition to the nominal adaptation of Angluin’s algorithm νL∗ , we implemented a variant, νL∗col which adds counterexamples
|
||
to the columns instead of rows.
|
||
Target automata are defined using NLambda as well, using the automaton data
|
||
type provided by the library. Membership queries are already implemented by the
|
||
library. Equivalence queries are implemented by constructing a bisimulation (recall
|
||
that bisimulation implies language equivalence), where a counterexample is obtained
|
||
when two DFAs are not bisimilar. For nominal NFAs, however, we cannot implement
|
||
a complete equivalence query as their language equivalence is undecidable. We
|
||
approximated the equivalence by bounding the depth of the bisimulation for nominal
|
||
NFAs. As an optimisation, we use bisimulation up to congruence as described by
|
||
Bonchi and Pous (2015). Having an approximate teacher is a minor issue since in
|
||
|
||
104
|
||
|
||
Chapter 5
|
||
|
||
many applications no complete teacher can be implemented and one relies on testing
|
||
(Aarts, et al., 2015 and Bollig, et al., 2013). For the experiments listed here the bound
|
||
was chosen large enough for the learner to terminate with the correct automaton.
|
||
The code can be found at https://github.com/Jaxan/nominal-lstar.
|
||
|
||
5.3 Test Cases
|
||
To provide a benchmark for future improvements, we tested our algorithms on simple
|
||
automata described below. We report results in Table 5.1. The experiments were
|
||
performed on a machine with an Intel Core i5 (Skylake, 2.4 GHz) and 8 GB RAM.
|
||
Model
|
||
FIFO0
|
||
FIFO1
|
||
FIFO2
|
||
FIFO3
|
||
FIFO4
|
||
FIFO5
|
||
ℒ0
|
||
ℒ1
|
||
ℒ2
|
||
ℒ′0
|
||
ℒ′1
|
||
ℒ′2
|
||
ℒ′3
|
||
ℒeq
|
||
|
||
2
|
||
0
|
||
3
|
||
1
|
||
5
|
||
2
|
||
10
|
||
3
|
||
25
|
||
4
|
||
77
|
||
5
|
||
2
|
||
0
|
||
4
|
||
1
|
||
7
|
||
2
|
||
3
|
||
1
|
||
5
|
||
1
|
||
9
|
||
1
|
||
17
|
||
1
|
||
n/a n/a
|
||
|
||
νL∗ (s)
|
||
|
||
νL∗ col (s)
|
||
|
||
1.9
|
||
12.9
|
||
45.6
|
||
189
|
||
370
|
||
1337
|
||
1.3
|
||
29.6
|
||
229
|
||
4.4
|
||
15.4
|
||
46.3
|
||
89.0
|
||
n/a
|
||
|
||
1.9
|
||
7.4
|
||
22.6
|
||
107
|
||
267
|
||
697
|
||
1.4
|
||
4.7
|
||
23.1
|
||
4.9
|
||
15.4
|
||
40.5
|
||
66.8
|
||
n/a
|
||
|
||
νNL∗ (s)
|
||
|
||
2
|
||
3
|
||
5
|
||
10
|
||
25
|
||
|
||
0
|
||
1
|
||
2
|
||
3
|
||
4
|
||
|
||
2.4
|
||
17.3
|
||
70.3
|
||
476
|
||
1230
|
||
|
||
∞
|
||
|
||
∞
|
||
|
||
∞
|
||
|
||
2
|
||
4
|
||
7
|
||
3
|
||
4
|
||
5
|
||
6
|
||
3
|
||
|
||
0
|
||
1
|
||
2
|
||
1
|
||
1
|
||
1
|
||
1
|
||
1
|
||
|
||
1.4
|
||
8.9
|
||
84.7
|
||
11.3
|
||
66.4
|
||
210
|
||
566
|
||
16.3
|
||
|
||
Table 5.1 Results of experiments. The column DFA (resp.
|
||
RFSA) shows the number of orbits (left sub-column) and dimension (right sub-column) of the learnt minimal DFA (resp.
|
||
canonical RFSA). We use ∞ when the running time is too high.
|
||
Queue Data Structure. A queue is a data structure to store elements which can later
|
||
be retrieved in a first-in, first-out order. It has two operations: push and pop. We define
|
||
the alphabet ΣFIFO = {push(a), pop(a) | a ∈ 𝔸}. The language FIFOn contains all valid
|
||
traces of push and pop using a bounded queue of size n. The minimal nominal DFA
|
||
for FIFO2 is given in Figure 5.2.
|
||
The state reached from q1,x via push(x) is omitted: Its outgoing transitions are
|
||
those of q2,x,y , where y is replaced by x. Similar benchmarks appear in (Aarts, et al.,
|
||
2015 and Isberner, et al., 2014).
|
||
Double Word. ℒn = {ww | w ∈ 𝔸n } from Section 1.
|
||
|
||
Learning Nominal Automata
|
||
|
||
push(y)
|
||
|
||
push(x)
|
||
|
||
pop(𝔸)
|
||
∗
|
||
|
||
q2,x,y
|
||
|
||
q1,x
|
||
|
||
q0
|
||
|
||
105
|
||
|
||
pop(x)
|
||
|
||
pop(x) to q1,y
|
||
|
||
pop(≠ x)
|
||
|
||
⊥
|
||
|
||
pop(≠ x)
|
||
push(𝔸)
|
||
Figure 5.2
|
||
|
||
A nominal automaton accepting FIFO2 .
|
||
|
||
NFA. Consider the language ℒeq = ⋃a∈𝔸 𝔸∗ a𝔸∗ a𝔸∗ of words where some letter
|
||
appears twice. This is accepted by an NFA which guesses the position of the first
|
||
occurrence of a repeated letter a and then waits for the second a to appear. The
|
||
language is not accepted by a DFA (Bojańczyk, et al., 2014). Despite this νNL∗ is able
|
||
to learn the automaton shown in Figure 5.3.
|
||
𝔸
|
||
|
||
𝔸
|
||
|
||
𝔸
|
||
x
|
||
|
||
x
|
||
q′1,x
|
||
|
||
q′0
|
||
|
||
q′2
|
||
𝔸 to any q′2,x
|
||
|
||
𝔸
|
||
y to q′2,y
|
||
𝔸
|
||
|
||
Figure 5.3 A nominal NFA accepting ℒeq . Here, the transition
|
||
from q′2 to q′1,x is defined as δ(q′2 , a) = {q′1,b | b ∈ 𝔸}.
|
||
𝐧-last Position.
|
||
|
||
A prototypical example of regular languages which are accepted
|
||
by very small NFAs is the set of words where a distinguished symbol a appears
|
||
on the n-last position (Bollig, et al., 2009). We define a similar nominal language
|
||
ℒ′n = ⋃a∈𝔸 a𝔸∗ a𝔸n . To accept such words non-deterministically, one simply guesses
|
||
the n-last position. This language is also accepted by a much larger deterministic
|
||
automaton.
|
||
|
||
6 Related Work
|
||
This section compares νL∗ with other algorithms from the literature. We stress that no
|
||
comparison is possible for νNL∗ , as it is the first learning algorithm for non-deterministic
|
||
automata over infinite alphabets.
|
||
|
||
106
|
||
|
||
Chapter 5
|
||
|
||
The first one to consider learning automata over infinite alphabets was Sakamoto
|
||
(1997). In his work the problem is reduced to L∗ with some finite sub-alphabet. The
|
||
sub-alphabet grows in stages and L∗ is rerun at every stage, until the alphabet is
|
||
big enough to capture the whole language. In Sakamoto’s approach, any learning
|
||
algorithm can be used as a back-end. This, however, comes at a cost: It has to be rerun
|
||
at every stage, and each symbol is treated in isolation, which might require more
|
||
queries. Our algorithm νL∗ , instead, works with the whole alphabet from the very
|
||
start, and it exploits its symmetry. An example is in Sections 1.1 and 1.2: The ordinary
|
||
learner uses four equivalence queries, whereas the nominal one, using the symmetry,
|
||
only needs three. Moreover, our algorithm is easier to generalise to other alphabets
|
||
and computational models, such as non-determinism.
|
||
More recently papers appeared on learning register automata by Cassel, et al.
|
||
(2016) and Howar, et al. (2012). Their register automata are as expressive as our
|
||
deterministic nominal automata. The state space is similar to our orbit-wise representation: It is formed by finitely many locations with registers. Transitions are defined
|
||
symbolically using propositional logic. We remark that the most recent paper by
|
||
Cassel, et al. (2016) generalises the algorithm to alphabets with different structures
|
||
(which correspond to different atom symmetries in our work), but at the cost of changing Angluin’s framework. Instead of membership queries the algorithm requires more
|
||
sophisticated tree queries. In our approach, using a different symmetry does not affect
|
||
neither the algorithm nor its correctness proof. Tree queries can be reduced to membership queries by enumerating all n-types for some n (n-types in logic correspond
|
||
to orbits in the set of n-tuples). Keeping that in mind, their complexity results are
|
||
roughly the same as ours, although this is hard to verify, as they do not give bounds
|
||
on the length of individual tree queries. Finally, our approach lends itself better to be
|
||
extended to other variations on L∗ (of which many exist), as it is closer to Angluin’s
|
||
original work.
|
||
Another class of learning algorithms for systems with large alphabets is based
|
||
on abstraction and refinement, which is orthogonal to the approach in this thesis
|
||
but connections and possible transference of techniques are worth exploring in the
|
||
future. Aarts, et al. (2015) reduce the alphabet to a finite alphabet of abstractions, and
|
||
L∗ for ordinary DFAs over such finite alphabet is used. Abstractions are refined by
|
||
counterexamples. Other similar approaches are by Howar, et al. (2011) and Isberner,
|
||
et al. (2013), where global and local per-state abstractions of the alphabet are used,
|
||
and by Mens (2017), where the alphabet can also have additional structure (e.g., an
|
||
ordering relation). We also mention that Botincan and Babic (2013) give a framework
|
||
for learning symbolic models of software behaviour.
|
||
Berg, et al. (2006 and 2008) cope with an infinite alphabet by running L∗ (adapted
|
||
to Mealy machines) using a finite approximation of the alphabet, which may be
|
||
augmented when equivalence queries are answered. A smaller symbolic model is
|
||
derived subsequently. Their approach, unlike ours, does not exploit the symmetry
|
||
|
||
Learning Nominal Automata
|
||
|
||
107
|
||
|
||
over the full alphabet. The symmetry allows our algorithm to reduce queries and to
|
||
produce the smallest possible automaton at every step.
|
||
Finally we compare with results on session automata (Bollig, et al., 2013). Session
|
||
automata are defined over finite alphabets just like the work by Sakamoto. However,
|
||
session automata are more restrictive than deterministic nominal automata. For example, the model cannot capture an acceptor for the language of words where consecutive
|
||
data values are distinct. This language can be accepted by a three orbit nominal DFA,
|
||
which can be learned by our algorithm.
|
||
We implemented our algorithms in the nominal library NLambda as sketched
|
||
before. Other implementation options include Fresh OCaml (Shinwell, 2006), a functional programming language designed for programming over nominal data structures with binding, and Lois by Kopczyński and Toruńczyk (2016 and 2017), a C++
|
||
library for imperative nominal programming. We chose NLambda for its convenient
|
||
set-theoretic primitives, but the other options remain to be explored, in particular the
|
||
low-level Lois could be expected to provide more efficient implementations.
|
||
|
||
7 Discussion and Future Work
|
||
In this chapter we defined and implemented extensions of several versions of L∗ and
|
||
of NL∗ for nominal automata.
|
||
We highlight two features of our approach:
|
||
– It has strong theoretical foundations: The theory of nominal languages, covering different alphabets and symmetries (see Section 2.1); category theory, where nominal
|
||
automata have been characterised as coalgebras (Ciancia & Montanari, 2010 and
|
||
Kozen, et al., 2015) and many properties and algorithms (e.g., minimisation) have
|
||
been studied at this abstract level.
|
||
– It follows a generic pattern for transporting computation models and algorithms
|
||
from finite sets to nominal sets, which leads to simple correctness proofs.
|
||
These features pave the way to several extensions and improvements.
|
||
Future work includes a general version of νNL∗ , parametric in the notion of sideeffect (an example is non-determinism). Different notions will yield models with
|
||
different degree of succinctness w.r.t. deterministic automata. The key observation
|
||
here is that many forms of non-determinism and other side effects can be captured via
|
||
the categorical notion of monad, i.e., an algebraic structure, on the state-space. Monads
|
||
allow generalising the notion of composed and prime state: A state is composed
|
||
whenever it is obtained from other states via an algebraic operation. Our algorithm
|
||
νNL∗ is based on the powerset monad, representing classical non-determinism. We
|
||
are currently investigating a substitution monad, where the operation is “applying a
|
||
(possibly non-injective) substitution of atoms in the support”. A minimal automaton
|
||
over this monad, akin to a RFSA, will have states that can generate all the states of the
|
||
associated minimal DFA via a substitution, but cannot be generated by other states
|
||
|
||
108
|
||
|
||
Chapter 5
|
||
|
||
(they are prime). For instance, we can give an automaton over the substitution monad
|
||
that recognises ℒ2 from Section 1:
|
||
≠x
|
||
≠y
|
||
|
||
x, [y ↦ x]
|
||
q0
|
||
|
||
x
|
||
|
||
qx
|
||
|
||
y
|
||
|
||
qxy
|
||
|
||
x
|
||
|
||
qy
|
||
|
||
y
|
||
|
||
q1
|
||
|
||
A
|
||
|
||
q2
|
||
|
||
A
|
||
|
||
Here [y ↦ x] means that, if that transition is taken, qxy (hence its language) is subject
|
||
to y ↦ x. In general, the size of the minimal DFA for ℒn grows more than exponentially
|
||
with n, but an automaton with substitutions on transitions, like the one above, only
|
||
needs 𝒪(n) states. This direction is investigated in Chapter 7.
|
||
In principle, thanks to the generic approach we have taken, all our algorithms
|
||
should work for various kinds of atoms with more structure than just equality, as
|
||
advocated by Bojańczyk, et al. (2014). Details, such as precise assumptions on the
|
||
underlying structure of atoms necessary for proofs to go through, remain to be checked.
|
||
In the next chapter (Chapter 6), we investigate learning with the total order symmetry.
|
||
We implement this in NLambda, as well as a new tool for computing with nominal
|
||
sets over the total order symmetry.
|
||
The efficiency of our current implementation, as measured in Section 5.3, leaves
|
||
much to be desired. There is plenty of potential for running time optimisation, ranging
|
||
from improvements in the learning algorithms itself, to optimisations in the NLambda
|
||
library (such as replacing the external and general-purpose SMT solver with a purposebuilt, internal one, or a tighter integration of nominal mechanisms with the underlying
|
||
Haskell language as it was done by Shinwell, 2006), to giving up the functional
|
||
programming paradigm for an imperative language such as LOIS (Kopczyński &
|
||
Toruńczyk, 2016 and 2017).
|
||
|
||
Acknowledgements
|
||
We thank Frits Vaandrager and Gerco van Heerdt for useful comments and discussions.
|
||
We also thank the anonymous reviewers.
|
||
|
||
Chapter 6
|
||
Fast Computations on Ordered Nominal Sets
|
||
David Venhoek
|
||
Radboud University
|
||
|
||
Joshua Moerman
|
||
Radboud University
|
||
Jurriaan Rot
|
||
Radboud University
|
||
|
||
Abstract
|
||
We show how to compute efficiently with nominal sets over the total order symmetry by developing a direct representation of such nominal sets
|
||
and basic constructions thereon. In contrast to previous approaches, we
|
||
work directly at the level of orbits, which allows for an accurate complexity analysis. The approach is implemented as the library Ons (Ordered
|
||
Nominal Sets).
|
||
Our main motivation is nominal automata, which are models for
|
||
recognising languages over infinite alphabets. We evaluate Ons in two
|
||
applications: minimisation of automata and active automata learning.
|
||
In both cases, Ons is competitive compared to existing implementations
|
||
and outperforms them for certain classes of inputs.
|
||
|
||
This chapter is based on the following publication:
|
||
Venhoek, D., Moerman, J., & Rot, J. (2018). Fast Computations on Ordered Nominal Sets. In Theoretical Aspects of Computing - ICTAC - 15th International Colloquium,
|
||
Proceedings. Springer. doi:10.1007/978-3-030-02508-3_26
|
||
|
||
110
|
||
|
||
Chapter 6
|
||
|
||
Automata over infinite alphabets are natural models for programs with unbounded
|
||
data domains. Such automata, often formalised as register automata, are applied in
|
||
modelling and analysis of communication protocols, hardware, and software systems
|
||
(see Bojańczyk, et al., 2014; D’Antoni & Veanes, 2017; Grigore & Tzevelekos, 2016;
|
||
Kaminski & Francez, 1994; Montanari & Pistore, 1997; Segoufin, 2006 and references
|
||
therein). Typical infinite alphabets include sequence numbers, timestamps, and identifiers. This means one can model data flow in such automata beside the basic control
|
||
flow provided by ordinary automata. Recently, it has been shown in a series of papers
|
||
that such models are amenable to learning (Aarts, et al., 2015; Bollig, et al., 2013;
|
||
Cassel, et al., 2016; Drews & D’Antoni, 2017; Moerman, et al., 2017; Vaandrager, 2017)
|
||
with the verification of (closed source) TCP implementations by Fiterău-Broștean, et
|
||
al. (2016) as a prominent example.
|
||
A foundational approach to infinite alphabets is provided by the notion of nominal
|
||
set, originally introduced in computer science as an elegant formalism for name binding (Gabbay & Pitts, 2002 and Pitts, 2016). Nominal sets have been used in a variety
|
||
of applications in semantics, computation, and concurrency theory (see Pitts, 2013 for
|
||
an overview). Bojańczyk, et al. (2014) introduce nominal automata, which allow one to
|
||
model languages over infinite alphabets with different symmetries. Their results are
|
||
parametric in the structure of the data values. Important examples of data domains
|
||
are ordered data values (e.g., timestamps) and data values that can only be compared
|
||
for equality (e.g., identifiers). In both data domains, nominal automata and register
|
||
automata are equally expressive.
|
||
Important for applications of nominal sets and automata are implementations. A
|
||
couple of tools exist to compute with nominal sets. Notably, Nλ (Klin & Szynwelski,
|
||
2016) and Lois (Kopczyński & Toruńczyk, 2016 and 2017) provide a general purpose
|
||
programming language to manipulate infinite sets.19 Both tools are based on SMT
|
||
solvers and use logical formulas to represent the infinite sets. These implementations
|
||
are very flexible, and the SMT solver does most of the heavy lifting, which makes the
|
||
implementations themselves relatively straightforward. Unfortunately, this comes at a
|
||
cost as SMT solving is in general Pspace-hard. Since the formulas used to describe sets
|
||
tend to grow as more calculations are done, running times can become unpredictable.
|
||
In this chapter, we use a direct representation based on symmetries and orbits, to
|
||
represent nominal sets. We focus on the total order symmetry, where data values are
|
||
rational numbers and can be compared for their order. Nominal automata over the
|
||
total order symmetry are more expressive than automata over the equality symmetry
|
||
(i.e., traditional register automata of Kaminski & Francez, 1994). A key insight is
|
||
that the representation of nominal sets from Bojańczyk, et al. (2014) becomes rather
|
||
simple in the total order symmetry; each orbit is presented solely by a natural number,
|
||
intuitively representing the number of variables or registers.
|
||
|
||
19
|
||
|
||
Other implementations of nominal techniques that are less directly related to our setting (Mihda, Fresh
|
||
OCaml, and Nominal Isabelle) are discussed in Section 5.
|
||
|
||
Fast Computations on Ordered Nominal Sets
|
||
|
||
111
|
||
|
||
Our main contributions include the following.
|
||
– We develop the representation theory of nominal sets over the total order symmetry.
|
||
We give concrete representations of nominal sets, their products, and equivariant
|
||
maps.
|
||
– We provide time complexity bounds for operations on nominal sets such as intersections and membership. Using those results we give the time complexity of Moore’s
|
||
minimisation algorithm (generalised to nominal automata) and prove that it is
|
||
polynomial in the number of orbits.
|
||
– Using the representation theory, we are able to implement nominal sets in a C++
|
||
library Ons. The library includes all the results from the representation theory
|
||
(sets, products, and maps).
|
||
– We evaluate the performance of Ons and compare it to Nλ and Lois, using two
|
||
algorithms on nominal automata: minimisation (Bojańczyk & Lasota, 2012) and
|
||
automata learning (Moerman, et al., 2017). We use randomly generated automata
|
||
as well as concrete, logically structured models such as FIFO queues. For random
|
||
automata, our methods are drastically faster than the other tools. On the other
|
||
hand, Lois and Nλ are faster in minimising the structured automata as they exploit
|
||
their logical structure. In automata learning, the logical structure is not available
|
||
a-priori, and Ons is faster in most cases.
|
||
The structure of this chapter is as follows. Section 1 contains background on nominal sets and their representation. Section 2 describes the concrete representation of
|
||
nominal sets, equivariant maps and products in the total order symmetry. Section 3
|
||
describes the implementation Ons with complexity results, and Section 4 the evaluation of Ons on algorithms for nominal automata. Related work is discussed in
|
||
Section 5, and future work in Section 6.
|
||
|
||
1 Nominal sets
|
||
Nominal sets are infinite sets that carry certain symmetries, allowing a finite representation in many interesting cases. We recall their formalisation in terms of group
|
||
actions, following Bojańczyk, et al. (2014) and Pitts (2013), to which we refer for an
|
||
extensive introduction.
|
||
|
||
1.1 Group actions
|
||
Let G be a group and X be a set. A (left) G-action is a function ⋅ : G × X → X satisfying
|
||
1 ⋅ x = x and (ℎg) ⋅ x = ℎ ⋅ (g ⋅ x) for all x ∈ X and g, ℎ ∈ G. A set X with a G-action is
|
||
called a G-set and we often write gx instead of g ⋅ x. The orbit of an element x ∈ X is
|
||
the set {gx | g ∈ G}. A G-set is always a disjoint union of its orbits (in other words, the
|
||
orbits partition the set). We say that X is orbit-finite if it has finitely many orbits, and
|
||
we denote the number of orbits by N(X).
|
||
|
||
112
|
||
|
||
Chapter 6
|
||
|
||
A map f : X → Y between G-sets is called equivariant if it preserves the group action,
|
||
i.e., for all x ∈ X and g ∈ G we have g ⋅ f(x) = f(g ⋅ x). If an equivariant map f is
|
||
bijective, then f is an isomorphism and we write X ≅ Y. A subset Y ⊆ X is equivariant
|
||
if the corresponding inclusion map is equivariant. The product of two G-sets X and
|
||
Y is given by the Cartesian product X × Y with the point-wise group action on it, i.e.,
|
||
g(x, y) = (gx, gy). Union and intersection of X and Y are well-defined if the two actions
|
||
agree on their common elements.
|
||
|
||
1.2 Nominal sets
|
||
A data symmetry is a pair (𝒟, G) where 𝒟 is a set and G is a subgroup of Sym(𝒟),
|
||
the group of bijections on 𝒟. Note that the group G naturally acts on 𝒟 by defining
|
||
gx = g(x). In the most studied instance, called the equality symmetry, 𝒟 is a countably
|
||
infinite set and G = Sym(𝒟). In this chapter, we focus on the total order symmetry given
|
||
by 𝒟 = ℚ and G = {π | π ∈ Sym(ℚ), π is monotone}.
|
||
Let (𝒟, G) be a data symmetry and X be a G-set. A set of data values S ⊆ 𝒟 is
|
||
called a support of an element x ∈ X if for all g ∈ G with ∀s ∈ S : gs = s we have gx = x.
|
||
A G-set X is called nominal if every element x ∈ X has a finite support.
|
||
Example 1. We list several examples for the total order symmetry. The set ℚ2 is
|
||
nominal as each element (q1 , q2 ) ∈ ℚ2 has the finite set {q1 , q2 } as its support. The
|
||
set has the following three orbits:
|
||
{(q1 , q2 ) | q1 < q2 }
|
||
|
||
{(q1 , q2 ) | q1 = q2 }
|
||
|
||
{(q1 , q2 ) | q1 > q2 }.
|
||
|
||
For a set X, the set of all subsets of size n ∈ ℕ is denoted by 𝒫n (X) = {Y ⊆ X | #Y = n}.
|
||
The set 𝒫n (ℚ) is a single-orbit nominal set for each n, with the action defined by direct
|
||
image: gY = {gy | y ∈ Y}. The group of monotone bijections also acts by direct image
|
||
on the full power set 𝒫(ℚ), but this is not a nominal set. For instance, the set ℤ ∈ 𝒫(ℚ)
|
||
of integers has no finite support.
|
||
If S ⊆ 𝒟 is a support of an element x ∈ X, then any set S′ ⊆ 𝒟 such that S ⊆ S′ is also a
|
||
support of x. A set S ⊂ 𝒟 is a least finite support of x ∈ X if it is a finite support of x and
|
||
S ⊆ S′ for any finite support S′ of x. The existence of least finite supports is crucial
|
||
for representing orbits. Unfortunately, even when elements have a finite support, in
|
||
general they do not always have a least finite support. A data symmetry (𝒟, G) is said
|
||
to admit least supports if every element of every nominal set has a least finite support.
|
||
Both the equality and the total order symmetry admit least supports. (Bojańczyk, et al.,
|
||
2014 give additional (counter)examples of data symmetries admitting least supports.)
|
||
Having least finite supports is useful for a finite representation. Henceforth, we will
|
||
write least support to mean least finite support.
|
||
Given a nominal set X, the size of the least support of an element x ∈ X is denoted
|
||
by dim(x), the dimension of x. We note that all elements in the orbit of x have the same
|
||
|
||
Fast Computations on Ordered Nominal Sets
|
||
|
||
113
|
||
|
||
dimension. For an orbit-finite nominal set X, we define dim(X) = max{dim(x) | x ∈ X}.
|
||
For a single-orbit set O, observe that dim(O) = dim(x) where x is any element x ∈ O.
|
||
|
||
1.3 Representing nominal orbits
|
||
We represent nominal sets as collections of single orbits. The finite representation
|
||
of single orbits is based on the theory of Bojańczyk, et al. (2014), which uses the
|
||
technical notions of restriction and extension. We only briefly report their definitions
|
||
here. However, the reader can safely move to the concrete representation theory in
|
||
Section 2 with only a superficial understanding of Theorem 2 below.
|
||
The restriction of an element π ∈ G to a subset C ⊆ 𝒟, written as π|C , is the
|
||
restriction of the function π : 𝒟 → 𝒟 to the domain C. The restriction of a group G to
|
||
a subset C ⊆ 𝒟 is defined as G|C = {π|C | π ∈ G, πC = C}. The extension of a subgroup
|
||
S ≤ G|C is defined as extG (S) = {π ∈ G | π|C ∈ S}. For C ⊆ 𝒟 and S ≤ G|C , define
|
||
[C, S]ec = {{gs | s ∈ extG (S)} | g ∈ G}, i.e., the set of right cosets of extG (S) in G. Then
|
||
[C, S]ec is a single-orbit nominal set.
|
||
Using the above, we can formulate the representation theory from Bojańczyk, et
|
||
al. (2014). This gives a finite description for all single-orbit nominal sets X, namely a
|
||
finite set C together with some of its symmetries.
|
||
Theorem 2. Let X be a single-orbit nominal set for a data symmetry (𝒟, G) that
|
||
admits least supports and let C ⊆ 𝒟 be the least support of some element x ∈ X. Then
|
||
there exists a subgroup S ≤ G|C such that X ≅ [C, S]ec .
|
||
The proof by Bojańczyk, et al. (2014) uses a bit of category theory: it establishes an
|
||
equivalence of categories between single-orbit sets and the pairs (C, S). We will not
|
||
use the language of category theory much in order to keep the chapter self-contained.
|
||
|
||
2 Representation in the total order symmetry
|
||
This section develops a concrete representation of nominal sets over the total order
|
||
symmetry, as well as their equivariant maps and products. It is based on the abstract
|
||
representation theory from Section 1.3. From now on, by nominal set we always refer
|
||
to a nominal set over the total order symmetry. Hence, our data domain is ℚ and we
|
||
take G to be the group of monotone bijections.
|
||
|
||
2.1 Orbits and nominal sets
|
||
From the representation in Section 1.3, we find that any single-orbit set X can be
|
||
represented as a tuple (C, S). Our first observation is that the finite group S of ‘local
|
||
|
||
114
|
||
|
||
Chapter 6
|
||
|
||
symmetries’ in this representation is always trivial, i.e., S = I, where I = {1} is the
|
||
trivial group. This follows from the following lemma and S ≤ G|C .
|
||
Lemma 3.
|
||
|
||
For every finite subset C ⊂ ℚ, we have G|C = I.
|
||
|
||
Immediately, we see that (C, S) = (C, I), and hence that the orbit is fully represented
|
||
by the set C. A further consequence of Lemma 3 is that each element of an orbit can be
|
||
uniquely identified by its least support. This leads us to the following characterisation
|
||
of [C, I]ec .
|
||
Lemma 4.
|
||
|
||
Given a finite subset C ⊂ ℚ, we have [C, I]ec ≅ 𝒫#C (ℚ).
|
||
|
||
By Theorem 2 and the above lemmas, we can represent an orbit by a single integer
|
||
n, the size of the least support of its elements. This naturally extends to (orbit-finite)
|
||
nominal sets with multiple orbits by using a multiset of natural numbers, representing
|
||
the size of the least support of each of the orbits. These multisets are formalised here
|
||
as functions f : ℕ → ℕ.
|
||
Definition 5.
|
||
|
||
Given a function f : ℕ → ℕ, we define a nominal set [f]o by
|
||
[f]o =
|
||
|
||
∪
|
||
|
||
{i} × 𝒫n (ℚ).
|
||
|
||
n∈ℕ
|
||
1≤i≤f(n)
|
||
|
||
Proposition 6. For every orbit-finite nominal set X, there is a function f : ℕ → ℕ such
|
||
that X ≅ [f]o and the set {n | f(n) ≠ 0} is finite. Furthermore, the mapping between X
|
||
and f is one-to-one (up to isomorphism of nominal sets) when restricting to f : ℕ → ℕ
|
||
for which the set {n | f(n) ≠ 0} is finite.
|
||
The presentation in terms of a function f : ℕ → ℕ enforces that there are only finitely
|
||
many orbits of any given dimension. The first part of the above proposition generalises
|
||
to arbitrary nominal sets by replacing the codomain of f by the class of all sets and
|
||
adapting Definition 5 accordingly. However, the resulting correspondence will no
|
||
longer be one-to-one.
|
||
As a brief example, let us consider the set ℚ × ℚ. The elements (a, b) split in three
|
||
orbits, one for a < b, one for a = b and one for a > b. These have dimension 2, 1 and
|
||
2 respectively, so the set ℚ × ℚ is represented by the multiset {1, 2, 2}.
|
||
|
||
2.2 Equivariant maps
|
||
We show how to represent equivariant maps, using two basic properties. Let f : X → Y
|
||
be an equivariant map. The first property is that the direct image of an orbit (in X)
|
||
is again an orbit (in Y), that is to say, f is defined ‘orbit-wise’. Second, equivariant
|
||
maps cannot introduce new elements in the support (but they can drop them). More
|
||
precisely:
|
||
|
||
Fast Computations on Ordered Nominal Sets
|
||
|
||
115
|
||
|
||
Lemma 7. Let f : X → Y be an equivariant map, and O ⊆ X a single orbit. The direct
|
||
image f(O) = {f(x) | x ∈ O} is a single-orbit nominal set.
|
||
Lemma 8. Let f : X → Y be an equivariant map between two nominal sets X and Y.
|
||
Let x ∈ X and let C be a support of x. Then C supports f(x).
|
||
Hence, equivariant maps are fully determined by associating two pieces of information
|
||
for each orbit in the domain: the orbit on which it is mapped and a string denoting
|
||
which elements of the least support of the input are preserved. These ingredients are
|
||
formalised in the first part of the following definition. The second part describes how
|
||
these ingredients define an equivariant function. Proposition 10 then states that every
|
||
equivariant function can be described in this way.
|
||
Definition 9. Let H = {(I1 , F1 , O1 ), …, (In , Fn , On )} be a finite set of tuples where the
|
||
Ii ’s are disjoint single-orbit nominal sets, the Oi ’s are single-orbit nominal sets with
|
||
dim(Oi ) ≤ dim(Ii ), and the Fi ’s are bit strings of length dim(Ii ) with exactly dim(Oi )
|
||
ones.
|
||
Given a set H as above, we define fH : ⋃ Ii → ⋃ Oi as the unique equivariant
|
||
function such that, given x ∈ Ii with least support C, fH (x) is the unique element of
|
||
Oi with support {C(j) | Fi (j) = 1}, where Fi (j) is the j-th bit of Fi and C(j) is the j-th
|
||
smallest element of C.
|
||
Proposition 10. For every equivariant map f : X → Y between orbit-finite nominal
|
||
sets X and Y there is a set H as in Definition 9 such that f = fH .
|
||
Consider the example function min : 𝒫3 (ℚ) → ℚ which returns the smallest element
|
||
of a 3-element set. Note that both 𝒫3 (ℚ) and ℚ are single orbits. Since for the orbit
|
||
𝒫3 (ℚ) we only keep the smallest element of the support, we can thus represent the
|
||
function min with H = {(𝒫3 (ℚ), 100, ℚ)}.
|
||
|
||
2.3 Products
|
||
The product X × Y of two nominal sets is again a nominal set and hence, it can be
|
||
represented itself in terms of the dimension of each of its orbits as shown in Section 2.1.
|
||
However, this approach has some disadvantages.
|
||
Example 11. We start by showing that the orbit structure of products can be nontrivial. Consider the product of X = ℚ and the set Y = {(a, b) ∈ ℚ2 | a < b}. This
|
||
product consists of five orbits, more than one might naively expect from the fact that
|
||
both sets are single-orbit:
|
||
{(a, (b, c)) | a, b, c ∈ ℚ, a < b < c},
|
||
|
||
{(a, (a, b)) | a, b ∈ ℚ, a < b},
|
||
|
||
{(b, (a, c)) | a, b, c ∈ ℚ, a < b < c},
|
||
|
||
{(b, (a, b)) | a, b ∈ ℚ, a < b},
|
||
|
||
{(c, (a, b)) | a, b, c ∈ ℚ, a < b < c}.
|
||
|
||
116
|
||
|
||
Chapter 6
|
||
|
||
We find that this product is represented by the multiset {2, 2, 3, 3, 3}. Unfortunately,
|
||
this is not sufficient to accurately describe the product as it abstracts away from the
|
||
relation between its elements with those in X and Y. In particular, it is not possible to
|
||
reconstruct the projection maps from such a representation.
|
||
The essence of our representation of products is that each orbit O in the product
|
||
X × Y is described entirely by the dimension of O together with the two (equivariant)
|
||
projections π1 : O → X and π2 : O → Y. This combination of the orbit and the two
|
||
projection maps can already be represented using Propositions 6 and 10. However, as
|
||
we will see, a combined representation for this has several advantages. For discussing
|
||
such a representation, let us first introduce what it means for tuples of a set and two
|
||
functions to be isomorphic:
|
||
Definition 12. Given nominal sets X, Y, Z1 and Z2 , and equivariant functions l1 : Z1 →
|
||
X, r1 : Z1 → Y, l2 : Z2 → X and r2 : Z2 → Y, we define (Z1 , l1 , r1 ) ≅ (Z2 , l2 , r2 ) if there
|
||
exists an isomorphism ℎ : Z1 → Z2 such that l1 = l2 ∘ ℎ and r1 = r2 ∘ ℎ.
|
||
Our goal is to have a representation that, for each orbit O, produces a tuple (A, f1 , f2 )
|
||
isomorphic to the tuple (O, π1 , π2 ). The next lemma gives a characterisation that can
|
||
be used to simplify such a representation.
|
||
Lemma 13. Let X and Y be nominal sets and (x, y) ∈ X × Y. If C, Cx , and Cy are the
|
||
least supports of (x, y), x, and y respectively, then C = Cx ∪ Cy .
|
||
With Proposition 10 we represent the maps π1 and π2 by tuples (O, F1 , O1 ) and
|
||
(O, F2 , O2 ) respectively. Using Lemma 13 and the definitions of F1 and F2 , we see
|
||
that at least one of F1 (i) and F2 (i) equals 1 for each i.
|
||
We can thus combine the strings F1 and F2 into a single string P ∈ {L, R, B}∗ as
|
||
follows. We set P(i) = L when only F1 (i) is 1, P(i) = R when only F2 (i) is 1, and
|
||
P(i) = B when both are 1. The string P fully describes the strings F1 and F2 . This
|
||
process for constructing the string P gives it two useful properties. The number of Ls
|
||
and Bs in the string gives the size dimension of O1 . Similarly, the number of Rs and Bs
|
||
in the string gives the dimension of O2 . We will call strings with that property valid.
|
||
In conclusion, to describe a single orbit of the product X × Y, a valid string P together
|
||
with the images of π1 and π2 is sufficient.
|
||
Definition 14. Let P ∈ {L, R, B}∗ , and O1 ⊆ X, O2 ⊆ Y be single-orbit sets. Given a
|
||
tuple (P, O1 , O2 ), where the string P is valid, define
|
||
[(P, O1 , O2 )]t = (𝒫|P| (ℚ), fH1 , fH2 ),
|
||
|
||
where Hi = {(𝒫|P| (ℚ), Fi , Oi )} and the string F1 is defined as the string P with Ls and
|
||
Bs replaced by 1s and Rs by 0s. The string F2 is similarly defined with the roles of L
|
||
and R swapped.
|
||
|
||
Fast Computations on Ordered Nominal Sets
|
||
|
||
117
|
||
|
||
Proposition 15. There exists a one-to-one correspondence between the orbits O ⊆
|
||
X × Y, and tuples (P, O1 , O2 ) satisfying O1 ⊆ X, O2 ⊆ Y, and where P is a valid string,
|
||
such that [(P, O1 , O2 )]t ≅ (O, π1 |O , π2 |O ).
|
||
From the above proposition it follows that we can generate the product X × Y simply
|
||
by enumerating all valid strings P for all pairs of orbits (O1 , O2 ) of X and Y. Given
|
||
this, we can calculate the multiset representation of a product from the multiset
|
||
representations of both factors.
|
||
Theorem 16.
|
||
|
||
For X ≅ [f]o and Y ≅ [g]o we have X × Y ≅ [ℎ]o , where
|
||
ℎ(n) =
|
||
|
||
n
|
||
j
|
||
f(i)g(j)
|
||
.
|
||
( j )(n − i)
|
||
0≤i,j≤n
|
||
∑
|
||
|
||
i+j≥n
|
||
|
||
Example 17. To illustrate some aspects of the above representation, let us use it to
|
||
calculate the product of Example 11. First, we observe that both ℚ and S = {(a, b) ∈
|
||
ℚ2 | a < b} consist of a single orbit. Hence any orbit of the product corresponds to a
|
||
triple (P, ℚ, S), where the string P satisfies |P|L + |P|B = dim(ℚ) = 1 and |P|R + |P|B =
|
||
dim(S) = 2. We can now find the orbits of the product ℚ × S by enumerating all
|
||
strings satisfying these equations. This yields
|
||
– LRR, corresponding to the orbit {(a, (b, c)) | a, b, c ∈ ℚ, a < b < c},
|
||
– RLR, corresponding to the orbit {(b, (a, c)) | a, b, c ∈ ℚ, a < b < c},
|
||
– RRL, corresponding to the orbit {(c, (a, b)) | a, b, c ∈ ℚ, a < b < c},
|
||
– RB, corresponding to the orbit {(b, (a, b)) | a, b ∈ ℚ, a < b}, and
|
||
– BR, corresponding to the orbit {(a, (a, b)) | a, b ∈ ℚ, a < b}.
|
||
Each product string fully describes the corresponding orbit. To illustrate, consider the
|
||
string BR. The corresponding bit strings for the projection functions are F1 = 10 and
|
||
F2 = 11. From the lengths of the string we conclude that the dimension of the orbit is
|
||
2. The string F1 further tells us that the left element of the tuple consists only of the
|
||
smallest element of the support. The string F2 indicates that the right element of the
|
||
tuple is constructed from both elements of the support. Combining this, we find that
|
||
the orbit is {(a, (a, b)) | a, b ∈ ℚ, a < b}.
|
||
|
||
2.4 Summary
|
||
We summarise our concrete representation in the following table. Propositions 6, 10
|
||
and 15 correspond to the three rows in the table.
|
||
Notice that in the case of maps and products, the orbits are inductively represented
|
||
using the concrete representation. As a base case we can represent single orbits by
|
||
their dimension.
|
||
|
||
118
|
||
|
||
Chapter 6
|
||
|
||
Object
|
||
|
||
Representation
|
||
|
||
Single orbit O
|
||
Nominal set X = ⋃i Oi
|
||
|
||
Natural number n = dim(O)
|
||
Multiset of these numbers
|
||
|
||
Map from single orbit f : O → Y The orbit f(O) and a bit string F
|
||
Equivariant map f : X → Y
|
||
Set of tuples (O, F, f(O)), one for each orbit
|
||
Orbit in a product O ⊆ X × Y
|
||
Product X × Y
|
||
|
||
The corresponding orbits of X and Y, and a string P
|
||
relating their supports
|
||
Set of tuples (P, OX , OY ), one for each orbit
|
||
|
||
Table 6.1 Overview of representation.
|
||
|
||
3 Implementation and Complexity of ONS
|
||
The ideas outlined above have been implemented in a C++ library, Ons, and a Haskell
|
||
library, Ons-hs.20 We focus here on the C++ library only, as the Haskell one is very similar. The library can represent orbit-finite nominal sets and their products, (disjoint)
|
||
unions, and maps. A full description of the possibilities is given in the documentation
|
||
included with Ons.
|
||
As an example, the following program computes the product from Example 11.
|
||
Initially, the program creates the nominal set A, containing the entirety of ℚ. Then
|
||
it creates a nominal set B, such that it consists of the orbit containing the element
|
||
(1, 2) ∈ ℚ × ℚ. For this, the library determines to which orbit of the product ℚ × ℚ
|
||
the element (1, 2) belongs, and then stores a description of the orbit as described in
|
||
Section 2. Note that this means that it internally never needs to store the element
|
||
used to create the orbit. The function nomset_product then uses the enumeration of
|
||
product strings mentioned in Section 2.3 to calculate the product of A and B. Finally,
|
||
it prints a representative element for each of the orbits in the product. These elements
|
||
are constructed based on the description of the orbits stored, filled in to make their
|
||
support equal to sets of the form {1, 2, …, n}.
|
||
|
||
nomset<rational> A = nomset_rationals();
|
||
nomset<pair<rational, rational>> B({rational(1),rational(2)});
|
||
auto AtimesB = nomset_product(A, B);
|
||
// compute the product
|
||
for (auto orbit : AtimesB)
|
||
cout << orbit.getElement() << " ";
|
||
Running this gives the following output (where /1 signifies the denominator):
|
||
|
||
(1/1,(2/1,3/1)) (1/1,(1/1,2/1)) (2/1,(1/1,3/1))
|
||
(2/1,(1/1,2/1)) (3/1,(1/1,2/1))
|
||
20
|
||
|
||
Ons can be found at https://github.com/davidv1992/ONS and Ons-hs can be found at https://github.com
|
||
/Jaxan/ons-hs/.
|
||
|
||
Fast Computations on Ordered Nominal Sets
|
||
|
||
119
|
||
|
||
Internally, orbit is implemented following the theory presented in Section 2,
|
||
storing the dimension of the orbit it represents. It also contains sufficient information
|
||
to reconstruct elements given their least support, such as the product string for orbits
|
||
resulting from a product. The class nomset then uses a standard set data structure to
|
||
store the collection of orbits contained in the nominal set it represents.
|
||
In a similar way, eqimap stores equivariant maps by associating each orbit in the
|
||
domain with the image orbit and the string representing which of the least support to
|
||
keep. This is stored using a map data structure. For both nominal sets and equivariant
|
||
maps, the underlying data structure is currently implemented using trees.
|
||
|
||
3.1 Complexity of operations
|
||
Using the concrete representation of nominal sets, we can determine the complexity
|
||
of common operations. To simplify such an analysis, we will make the following
|
||
assumptions:
|
||
– The comparison of two orbits takes O(1).
|
||
– Constructing an orbit from an element takes O(1).
|
||
– Checking whether an element is in an orbit takes O(1).
|
||
These assumptions are justified as each of these operations takes time proportional
|
||
to the size of the representation of an individual orbit, which in practice is small
|
||
and approximately constant. For instance, the orbit 𝒫n (ℚ) is represented by just the
|
||
integer n and its type.
|
||
Theorem 18. If nominal sets are implemented with a tree-based set structure (as in
|
||
Ons), the complexity of the following set operations is as follows. Recall that N(X)
|
||
denotes the number of orbits of X. We use p and f to denote functions implemented
|
||
in whatever way the user wants, which we assume to take O(1) time. The software
|
||
assumes these are equivariant, but this is not verified.
|
||
Operation
|
||
Test x ∈ X
|
||
Test X ⊆ Y
|
||
Calculate X ∪ Y
|
||
Calculate X ∩ Y
|
||
Calculate {x ∈ X | p(x)}
|
||
Calculate {f(x) | x ∈ X}
|
||
Calculate X × Y
|
||
|
||
Complexity
|
||
O(log N(X))
|
||
O(min(N(X) + N(Y), N(X) log N(Y)))
|
||
O(N(X) + N(Y))
|
||
O(N(X) + N(Y))
|
||
O(N(X))
|
||
O(N(X) log N(X))
|
||
O(N(X × Y)) ⊆ O(3dim(X)+dim(Y) N(X)N(Y))
|
||
|
||
Table 6.2 Time complexity of operations on nominal sets.
|
||
Proof. Since most parts are proven similarly, we only include proofs for the first and
|
||
last item.
|
||
|
||
120
|
||
|
||
Chapter 6
|
||
|
||
Membership. To decide x ∈ X, we first construct the orbit containing x, which is
|
||
done in constant time. Then we use a logarithmic lookup to decide whether this orbit
|
||
is in our set data structure. Hence, membership checking is O(log(N(X))).
|
||
Products. Calculating the product of two nominal sets is the most complicated
|
||
construction. For each pair of orbits in the original sets X and Y, all product strings
|
||
need to be generated. Each product orbit itself is constructed in constant time. By
|
||
generating these orbits in-order, the resulting set takes O(N(X × Y)) time to construct.
|
||
We can also give an explicit upper bound for the number of orbits in terms of
|
||
the input. Recall that orbits in a product are represented by strings of length at
|
||
most dim(X) + dim(Y). (If the string is shorter, we pad it with one of the symbols.)
|
||
Since there are three symbols (L, R and B), the product of X and Y will have at most
|
||
3dim(X)+dim(Y) N(X)N(Y) orbits. It follows that taking products has time complexity
|
||
of O(3dim(X)+dim(Y) N(X)N(Y)).
|
||
□
|
||
|
||
4 Results and evaluation in automata theory
|
||
In this section we consider applications of nominal sets to automata theory. As mentioned in the introduction, nominal sets are used to formalise languages over infinite
|
||
alphabets. These languages naturally arise as the semantics of register automata. The
|
||
definition of register automata is not as simple as that of ordinary finite automata.
|
||
Consequently, transferring results from automata theory to this setting often requires
|
||
non-trivial proofs. Nominal automata, instead, are defined as ordinary automata by
|
||
replacing finite sets with orbit-finite nominal sets. The theory of nominal automata
|
||
is developed by Bojańczyk, et al. (2014) and it is shown that many algorithms, such
|
||
as minimisation (based on the Myhill-Nerode equivalence), from automata theory
|
||
transfer to nominal automata. Not all algorithms work: e.g., the subset construction
|
||
fails for nominal automata.
|
||
As an example we consider the following language on rational numbers:
|
||
ℒint = {a1 b1 ⋯an bn | ai , bi ∈ ℚ, ai < ai+1 < bi+1 < bi for all i}.
|
||
|
||
We call this language the interval language as a word w ∈ ℚ∗ is in the language when it
|
||
denotes a sequence of nested intervals. This language contains arbitrarily long words.
|
||
For this language it is crucial to work with an infinite alphabet as for each finite set
|
||
C ⊂ ℚ, the restriction ℒint ∩ C∗ is just a finite language. Note that the language is
|
||
equivariant: w ∈ ℒint ⟺ wg ∈ ℒint for any monotone bijection g, because nested
|
||
intervals are preserved by monotone maps.21 Indeed, ℒint is a nominal set, although it
|
||
is not orbit-finite.
|
||
Informally, the language ℒint can be accepted by the automaton depicted in Figure 6.1. Here we allow the automaton to store rational numbers and compare them to
|
||
21
|
||
|
||
The G-action on words is defined point-wise: g(w1 …wn ) = (gw1 )…(gwn ).
|
||
|
||
Fast Computations on Ordered Nominal Sets
|
||
|
||
121
|
||
|
||
new symbols. For example, the transition from q2 to q3 is taken if any value c between
|
||
a and b is read and then the currently stored value a is replaced by c. For any other
|
||
value read at state q2 the automaton transitions to the sink state q4 . Such a transition
|
||
structure is made precise by the notion of nominal automata.
|
||
|
||
q0
|
||
|
||
a
|
||
|
||
q1 (a)
|
||
|
||
b>a
|
||
|
||
a<c<b
|
||
a←c
|
||
q2 (a, b)
|
||
|
||
q3 (a, b)
|
||
a<c<b
|
||
b←c
|
||
|
||
c≤a
|
||
|
||
c≥b
|
||
|
||
c≤a
|
||
|
||
b≤a
|
||
q4
|
||
|
||
c≥b
|
||
|
||
a
|
||
|
||
Figure 6.1
|
||
|
||
Example automaton that accepts the language ℒint .
|
||
|
||
Definition 19. A nominal language is an equivariant subset L ⊆ A∗ where A is an
|
||
orbit-finite nominal set.
|
||
Definition 20. A nominal deterministic finite automaton is a tuple (S, A, F, δ), where S is
|
||
an orbit-finite nominal set of states, A is an orbit-finite nominal set of symbols, F ⊆ S
|
||
is an equivariant subset of final states, and δ : S × A → S is the equivariant transition
|
||
function.
|
||
Given a state s ∈ S, we define the usual acceptance condition: a word w ∈ A∗ is
|
||
accepted if w denotes a path from s to a final state.
|
||
The automaton in Figure 6.1 can be formalised as a nominal deterministic finite automaton as follows. Let S = {q0 , q4 } ∪ {q1 (a) | a ∈ ℚ} ∪ {q2 (a, b) | a < b ∈ ℚ} ∪ {q3 (a, b) | a <
|
||
b ∈ ℚ} be the set of states, where the group action is defined as one would expect. The
|
||
transition we described earlier can now be formally defined as δ(q2 (a, b), c) = q3 (c, b)
|
||
for all a < c < b ∈ ℚ. By defining δ on all states accordingly and defining the final
|
||
states as F = {q2 (a, b) | a < b ∈ ℚ}, we obtain a nominal deterministic automaton
|
||
(S, ℚ, F, δ). The state q0 accepts the language ℒint .
|
||
We implement two algorithms on nominal automata, minimisation and learning,
|
||
to benchmark Ons. The performance of Ons is compared to two existing libraries for
|
||
computing with nominal sets, Nλ and Lois. The following automata will be used.
|
||
|
||
Random automata
|
||
As a primary test suite, we generate random automata as follows. The input alphabet
|
||
is always ℚ and the number of orbits and dimension k of the state space S are fixed.
|
||
|
||
122
|
||
|
||
Chapter 6
|
||
|
||
For each orbit in the set of states, its dimension is chosen uniformly at random between
|
||
0 and k, inclusive. Each orbit has a probability 12 of consisting of accepting states.
|
||
To generate the transition function δ, we enumerate the orbits of S × ℚ and choose
|
||
a target state uniformly from the orbits S with small enough dimension. The bit string
|
||
indicating which part of the support is preserved is then sampled uniformly from all
|
||
valid strings. We will denote these automata as randN(S),k . The choices made here are
|
||
arbitrary and only provide basic automata. We note that the automata are generated
|
||
orbit-wise and this may favour our tool.
|
||
|
||
Structured automata
|
||
Besides random automata we wish to test the algorithms on more structured automata.
|
||
We define the following automata.
|
||
FIFO(𝐧) Automata accepting valid traces of a finite FIFO data structure of size n.
|
||
The alphabet is defined by two orbits: {Put(a) | a ∈ ℚ} and {Get(a) | a ∈ ℚ}.
|
||
𝐰𝐰(𝐧)
|
||
|
||
Automata accepting the language of words of the form ww, where w ∈ ℚn .
|
||
|
||
The language ℒmax = {wa ∈ ℚ∗ | a = max(w1 , …, wn )} where the last symbol
|
||
is the maximum of previous symbols.
|
||
𝓛max
|
||
|
||
𝓛int
|
||
|
||
The language accepting a series of nested intervals, as defined before.
|
||
|
||
In Table 6.3 we report the number of orbits for each automaton. The first two
|
||
classes of automata are described in Chapter 5. These two classes are also equivariant
|
||
w.r.t. the equality symmetry.
|
||
Extra structure allows the automata to be encoded more efficiently, as we do not
|
||
need to encode a transition for each orbit in S × A. Instead, a more symbolic encoding
|
||
is possible. Both Lois and Nλ allow to use this more symbolic representation. Our
|
||
tool, Ons, only works with nominal sets and the input data needs to be provided
|
||
orbit-wise. Where applicable, the automata listed above were generated using the
|
||
code from Moerman, et al. (2017), ported to the other libraries as needed.
|
||
|
||
4.1 Minimising nominal automata
|
||
For languages recognised by nominal DFAs, a Myhill-Nerode theorem holds which
|
||
relates states to right congruence classes. This guarantees the existence of unique
|
||
minimal automata. We say an automaton is minimal if its set of states has the least
|
||
number of orbits and each orbit has the smallest dimension possible.22 We generalise
|
||
22
|
||
|
||
Abstractly, an automaton is minimal if it has no proper quotients. Minimal deterministic automata are
|
||
unique up to isomorphism.
|
||
|
||
Fast Computations on Ordered Nominal Sets
|
||
|
||
123
|
||
|
||
Moore’s minimisation algorithm to nominal DFAs (Algorithm 6.1) and analyse its
|
||
time complexity using the bounds from Section 3.
|
||
|
||
Require: Nominal automaton M = (S, A, F, δ)
|
||
Ensure: Minimal nominal automaton equivalent to M
|
||
1
|
||
2
|
||
3
|
||
4
|
||
5
|
||
6
|
||
7
|
||
8
|
||
9
|
||
10
|
||
11
|
||
|
||
i←0
|
||
≡−1 ← S × S
|
||
≡0 ← F × F ∪ (S\F) × (S\F)
|
||
while ≡i ≠ ≡i−1 do
|
||
≡i+1 ← {(q1 , q2 ) | (q1 , q2 ) ∈ ≡i ∧ ∀a ∈ A, (δ(q1 , a), δ(q2 , a)) ∈ ≡i }
|
||
i←i+1
|
||
|
||
end while
|
||
E ← S/≡i
|
||
FE ← {e ∈ E | ∀s ∈ e, s ∈ F}
|
||
Let δE be the map such that, if s ∈ e and δ(s, a) ∈ e′ , then δE (e, a) = e′
|
||
return (E, A, FE , δE )
|
||
|
||
Algorithm 6.1
|
||
|
||
Moore’s minimisation algorithm for nominal DFAs.
|
||
|
||
Theorem 21. The runtime complexity of Moore’s algorithm on nominal deterministic
|
||
automata is O(35k kN(S)3 N(A)), where k = dim(S ∪ A).
|
||
Proof. This is shown by counting operations, using the complexity results of set
|
||
operations stated in Theorem 18. We first focus on the while loop on lines 4–7. The
|
||
runtime of an iteration of the loop is determined by line 5, as this is the most expensive
|
||
step. Since the dimensions of S and A are at most k, computing S × S × A takes
|
||
O(N(S)2 N(A)35k ). Filtering S × S using that then takes O(N(S)2 32k ). The time to
|
||
compute S × S × A dominates, hence each iteration of the loop takes O(N(S)2 N(A)35k ).
|
||
Next, we need to count the number of iterations of the loop. Each iteration of the
|
||
loop gives rise to a new partition, refining the previous partition. Furthermore, every
|
||
partition generated is equivariant. Note that this implies that each refinement of the
|
||
partition does at least one of two things: distinguish between two orbits of S previously
|
||
in the same element(s) of the partition, or distinguish between two members of the
|
||
same orbit previously in the same element of the partition. The former can happen
|
||
only N(S) − 1 times, as after that there are no more orbits lumped together. The latter
|
||
can only happen dim(S) times per orbit, because each such a distinction between
|
||
elements is based on splitting on the value of one of the elements of the support.
|
||
Hence, after dim(S) times on a single orbit, all elements of the support are used up.
|
||
Combining this, the longest chain of partitions of S has length at most O(kN(S)).
|
||
Since each partition generated in the loop is unique, the loop cannot run for more
|
||
iterations than the length of the longest chain of partitions on S. It follows that
|
||
|
||
124
|
||
|
||
Chapter 6
|
||
|
||
there are at most O(kN(S)) iterations of the loop, giving the loop a complexity of
|
||
O(kN(S)3 N(A)35k )
|
||
|
||
The remaining operations outside the loop have a lower complexity than that
|
||
of the loop, hence the complexity of Moore’s minimisation algorithm for a nominal
|
||
automaton is O(kN(S)3 N(A)35k ).
|
||
□
|
||
The above theorem shows in particular that minimisation of nominal automata is fixedparameter tractable (FPT) with the dimension as fixed parameter. The complexity of
|
||
Algorithm 6.1 for nominal automata is very similar to the O((#S)3 # A) bound given
|
||
by a naive implementation of Moore’s algorithm for ordinary DFAs. This suggest that
|
||
it is possible to further optimise an implementation with similar techniques used for
|
||
ordinary automata.
|
||
|
||
Implementations
|
||
We implemented the minimisation algorithm in Ons. For Nλ and Lois we used their
|
||
implementations of Moore’s minimisation algorithm (Klin & Szynwelski, 2016 and
|
||
Kopczyński & Toruńczyk, 2016 and 2017). For each of the libraries, we wrote routines
|
||
to read in an automaton from a file and, for the structured test cases, to generate the
|
||
requested automaton. For Ons, all automata were read from file. The output of these
|
||
programs was manually checked to see if the minimisation was performed correctly.
|
||
|
||
Results
|
||
The results (shown in Table 6.3) for random automata show a clear advantage for
|
||
Ons, which is capable of running all supplied testcases in less than one second. This
|
||
in contrast to both Lois and Nλ, which take more than 2 hours on the largest random
|
||
automata.
|
||
The results for structured automata show a clear effect of the extra structure. Both
|
||
Nλ and Lois remain capable of minimising the automata in reasonable amounts of
|
||
time for larger sizes. In contrast, Ons benefits little from the extra structure. Despite
|
||
this, it remains viable: even for the larger cases it falls behind significantly only for
|
||
the largest FIFO automaton and the two largest ww automata.
|
||
|
||
4.2 Learning nominal automata
|
||
Another application that we implemented in Ons is automata learning. The aim of
|
||
automata learning is to infer an unknown regular language ℒ. We use the framework
|
||
of active learning as set up by Angluin (1987) where a learning algorithm can query
|
||
an oracle to gather information about ℒ. Formally, the oracle can answer two types of
|
||
queries:
|
||
|
||
Fast Computations on Ordered Nominal Sets
|
||
Type
|
||
N(S) N(Smin )
|
||
rand5,1 (x10)
|
||
5
|
||
n/a
|
||
rand10,1 (x10)
|
||
10
|
||
n/a
|
||
rand10,2 (x10)
|
||
10
|
||
n/a
|
||
rand15,1 (x10)
|
||
15
|
||
n/a
|
||
rand15,2 (x10)
|
||
15
|
||
n/a
|
||
rand15,3 (x10)
|
||
15
|
||
n/a
|
||
FIFO(2)
|
||
13
|
||
6
|
||
FIFO(3)
|
||
65
|
||
19
|
||
FIFO(4)
|
||
440
|
||
94
|
||
FIFO(5)
|
||
3686
|
||
635
|
||
ww(2)
|
||
8
|
||
8
|
||
ww(3)
|
||
24
|
||
24
|
||
ww(4)
|
||
112
|
||
112
|
||
ww(5)
|
||
728
|
||
728
|
||
ℒmax
|
||
5
|
||
3
|
||
ℒint
|
||
5
|
||
5
|
||
|
||
Ons (s) Gen (s)
|
||
0.02
|
||
n/a
|
||
0.03
|
||
n/a
|
||
0.09
|
||
n/a
|
||
0.04
|
||
n/a
|
||
0.11
|
||
n/a
|
||
0.46
|
||
n/a
|
||
0.01
|
||
0.01
|
||
0.38
|
||
0.09
|
||
39.11
|
||
1.60
|
||
∞
|
||
39.78
|
||
0.00
|
||
0.00
|
||
0.19
|
||
0.02
|
||
26.44
|
||
0.25
|
||
∞
|
||
6.37
|
||
0.00
|
||
0.00
|
||
0.00
|
||
0.00
|
||
|
||
Nλ (s)
|
||
0.82
|
||
17.03
|
||
2114
|
||
87
|
||
3346
|
||
|
||
125
|
||
|
||
Lois (s)
|
||
3.14
|
||
92
|
||
∞
|
||
|
||
620
|
||
|
||
∞
|
||
|
||
∞
|
||
∞
|
||
|
||
1.37
|
||
11.59
|
||
76
|
||
402
|
||
0.14
|
||
0.88
|
||
3.41
|
||
10.54
|
||
2.06
|
||
1.55
|
||
|
||
0.24
|
||
2.4
|
||
14.95
|
||
71
|
||
0.03
|
||
0.16
|
||
0.61
|
||
1.80
|
||
0.03
|
||
0.03
|
||
|
||
Table 6.3 Running times for Algorithm 6.1 implemented in the three libraries. N(S) is the size of the input and N(Smin ) the size of the minimal
|
||
automaton. For Ons, the time used to generate the automaton is reported
|
||
separately (in grey). Timeouts are indicated by ∞.
|
||
– membership queries, where a query consists of a word w ∈ A∗ and the oracle replies
|
||
whether w ∈ ℒ, and
|
||
– equivalence queries, where a query consists of an automaton ℋ and the oracle replies
|
||
positively if ℒ(ℋ) = ℒ or provides a counterexample if ℒ(ℋ) ≠ ℒ.
|
||
With these queries, the L∗ algorithm can learn regular languages efficiently (Angluin,
|
||
1987). In particular, it learns the unique minimal automaton for ℒ using only finitely
|
||
many queries. The L∗ algorithm has been generalised to νL∗ in order to learn nominal
|
||
regular languages. In particular, it learns a nominal DFA (over an infinite alphabet)
|
||
using only finitely many queries. We implement νL∗ in the presented library and compare it to its previous implementation in Nλ. The algorithm is not polynomial, unlike
|
||
the minimisation algorithm described above. However, the authors conjecture that
|
||
there is a polynomial algorithm.23 For the correctness, termination, and comparison
|
||
with other learning algorithms see Chapter 5.
|
||
|
||
23
|
||
|
||
See https://joshuamoerman.nl/papers/2017/17popl-learning-nominal-automata.html for a sketch of the
|
||
polynomial algorithm.
|
||
|
||
126
|
||
|
||
Chapter 6
|
||
|
||
Implementations
|
||
Both implementations in Nλ and Ons are direct implementations of the pseudocode
|
||
for νL∗ with no further optimisations. The authors of Lois implemented νL∗ in their
|
||
library as well.24 They reported similar performance as the implementation in Nλ
|
||
(private communication). Hence we focus our comparison on Nλ and Ons. We use
|
||
the variant of νL∗ where counterexamples are added as columns instead of prefixes.
|
||
The implementation in Nλ has the benefit that it can work with different symmetries. Indeed, the structured examples, FIFO and ww, are equivariant w.r.t. the
|
||
equality symmetry as well as the total order symmetry. For that reason, we run the
|
||
Nλ implementation using both the equality symmetry and the total order symmetry
|
||
on those languages. For the languages ℒmax , ℒint and the random automata, we can
|
||
only use the total order symmetry.
|
||
To run the νL∗ algorithm, we implement an external oracle for the membership
|
||
queries. This is akin to the application of learning black box systems (see Vaandrager,
|
||
2017). For equivalence queries, we constructed counterexamples by hand. All implementations receive the same counterexamples. We measure CPU time instead of real
|
||
time, so that we do not account for the external oracle.
|
||
|
||
Results
|
||
The results (Table 6.4) for random automata show an advantage for Ons. Additionally,
|
||
we report the number of membership queries, which can vary for each implementation
|
||
as some steps in the algorithm depend on the internal ordering of set data structures.
|
||
In contrast to the case of minimisation, the results suggest that Nλ cannot exploit
|
||
the logical structure of FIFO(n), ℒmax and ℒint as it is not provided a priori. For ww(2)
|
||
we inspected the output on Nλ and saw that it learned some logical structure (e.g.,
|
||
it outputs {(a, b) | a ≠ b} as a single object instead of two orbits {(a, b) | a < b} and
|
||
{(a, b) | b < a}). This may explain why Nλ is still competitive. For languages which
|
||
are equivariant for the equality symmetry, the Nλ implementation using the equality
|
||
symmetry can learn with much fewer queries. This is expected as the automata
|
||
themselves have fewer orbits. It is interesting to see that these languages can be
|
||
learned more efficiently by choosing the right symmetry.
|
||
|
||
5 Related work
|
||
As stated in the introduction, Nλ by Klin and Szynwelski (2016) and Lois by Kopczyński
|
||
and Toruńczyk (2016) use first-order formulas to represent nominal sets and use SMT
|
||
solvers to manipulate them. This makes both libraries very flexible and they indeed
|
||
24
|
||
|
||
Can be found on https://github.com/eryxcc/lois/blob/master/tests/learning.cpp.
|
||
|
||
Fast Computations on Ordered Nominal Sets
|
||
|
||
Model
|
||
N(S) dim(S)
|
||
rand5,1
|
||
4
|
||
1
|
||
rand5,1
|
||
5
|
||
1
|
||
rand5,1
|
||
3
|
||
0
|
||
rand5,1
|
||
5
|
||
1
|
||
rand5,1
|
||
4
|
||
1
|
||
FIFO(1)
|
||
3
|
||
1
|
||
FIFO(2)
|
||
6
|
||
2
|
||
FIFO(3)
|
||
19
|
||
3
|
||
ww(1)
|
||
4
|
||
1
|
||
ww(2)
|
||
8
|
||
2
|
||
ww(3)
|
||
24
|
||
3
|
||
ℒmax
|
||
3
|
||
1
|
||
ℒint
|
||
5
|
||
2
|
||
|
||
Ons
|
||
time (s)
|
||
MQs
|
||
127
|
||
2321
|
||
0.12
|
||
404
|
||
0.86
|
||
499
|
||
∞
|
||
n/a
|
||
0.08
|
||
387
|
||
0.04
|
||
119
|
||
1.73
|
||
2655
|
||
2794
|
||
298400
|
||
0.42
|
||
134
|
||
266
|
||
3671
|
||
∞
|
||
n/a
|
||
0.01
|
||
54
|
||
0.59
|
||
478
|
||
|
||
Nλord
|
||
time (s)
|
||
MQs
|
||
2391
|
||
1243
|
||
2434
|
||
435
|
||
1819
|
||
422
|
||
∞
|
||
n/a
|
||
2097
|
||
387
|
||
3.17
|
||
119
|
||
392
|
||
3818
|
||
∞
|
||
n/a
|
||
2.49
|
||
77
|
||
228
|
||
2140
|
||
∞
|
||
n/a
|
||
3.58
|
||
54
|
||
83
|
||
478
|
||
|
||
127
|
||
|
||
Nλeq
|
||
time (s) MQs
|
||
|
||
1.76
|
||
40.00
|
||
2047
|
||
1.47
|
||
30.58
|
||
∞
|
||
|
||
51
|
||
434
|
||
8151
|
||
30
|
||
237
|
||
n/a
|
||
|
||
Table 6.4 Running times and number of membership queries for the νL∗ algorithm.
|
||
For Nλ we used two version: Nλord uses the total order symmetry Nλeq uses the
|
||
equality symmetry. Timeouts are indicated by ∞.
|
||
implement the equality symmetry as well as the total order symmetry. As their representation is not unique, the efficiency depends on how the logical formulas are
|
||
constructed. As such, they do not provide complexity results. In contrast, our direct representation allows for complexity results (Section 3) and leads to different
|
||
performance characteristics (Section 4).
|
||
A second big difference is that both Nλ and Lois implement a “programming paradigm” instead of just a library. This means that they overload natural programming
|
||
constructs in their host languages (Haskell and C++ respectively). For programmers
|
||
this means they can think of infinite sets without having to know about nominal sets.
|
||
It is worth mentioning that an older (unreleased) version of Nλ implemented
|
||
nominal sets with orbits instead of SMT solvers (Bojańczyk, et al., 2012). However,
|
||
instead of characterising orbits (e.g., by its dimension), they represent orbits by a
|
||
representative element. Klin and Szynwelski (2016) reported that the current version
|
||
is faster.
|
||
The theoretical foundation of our work is the main representation theorem by
|
||
Bojańczyk, et al. (2014). We improve on that by instantiating it to the total order
|
||
symmetry and distil a concrete representation of nominal sets. As far as we know, we
|
||
provide the first implementation of their representation theory.
|
||
Another tool using nominal sets is Mihda by Ferrari, et al. (2005). Here, only the
|
||
equality symmetry is implemented. This tool implements a translation from π-calculus
|
||
to history-dependent automata (HD-automata) with the aim of minimisation and
|
||
checking bisimilarity. The implementation in OCaml is based on named sets, which
|
||
are finite representations for nominal sets. The theory of named sets is well-studied
|
||
|
||
128
|
||
|
||
Chapter 6
|
||
|
||
and has been used to model various behavioural models with local names. For those
|
||
results, the categorical equivalences between named sets, nominal sets and a certain
|
||
(pre)sheaf category have been exploited (Ciancia, et al., 2010 and Ciancia & Montanari,
|
||
2010). The total order symmetry is not mentioned in their work. We do, however,
|
||
believe that similar equivalences between categories can be stated. Interestingly, the
|
||
product of named sets is similar to our representation of products of nominal sets:
|
||
pairs of elements together with data which denotes the relation between data values.
|
||
Fresh OCaml by Shinwell and Pitts (2005) and Nominal Isabelle by Urban and
|
||
Tasson (2005) are both specialised in name-binding and α-conversion used in proof
|
||
systems. They only use the equality symmetry and do not provide a library for
|
||
manipulating nominal sets. Hence they are not suited for our applications.
|
||
On the theoretical side, there are many complexity results for register automata
|
||
(Grigore & Tzevelekos, 2016 and Murawski, et al., 2015). In particular, we note that
|
||
problems such as emptiness and equivalence are NP-hard depending on the type of
|
||
register automaton. Recently, Murawski, et al. (2018) showed that equivalence of
|
||
unique-valued deterministic register automata can be decided in polynomial time.
|
||
These results do not easily compare to our complexity results for minimisation. One
|
||
difference is that we use the total order symmetry, where the local symmetries are
|
||
always trivial (Lemma 3). As a consequence, all the complexity required to deal with
|
||
groups vanishes. Rather, the complexity is transferred to the input of our algorithms,
|
||
because automata over the equality symmetry require more orbits when expressed
|
||
over the total order symmetry. Another difference is that register automata allow for
|
||
duplicate values in the registers. In nominal automata, such configurations will be
|
||
encoded in different orbits.
|
||
Orthogonal to nominal automata, there is the notion of symbolic automata (D’Antoni & Veanes, 2017 and Maler & Mens, 2017). These automata are also defined
|
||
over infinite alphabets but they use predicates on transitions, instead of relying on
|
||
symmetries. Symbolic automata are finite state (as opposed to infinite state nominal
|
||
automata) and do not allow for storing values. However, they do allow for general
|
||
predicates over an infinite alphabet, including comparison to constants.
|
||
|
||
6 Conclusion and Future Work
|
||
We presented a concrete finite representation for nominal sets over the total order
|
||
symmetry. This allowed us to implement a library, Ons, and provide complexity
|
||
bounds for common operations. The experimental comparison of Ons against existing
|
||
solutions for automata minimisation and learning show that our implementation is
|
||
much faster in many instances. As such, we believe Ons is a promising implementation
|
||
of nominal techniques.
|
||
A natural direction for future work is to consider other symmetries, such as the
|
||
equality symmetry. Here, we may take inspiration from existing tools such as Mihda
|
||
|
||
Fast Computations on Ordered Nominal Sets
|
||
|
||
129
|
||
|
||
(see Section 5). Another interesting question is whether it is possible to translate a
|
||
nominal automaton over the total order symmetry which accepts an equality language
|
||
to an automaton over the equality symmetry. This would allow one to efficiently move
|
||
between symmetries. Finally, our techniques can potentially be applied to timed
|
||
automata by exploiting the intriguing connection between the nominal automata that
|
||
we consider and timed automata (Bojańczyk & Lasota, 2012).
|
||
|
||
Acknowledgement
|
||
We would like to thank Szymon Toruńczyk and Eryk Kopczyński for their prompt
|
||
help when using the Lois library. For general comments and suggestions we would
|
||
like to thank Ugo Montanari and Niels van der Weide. At last, we want to thank the
|
||
anonymous reviewers for their comments.
|
||
|
||
130
|
||
|
||
Chapter 6
|
||
|
||
Chapter 7
|
||
Separation and Renaming in Nominal Sets
|
||
Joshua Moerman
|
||
Radboud University
|
||
|
||
Jurriaan Rot
|
||
Radboud University
|
||
|
||
Abstract
|
||
Nominal sets provide a foundation for reasoning about names. They
|
||
are used primarily in syntax with binders, but also, e.g., to model automata over infinite alphabets. In this chapter, nominal sets are related
|
||
to nominal renaming sets, which involve arbitrary substitutions rather
|
||
than permutations, through a categorical adjunction. In particular, the
|
||
separated product of nominal sets is related to the Cartesian product
|
||
of nominal renaming sets. Based on these results, we define the new
|
||
notion of separated nominal automata. These efficiently recognise nominal
|
||
languages, provided these languages are renaming sets. In such cases,
|
||
moving from the existing notion of nominal automata to separated automata can lead to an exponential reduction of the state space.
|
||
|
||
This chapter is based on the following submission:
|
||
Moerman, J. & Rot, J. (2019). Separation and Renaming in Nominal Sets. (Under submission)
|
||
|
||
132
|
||
|
||
Chapter 7
|
||
|
||
Nominal sets are abstract sets which allow one to reason over sets with names, in terms
|
||
of permutations and symmetries. Since their introduction in computer science by
|
||
Gabbay and Pitts (1999), they have been widely used for implementing and reasoning
|
||
over syntax with binders (see the book of Pitts, 2013). Further, nominal techniques
|
||
have been related to computability theory (Bojańczyk, et al., 2013) and automata
|
||
theory (Bojańczyk, et al., 2014), where they provide an elegant means of studying
|
||
languages over infinite alphabets. This embeds nominal techniques in a broader
|
||
setting of symmetry aware computation (Pitts, 2016).
|
||
Gabbay, one of the pioneers of nominal techniques described a variation on the
|
||
theme: nominal renaming sets (Gabbay, 2007 and Gabbay & Hofmann, 2008). Nominal
|
||
renaming sets are equipped with a monoid action of arbitrary (possibly non-injective)
|
||
substitution of names, in contrast to nominal sets, which only involve a group action
|
||
of permutations.
|
||
In this paper, the motivation for using nominal renaming sets comes from automata
|
||
theory over infinite alphabets. Certain languages form nominal renaming sets, which
|
||
means that they are closed under all possible substitutions on atoms. In order to obtain
|
||
efficient automata-theoretic representations of such languages, we systematically
|
||
relate nominal renaming sets to nominal sets.
|
||
We start by establishing a categorical adjunction in Section 2:
|
||
F
|
||
|
||
Pm-Nom
|
||
|
||
⊥
|
||
|
||
Sb-Nom
|
||
|
||
U
|
||
|
||
where Pm-Nom is the usual category of nominal sets and Sb-Nom the category of
|
||
nominal renaming sets. The right adjoint U simply forgets the action of non-injective
|
||
substitutions. The left adjoint F freely extends a nominal set with elements representing the application of such substitutions. For instance, F maps the nominal set 𝔸(∗) of
|
||
all words consisting of distinct atoms to the nominal renaming set 𝔸∗ consisting of all
|
||
words over the atoms.
|
||
In fact, the latter follows from one of the main results of this paper: F maps the
|
||
separated product X ∗ Y of nominal sets to the Cartesian product of nominal renaming
|
||
sets. Additionally, under certain conditions, U maps the exponent to the magic wand
|
||
X −−∗ Y, which is the right adjoint of the separated product. The separated product
|
||
consists of those pairs whose elements have disjoint supports. This is relevant for
|
||
name abstraction (Pitts, 2013), and has also been studied in the setting of presheaf
|
||
categories, aimed towards separation logic (O’Hearn, 2003).
|
||
We apply these connections between nominal sets and renaming sets in the context
|
||
of automata theory. Nominal automata are an elegant model for recognising languages
|
||
over infinite alphabets. They are expressively equivalent to the more classical register
|
||
automata (Bojańczyk, 2018, Theorem 6.5), and have appealing properties that register
|
||
automata lack, such as unique minimal automata. However, moving from register
|
||
|
||
Separation and Renaming in Nominal Sets
|
||
|
||
133
|
||
|
||
automata to nominal automata can lead to an exponential blow-up in the number of
|
||
states.25
|
||
As a motivating example, we consider a language modelling an n-bounded FIFO
|
||
queue. The input alphabet is given by Σ = {Put(a) | a ∈ 𝔸} ∪ {Pop}, and the output
|
||
alphabet by O = 𝔸 ∪ {⊥} (here ⊥ is a null value). The language Ln : Σ∗ → O maps
|
||
a sequence of queue operations to the resulting top element when starting from the
|
||
empty queue, or to ⊥ if this is undefined. The language Ln can be recognised by a
|
||
nominal automaton, but this requires an exponential number of states in n, as the
|
||
automaton distinguishes internally between all possible equalities among elements in
|
||
the queue.
|
||
Based on the observation that Ln is a nominal renaming set, we can come up with
|
||
a linear automata-theoretic representation. To this end, we define the new notion of
|
||
separated nominal automaton, where the transition function is only defined for pairs
|
||
of states and letters with a disjoint support (Section 3). Using the aforementioned
|
||
categorical framework, we find that such separated automata recognise languages
|
||
which are nominal renaming sets. Although separated nominal automata are not as
|
||
expressive as classical nominal automata, they can be much smaller. In particular, in
|
||
the FIFO example, the reachable part of the separated automaton obtained from the
|
||
original nominal automaton has n + 1 states, thus dramatically reducing the number
|
||
of states. We expect that such a reduction is useful in many applications, such as
|
||
automata learning (Chapter 5).
|
||
In summary, the main contributions of this paper are the adjunction between
|
||
nominal sets and nominal renaming sets, the relation between separated product and
|
||
the Cartesian product of renaming sets, and the application to automata theory. We
|
||
conclude with a coalgebraic account of separated automata in Section 3.1. In particular,
|
||
we justify the semantics of separated automata by showing how it arises through a
|
||
final coalgebra, obtained by lifting the adjunction to categories of coalgebras. The last
|
||
section is orthogonal to the other results, and background knowledge of coalgebra is
|
||
needed only there.
|
||
|
||
1 Monoid actions and nominal sets
|
||
In order to capture both the standard notion of nominal sets by Pitts (2013) and sets
|
||
with more general renaming actions by Gabbay and Hofmann (2008), we start by
|
||
defining monoid actions.
|
||
Definition 1. Let (M, ⋅, 1) be a monoid. An M-set is a set X together with a function
|
||
⋅ : M × X → X such that 1 ⋅ x = x and m ⋅ (n ⋅ x) = (m ⋅ n) ⋅ x for all m, n ∈ M and
|
||
x ∈ X. The function ⋅ is called an M-action and m ⋅ x is often written by juxtaposition
|
||
mx. A function f : X → Y between two M-sets is M-equivariant if m ⋅ f(x) = f(m ⋅ x) for
|
||
25
|
||
|
||
Here ‘number of states’ refers to the number of orbits in the state space.
|
||
|
||
134
|
||
|
||
Chapter 7
|
||
|
||
all m ∈ M and x ∈ X. The class of M-sets together with equivariant maps forms a
|
||
category M-Set.
|
||
Let 𝔸 = {a, b, c, …} be a countable infinite set of atoms. The two main instances of M
|
||
considered in this paper are the monoid
|
||
Sb = {m : 𝔸 → 𝔸 | m(a) ≠ a for finitely many a}
|
||
of all (finite) substitutions (with composition as multiplication), and the monoid
|
||
Pm = {g ∈ Sb | g is a bijection}
|
||
of all (finite) permutations. Since Pm is a submonoid of Sb, any Sb-set is also a Pm-set;
|
||
and any Sb-equivariant map is also Pm-equivariant. This gives rise to a forgetful
|
||
functor
|
||
U : Sb-Set → Pm-Set.
|
||
|
||
The set 𝔸 is an Sb-set by defining m ⋅ a = m(a). Given an M-set X, the set 𝒫(X) of
|
||
subsets of X is an M-set, with the action defined by direct image.
|
||
For a Pm-set X, the orbit of an element x is the set orb(x) = {g ⋅ x | g ∈ Pm}. We say
|
||
X is orbit-finite if the set {orb(s) | x ∈ X} is finite.
|
||
For any monoid M, the category M-Set is symmetric monoidal closed. The product
|
||
of two M-sets is given by the Cartesian product, with the action defined pointwise:
|
||
m ⋅ (x, y) = (m ⋅ x, m⋅y). In M-Set, the exponent X →M Y is given by the set {f : M×X →
|
||
Y | f is equivariant}.26 The action on such an f : M × X → Y is defined by (m ⋅ f)(n, x) =
|
||
f(mn, x). A good introduction to the construction of the exponent is given by Simmons
|
||
(n.d.). If M is a group, a simpler description of the exponent may be given, carried by
|
||
the set of all functions f : X → Y, with the action given by (g ⋅ f)(x) = g ⋅ f(g−1 ⋅ x).
|
||
|
||
1.1 Nominal M-sets
|
||
The notion of nominal set is usually defined w.r.t. a Pm-action. Here, we use the
|
||
generalisation to Sb-actions from Gabbay and Hofmann (2008). Throughout this
|
||
section, let M denote a submonoid of Sb.
|
||
Definition 2. Let X be an M-set, and x ∈ X an element. A set C ⊂ 𝔸 is an (M-)support
|
||
of x if for all m1 , m2 ∈ M s.t. m1 |C = m2 |C we have m1 x = m2 x. An M-set X is called
|
||
nominal if every element x has a finite M-support.
|
||
Nominal M-sets and equivariant maps form a full subcategory of M-Set, denoted by
|
||
M-Nom. The M-set 𝔸 of atoms is nominal. The powerset 𝒫(X) of a nominal set is not
|
||
nominal in general; the restriction to finitely supported elements is.
|
||
26
|
||
|
||
If we write a regular arrow →, then we mean a map in the category. Exponent objects will always be
|
||
denoted by annotated arrows.
|
||
|
||
Separation and Renaming in Nominal Sets
|
||
|
||
135
|
||
|
||
If M is a group, then the notion of support can be simplified by using inverses.
|
||
To see this, first note that, given elements g1 , g2 ∈ M, g1 |C = g2 |C can equivalently
|
||
be written as g1 g−1
|
||
2 |C = id|C . Second, the statement xg1 = xg2 can be expressed as
|
||
−1
|
||
xg1 g2 = x. Hence, C is a support iff g|C = idC implies gx = x for all g, which is the
|
||
standard definition for nominal sets over a group (Pitts, 2013). Surprisingly, Gabbay
|
||
and Hofmann (2008) show a similar characterisation also holds for Sb-sets. Moreover,
|
||
recall that every Sb-set is also a Pm-set; the associated notions of support coincide on
|
||
nominal Sb-sets, as shown by the following result. In particular, this means that the
|
||
forgetful functor restricts to U : Sb-Nom → Pm-Nom.
|
||
Lemma 3. (Theorem 4.8 from Gabbay, 2007) Let X be a nominal Sb-set, x ∈ X, and
|
||
C ⊂ 𝔸. Then C is an Sb-support of x iff it is a Pm-support of x.
|
||
Remark 4. It is not true that any Pm-support is an Sb-support. The condition that
|
||
X is nominal, in the above lemma, is crucial. Let X = 𝔸 + 1 and define the following
|
||
Sb-action: m ⋅ a = m(a) if m is injective, m ⋅ a = ∗ if m is non-injective, and m ⋅ ∗ = ∗.
|
||
This is a well-defined Sb-set, but is not nominal. Now consider U(X), this is the Pm-set
|
||
𝔸 + 1 with the natural action, which is a nominal Pm-set! In particular, as a Pm-set
|
||
each element has a finite support, but as a Sb-set the supports are infinite.
|
||
This counterexample is similar to the “exploding nominal sets” of Gabbay (2007),
|
||
but even worse behaved. We like to call them nuclear sets, since an element will collapse
|
||
when hit by a non-injective map, no matter how far away the non-injectivity occurs.
|
||
For M ∈ {Sb, Pm}, any element x ∈ X of a nominal M-set X has a least finite support
|
||
(w.r.t. set inclusion). We denote the least finite support of an element x ∈ X by
|
||
supp(x). Note that by Lemma 3, the set supp(x) is independent of whether a nominal
|
||
Sb-set X is viewed as an Sb-set or a Pm-set. The dimension of X is given by dim(X) =
|
||
max{|supp(x)| | x ∈ X}, where |supp(x)| is the cardinality of supp(x).
|
||
We list some basic properties of nominal M-sets, which have known counterparts
|
||
for the case that M is a group (Bojańczyk, et al., 2014), and when M = Sb (Gabbay &
|
||
Hofmann, 2008).
|
||
Lemma 5. Let X be an M-nominal set. If C supports an element x ∈ X, then m ⋅ C
|
||
supports m ⋅ x for all m ∈ M. Moreover, any g ∈ Pm preserves least supports: g ⋅
|
||
supp(x) = supp(gx).
|
||
The latter equality does not hold in general for a monoid M. For instance, the exploding
|
||
nominal renaming sets by Gabbay and Hofmann (2008) give counterexamples for
|
||
M = Sb.
|
||
Lemma 6. Given M-nominal sets X, Y and a map f : X → Y, if f is M-equivariant and
|
||
C supports an element x ∈ X, then C supports f(x).
|
||
|
||
136
|
||
|
||
Chapter 7
|
||
|
||
The category M-Nom is symmetric monoidal closed, with the product inherited from
|
||
M-Set, thus simply given by Cartesian product. The exponent is given by the restriction of the exponent X →M Y in M-Set to the set of finitely supported functions, denoted by X →M
|
||
fs Y. This is similar to the exponents of nominal sets with 01-substitutions
|
||
from Pitts (2014).
|
||
Remark 7. Gabbay and Hofmann (2008) give a different presentation of the exponent in M-Nom, based on a certain extension of partial functions. We prefer the
|
||
previous characterisation, as it is derived in a straightforward way from the exponent
|
||
in M-Set.
|
||
|
||
1.2 Separated product
|
||
Definition 8. Two elements x, y ∈ X of a Pm-nominal set are called separated, denoted
|
||
by x # y, if there are disjoint sets C1 , C2 ⊂ 𝔸 such that C1 supports x and C2 supports
|
||
y. The separated product of Pm-nominal sets X and Y is defined as
|
||
X ∗ Y = {(x, y) | x # y}.
|
||
|
||
We extend the separated product to the separated power, defined by X(0) = 1 and
|
||
X(n+1) = X(n) ∗ X, and the set of separated words X(∗) = ⋃i X(i) . The separated product
|
||
is an equivariant subset X ∗ Y ⊆ X × Y. Consequently, we have equivariant projection
|
||
maps X ∗ Y → X and X ∗ Y → Y.
|
||
Example 9. Two finite sets C, D ⊂ 𝔸 are separated precisely when they are disjoint.
|
||
An important example is the set 𝔸(∗) of separated words over the atoms: it consists of
|
||
those words where all letters are distinct.
|
||
The separated product gives rise to another symmetric closed monoidal structure
|
||
on Pm-Nom, with 1 as unit, and the exponential object given by magic wand X −−∗ Y.
|
||
An explicit characterisation of X −−∗ Y is not needed in the remainder of this chapter,
|
||
but for a complete presentation we briefly recall the description from Schöpp (2006)
|
||
(see also the book of Pitts, 2013 and the paper of Clouston, 2013). First, define a
|
||
Pm-action on the set of partial functions f : X ⇀ Y by (g ⋅ f)(x) = g ⋅ f(g−1 ⋅ x) if f(g−1 ⋅ x)
|
||
is defined. Now, such a partial function f : X ⇀ Y is called separating if f is finitely
|
||
supported, f(x) is defined iff f # x, and supp(f) = ⋃x∈dom(f) supp(f(x)) ∖ supp(x).
|
||
Finally, X −−∗ Y = {f : X ⇀ Y | f is separating}. We refer to the thesis of Schöpp (2006)
|
||
(Section 3.3.1) for a proof and explanation.
|
||
Remark 10. The special case 𝔸 −−∗ Y coincides with [𝔸]Y, the set of name abstractions
|
||
(Pitts, 2013). The latter is generalised to [X]Y by Clouston (2013), but it is shown there
|
||
that the coincidence [X]Y ≅ (X −−∗ Y) only holds under strong assumptions (including
|
||
that X is single-orbit).
|
||
|
||
Separation and Renaming in Nominal Sets
|
||
|
||
137
|
||
|
||
Remark 11. An analogue of the separated product does not seem to exist for nominal
|
||
Sb-sets. For instance, consider the set 𝔸 × 𝔸. As a Pm-set, it has four equivariant
|
||
subsets: ∅, Δ(𝔸) = {(a, a) | a ∈ 𝔸}, 𝔸 ∗ 𝔸, and 𝔸 × 𝔸. However, the set 𝔸 ∗ 𝔸 is not
|
||
an equivariant subset when considering 𝔸 × 𝔸 as an Sb-set.
|
||
|
||
2 A monoidal construction from Pm-sets to Sb-sets
|
||
In this section, we provide a free construction, extending nominal Pm-sets to nominal
|
||
Sb-sets. We use this as a basis to relate the separated product and exponent (in
|
||
Pm-Nom) to the product and exponent in Sb-Nom. The main results are:
|
||
– the forgetful functor U : Sb-Nom → Pm-Nom has a left adjoint F (Theorem 16);
|
||
– this F is monoidal: it maps separated products to products (Theorem 17);
|
||
– U maps the exponent object in Sb-Nom to the right adjoint −−∗ of the separated
|
||
product, if the domain has dimension ≤ 1 (Theorem 24, Corollary 25).
|
||
Together, these results form the categorical infrastructure to relate nominal languages
|
||
to separated languages and automata in Section 3.
|
||
Definition 12. Given a Pm-nominal set X, we define a nominal Sb-set F(X) as follows.
|
||
Define the set
|
||
F(X) = {(m, x) | m ∈ Sb, x ∈ X}/∼ ,
|
||
|
||
where ∼ is the least equivalence relation containing:
|
||
(m, gx) ∼ (mg, x),
|
||
(m, x) ∼ (m′ , x)
|
||
|
||
if m|C = m′ |C for a Pm-support C of x,
|
||
|
||
for all x ∈ X, m, m′ ∈ Sb and g ∈ Pm. The equivalence class of a pair (m, x) is denoted
|
||
by [m, x]. We define an Sb-action on F(X) as n ⋅ [m, x] = [nm, x].
|
||
Well-definedness is proved as part of Proposition 15 below. Informally, an equivalence
|
||
class [m, x] ∈ F(X) behaves “as if m acted on x”. The first equation of ∼ ensures
|
||
compatibility with the Pm-action on x, and the second equation ensures that [m, x]
|
||
only depends the relevant part of m. The following characterisation of ∼ is useful in
|
||
proofs.
|
||
Lemma 13. We have (m1 , x1 ) ∼ (m2 , x2 ) iff there is a permutation g ∈ Pm such that
|
||
gx1 = x2 and m1 |C = m2 g|C , for C some Pm-support of x1 .
|
||
Remark 14. The first relation of ∼ in Definition 12 comes from the construction of
|
||
“extension of scalars” in commutative algebra (see Atiyah & MacDonald, 1969). In
|
||
that context, one has a ring homomorphism f : A → B and an A-module M and wishes
|
||
|
||
138
|
||
|
||
Chapter 7
|
||
|
||
to obtain a B-module. This is constructed by the tensor product B ⊗A M and it is here
|
||
that the relation (b, am) ∼ (ba, m) is used (B is a right A-module via f).
|
||
Proposition 15.
|
||
|
||
The construction F in Definition 12 extends to a functor
|
||
F : Pm-Nom → Sb-Nom,
|
||
|
||
defined on an equivariant map f : X → Y by F(f)([m, x]) = [m, f(x)] ∈ F(Y).
|
||
Proof. We first prove well-definedness and then the functoriality.
|
||
𝐅(𝐗) is an Sb-set. To this end we check that the Sb-action is well-defined. Let
|
||
[m1 , x1 ] = [m2 , x2 ] ∈ F(X) and let m ∈ Sb. By Lemma 13, there is some permutation g such that gx1 = x2 and m1 |C = m2 g|C for some support C of x1 . By postcomposition with m we get mm1 |C = mm2 g|C , which means (again by the lemma)
|
||
that [mm1 , x1 ] = [mm2 , x2 ]. Thus m[m1 , x1 ] = m[m2 , x2 ], which concludes well-
|
||
|
||
definedness.
|
||
For associativity and unitality of the Sb-action, we simply note that it is directly
|
||
defined by left multiplication of Sb which is associative and unital. This concludes
|
||
that F(X) is an Sb-set.
|
||
𝐅(𝐗) is a nominal Sb set. Given an element [m, x] ∈ F(X) and a Pm-support C of x,
|
||
we will prove that m ⋅ C is an Sb-support for [m, x]. Suppose that we have m1 , m2 ∈ Sb
|
||
such that m1 |m⋅C = m2 |m⋅C . By pre-composition with m we get m1 m|C = m2 m|C and
|
||
this leads us to conclude [m1 m, x] = [m2 m, x]. So m1 [m, x] = m2 [m, x] as required.
|
||
|
||
Functoriality. Let f : X → Y be a Pm-equivariant map. To see that F(f) is welldefined consider [m1 , x1 ] = [m2 , x2 ]. By Lemma 13, there is a permutation g such
|
||
that gx1 = x2 and m1 |C = m2 g|C for some support C of x1 . Applying F(f) gives on
|
||
one hand [m1 , f(x1 )] and on the other hand [m2 , f(x2 )] = [m2 , f(gx1 )] = [m2 , gf(x1 )] =
|
||
[m2 g, f(x1 )] (we used equivariance in the second step). Since m1 |C = m2 g|C and f
|
||
preserves supports we have [m2 g, f(x1 )] = [m1 , f(x1 )].
|
||
For Sb-equivariance we consider both n ⋅ F(f)([m, x]) = n[m, f(x)] = [nm, f(x)] and
|
||
F(f)(n ⋅ [m, x]) = F(f)([nm, x]) = [nm, f(x)]. This shows that nF(f)([m, x]) = F(f)(n[m, x])
|
||
and concludes that we have a map F(f) : F(X) → F(Y).
|
||
The fact that F preserves the identity function and composition follows from the
|
||
definition directly.
|
||
□
|
||
Theorem 16.
|
||
|
||
The functor F is left adjoint to U:
|
||
F
|
||
|
||
Pm-Nom
|
||
|
||
⊥
|
||
U
|
||
|
||
Sb-Nom
|
||
|
||
Separation and Renaming in Nominal Sets
|
||
|
||
139
|
||
|
||
Proof. We show that, for every nominal set X, there is a map ηX : X → UF(X) with
|
||
the necessary universal property: for every Pm-equivariant f : X → U(Y) there is
|
||
a unique Sb-equivariant map f♯ : FX → Y such that U(f♯ ) ∘ ηX = f. Define ηX by
|
||
ηX (x) = [id, x]. This is equivariant: g ⋅ ηX (x) = g[id, x] = [g, x] = [id, gx] = ηX (gx).
|
||
Now, for f : X → U(Y), define f♯ ([m, x]) = m ⋅ f(x) for x ∈ X and m ∈ Sb. Then
|
||
U(f♯ ) ∘ ηX (x) = f♯ ([id, x]) = id ⋅ f(x) = f(x).
|
||
To show that f♯ is well-defined, consider [m1 , x1 ] = [m2 , x2 ] (we have to prove
|
||
that m1 ⋅ f(x1 ) = m2 ⋅ f(x2 )). By Lemma 13, there is a g ∈ Pm such that gx1 = x2
|
||
and m2 g|C = m1 |C for a Pm-support C of x1 . Now C is also a Pm-support for f(x)
|
||
and hence it is an Sb-support of f(x) (Lemma 3). We conclude that m2 ⋅ f(x2 ) =
|
||
m2 ⋅ f(gx1 ) = m2 g ⋅ f(x1 ) = m1 ⋅ f(x1 ) (we use Pm-equivariance in the one but last step
|
||
and Sb-support in the last step). Finally, Sb-equivariance of f♯ and uniqueness are
|
||
straightforward calculations.
|
||
□
|
||
The counit ϵ : FU(Y) → Y is defined by ϵ([m, x]) = m ⋅ x. For the inverse of −♯ , let
|
||
g : F(X) → Y be an Sb-equivariant map; then g♭ : X → U(Y) is given by g♭ (x) = g([id, x]).
|
||
Note that the unit η is a Pm-equivariant map, hence it preserves supports (i.e., any
|
||
support of x also supports [id, x]). This also means that if C is a support of x, then
|
||
m ⋅ C is a support of [m, x] (by Lemma 5).
|
||
|
||
2.1 On (separated) products
|
||
The functor F not only preserves coproducts, being a left adjoint, but it also maps the
|
||
separated product to products:
|
||
Theorem 17. The functor F is strong monoidal, from the monoidal category
|
||
(Pm-Set, ∗ , 1) to (Sb-Set, ×, 1). In particular, the map p given by
|
||
p = ⟨F(π1 ), F(π2 )⟩ : F(X ∗ Y) → F(X) × F(Y)
|
||
|
||
is an isomorphism, natural in X and Y.
|
||
Proof. We prove that p is an isomorphism. It suffices to show that p is injective and
|
||
surjective. Note that p([m, (x, y)]) = ([m, x], [m, y]).
|
||
Surjectivity. Let ([m1 , x], [m2 , y]) be an element of F(X) × F(Y). We take an element
|
||
y′ ∈ Y such that y′ # supp(x) and y′ = gy for some g ∈ Pm. Now we have an element
|
||
(x, y′ ) ∈ X ∗ Y. By Lemma 5, we have supp(y′ ) = supp(y). Define the map
|
||
m(x) =
|
||
|
||
⎧
|
||
⎪ m1 (x)
|
||
m2 (g−1 (x))
|
||
⎨
|
||
⎪
|
||
⎩x
|
||
|
||
if x ∈ supp(x)
|
||
if x ∈ supp(y′ )
|
||
otherwise.
|
||
|
||
(Observe that supp(x) # supp(y′ ), so the cases are not overlapping.) The map m is an
|
||
element of Sb. Now consider the element z = [m, (x, y′ )] ∈ F(X ∗ Y). Applying p to z
|
||
|
||
140
|
||
|
||
Chapter 7
|
||
|
||
gives the element ([m, x], [m, y′ ]). First, we note that [m, x] = [m1 , x] by the definition
|
||
of m. Second, we show that [m, y′ ] = [m2 , y]. Observe that mg|supp(y) = m2 |supp(y)
|
||
by definition of m. Since supp(y) is a support of y, we have [mg, y] = [m2 , y], and
|
||
since [mg, y] = [m, gy] = [m, y′ ] we are done. Hence p([m, (x, y′ )]) = ([m, x], [m, y′ ]) =
|
||
([m1 , x], [m2 , y]), so p is surjective.
|
||
Injectivity. Let [m1 , (x1 , y1 )] and [m2 , (x2 , y2 )] be two elements. Suppose that they
|
||
are mapped to the same element, i.e., [m1 , x1 ] = [m2 , x2 ] and [m1 , y1 ] = [m2 , y2 ]. Then
|
||
there are permutations gx , gy such that x2 = gx x1 and y2 = gy y1 . Moreover, let
|
||
C = supp(x1 ) and D = supp(y1 ); then we have m1 |C = m2 gx |C and m1 |D = m2 gy |D .
|
||
In order to show the two original elements are equal, we have to provide a single
|
||
permutation g. Define for, z ∈ C ∪ D,
|
||
g0 (z) =
|
||
|
||
gx (z)
|
||
{ gy (z)
|
||
|
||
if z ∈ C
|
||
if z ∈ D.
|
||
|
||
(Again, C and D are disjoint.) The function g0 is injective since the least supports of x2
|
||
and y2 are disjoint. Hence g0 defines a local isomorphism from C ∪ D to g0 (C ∪ D). By
|
||
homogeneity (Pitts, 2013), the map g0 extends to a permutation g ∈ Pm with g(z) =
|
||
gx (z) for z ∈ C and g(z) = gy (z) for z ∈ D. In particular we get (x2 , y2 ) = g(x1 , y1 ). We
|
||
also obtain m1 |C∪D = m2 g|C∪D . This proves that [m1 , (x1 , y1 )] = [m2 , (x2 , y2 )], and so
|
||
the map p is injective.
|
||
Unit and coherence. To show that F preserves the unit, we note that [m, 1] = [m′ , 1]
|
||
for every m, m′ ∈ Sb, as the empty set supports 1 and so m|∅ = m′ |∅ vacuously holds.
|
||
We conclude F(1) is a singleton. By the definition p([m, (x, y)]) = ([m, x], [m, y]), one
|
||
can check the coherence axioms elementary.
|
||
□
|
||
Since F also preserves coproducts (being a left adjoint), we obtain that F maps the set
|
||
of separated words to the set of all words.
|
||
Corollary 18.
|
||
|
||
For any Pm-nominal set X, we have F(X(∗) ) ≅ (FX)∗ .
|
||
|
||
As we will show below, the functor F preserves the set 𝔸 of atoms. This is an instance
|
||
of a more general result about preservation of one-dimensional objects.
|
||
Lemma 19. The functors F and U are equivalences on ≤ 1-dimensional objects. Concretely, for X ∈ Pm-Nom and Y ∈ Sb-Nom:
|
||
– If dim(X) ≤ 1, then the unit η : X → UF(X) is an isomorphism.
|
||
– If dim(Y) ≤ 1, then the co-unit ϵ : FU(X) → X is an isomorphism.
|
||
Before we prove this lemma, we need the following technical property of ≤ 1-dimensional
|
||
Sb-sets.
|
||
|
||
Separation and Renaming in Nominal Sets
|
||
|
||
141
|
||
|
||
Lemma 20. Let Y be a nominal Sb-set. If an element y ∈ Y is supported by a singleton
|
||
set (or even the empty set), then
|
||
{my | m ∈ Sb} = {gy | g ∈ Pm}.
|
||
|
||
Proof. Let y ∈ Y be supported by {a} and let m ∈ Sb. Now consider b = m(a) and the
|
||
bijection g = (a b). Now m|{a} = g|{a} , meaning that my = gy. So the set {my | m ∈ Sb}
|
||
is contained in {gy | g ∈ Pm}. The inclusion the other way is trivial, which means
|
||
{my | m ∈ Sb} = {gy | g ∈ Pm}.
|
||
□
|
||
Proof of Lemma 19. It is easy to see that η : x ↦ [id, x] is injective. Now to see that
|
||
η is surjective, let [m, x] ∈ UF(X) and consider a support {a} of x (this is a singleton
|
||
or empty since dim(X) ≤ 1). Let b = m(a) and consider the swap g = (a b). Now
|
||
[m, x] = [mg−1 , gx] and note that {b} supports gx and mg−1 |{b} = id|{b} . We continue
|
||
with [mg−1 , gx] = [id, gx], which concludes that gx is the preimage of [m, x]. Hence η
|
||
is an isomorphism.
|
||
To see that ϵ : [m, y] ↦ my is surjective, just consider m = id. To see that ϵ is
|
||
injective, let [m, y], [m′ , y′ ] ∈ FU(Y) be two elements such that my = m′ y′ . Then
|
||
by using Lemma 20 we find g, g′ ∈ Pm such that gy = my = m′ y′ = g′ y′ . This
|
||
means that y and y′ are in the same orbit (of U(Y)) and have the same dimension.
|
||
Case 1: supp(y) = supp(y′ ) = ∅, then [m, y] = [id, y] = [id, y′ ] = [m′ , y′ ]. Case 2:
|
||
supp(y) = {a} and supp(y′ ) = {b}, then supp(gy) = {g(a)} (Lemma 5). In particular
|
||
we now now that m and g map a to c = g(a), likewise m′ and g′ map b to c. Now
|
||
[m, y] = [m, g−1 g′ y′ ] = [mg−1 g′ , y′ ] = [m′ , y′ ], where we used mg−1 g(b) = c = m′ (b)
|
||
in the last step. This means that ϵ is injective and hence an isomorphism.
|
||
□
|
||
By Lemma 19, we may consider the set 𝔸 as both Sb-set and Pm-set (abusing notation).
|
||
And we get an isomorphism F(𝔸) ≅ 𝔸 of nominal Sb-sets. To appreciate the above
|
||
results, we give a concrete characterisation of one-dimensional nominal sets:
|
||
Lemma 21. Let X be a nominal M-set, for M ∈ {Sb, Pm}. Then dim(X) ≤ 1 iff there
|
||
exist (discrete) sets Y and I such that X ≅ Y + ∐I 𝔸.
|
||
In particular, the one-dimensional objects include the alphabets used for data words,
|
||
consisting of a product S × 𝔸 of a discrete set S of action labels and the set of atoms.
|
||
These alphabets are very common in the study of register automata (see, e.g., Isberner,
|
||
et al., 2014).
|
||
By the above and Theorem 17, F maps separated powers of 𝔸 to powers, and the
|
||
set of separated words over 𝔸 to the Sb-set of words over 𝔸.
|
||
Corollary 22.
|
||
|
||
We have F(𝔸(n) ) ≅ 𝔸n and F(𝔸(∗) ) ≅ 𝔸∗ .
|
||
|
||
142
|
||
|
||
Chapter 7
|
||
|
||
2.2 On exponents
|
||
We have described how F and U interact with (separated) products. In this section, we
|
||
establish a relationship between the magic wand (−−∗) and the exponent of nominal
|
||
Sb-sets (→Sb
|
||
fs ).
|
||
Definition 23. Let X ∈ Pm-Nom and Y ∈ Sb-Nom. We define a Pm-equivariant map
|
||
ϕ as follows:
|
||
ϕ : (X −−∗ U(Y)) → U(F(X) →Sb
|
||
fs Y)
|
||
|
||
is defined by using the composition
|
||
−1
|
||
|
||
p
|
||
F(ev)
|
||
ϵ
|
||
F(X −−∗ U(Y)) × F(X) ⟶
|
||
F((X −−∗ U(Y)) ∗ X) ⟶
|
||
FU(Y) ⟶
|
||
Y,
|
||
|
||
where p−1 is from Theorem 17 and ev is the evaluation map of the exponent −−∗. By
|
||
Currying and the adjunction we arrive at ϕ:
|
||
F(X −−∗ U(Y)) × F(X) → Y
|
||
F(X −−∗ U(Y)) → (F(X) →Sb
|
||
fs Y)
|
||
|
||
ϕ : (X −−∗ U(Y)) → U(F(X) →Sb
|
||
fs Y)
|
||
|
||
by Currying
|
||
by Theorem 16
|
||
|
||
With this map we can prove a generalisation of Theorem 16. In particular, the following
|
||
theorem generalises the one-to-one correspondence between maps X → U(Y) and
|
||
maps F(X) → Y. First, it shows that this correspondence is Pm-equivariant. Second, it
|
||
extends the correspondence to all finitely supported maps and not just the equivariant
|
||
ones.
|
||
Theorem 24. The sets X −−∗ U(Y) and U(F(X) →Sb
|
||
fs Y) are naturally isomorphic via ϕ
|
||
as nominal Pm-sets.
|
||
Proof. We define some additional maps in order to construct the inverse of ϕ. First,
|
||
from Theorem 16 we get the following isomorphism:
|
||
=
|
||
q : U(X × Y) ⟶
|
||
U(X) × U(Y)
|
||
|
||
Second, with this map and Currying, we obtain the following two natural maps:
|
||
q
|
||
U(ev)
|
||
Sb
|
||
U(F(X) →Sb
|
||
fs Y) × UF(X) ⟶ U((F(X) →fs Y) × F(X)) ⟶ U(Y)
|
||
−1
|
||
|
||
Pm
|
||
α : U(F(X) →Sb
|
||
fs Y) → (UF(X) →fs U(Y))
|
||
|
||
id×η
|
||
ev
|
||
Pm
|
||
(UF(X) →Pm
|
||
fs U(Y)) × X ⟶ (UF(X) →fs U(Y)) × UF(X) ⟶ U(Y)
|
||
|
||
Pm
|
||
β : (UF(X) →Pm
|
||
fs U(Y)) → (X →fs U(Y))
|
||
|
||
by Currying
|
||
|
||
by Currying
|
||
|
||
Separation and Renaming in Nominal Sets
|
||
|
||
143
|
||
|
||
Last, we note that the inclusion A ∗ B ⊆ A × B induces a restriction map r : (B →Pm
|
||
fs
|
||
C) → (B −−∗ C) (again by Currying). A calculation shows that r ∘ β ∘ α is the inverse
|
||
of ϕ.
|
||
□
|
||
Note that this theorem gives an alternative characterisation of the magic wand in terms
|
||
of the exponent in Sb-Nom, if the codomain is U(Y). Moreover, for a 1-dimensional
|
||
object X in Sb-Nom, we obtain the following special case of the theorem (using the
|
||
co-unit isomorphism from Lemma 19):
|
||
Let X, Y be nominal Sb-sets. For 1-dimensional X, the nominal Pm-set
|
||
U(X) −−∗ U(Y) is naturally isomorphic to U(X →Sb
|
||
fs Y).
|
||
Corollary 25.
|
||
|
||
Remark 26. The set 𝔸 −−∗ U(X) coincides with the atom abstraction [𝔸]UX (Remark 10). Hence, as a special case of Corollary 25, we recover Theorem 34 of Gabbay
|
||
and Hofmann (2008), which states a bijective correspondence between [𝔸]UX and
|
||
U(𝔸 →Sb
|
||
fs X).
|
||
|
||
3 Nominal and separated automata
|
||
In this section, we study nominal automata, which recognise languages over infinite
|
||
alphabets. After recalling the basic definitions, we introduce a new variant of automata
|
||
based on the separating product, which we call separated nominal automata. These
|
||
automata represent nominal languages which are Sb-equivariant, essentially meaning
|
||
they are closed under substitution. Our main result is that, if a ‘classical’ nominal
|
||
automaton (over Pm) represents a language L which is Sb-equivariant, then L can also
|
||
be represented by a separated nominal automaton. The latter can be exponentially
|
||
smaller (in number of orbits) than the original automaton, as we show in a concrete
|
||
example.
|
||
Remark 27. We will work with a general output set O instead of just acceptance.
|
||
The reason for this is that Sb-equivariant functions L : 𝔸 → 2 are not very interesting:
|
||
they are defined purely by the length of the input. By using more general output O,
|
||
we may still capture interesting behaviour, e.g., the automaton in Example 29.
|
||
Definition 28. Let Σ, O be Pm-sets, called input/output alphabet respectively.
|
||
– A (Pm)-nominal language is an equivariant map of the form L : Σ∗ → O.
|
||
– A nominal (Moore) automaton 𝒜 = (Q, δ, o, q0 ) consists of a nominal set of states Q,
|
||
an equivariant transition function δ : Q × Σ → Q, an equivariant output function
|
||
o : Q → O, and an initial state q0 ∈ Q with an empty support.
|
||
– The language semantics is the map l : Q × Σ∗ → O, defined inductively by
|
||
l(x, ε) = o(x) ,
|
||
|
||
l(x, aw) = l(δ(x, a), w)
|
||
|
||
144
|
||
|
||
Chapter 7
|
||
|
||
for all x ∈ Q, a ∈ Σ and w ∈ Σ∗ .
|
||
♭
|
||
∗
|
||
– For l♭ : Q → (Σ∗ →Pm
|
||
fs O) the transpose of l, we have that l (q0 ) : Σ → O is equivariant; this is called the language accepted by 𝒜.
|
||
Note that the language accepted by an automaton can equivalently be characterised
|
||
by considering paths through the automaton from the initial state.
|
||
If the state space Q and the alphabets Σ, O are orbit finite, this allows us to run
|
||
algorithms (reachability, minimization, etc.) on such automata, but there is no need to
|
||
assume this for now. For an automaton 𝒜 = (Q, δ, o, q0 ), we define the set of reachable
|
||
states as the least set R(𝒜) ⊆ Q such that q0 ∈ R(𝒜) and for all x ∈ R(𝒜) and a ∈ Σ,
|
||
δ(x, a) ∈ R(𝒜).
|
||
Example 29. We model a bounded FIFO queue of size n as a nominal Moore automaton, explicitly handling the data in the automaton structure.27 The input alphabet Σ
|
||
and output alphabet O are as follows:
|
||
Σ = {Put(a) | a ∈ 𝔸} ∪ {Pop},
|
||
|
||
O = 𝔸 ∪ {⊥}.
|
||
|
||
The input alphabet encodes two actions: putting a new value on the queue and
|
||
popping a value. The output is either a value (the front of the queue) or ⊥ if the
|
||
queue is empty. A queue of size n is modelled by the automaton (Q, δ, o, q0 ) defined
|
||
as follows.
|
||
Q = 𝔸≤n ∪ {⊥},
|
||
|
||
δ(a1 …ak , Put(b)) =
|
||
|
||
q0 = ϵ,
|
||
|
||
a1 …ak b
|
||
{⊥
|
||
|
||
o(a1 …ak ) =
|
||
|
||
if k < n
|
||
otherwise
|
||
|
||
a1
|
||
{⊥
|
||
|
||
if k ≥ 1
|
||
otherwise
|
||
|
||
δ(a1 …ak , Pop) =
|
||
|
||
a2 …ak
|
||
{⊥
|
||
|
||
if k > 0
|
||
otherwise
|
||
|
||
δ(⊥, x) = ⊥
|
||
|
||
The automaton is depicted in Figure 7.1 for the case n = 3. The language accepted
|
||
by this automaton assigns to a word w the first element of the queue after executing
|
||
the instructions in w from left to right, and ⊥ if the input is ill-behaved, i.e., Pop is
|
||
applied to an empty queue or Put(a) to a full queue.
|
||
Definition 30. Let Σ, O be Pm-sets. A separated language is an equivariant map of the
|
||
form Σ(∗) → O. A separated automaton 𝒜 = (Q, δ, o, q0 ) consists of Q, o and q0 defined
|
||
as in a nominal automaton, and an equivariant transition function δ : Q ∗ Σ → Q.
|
||
The separated language semantics of such an automaton is given by the map s : Q ∗
|
||
(∗)
|
||
Σ → O, defined by
|
||
s(x, ϵ) = o(x),
|
||
27
|
||
|
||
s(x, aw) = s(δ(x, a), w)
|
||
|
||
We use a reactive version of the queue data structure which is slightly different from the versions of Isberner, et al. (2014) and Moerman, et al. (2017).
|
||
|
||
Separation and Renaming in Nominal Sets
|
||
|
||
ϵ
|
||
o=⊥
|
||
|
||
Pop
|
||
⊥
|
||
o=⊥
|
||
|
||
Pop
|
||
Put(a)
|
||
|
||
a
|
||
o=a
|
||
|
||
Pop
|
||
goes to b
|
||
Put(b)
|
||
|
||
ab
|
||
o=a
|
||
|
||
Pop
|
||
goes to bc
|
||
Put(c)
|
||
|
||
145
|
||
|
||
abc
|
||
o=a
|
||
|
||
Put(d)
|
||
|
||
Σ
|
||
|
||
Figure 7.1 The FIFO automaton from Example 29 with n = 3. The right-most state
|
||
consists of five orbits as we can take a, b, c distinct, all the same, or two of them equal
|
||
in three different ways. Consequently, the complete state space has ten orbits. The
|
||
output of each state is denoted in the lower part.
|
||
for all x ∈ Q, a ∈ Σ and w ∈ Σ(∗) such that x # aw and a # w.
|
||
Let s♭ : Q → (Σ(∗) −−∗ O) be the transpose of s. Then s♭ (q0 ) : Σ(∗) → O corresponds
|
||
to a separated language, this is called the separated language accepted by 𝒜.
|
||
By definition of the separated product, the transition function is only defined on a
|
||
state x and letter a ∈ Σ if x # a. In Example 36 below, we describe the bounded FIFO
|
||
as a separated automaton, and describe its accepted language.
|
||
First, we show how the language semantics of separated nominal automata extends
|
||
to a language over all words, provided that both the input alphabet Σ and the output
|
||
alphabet O are Sb-sets.
|
||
Definition 31. Let Σ and O be nominal Sb-sets. An Sb-equivariant function L : Σ∗ →
|
||
O is called an Sb-language.
|
||
Notice the difference between an Sb-language L : Σ∗ → O and a Pm-language L′ : (UΣ)∗ →
|
||
U(O). They are both functions from Σ∗ to O, but the latter is only Pm-equivariant,
|
||
while the former satisfies the stronger property of Sb-equivariance. Languages over
|
||
separated words, and Sb-languages, are connected as follows.
|
||
Theorem 32. Suppose Σ, O are both nominal Sb-sets, and suppose dim(Σ) ≤ 1. There
|
||
is a one-to-one correspondence
|
||
S : (UΣ)(∗) → UO Pm-equivariant
|
||
S : Σ∗ → O Sb-equivariant
|
||
|
||
between separated languages and Sb-nominal languages. From S to S, this is given by
|
||
application of the forgetful functor and restricting to the subset of separated words.
|
||
|
||
146
|
||
|
||
Chapter 7
|
||
|
||
For the converse direction, given w = a1 …an ∈ Σ∗ , let b1 , …, bn ∈ Σ such that
|
||
w # bi for all i, and bi # bj for all i, j with i ≠ j. Define m ∈ Sb by
|
||
m(a) =
|
||
|
||
ai
|
||
{a
|
||
|
||
if a = bi for some i
|
||
otherwise
|
||
|
||
Then S(a1 a2 a3 ⋯an ) = m ⋅ S(b1 b2 b3 ⋯bn ).
|
||
Proof. There is the following chain of one-to-one correspondences, from the results of
|
||
the previous section:
|
||
(UΣ)(∗) → UO
|
||
F(UΣ)(∗) → O
|
||
(FUΣ)∗ → O
|
||
Σ∗ → O
|
||
|
||
by Theorem 16
|
||
by Corollary 18
|
||
by Lemma 19
|
||
|
||
□
|
||
|
||
Thus, every separated automaton over U(Σ), U(O) gives rise to an Sb-language S,
|
||
corresponding to the language S accepted by the automaton.
|
||
Any nominal automaton 𝒜 restricts to a separated automaton, formally described
|
||
in Definition 33. It turns out that if the (Pm)-language accepted by 𝒜 is actually an
|
||
Sb-language, then the restricted automaton already represents this language, as the
|
||
extension S of the associated separated language S (Theorem 34). Hence, in such a
|
||
case, the restricted separated automaton suffices to describe the language of 𝒜.
|
||
Definition 33. Let i : Q ∗ U(Σ) → Q × U(Σ) be the natural inclusion map. A nominal
|
||
automaton 𝒜 = (Q, δ, o, q0 ) induces a separated automaton 𝒜∗ , by setting
|
||
𝒜∗ = (Q, δ ∘ i, o, q0 ).
|
||
|
||
Theorem 34. Suppose Σ, O are both Sb-sets, and suppose dim(Σ) ≤ 1. Let L : (UΣ)∗ →
|
||
UO be the Pm-nominal language accepted by a nominal automaton 𝒜, and suppose L
|
||
is Sb-equivariant. Let S be the separated language accepted by 𝒜∗ . Then L = U(S).
|
||
Proof. If follows from the one-to-one correspondence in Theorem 32: on the bottom
|
||
there are two languages (L and U(S)), while there is only the restriction of L on the
|
||
top. We conclude that L = U(S).
|
||
□
|
||
As we will see in Example 36, separated automata allow us to represent Sb-languages
|
||
in a much smaller way than nominal automata. Given a nominal automaton 𝒜, a
|
||
smaller separated automaton can be obtained by computing the reachable part of the
|
||
restriction 𝒜∗ . The reachable part is defined similarly (but only where δ is defined)
|
||
and denoted by R(𝒜∗ ) as well.
|
||
|
||
Separation and Renaming in Nominal Sets
|
||
Proposition 35.
|
||
|
||
147
|
||
|
||
For any nominal automaton 𝒜, we have R(𝒜∗ ) ⊆ R(𝒜).
|
||
|
||
The converse inclusion of the above proposition does certainly not hold, as shown by
|
||
the following example.
|
||
Example 36. Let 𝒜 be the automaton modelling a bounded FIFO queue (for some
|
||
n), from Example 29. The Pm-nominal language L accepted by 𝒜 is Sb-equivariant: it
|
||
is closed under application of arbitrary substitutions.
|
||
The separated automaton 𝒜∗ is given simply by restricting the transition function
|
||
to Q ∗ Σ, i.e., a Put(a)-transition from a state w ∈ Q exists only if a does not occur
|
||
in w. The separated language S accepted by this new automaton is the restriction of
|
||
the nominal language of 𝒜 to separated words. By Theorem 34, we have L = U(S).
|
||
Hence, the separated automaton 𝒜∗ represents L, essentially by closing the associated
|
||
separated language S under all substitutions.
|
||
The reachable part of 𝒜∗ is given by
|
||
R𝒜∗ = 𝔸(≤n) ∪ {⊥}.
|
||
|
||
Clearly, restricting 𝒜∗ to the reachable part does not affect the accepted language.
|
||
However, while the orginal state space Q has exponentially many orbits in n, R𝒜∗ has
|
||
only n + 1 orbits! Thus, taking the reachable part of R𝒜∗ yields a separated automaton which represents the FIFO language L in a much smaller way than the original
|
||
automaton.
|
||
|
||
3.1 Separated automata: coalgebraic perspective
|
||
Nominal automata and separated automata can be presented as coalgebras on the
|
||
category of Pm-nominal sets. In this section we revisit the above results from this perspective, and generalise from (equivariant) languages to finitely supported languages.
|
||
In particular, we retrieve the extension from separated languages to Sb-languages, by
|
||
establishing Sb-languages as a final separated automaton. The latter result follows by
|
||
instantiating a well-known technique for lifting adjunctions to categories of coalgebras,
|
||
using the results of Section 2. In the remainder of this section we assume familiarity
|
||
with the theory of coalgebras, see, e.g., Jacobs (2016) and Rutten (2000).
|
||
Definition 37. Let M be a submonoid of Sb, and let Σ, O be nominal M-sets, referred
|
||
to as the input and output alphabet respectively. We define the functor BM : M-Nom →
|
||
M-Nom by BM (X) = O×(Σ →M
|
||
fs X). An (M)-nominal (Moore) automaton is a BM -coalgebra.
|
||
A BM -coalgebra can be presented as a nominal set Q together with the pairing
|
||
⟨o, δ♭ ⟩ : Q → O × (Σ →M
|
||
fs Q)
|
||
|
||
of an equivariant output function o : Q → O, and (the transpose of) an equivariant
|
||
transition function δ : Q × Σ → Q. In case M = Pm, this coincides with the automata
|
||
|
||
148
|
||
|
||
Chapter 7
|
||
|
||
of Definition 28, omitting initial states. The language semantics is generalised accordingly, as follows. Given such a BM -coalgebra (Q, ⟨o, δ♭ ⟩), the language semantics
|
||
l : Q × Σ∗ → O is given by
|
||
l(x, ε) = o(x) ,
|
||
|
||
l(x, aw) = l(δ(x, a), w)
|
||
|
||
for all x ∈ S, a ∈ Σ and w ∈ Σ∗ .
|
||
Theorem 38. Let M be a submonoid of Sb, let Σ, O be nominal M-sets. The nom∗
|
||
M
|
||
inal M-set Σ∗ →M
|
||
fs O extends to a final BM -coalgebra (Σ →fs O, ζ), such that the
|
||
unique homomorphism from a given BM -coalgebra is the transpose l♭ of the language
|
||
semantics.
|
||
A separated automaton (Definition 30, without initial states) corresponds to a coalgebra
|
||
for the functor B∗ : Pm-Nom → Pm-Nom given by B∗ (X) = O×(Σ −−∗ X). The separated
|
||
language semantics arises by finality.
|
||
Theorem 39. The set Σ(∗) −−∗ O is the carrier of a final B∗ -coalgebra, such that the
|
||
unique coalgebra homomorphism from a given B∗ -coalgebra (Q, ⟨o, δ⟩) is the transpose
|
||
s♭ of the separated language semantics s : Q ∗ Σ(∗) → O (Definition 30).
|
||
Next, we provide an alternative final B∗ -coalgebra which assigns Sb-nominal languages to states of separated nominal automata. The essence is to obtain a final
|
||
B∗ -coalgebra from the final BSb -coalgebra. In order to prove this, we use a technique
|
||
to lift adjunctions to categories of coalgebras. This technique occurs regularly in the
|
||
coalgebraic study of automata (Jacobs, et al., 2015; Kerstan, et al., 2014; Klin & Rot,
|
||
2016).
|
||
Theorem 40. Let Σ be a Pm-set, and O an Sb-set. Define B∗ and BSb accordingly, as
|
||
B∗ (X) = UO × (Σ −−∗ X) and BSb (X) = O × (FΣ →Sb
|
||
fs X).
|
||
There is an adjunction F ⊣ U in:
|
||
F
|
||
|
||
CoAlg(B∗ )
|
||
|
||
⊥
|
||
|
||
CoAlg(BSb )
|
||
|
||
U
|
||
|
||
where F and U coincide with F and U respectively on carriers.
|
||
Proof. There is a natural isomorphism λ : B∗ U → UBSb given by
|
||
id × ϕ
|
||
|
||
≅
|
||
|
||
λ : UO × (Σ −−∗ UX) −−→ UO × U(FΣ →Sb
|
||
→ U(O × (FΣ →Sb
|
||
fs X) −
|
||
fs X)),
|
||
|
||
where ϕ is the isomorphism from Theorem 24 and the isomorphism on the right
|
||
comes from U being a right adjoint. The result now follows from Theorem 2.14 of
|
||
Hermida and Jacobs (1998). In particular, U(X, γ) = (UX, λ−1 ∘ U(γ)).
|
||
□
|
||
|
||
Separation and Renaming in Nominal Sets
|
||
|
||
149
|
||
|
||
Since right adjoints preserve limits, and final objects in particular, we obtain the
|
||
following. This gives an Sb-semantics of separated automata through finality.
|
||
Corollary 41. Let ((FΣ)∗ →Sb
|
||
fs O, ζ) be the final BSb -coalgebra (Theorem 38). The
|
||
∗
|
||
Sb
|
||
B∗ -coalgebra U(Σ →fs O, ζ) is final and carried by the set (FΣ)∗ →Sb
|
||
fs O of Sb-nominal
|
||
languages.
|
||
|
||
4 Related and future work
|
||
Fiore and Turi (2001) described a similar adjunction between certain presheaf categories. However, Staton (2007) describes in his thesis that the usage of presheaves
|
||
allows for many degenerate models and one should look at sheaves instead. The
|
||
category of sheaves is equivalent to the category of nominal sets. Staton transfers the
|
||
adjunction of Fiore and Turi to the sheaf categories. We conjecture that the adjunction
|
||
presented in this paper is equivalent, but defined in more elementary means. The
|
||
monoidal property of F, which is crucial for our application in automata, has not been
|
||
discussed before.
|
||
An interesting line of research is the generalisation to other symmetries by Bojańczyk,
|
||
et al. (2014). In particular, the total order symmetry is relevant, since it allows one to
|
||
compare elements on their order, as often used in data words. In this case the symmetries are given by the group of all monotone bijections. Many results of nominal
|
||
sets generalise to this symmetry. For monotone substitutions, however, the situation
|
||
seems more subtle. For example, we note that a substitution which maps two values
|
||
to the same value actually maps all the values in between to that value. Whether the
|
||
adjunction from Theorem 16 generalises to other symmetries is left as future work.
|
||
This research was motivated by learning nominal automata. If we know a nominal
|
||
automaton recognises an Sb-language, then we are better off learning a separated
|
||
automaton directly. From the Sb-semantics of separated automata, it follows that we
|
||
have a Myhill-Nerode theorem, which means that learning is feasible. We expect that
|
||
this can be useful, since we can achieve an exponential reduction this way.
|
||
Bojańczyk, et al. (2014) prove that nominal automata are equivalent to register
|
||
automata in terms of expressiveness. However, when translating from register automata with n states to nominal automata, we may get exponentially many orbits.
|
||
This happens for instance in the FIFO automaton (Example 29). We have shown that
|
||
the exponential blow-up is avoidable by using separated automata, for this example
|
||
and in general for Sb-equivariant languages. An open problem is whether the latter
|
||
requirement can be relaxed, by adding separated transitions only locally in a nominal
|
||
automaton.
|
||
A possible step in this direction is to consider the monad T = UF on Pm-Nom
|
||
and incorporate it in the automaton model. We believe that this is the hypothesised
|
||
“substitution monad” from Chapter 5. The monad is monoidal (sending separated
|
||
products to Cartesian products) and if X is an orbit-finite nominal set, then so is T (X).
|
||
|
||
150
|
||
|
||
Chapter 7
|
||
|
||
This means that we can consider nominal T-automata and we can perhaps determinise
|
||
them using coalgebraic methods (Silva, et al., 2013).
|
||
|
||
Acknowledgements
|
||
We would like to thank Gerco van Heerdt for his useful comments.
|
||
|
||
|
||
Curriculum Vitae
|
||
Joshua Moerman was born in 1991 in Utrecht, the Netherlands. After graduating gymnasium at the Christiaan Huygens College in Eindhoven, 2009, he followed a double
|
||
bachelor programme in mathematics and computer science at the Radboud University in
|
||
Nijmegen. In 2013, he obtained both bachelors summa cum laude and continued with a
|
||
master in mathematics. He obtained the degree of Master of Science in Mathematics
|
||
summa cum laude in 2015, with a specialisation in algebra and topology.
|
||
In February 2015, he started his Ph.D. research under supervision of Frits Vaandrager, Sebastiaan Terwijn, and Alexandra Silva. This was a joint project between the
|
||
computer science institute (iCIS) and the mathematics departement (part of IMAPP)
|
||
of the Radboud University. During the four years of his Ph.D. research, he spent a
|
||
total of six months at the University College London, UK.
|
||
As of April 2019, Joshua works as a postdoctoral researcher in the group of JoostPieter Katoen at the RWTH Aachen, Germany.
|
||
|
||
170
|
||
|
||
|