\project thesis \startcomponent introduction \startchapter [title={Nominal Techniques and Testing Techniques for Automata Learning}, reference=chap:introduction] \startsection [title={Learning and Testing}] First off, let me sketch the problem. There is some fixed, but unkown to us, language $\lang \subseteq \Sigma^{*}$. This can be a language of natural speech, a property in model checking, a set of traces from a protocol, etc. We wish to infer a description of $\lang$ by only using finitely many words. Note that this is not an easy task, as $\lang$ is often infinite. \todo{Waarom sequentiele data?} Such a learning problem can be stated and solved in a variety of ways. In the applications we do in our research group, we often try to infer a model of a software component. (\in{Chapter}[chap:applying-automata-learning] describes such an application.) In such cases, a learning algorithm can interact with the software. So it makes sense to study a learning paradigm which allows for \emph{queries}, and not just a data set of samples. \footnote{Instead of query learning, people also use the term \emph{active learning}.} A typical query learning framework was established by \citet[DBLP:journals/iandc/Angluin87]. In her framework, the learning algorithm may pose two types of queries to a \emph{teacher} (or \emph{oracle}): \description{Membership queries (MQs)} The learner poses such a query by providing a word $w \in \Sigma^{*}$ to the teacher. The teacher will then reply whether $w \in \lang$ or not. This type of query is often generalised to more general output, so the teacher replies with $\lang(w)$. In some papers, such a query is then called an \emph{output query}. \description{Equivalence queries (EQs)} The learner can provide a hypothesised description of $\lang$. If the hypothesis is correct, the teacher replies with \kw{yes}. If, however, the hypothesis is incorrect, the teacher replies with \kw{no}, together with a counterexample (i.e., a word which is in $\lang$ but not in the hypothesis or vice versa). With these queries, the learner algorithm is supposed to converge to a correct model. This type of learning is hence called \emph{exact learning}. \citet[DBLP:journals/iandc/Angluin87] showed that one can do this efficiently for deterministic finite automata DFAs (when $\lang$ is in the class of regular languages). Another paradigm which is relevant for our type of applications is PAC-learning with membership queries. Here, the algorithm can again use MQs as before, but the EQs are replace by random sampling (by a fixed probability distribution). So the allowed query is: \description{Random sample query (EX)} If the learner poses this query (there are no parameters), the teacher will respond with a random word $w$, together with its label $w \in \lang$. Instead of requiring that the learner exactly learns the model, we only require the following. The learner should \emph{probably} return a model which is \emph{approximate} to the target. This gives the name probably approximately correct (PAC). Note that there are two uncertainties: the probable and the approximate part. Both part are bounded by parameters, so one can determine the confidence. \startremark When using only EQs, only MQs, or only EXs, then there are hardness results for learning DFAs. See the book of \citet[KearnsV94] for such results. \stopremark DFAs can be efficiently learned in the PAC model. In fact, any exact learning algorithm with MQs and EQs can be transformed into a PAC algorithm with MQs (see exercise X in \citenp[KearnsV94]). For this reason, we mostly focus on the former type of learning in this thesis. The transformation from exact learning to PAC learning is implemented by simply \emph{testing} the hypothesis with random samples. The number of samples depend on the required precision (i.e., both parameters in the PAC model). The type of testing we will see in later chapter is slightly different, however. We will often choose the probability distribution ourself and even change it between hypotheses. Although the PAC model does not allow this and we do not acquire the promised bounds, we are able to give other guarantees with those testing methods. \todo{Gekopieerd van test methods paper.} Finite state machine conformance testing is a core topic in testing literature that is relevant for communication protocols and other reactive systems. The combination of conformance testing and automata learning is applied in practice and it proves to be an effective way of finding bugs \cite[Joeri], replacing legacy software \cite[Philips], or inferring models to be used in model checking \cite[MeinkeSindhu]. Several model learning applications: learning embedded controller software \cite[DBLP:conf/icfem/SmeenkMVJ15], learning the TCP protocol \cite[DBLP:conf/cav/Fiterau-Brostean16] and learning the MQTT protocol \cite[DBLP:conf/icst/TapplerAB17]. \stopsection \startsection [title={Nominal Techniques}] In the second part of this thesis, I will present results related to nominal automata. Usually, nominal techniques are introduced in order to solve problems involving name binding in topics like lambda calculus. However, for our applications it makes more sense to focus more on the symmetries (of names) first and show how this is useful in automata theory. To motivate the use of symmetries, we will look at an example of a register auttomaton. In the following automaton we model a (not-so-realistic) login system for a single person. The alphabet consists of the following actions: \startformula\startalign \NC \kw{register}(p) \NR \NC \kw{login}(p) \NR \NC \kw{logout}() \NR \NC \kw{view}() \NR \stopalign\stopformula The \kw{register} action allows one to set a password $p$. This can only be done when the system is initialised. The \kw{login} and \kw{logout} actions speak for themselves and the \kw{view} action allows one to see the secret data (we abstract away from what the user actually gets to see here). A simple automaton with roughly this behaviour is given in \in{Figure}[fig:login-system]. \startplacefigure [title={A simple register automaton. The symbol $\ast$ denotes any input otherwise not specified. The $r$ in states $q_1$ and $q_2$ is a register.}, list={A simple register automaton.}, reference=fig:login-system] \hbox{\starttikzpicture[node distance=4cm] \node[state,initial] (0) {$q_0$}; \node[state with output,right of=0] (1) {$q_1$ \nodepart{lower} $r$}; \node[state with output,right of=1] (2) {$q_2$ \nodepart{lower} $r$}; \path[->] (0) edge [loop below] node [below] {$\ast$ / \checknok} (0) edge node [above, align=center] {$\kw{register}(p)$ / \checkok \\ set $r := p$} (1) (1) edge [loop below] node [below] {$\ast$ / \checknok} (1) edge [bend left, align=center] node [above] {$\kw{login}(p)$ / \checkok \\ if $r = p$} (2) (2) edge [loop below] node [right, align=left] {$\kw{view}()$ / \checkok \\ $\ast$ / \checknok} (2) edge [bend left] node [below] {$\kw{logout}()$ / \checkok} (1); \stoptikzpicture} \stopplacefigure To model the behaviour nicely, we want the domain of passwords to be infinite. After all, one should allow arbitrarily long passwords. This means that a register automaton is actually an automaton over an infinite alphabet. Common algorithms for automata, such as learning, will not work with an infinite alphabet. Any loop which iterates over the alphabet will diverge. In order to cope with this, we will use the \emph{symmetries} present in the alphabet. Let us continue with the example and look at its symmetries. If a person registers with a password \quotation{\kw{hello}} and consequently logins with \quotation{\kw{hello}}, then this is not distinguishable from a person registering and logging in with \quotation{\kw{bye}}. However, a trace $\kw{register}(\kw{hello})\, \kw{login}(\kw{bye})$ is different. This is an example of symmetry: the values \quotation{\kw{hello}} and \quotation{\kw{bye}} can be permuted, or interchanged. Before formalising the permutations, let me remark the the use of symmetries in automata theory is not new. One of the first to use symmetries was \citet[DBLP:conf/alt/Sakamoto97]. He devised an \LStar{} learning algorithm for register automata, much like the one presented in \in{Chapter}[chap:learning-nominal-automata]. The symmetries are crucial to reduce the problem to a finite alphabet and use the regular \LStar{} algorithm. (\in{Chapter}[chap:learning-nominal-automata] shows how to do it with more general symmetries.) Around the same time \citet[FerrariMT05] worked on automata theoretic algorithms for the \pi-calculus. Their approach was based on the same symmetries and they developed a theory of \emph{named sets} to implement their algorithms. Named sets are equivalent to nominal sets. However, nominal sets are defined in a more elementary way. The nominal sets we will soon see are introduced by \citet[GabbayP02] to solve certain problems in name binding in abstract syntaxes. Although this is not really related to automata theory, it was picked up by \citet[DBLP:journals/corr/BojanczykKL14]. They provide an equivalence between register automata and nominal automata. Additionally, they generalise the work on nominal sets to other symmetries. \startsubsection [title={What is a nominal set?}] Before we dive into the relation with automata, we will define the notion of nominal sets. \startdefinition Fix a countable, infinite set $\atoms = \{ a, b, \ldots \}$ of \emph{names} (sometimes called \emph{atoms}). The elements of $\atoms$ bare no relationship to natural numbers, or other standard mathematical entities. Define $\Pm = \{ \pi \colon \atoms \to \atoms \mid \pi \text{ is bijective} \}$ to be the set of permutations of names. Together with function composition, $\Pm$ forms a \emph{group}. For two elements $a$ and $b$ we define a particular bijection $\swap{a}{b} \in \Pm$ which swaps $a$ and $b$ and leaves all other elements fixed. \stopdefinition It is good to stress that the set of names has no other structure defined on it. The names are abstract entities which can be compared for equality, but nothing else. \footnote{We can have more structure on the set of atoms, this is discussed in \in{Section}[].} This also means that although $a$ and $b$ are distinct names, they are interchangeable. If we write $a \in \atoms$, then $a$ can stand for any of the names. So if we write $a, b \in \atoms$, then $a$ and $b$ can refer to the same name, i.e., $a = b$. In other words, we do not adapt the permutative convention by \citet[?]. As alluded to before, we want to have permutations act on objects constructed from names, such as words, states in an automaton and languages. The notion of a group action captures exactly this. In most cases we are interested in the group $\Pm$. However, in order to be general enough for the next chapters, we introduce group actions for an arbitrary group $G$. \todo{Notatie $1$ is groepseenheid, ${\cdot}$ is vermenigvuldiging en werking.} \startdefinition Let $X$ be a set. A (left) \footnote{Many authors use left actions. However, we note that \citet[DBLP:journals/corr/BojanczykKL14] use a right action. For them to have a well-defined group action, their group multiplication has to be defined as $g \cdot f = f \circ g$ (i.e., reverse function composition).} \emph{$G$-action} is a function ${\cdot} \colon G \times X \to X$ satisfying: \startformula\startalign[n=3] \NC 1 \cdot x \NC = x \NC \quad \forall x \in X \NR \NC (g \cdot h) \cdot x \NC = g \cdot (h \cdot x) \NC \quad \forall x \in X, \forall g,h \in G \NR \stopalign\stopformula A set together with a $G$-action, $(X, {\cdot})$, is called a \emph{$G$-set}. \stopdefinition It is worth noting that we generally fix $G$ but we consider many sets with a $G$-action. In a way all these sets will have the same symmetries (namely $G$). Instead of writing $g \cdot x$ we will often write the group action by juxtaposition $g x$. We will often write $X$ instead of $(X, {\cdot})$ when the intended action is clear from the context. \footnote{One should be cautious, as a set often allows for many different $G$-actions.} \startexample We list several examples of group actions. Many of them will be used later in this thesis. \startitemize \item The set $\atoms$ itself admits a natural $\Pm$-action, defined by \startformula \pi \cdot a = \pi(a). \stopformula The two requirements are easily verified by a routine calculation. We will also omit this verification for the upcoming examples. \item The set of words $\atoms^{*}$ has a $\Pm$-action which is defined point-wise: \startformula \pi \cdot a_1 a_2 \ldots a_k = \pi(a_1) \pi(a_2) \ldots \pi(a_k) \stopformula \item Similarly, the set of infinite words $\atoms^{\omega}$ has such a $\Pm$-action: \startformula \pi \cdot a_1 a_2 \ldots = \pi(a_1) \pi(a_2) \ldots \stopformula \item The empty set always admits a unique $G$-action for any $G$. (This is unique since the domain $G \times \emptyset = \emptyset$.) \startformula {\cdot} \colon G \times \emptyset \to \emptyset \stopformula \item The singleton set always admits a unique $G$-action for any $G$. (This is unique since the codomain only has just one element.) \startformula {\cdot} \colon G \times \{*\} \to \{*\} \stopformula \item For any set $X$, we can define a $G$-action by defining \startformula g \cdot x = x \stopformula for all the elements $x \in X$. Such an action is called \emph{trivial}. Note that the action on $\emptyset$ and $\{*\}$ are trivial, but the $\Pm$-actions on $\atoms$, $\atoms^{*}$ and $\atoms^{\omega}$ are not trivial. \item Another interesting non-trivial example is the set $\Pm$ itself. There are three different interesting actions one can define: \startformula\startalign \NC \pi \cdot_{l} \sigma \NC = \pi \sigma \NR \NC \pi \cdot_{r} \sigma \NC = \sigma \pi^{-1} \NR \NC \pi \cdot_{c} \sigma \NC = \pi \sigma \pi^{-1} \NR \stopalign\stopformula Here the group multiplication is written by juxtaposition. The first two actions are \emph{left-multiplication} and \emph{right-multiplication} respectively. The latter is called \emph{conjugation}. For each of them, one can verify the requirements. \stopitemize \stopexample In the above examples, the non-trivial $\Pm$-sets are all infinite. Yet, in a sense, the set $\atoms^{*}$ is bigger than the set $\atoms$. To be able to quantify this, we introduce the notion of an orbit. \startdefinition Given a $G$-set $(X, {\cdot})$ and an element $x \in X$, we define the \emph{orbit of $x$} as the set \startformula \orb(x) = \{ g x \mid g \in G \}. \stopformula \stopdefinition If for two elements $x, y \in X$ we have $\orb(x) = \orb(y)$, then we say that $x$ and $y$ are in the same orbit. This precisely happens if there exists a $g$ such that $g x = y$. The relation of \quotation{being in the same orbit} is an equivalence relation (it is reflexive as a group has an identity element, symmetric because of the inverses and transitive because of composition). This relation partitions the set $X$ in a collection of orbits: \startformula X = \bigcup_{x \in X} \orb(x). \stopformula We can picture orbits in the following way. \todo{PLAATJE} As we wish to represent such sets (in order to run algorithms on them), we are especially interested in orbit-finite sets. For such sets, we can represent the whole set by a collection of its orbits. What remains to be represented are the orbits themselves. An easy way to do is, is to choose a representative of the orbit $x \in \orb(x)$. (Any element will do as the other elements can be constructed via the group action.) \todo{PLAATJE} \startexample We will describe the orbits for some $\Pm$-sets. \startitemize \item For a trivial $G$-set $X$, each element defines its own orbit, since $\orb(x) = \{ g x \mid g \in G \}$ is a singleton set. \item The $\Pm$-set $\atoms$ only has \emph{one orbit}. To see this, take two (distinct) elements $a, b \in \atoms$ and consider the bijection $\pi = \swap{a}{b}$. Then we see that $\pi \cdot a = b$, meaning that $a$ and $b$ are in the same orbit. So $\atoms$ is a single-orbit set. \item Before we tackle $\atoms^{*}$, we will analyse $\atoms^{2}$. The set consists of exactly \emph{two orbits}: \startformula\startalign \NC \{ (a, a) \NC \mid a \in \atoms \} \NR \NC \{ (a, b) \NC \mid a, b \in \atoms, a \neq b \} \NR \stopalign\stopformula This is because a bijection $\pi \in \Pm$ can never send an element of the form $(a, b)$ to an element of the form $(a, a)$ or vice versa. It can, however send any element $(a, b)$ to $(c, d)$ and so on. \item The set $\atoms^{*}$ has \emph{countably many orbits}. Since the action preserves the length of a word, we will show that the set has finitely many orbits for each length. So consider the set $\atoms^{k}$ with the point-wise action. An orbit of $\atoms^{k}$ is precisely determined by specifying which of the $k$ elements are equal to each other. This is a partition of $k$ elements, and there exactly $B_k$, the $k$th Bell number, such partitions. (As we have seen for $k = 2$, the second Bell number is $B_2 = 2$. This quantity grows exponential in $k$.) This shows that the set $\atoms^{*} = \bigcup_k \atoms^{k}$ has countably many orbits. \item The set $\atoms^{\omega}$ has \emph{uncountably many orbits}. To see this, fix two distinct elements $a, b \in \atoms$. Now, let $\sigma \in 2^{\omega}$ be an element of the Cantor space. We define the following sequence $x^{\sigma} \in \atoms^{\omega}$: \startformula\startalign \NC x^{\sigma}_0 \NC = a \NR \NC x^{\sigma}_{i+1} \NC = \startmathcases \NC a, \NC if $\sigma(i) = 0$ \NR \NC b, \NC if $\sigma(i) = 1$ \NR \stopmathcases \NR \stopalign\stopformula Now for two distinct elements $\sigma, \tau \in 2^{\omega}$, the elements $x^{\sigma}$ and $x^{\tau}$ are different. More importantly, their orbits $\orb(x^{\sigma})$ and $\orb(x^{\tau})$ are different too. This shows that there is an injective map from $2^{\omega}$ to the orbits of $\atoms^{\omega}$. This concludes that $\atoms^{\omega}$ has uncountably many orbits. \stopitemize \stopexample Having finitely many orbits is not enough for a finite representation which we can use algorithmically. We need an additional finiteness on the elements of a $G$-set, namely the existence of a \emph{finite support}. In order to define this, we need the notion of a data symmetry. \startdefinition A \emph{data symmetry} is a pair $(\mathcal{D}, G)$, where $\mathcal{D}$ is a structure and $G \leq \Sym(\mathcal{D})$ is a subgroup of the automorphism group of $\mathcal{D}$. \stopdefinition \startdefinition Let $X$ be a $G$-set and $x \in X$. A set $C \subset \mathcal{D}$ \emph{supports} $x$ if for all $g \in G$ with $g|_C = \id|_C$ we have $g \cdot x = x$. A $G$-set $X$ is called \emph{nominal} if every element has a finite support. \stopdefinition In a way, if an element is supported by a finite set $C$, it means that the element is somehow constructed from only the elements in $C$. We can see this from the definition, as changing any element outside of $C$ will leave the element $x$ fixed. \startexample \startitemize \item The sets $\atoms$, $\atoms^{k}$, $\atoms^{*}$ are all nominal. For an element $a_1 a_2 \ldots a_k \in \atoms^{*}$, its support is simply given by $\{a_1, a_2, \ldots, a_k\}$. \item The set $\atoms^{\omega}$ is \emph{not} nominal. To see this, let us order the elements of $\atoms$ as $\atoms = \{ a_1, a_2, a_3, \ldots \}$. Now the element $a_1 a_2 a_3 \in \atoms^{\omega}$ is not finitely supported. \todo{fs subset van $\atoms^{\omega}$?} \item The set $\{ X \subset \atoms \mid X \text{ is not finite nor co-finite} \}$ (with the group action given by direct image) is a single orbit set, but it is not a nominal set. \stopitemize \stopexample These examples show that being orbit-finite and nominal are orthogonal properties. There are $G$-sets which are orbit-finite, but non-nominal. Conversely, there are nominal sets which are not orbit-finite. \startremark The last example above needs a bit more clarification. In the book of \citet[Pitts13], the group of permutations is defined to be \startformula G_{< \omega} = \{ \pi \in \Perm \mid \pi(x) \neq x \text{ for finitely many } x \}. \stopformula This is the subgroup of $\Pm$ of \emph{finite} permutation. The set $\{ X \subset \atoms \mid X \text{ is not finite nor co-finite} \}$ has infinitely many orbits when considered as a $G_{< \omega}$-set, but only one orbit as a $\Pm$-set. This poses the question which group we should consider (for example, \citet[DBLP:journals/corr/BojanczykKL14] use the whole group $\Pm$). For nominal sets, there is no difference: nominal $G_{< \omega}$-sets and nominal $\Pm$-sets are equivalent, as shown by \citet[Pitts13]. It is only for non-nominal sets that we can distinguish them. We will mostly work with the set of all permutations $\Pm$. \stopremark \stopsubsection \stopsection \referencesifcomponent \stopchapter \stopcomponent