483 lines
29 KiB
TeX
483 lines
29 KiB
TeX
\project thesis
|
|
\startcomponent introduction
|
|
|
|
\startchapter
|
|
[title={Introduction},
|
|
reference=chap:introduction]
|
|
|
|
When I was younger, I often learned how to play with new toys by messing around with them, i.e., by pressing buttons at random, observing its behaviour, pressing more buttons, and so on.
|
|
Only resorting to the manual -- or asking \quotation{experts} -- to confirm my beliefs on how the toy works.
|
|
Now that I am older, I do mostly the same with new devices, new tools, or new software.
|
|
However, now I know that this is an established computer science technique, called \emph{model learning}.
|
|
|
|
Model learning
|
|
\footnote{There are many names for the type of learning, such as \emph{active automata learning}.
|
|
The generic name \quotation{model learning} is chosen as a counterpoint to model checking.}
|
|
is a automated technique to construct a model -- often a type of \emph{automaton} -- from a black box system.
|
|
The goal of this technique can be manifold:
|
|
It can be used to reverse-engineer a system, to find bugs in it, to verify properties of the system, or to understand the system in one way or another.
|
|
It is \emph{not} just random testing: the information learned during the interaction with the system is actively used to guide following interactions.
|
|
Additionally, the information learned can be inspected and analysed.
|
|
|
|
This thesis is about model learning and related techniques.
|
|
In the first part, I present results concerning \emph{black box testing} of automata.
|
|
Testing is a crucial part in learning software behaviour and often remains a bottleneck in applications of model learning.
|
|
In the second part, I show how \emph{nominal techniques} can be used to learn automata over structured infinite alphabets.
|
|
This was directly motivated by work on learning networks protocols which rely on identifiers or sequence numbers.
|
|
|
|
But before we get ahead of ourselves, we should first understand what we mean by learning, as learning means very different things to different people.
|
|
In educational science, learning may involve concepts such as teaching, blended learning, and interdisciplinarity.
|
|
Data scientist may think of data compression, feature extraction, and neural networks.
|
|
In this thesis we are mostly concerned with software verification.
|
|
But even in the field of verification several types of learning are relevant.
|
|
|
|
|
|
\startsection
|
|
[title={Model Learning}]
|
|
|
|
In the context of software verification, we often look at stateful computation with inputs and outputs.
|
|
For this reason, it makes sense to look at \emph{words}, or \emph{traces}.
|
|
For an alphabet $\Sigma$, we denote the set of words by $\Sigma^{*}$.
|
|
|
|
The learning problem is defined as follows.
|
|
There is some fixed, but unknown to us, language $\lang \subseteq \Sigma^{*}$.
|
|
This language may define the behaviour of a software component, a property in model checking, a set of traces from a protocol, etc.
|
|
We wish to infer a description of $\lang$ by only using finitely many words.
|
|
Note that this is not an easy task, as $\lang$ is often infinite.
|
|
|
|
Such a learning problem can be stated and solved in a variety of ways.
|
|
In the applications we do in our research group, we often try to infer a model of a software component.
|
|
(\in{Chapter}[chap:applying-automata-learning] describes such an application.)
|
|
In such cases, a learning algorithm can interact with the software.
|
|
So it makes sense to study a learning paradigm which allows for \emph{queries}, and not just a data set of samples.
|
|
\footnote{Instead of query learning, people also use the term \emph{active learning}.}
|
|
|
|
A typical query learning framework was established by \citet[Angluin87].
|
|
In her framework, the learning algorithm may pose two types of queries to a \emph{teacher}, or \emph{oracle}:
|
|
|
|
\description{Membership queries (MQs)}
|
|
The learner poses such a query by providing a word $w \in \Sigma^{*}$ to the teacher.
|
|
The teacher will then reply whether $w \in \lang$ or not.
|
|
This type of query is often generalised to more general output, so the teacher replies with $\lang(w)$.
|
|
In some papers, such a query is then called an \emph{output query}.
|
|
|
|
\description{Equivalence queries (EQs)}
|
|
The learner can provide a hypothesised description of $\lang$ to the teacher.
|
|
If the hypothesis is correct, the teacher replies with \kw{yes}.
|
|
If, however, the hypothesis is incorrect, the teacher replies with \kw{no} together with a counterexample, i.e., a word which is in $\lang$ but not in the hypothesis or vice versa.
|
|
|
|
By posing many such queries, the learner algorithm is supposed to converge to a correct model.
|
|
This type of learning is hence called \emph{exact learning}.
|
|
\citet[Angluin87] showed that one can do this efficiently for deterministic finite automata DFAs (when $\lang$ is in the class of regular languages).
|
|
|
|
It should be clear why this is called \emph{query learning} or \emph{active learning}.
|
|
The learning algorithm initiates interaction with the teacher by posing queries, it may construct its own data points and ask for their corresponding label.
|
|
|
|
Another paradigm which is relevant for our type of applications is \emph{PAC-learning with membership queries}.
|
|
Here, the algorithm can again use MQs as before, but the EQs are replace by random sampling.
|
|
So the allowed query is:
|
|
|
|
\description{Random sample queries (EXs)}
|
|
If the learner poses this query (there are no parameters), the teacher responds with a random word $w$, together with its label $w \in \lang$.
|
|
|
|
Instead of requiring that the learner exactly learns the model, we only require the following.
|
|
The learner should \emph{probably} return a model which is \emph{approximate} to the target.
|
|
This gives the name \emph{probably approximately correct} (PAC).
|
|
Note that there are two uncertainties: the probable and the approximate part.
|
|
Both part are bounded by parameters, so one can determine the confidence.
|
|
|
|
As with many problems in computer science, we are also interested in the \emph{efficiency} of learning algorithms.
|
|
Instead of measuring time or space, we often analyse the number of queries posed by an algorithm.
|
|
Efficiency often means that we require a polynomial number of queries.
|
|
But polynomial in what?
|
|
The learner has no input, other than the access to a teacher.
|
|
We ask the algorithms to be polynomial in the \emph{size of the target} (i.e., the size of the description which has yet to be learned).
|
|
In the case of PAC learning we also require it to be polynomial in the two parameters for confidence.
|
|
|
|
Deterministic automata can be efficiently learned in the PAC model.
|
|
In fact, any efficient exact learning algorithm with MQs and EQs can be transformed into an efficient PAC algorithm with MQs (see \citenp[KearnsV94], exercise 8.1).
|
|
For this reason, we mostly focus on the former type of learning in this thesis.
|
|
The transformation from exact learning to PAC learning is implemented by simply \emph{testing} the hypothesis with random samples.
|
|
This can be postponed until we actually implement a learning algorithm and apply it.
|
|
|
|
When using only EQs, only MQs, or only EXs, then there are hardness results for learning DFAs.
|
|
So the combinations MQs + EQs and MQs + EXs have been carefully been picked.
|
|
They provide a minimal basis for efficient learning.
|
|
See the book of \citet[KearnsV94] for such hardness results and more information on PAC learning.
|
|
|
|
So far, all the queries are assumed to be \emph{just there}.
|
|
Somehow, these are existing procedures which we can invoke with \kw{MQ}$(w)$, \kw{EQ}$(H)$, or \kw{EX}$()$.
|
|
This is a useful abstraction when designing a learning algorithm.
|
|
One can analyse the complexity (in terms of number of queries) independently of how these queries are resolved.
|
|
Nevertheless, at some point in time one has to implement them.
|
|
In our case of learning software behaviour, membership queries are easily implemented:
|
|
Simply provide the word $w$ to a running instance of the software and observe the output.
|
|
\footnote{In reality, it is a bit harder than this.
|
|
There are plentiful of challenges to solve, such as timing, choosing your alphabet, choosing the kind of observations to make and being able to faithfully reset the software.}
|
|
Equivalence queries, however, are in general not doable.
|
|
Even if we have the (machine) code, it is often way too complicated to check equivalence.
|
|
That is why we resort to testing with EX queries.
|
|
The EX query from PAC learning normally assumes a fixed, unknown, distribution.
|
|
In our case, we implement a distribution to test against.
|
|
This cuts both ways:
|
|
On one hand, it allows us to only test behaviour we really case about, on the other hand the results are only as good as our choice of distribution.
|
|
We deviate even further from the PAC-model as we sometimes change our distribution while learning.
|
|
Yet, as applications show, this is a useful way of learning software behaviour.
|
|
|
|
|
|
\stopsection
|
|
\startsection
|
|
[title={Research challenges}]
|
|
|
|
Model learning is far from a
|
|
In this thesis, we will mostly see learning of DFAs or Mealy machines.
|
|
Although this is restrictive as many pieces of software require richer models, it has been successfully applied in many different areas.
|
|
The restrictions include the following.
|
|
\startitemize[after]
|
|
\item The system behaves deterministically.
|
|
\item One can reliably reset the system.
|
|
\item The system can be modelled with a finite state space.
|
|
This also means that the model cannot cope with time or data.
|
|
\item The set of input actions is finite.
|
|
\item One knows when the target is reached.
|
|
\stopitemize
|
|
|
|
\description{Research challenge 1: Confidence in the hypothesis.}
|
|
Having confidence in a learned model is difficult.
|
|
We have PAC guarantees (as discussed before), but sometimes we may require other guarantees.
|
|
For example, we may require the hypothesis to be correct, provided that the real system is implemented with a certain number of states.
|
|
Efficiency is important here:
|
|
We want to obtain those guarantees fast and we want to find quickly counterexamples when the hypothesis is wrong.
|
|
Test generation methods is the topic of the first part in this thesis.
|
|
We will review existing algorithms and discuss new algorithms for test generation.
|
|
|
|
\startdescription[title={Research challenge 2: Generalisation to infinite alphabets.}]
|
|
Automata over infinite alphabets are very useful for modelling protocols which involve identifiers or timestamps.
|
|
Not only the alphabet is infinite in these cases, the state-space is as well, since the values have to be remembered.
|
|
In the second part of this thesis, we will see how nominal techniques can be used to tackle this challenge.
|
|
|
|
Being able to learn automata over an infinite alphabet is not new.
|
|
It has been tackled in the theses of \citet[Aarts14], \citet[Cassel15], and \citet[Fiterau-Brostean18].
|
|
In the first, the problem is solved by considering abstractions, which reduces the alphabet to a finite one.
|
|
These abstractions are automatically refined when a counterexample is presented to the algorithms.
|
|
\citet[Fiterau-Brostean18] extends this approach to cope with \quotation{fresh values}, crucial for protocols such as TCP.
|
|
In the thesis by \citet[Cassel15], another approach is taken.
|
|
The queries are changed to tree queries.
|
|
The approach in my thesis will be based on symmetries, which gives yet another insight into the problem of learning such automata.
|
|
\stopdescription
|
|
|
|
|
|
\stopsection
|
|
\startsection
|
|
[title={Applications of Model Learning}]
|
|
|
|
Since this thesis only contains one real-world application on learning in \in{Chapter}[chap:applying-automata-learning], I think it is good to mention a few others.
|
|
Although we remain in the context of learning software behaviour, the applications are quite different.
|
|
This is by no means a complete list.
|
|
|
|
\description{Bug finding in protocols.}
|
|
A prominent example is by \citet[Fiterau-Brostean16].
|
|
They learn models of TCP implementations -- both clients and server sides.
|
|
Interestingly, they found bugs in the (closed source) Windows implementation.
|
|
Later, \citet[Fiterau-BrosteanH17] also found a bug in the sliding window of the Linux implementation.
|
|
Other protocols have been learned as well, such as the MQTT protocol by \citet[TapplerAB17], TLS by \citet[RuiterP15], and SSH by \citet[Fiterau-BrosteanLPRVV17].
|
|
Many of these applications reveal bugs by learning a model and consequently apply model checking.
|
|
The combination of learning and model checking was first described by \citet[PeledVY02].
|
|
|
|
\description{Bug finding in smart cards.}
|
|
\citet[AartsRP13] learn the software on smart cards of several Dutch and German banks.
|
|
These cards use the EMV protocol, which is run on the card itself.
|
|
So this is an example of a real black box system, where no other monitoring is possible and no code is available.
|
|
No vulnerabilities were found, although each card had a slightly different protocol.
|
|
Card readers have been learned by \citet[ChaluparPPR14].
|
|
They built a Lego machine which could automatically press buttons and the researchers found a security flaw.
|
|
|
|
\description{Regression testing.}
|
|
\citet[HungarNS03] describe the potential of automata learning in regression testing.
|
|
The aim is not to find bugs, but to monitor the development process of a system.
|
|
By considering the differences between models at different stages, one can generate regressions tests.
|
|
|
|
\description{Refactoring legacy software.}
|
|
Model learning can also be used in order to verify refactored software.
|
|
\citet[SchutsHV16] have applied this at a project within Philips.
|
|
They learn both an old version and a new version of the same component.
|
|
By comparing the learned models, some differences could be seen.
|
|
This gave developers opportunities to solve problems before replacing the old component by the new one.
|
|
|
|
|
|
\stopsection
|
|
\startsection
|
|
[title={Testing Techniques}]
|
|
|
|
An important step in automata learning is equivalence checking.
|
|
Normally, this is abstracted away and done by an oracle, but we intend to implement such an oracle ourselves for our applications.
|
|
Concretely, the problem we need to solve is that of \emph{conformance checking}\footnote{It is also known as \emph{machine verification} or \emph{fault detection}.} and this was first described by \citet[Moore56].
|
|
|
|
The problem is as follows:
|
|
We are given the description of a finite state machine and a black box system, does the system behave exactly as the description?
|
|
We wish te determine this by running experiments on the system (as it is black box).
|
|
It should be clear that this is a hopelessly difficult task, as an error can be hidden arbitrarily deep in the system.
|
|
That is why we often assume some knowledge of the system.
|
|
In this thesis we often assume a \emph{bound on the number of states} of the system.
|
|
Under these conditions, \citet[Moore56] already solved the problem.
|
|
Unfortunately, his experiment is exponential in size, or in his own words: \quotation{fantastically large.}
|
|
|
|
Years later, \citet[Chow78, Vasilevskii73] independently designed efficient experiments.
|
|
In particular, the set of experiments is polynomial in size.
|
|
These techniques will be discussed in detail in \in{Chapter}[chap:test-methods].
|
|
More background and other related problems, as well as their complexity results, are well exposed in a survey of \citet[LeeY94].
|
|
|
|
\todo{I want to give an example (by giving a state machine) of why black box testing is hard.}
|
|
\todo{I want to move some related work from the second chapter to here}.
|
|
|
|
|
|
\stopsection
|
|
\startsection
|
|
[title={Nominal Techniques}]
|
|
|
|
In the second part of this thesis, I will present results related to \emph{nominal automata}.
|
|
Usually, nominal techniques are introduced in order to solve problems involving name binding in topics like lambda calculus.
|
|
However, we use them to deal with \emph{register automata}.
|
|
These are automata which have an infinite alphabet, often thought of as input actions with \emph{data}.
|
|
The control flow of the automaton may actually depend on the data.
|
|
However, the data cannot be used in an arbitrary way as this would lead to many problems being undecidable.
|
|
\footnote{The class of automata with arbitrary data operations is sometimes called \emph{extended finite state machines}.}
|
|
A crucial concept in nominal automata is that of \emph{symmetries}.
|
|
|
|
To motivate the use of symmetries, we will look at an example of a register auttomaton.
|
|
In the following automaton we model a (not-so-realistic) login system for a single person.
|
|
The alphabet consists of the following actions:
|
|
\startformula\startmathmatrix[n=2, distance=1cm]
|
|
\NC \kw{register}(p) \NC \kw{logout}() \NR
|
|
\NC \kw{login}(p) \NC \kw{view}() \NR
|
|
\stopmathmatrix\stopformula
|
|
The \kw{register} action allows one to set a password $p$.
|
|
This can only be done when the system is initialised.
|
|
The \kw{login} and \kw{logout} actions speak for themselves and the \kw{view} action allows one to see the secret data (we abstract away from what the user actually gets to see here).
|
|
A simple automaton with roughly this behaviour is given in \in{Figure}[fig:login-system].
|
|
We will only informally discuss its semantics for now.
|
|
|
|
\startplacefigure
|
|
[title={A simple register automaton. The symbol $\ast$ denotes any input otherwise not specified. The $r$ in states $q_1$ and $q_2$ is a register.},
|
|
list={A simple register automaton.},
|
|
reference=fig:login-system]
|
|
\hbox{\starttikzpicture[node distance=4cm]
|
|
\node[state,initial] (0) {$q_0$};
|
|
\node[state with output,right of=0] (1) {$q_1$ \nodepart{lower} $r$};
|
|
\node[state with output,right of=1] (2) {$q_2$ \nodepart{lower} $r$};
|
|
|
|
\path[->]
|
|
(0) edge [loop below] node [below] {$\ast$ / \checknok}
|
|
(0) edge node [above, align=center] {$\kw{register}(p)$ / \checkok \\ set $r := p$} (1)
|
|
(1) edge [loop below] node [below] {$\ast$ / \checknok}
|
|
(1) edge [bend left, align=center] node [above] {$\kw{login}(p)$ / \checkok \\ if $r = p$} (2)
|
|
(2) edge [loop below] node [right, align=left] {$\kw{view}()$ / \checkok \\ $\ast$ / \checknok}
|
|
(2) edge [bend left] node [below] {$\kw{logout}()$ / \checkok} (1);
|
|
\stoptikzpicture}
|
|
\stopplacefigure
|
|
|
|
To model the behaviour nicely, we want the domain of passwords to be infinite.
|
|
After all, one should allow arbitrarily long passwords to be secure.
|
|
This means that a register automaton is actually an automaton over an infinite alphabet.
|
|
|
|
Common algorithms for automata, such as learning, will not work with an infinite alphabet.
|
|
Any loop which iterates over the alphabet will diverge.
|
|
In order to cope with this, we will use the \emph{symmetries} present in the alphabet.
|
|
|
|
Let us continue with the example and look at its symmetries.
|
|
If a person registers with a password \quotation{\kw{hello}} and consequently logins with \quotation{\kw{hello}}, then this is not distinguishable from a person registering and logging in with \quotation{\kw{bye}}.
|
|
This is an example of symmetry:
|
|
the values \quotation{\kw{hello}} and \quotation{\kw{bye}} can be permuted, or interchanged.
|
|
Note, however, that the trace $\kw{register}(\kw{hello})\, \kw{login}(\kw{bye})$ is different from the two before.
|
|
So we see that we cannot simply collapse the whole alphabet into one equivalence class.
|
|
For this reason, we keep the alphabet infinite and explicitly mention its symmetries.
|
|
|
|
Before formalising symmetries, let me remark the the use of symmetries in automata theory is not new.
|
|
One of the first to use symmetries was \citet[Sakamoto97].
|
|
He devised an \LStar{} learning algorithm for register automata, much like the one presented in \in{Chapter}[chap:learning-nominal-automata].
|
|
The symmetries are crucial to reduce the problem to a finite alphabet and use the regular \LStar{} algorithm.
|
|
(\in{Chapter}[chap:learning-nominal-automata] shows how to do it with more general symmetries.)
|
|
Around the same time \citet[FerrariMT05] worked on automata theoretic algorithms for the \pi-calculus.
|
|
Their approach was based on the same symmetries and they developed a theory of \emph{named sets} to implement their algorithms.
|
|
Named sets are equivalent to nominal sets.
|
|
However, nominal sets are defined in a more elementary way.
|
|
The nominal sets we will soon see are introduced by \citet[GabbayP02] to solve certain problems in name binding in abstract syntaxes.
|
|
Although this is not really related to automata theory, it was picked up by \citet[BojanczykKL14].
|
|
They provide an equivalence between register automata and nominal automata.
|
|
Additionally, they generalise the work on nominal sets to other symmetries.
|
|
|
|
The symmetries we encounter in this thesis are the following, but other symmetries can be found in the literature.
|
|
The symmetry directly corresponds to the data values (and operations) used in an automaton.
|
|
The data values are often called \emph{atoms}.
|
|
\startitemize[after]
|
|
\item
|
|
The \quotation{equality symmetry}.
|
|
Here the domain can be any countably infinite set.
|
|
We can take, for example, the set of string we used before as the domain from which we take passwords.
|
|
No further structure is used on this domain, meaning that any value is just as good as any other.
|
|
The symmetries therefore consist of all bijections on this domain.
|
|
\item
|
|
The \quotation{total order symmetry}.
|
|
In this case, we take a countable infinite set with a dense total order.
|
|
Typically, this means we use the rational numbers, $\Q$, as data values and symmetries which respect the ordering.
|
|
\stopitemize
|
|
|
|
|
|
\startsubsection
|
|
[title={What is a nominal set?}]
|
|
|
|
So what exactly is a nominal set?
|
|
I will not define it here and leave the formalities to the corresponding chapters.
|
|
It suffices, for now, to think of nominal sets as abstract sets (often infinite) on which a group of symmetries acts.
|
|
This action makes it possible to interpret the symmetries of the data values in the abstract set.
|
|
For automata, this allows us to talk about symmetries on the state space, the set of transitions, and the alphabet.
|
|
|
|
In order to implement these sets algorithmically, we impose two finiteness requirements.
|
|
Both properties can be expressed using only the group action.
|
|
\startitemize[after]
|
|
\item
|
|
Each element is \emph{finitely supported}.
|
|
A way to think of this requirement is that each element is \quotation{constructed} out of finitely many data values.
|
|
\item
|
|
The set is \emph{orbit-finite}.
|
|
This means that we can choose finitely many elements such that any other element is a permuted version of one of those elements.
|
|
\stopitemize
|
|
|
|
If we wish to model the automaton from \in{Figure}[fig:login-system] as a nominal automaton, then we can simply define the state space as
|
|
\startformula
|
|
Q = \{ q_0 \} \cup \{ q_{1, a} \mid a \in \atoms \} \cup \{ q_{2, a} \mid a \in \atoms \},
|
|
\stopformula
|
|
where $\atoms$ is the set of atoms.
|
|
In this example, $\atoms$ is the set of all possible passwords.
|
|
The set $Q$ is infinite, but satisfies the two finiteness requirements.
|
|
|
|
The upshot of doing this is that the set $Q$ (and transition structure) corresponds directly to the semantics of the automaton.
|
|
We do not have to \emph{encode} how values relate or how they interact.
|
|
Instead, the set (and transition structure) defines all we need to know.
|
|
Algorithms, such as reachability, minimisation, and learning, can be run on such automata, despite the sets being infinite.
|
|
These algorithms can be implemented rather easily, by using a library which implements nominal sets.
|
|
|
|
One has to be careful as not all results from automata theory transfer to nominal automata.
|
|
A notable example is the powerset construction, which converts a non-deterministic automaton into a deterministic one.
|
|
The problem here is that the powerset of a set is generally \emph{not} orbit-finite and so the finiteness requirement is not met.
|
|
Consequently, languages accepted by nominal DFAs are \emph{not} closed under Kleene star, or even concatenation.
|
|
|
|
|
|
\stopsubsection
|
|
\stopsection
|
|
\startsection
|
|
[title={Contributions}]
|
|
|
|
The following chapters are split into two parts.
|
|
\in{Part}[part:testing] contains material about testing techniques, while \in{Part}[part:nominal] is about nominal techniques.
|
|
Each chapter could be read in isolation.
|
|
However, the chapters do get more technical and mathematical -- especially in \in{Part}[part:nominal].
|
|
\todo{Should I be more clear about my contribution?}
|
|
|
|
\startdescription[title={\in{Chapter}[chap:test-methods]: FSM-based test methods.}]
|
|
This chapter introduces test generation methods which can be used for learning.
|
|
The methods are presented in a uniform way, which allows to give a single proof of completeness for all these methods.
|
|
Moreover, the uniform presentation gives room to develop new test generation methods.
|
|
One particular contribution (besides the theoretical framework) is the new \emph{hybrid ADS} test generation method.
|
|
\stopdescription
|
|
|
|
\startdescription[title={\in{Chapter}[chap:applying-automata-learning]: Applying automata learning to embedded control software.}]
|
|
In this chapter we will apply model learning to an industrial case study.
|
|
It is a unique benchmark as it is much bigger than any of the applications seen before (3410 states and 77 inputs).
|
|
This makes it very challenging to learn a model and the main obstacle is finding counterexamples.
|
|
Here, my main contribution is the new test generation method from \in{Chapter}[chap:test-methods].
|
|
|
|
This is based on the following publication:
|
|
|
|
\cite[entry][DBLP:conf/icfem/SmeenkMVJ15].
|
|
\stopdescription
|
|
|
|
\startdescription[title={\in{Chapter}[chap:minimal-separating-sequences]: Minimal separating sequences for all pairs of states.}]
|
|
Continuing on test generation methods, this chapter presents an efficient algorithm to construct separating sequences.
|
|
Not only is the algorithm efficient -- it runs in $\bigO(n \log n)$ time -- it also constructs minimal length sequences.
|
|
The algorithm is based on a minimisation algorithm by \citet[Hopcroft71], but extending it to construct witnesses is non-trivial.
|
|
|
|
This is based on the following publication:
|
|
|
|
\cite[entry][DBLP:conf/lata/SmetsersMJ16].
|
|
\stopdescription
|
|
|
|
\startdescription[title={\in{Chapter}[chap:learning-nominal-automata]: Learning Nominal Automata.}]
|
|
In this chapter, we show how to learn automata over infinite alphabets.
|
|
We do this by translating the \LStar{} algorithm directly to a nominal version, \nLStar{}.
|
|
The correctness proofs mimic the original proofs by \citet[Angluin87].
|
|
Since our new algorithm is close to the original, we are able to translate variants of the \LStar{} algorithm as well.
|
|
In particular, we provide a learning algorithm for nominal non-deterministic automata.
|
|
|
|
This is based on the following publication:
|
|
|
|
\cite[entry][MoermanS0KS17].
|
|
\stopdescription
|
|
|
|
\startdescription[title={\in{Chapter}[chap:ordered-nominal-sets]: Fast computations on ordered nominal sets.}]
|
|
In this chapter, we provide a library to compute with nominal sets.
|
|
We restrict our attention to nominal sets over the total order symmetry.
|
|
This symmetry allows for a rather easy characterisation of orbits, and hence an easy implementation.
|
|
We experimentally show that it is competitive with existing tools, which are based on SMT solvers.
|
|
|
|
This is based on the following publication:
|
|
|
|
\cite[entry][VenhoekMR18].
|
|
\stopdescription
|
|
|
|
\startdescription[title={\in{Chapter}[chap:separated-nominal-automata]: Separated Nominal Automata.}]
|
|
We investigate how to reduce the size of certain nominal automata.
|
|
This is based on the observation that some languages (with outputs) are not just invariant under symmetries, but invariant under arbitrary \emph{transformations}, or \emph{renamings}.
|
|
We define a new type of automaton, the \emph{separated nominal automaton}, and show that they exactly accept those languages which are closed under renamings.
|
|
All of this is shown by using a theoretical framework: we establish a strong relationship between nominal sets on one hand, and nominal renaming sets on the other.
|
|
|
|
This is based on a paper under submission:
|
|
|
|
\cite[entry][MoermanR19].
|
|
\stopdescription
|
|
|
|
Besides these chapters, of which many are based on publications, the author has published the following papers.
|
|
These are not included in this thesis, but a short summary of those papers is presented below.
|
|
|
|
\startdescription[title={Complementing Model Learning with Mutation-Based Fuzzing.}]
|
|
Our group at the Radboud University participated in the RERS challenge.
|
|
This is a challenge where reactive software is provided and researchers have to asses validity of certain properties (given as LTL specifications).
|
|
We approached this with model learning:
|
|
Instead of analysing the source code, we simply learned the external behaviour, and then used model checking on the learned model.
|
|
This has worked remarkably well, as the models of the external behaviour are not too big.
|
|
Our results were presented at the RERS workshop (ISOLA 2016).
|
|
The report can be found on ar$\Chi$iv:
|
|
|
|
\cite[entry][SmetsersMJV16].
|
|
\stopdescription
|
|
|
|
\startdescription[title={$n$-Complete test suites for IOCO.}]
|
|
In this paper, we investigate complete test suites for labelled transition systems (LTSs), instead of deterministic Mealy machines.
|
|
This is a much harder problem than conformance testing of deterministic systems.
|
|
The system may adversarially avoid certain states the tester wishes to test.
|
|
We provide a test suite which is $n$-complete (provided the implementation is a suspension automaton).
|
|
My personal contribution here is the proof of completeness, which resembles the proof presented in \in{Chapter}[chap:test-methods] closely.
|
|
The conference paper was presented at ICTSS:
|
|
|
|
\cite[entry][BosJM17].
|
|
|
|
An extended version will appear soon:
|
|
|
|
\cite[entry][BosJM18].
|
|
\stopdescription
|
|
|
|
\startdescription[title={Learning Product Automata.}]
|
|
In this article we consider Moore machines with multiple outputs.
|
|
These machines can be decomposed by projecting on each output,
|
|
resulting in smaller components that can be learned with fewer queries.
|
|
We give experimental evidence that this is a useful technique which can reduce the number of queries substantially.
|
|
This is all motivated by the idea that compositional methods are widely used throughout engineering and that we should use this in model learning.
|
|
This was presented by Vaandrager at ICGI:
|
|
|
|
\cite[entry][Moerman18].
|
|
\stopdescription
|
|
|
|
\stopsection
|
|
\referencesifcomponent
|
|
\stopchapter
|
|
\stopcomponent
|