|
|
@ -5,7 +5,7 @@
|
|
|
|
[title={Introduction},
|
|
|
|
[title={Introduction},
|
|
|
|
reference=chap:introduction]
|
|
|
|
reference=chap:introduction]
|
|
|
|
|
|
|
|
|
|
|
|
When I was younger, I often learned how to play with new toys by messing around with them, i.e., by pressing buttons at random, observing their behaviour, pressing more buttons, and so on.
|
|
|
|
When I was younger, I often learned how to play with new toys by messing about with them, i.e., by pressing buttons at random, observing their behaviour, pressing more buttons, and so on.
|
|
|
|
Only resorting to the manual -- or asking \quotation{experts} -- to confirm my beliefs on how the toy works.
|
|
|
|
Only resorting to the manual -- or asking \quotation{experts} -- to confirm my beliefs on how the toy works.
|
|
|
|
Now that I am older, I do mostly the same with new devices, new tools, and new software.
|
|
|
|
Now that I am older, I do mostly the same with new devices, new tools, and new software.
|
|
|
|
However, now I know that this is an established computer science technique, called \emph{model learning}.
|
|
|
|
However, now I know that this is an established computer science technique, called \emph{model learning}.
|
|
|
@ -13,7 +13,7 @@ However, now I know that this is an established computer science technique, call
|
|
|
|
Model learning
|
|
|
|
Model learning
|
|
|
|
\footnote{There are many names for the type of learning, such as \emph{active automata learning}.
|
|
|
|
\footnote{There are many names for the type of learning, such as \emph{active automata learning}.
|
|
|
|
The generic name \quotation{model learning} is chosen as a counterpoint to model checking.}
|
|
|
|
The generic name \quotation{model learning} is chosen as a counterpoint to model checking.}
|
|
|
|
is a automated technique to construct a model -- often a type of \emph{automaton} -- from a black box system.
|
|
|
|
is an automated technique to construct a state-based model -- often a type of \emph{automaton} -- from a black box system.
|
|
|
|
The goal of this technique can be manifold:
|
|
|
|
The goal of this technique can be manifold:
|
|
|
|
it can be used to reverse-engineer a system, to find bugs in it, to verify properties of the system, or to understand the system in one way or another.
|
|
|
|
it can be used to reverse-engineer a system, to find bugs in it, to verify properties of the system, or to understand the system in one way or another.
|
|
|
|
It is \emph{not} just random testing: the information learned during the interaction with the system is actively used to guide following interactions.
|
|
|
|
It is \emph{not} just random testing: the information learned during the interaction with the system is actively used to guide following interactions.
|
|
|
@ -23,11 +23,11 @@ This thesis is about model learning and related techniques.
|
|
|
|
In the first part, I present results concerning \emph{black box testing} of automata.
|
|
|
|
In the first part, I present results concerning \emph{black box testing} of automata.
|
|
|
|
Testing is a crucial part in learning software behaviour and often remains a bottleneck in applications of model learning.
|
|
|
|
Testing is a crucial part in learning software behaviour and often remains a bottleneck in applications of model learning.
|
|
|
|
In the second part, I show how \emph{nominal techniques} can be used to learn automata over structured infinite alphabets.
|
|
|
|
In the second part, I show how \emph{nominal techniques} can be used to learn automata over structured infinite alphabets.
|
|
|
|
This was directly motivated by work on learning networks protocols which rely on identifiers or sequence numbers.
|
|
|
|
The study on nominal automata was directly motivated by work on learning networks protocols which rely on identifiers or sequence numbers.
|
|
|
|
|
|
|
|
|
|
|
|
But before we get ahead of ourselves, we should first understand what we mean by learning, as learning means very different things to different people.
|
|
|
|
But before we get ahead of ourselves, we should first understand what we mean by learning, as learning means very different things to different people.
|
|
|
|
In educational science, learning may involve concepts such as teaching, blended learning, and interdisciplinarity.
|
|
|
|
In educational science, learning may involve concepts such as teaching, blended learning, and interdisciplinarity.
|
|
|
|
Data scientist may think of data compression, feature extraction, and neural networks.
|
|
|
|
Data scientists may think of data compression, feature extraction, and neural networks.
|
|
|
|
In this thesis we are mostly concerned with software verification.
|
|
|
|
In this thesis we are mostly concerned with software verification.
|
|
|
|
But even in the field of verification several types of learning are relevant.
|
|
|
|
But even in the field of verification several types of learning are relevant.
|
|
|
|
|
|
|
|
|
|
|
@ -35,7 +35,7 @@ But even in the field of verification several types of learning are relevant.
|
|
|
|
\startsection
|
|
|
|
\startsection
|
|
|
|
[title={Model Learning}]
|
|
|
|
[title={Model Learning}]
|
|
|
|
|
|
|
|
|
|
|
|
In the context of software verification, we often look at stateful computation with inputs and outputs.
|
|
|
|
In the context of software verification, we often look at stateful computations with inputs and outputs.
|
|
|
|
For this reason, it makes sense to look at \emph{words}, or \emph{traces}.
|
|
|
|
For this reason, it makes sense to look at \emph{words}, or \emph{traces}.
|
|
|
|
For an alphabet $\Sigma$, we denote the set of words by $\Sigma^{*}$.
|
|
|
|
For an alphabet $\Sigma$, we denote the set of words by $\Sigma^{*}$.
|
|
|
|
|
|
|
|
|
|
|
@ -44,14 +44,13 @@ There is some fixed, but unknown, language $\lang \subseteq \Sigma^{*}$.
|
|
|
|
This language may define the behaviour of a software component, a property in model checking, a set of traces from a protocol, etc.
|
|
|
|
This language may define the behaviour of a software component, a property in model checking, a set of traces from a protocol, etc.
|
|
|
|
We wish to infer a description of $\lang$ by only having observed a small part of this language.
|
|
|
|
We wish to infer a description of $\lang$ by only having observed a small part of this language.
|
|
|
|
For example, we may have seen hundred words belonging to the language and a few which do not belong to the language.
|
|
|
|
For example, we may have seen hundred words belonging to the language and a few which do not belong to the language.
|
|
|
|
Then conluding with a good description of $\lang$ is difficult, as we are missing information of infinitely many words.
|
|
|
|
Then concluding with a good description of $\lang$ is difficult, as we are missing information about the infinitely many words we have not observed.
|
|
|
|
|
|
|
|
|
|
|
|
Such a learning problem can be stated and solved in a variety of ways.
|
|
|
|
Such a learning problem can be stated and solved in a variety of ways.
|
|
|
|
In the applications we do in our research group, we often try to infer a model of a software component.
|
|
|
|
In the applications we do in our research group, we often try to infer a model of a software component.
|
|
|
|
(\in{Chapter}[chap:applying-automata-learning] describes such an application.)
|
|
|
|
(\in{Chapter}[chap:applying-automata-learning] describes such an application.)
|
|
|
|
In such cases, a learning algorithm can interact with the software.
|
|
|
|
In these cases, a learning algorithm can interact with the software.
|
|
|
|
So it makes sense to study a learning paradigm which allows for \emph{queries}, and not just a data set of samples.
|
|
|
|
So it makes sense to study a learning paradigm which allows for \emph{queries}, and not just a data set of samples.
|
|
|
|
\footnote{Instead of query learning, people also use the term \emph{active learning}.}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
A typical query learning framework was established by \citet[Angluin87].
|
|
|
|
A typical query learning framework was established by \citet[Angluin87].
|
|
|
|
In her framework, the learning algorithm may pose two types of queries to a \emph{teacher}, or \emph{oracle}:
|
|
|
|
In her framework, the learning algorithm may pose two types of queries to a \emph{teacher}, or \emph{oracle}:
|
|
|
@ -59,7 +58,7 @@ In her framework, the learning algorithm may pose two types of queries to a \emp
|
|
|
|
\description{Membership queries (MQs)}
|
|
|
|
\description{Membership queries (MQs)}
|
|
|
|
The learner poses such a query by providing a word $w \in \Sigma^{*}$ to the teacher.
|
|
|
|
The learner poses such a query by providing a word $w \in \Sigma^{*}$ to the teacher.
|
|
|
|
The teacher will then reply whether $w \in \lang$ or not.
|
|
|
|
The teacher will then reply whether $w \in \lang$ or not.
|
|
|
|
This type of query is often generalised to more general output, so the teacher replies whether $w \in \lang$ or not.
|
|
|
|
This type of query is often generalised to more general output, in these cases we consider $\lang \colon \Sigma^\ast \to O$ and the teacher replies whether $\lang(w)$.
|
|
|
|
In some papers, such a query is then called an \emph{output query}.
|
|
|
|
In some papers, such a query is then called an \emph{output query}.
|
|
|
|
|
|
|
|
|
|
|
|
\description{Equivalence queries (EQs)}
|
|
|
|
\description{Equivalence queries (EQs)}
|
|
|
@ -69,10 +68,11 @@ If, however, the hypothesis is incorrect, the teacher replies with \kw{no} toget
|
|
|
|
|
|
|
|
|
|
|
|
By posing many such queries, the learner algorithm is supposed to converge to a correct model.
|
|
|
|
By posing many such queries, the learner algorithm is supposed to converge to a correct model.
|
|
|
|
This type of learning is hence called \emph{exact learning}.
|
|
|
|
This type of learning is hence called \emph{exact learning}.
|
|
|
|
\citet[Angluin87] showed that one can do this efficiently for deterministic finite automata DFAs (when $\lang$ is in the class of regular languages).
|
|
|
|
\citet[Angluin87] showed that one can do this efficiently for deterministic finite automata (DFAs), when $\lang$ is in the class of regular languages.
|
|
|
|
|
|
|
|
|
|
|
|
It should be clear why this is called \emph{query learning} or \emph{active learning}.
|
|
|
|
It should be clear why this is called \emph{query learning} or \emph{active learning}.
|
|
|
|
The learning algorithm initiates interaction with the teacher by posing queries, it may construct its own data points and ask for their corresponding label.
|
|
|
|
The learning algorithm initiates interaction with the teacher by posing queries, it may construct its own data points and ask for their corresponding label.
|
|
|
|
|
|
|
|
Active learning is in contrast to \emph{passive learning} where all observations are given to the algorithm up front.
|
|
|
|
|
|
|
|
|
|
|
|
Another paradigm which is relevant for our type of applications is \emph{PAC-learning with membership queries}.
|
|
|
|
Another paradigm which is relevant for our type of applications is \emph{PAC-learning with membership queries}.
|
|
|
|
Here, the algorithm can again use MQs as before, but the EQs are replace by random sampling.
|
|
|
|
Here, the algorithm can again use MQs as before, but the EQs are replace by random sampling.
|
|
|
@ -85,7 +85,7 @@ Instead of requiring that the learner exactly learns the model, we only require
|
|
|
|
The learner should \emph{probably} return a model which is \emph{approximate} to the target.
|
|
|
|
The learner should \emph{probably} return a model which is \emph{approximate} to the target.
|
|
|
|
This gives the name \emph{probably approximately correct} (PAC).
|
|
|
|
This gives the name \emph{probably approximately correct} (PAC).
|
|
|
|
Note that there are two uncertainties: the probable and the approximate part.
|
|
|
|
Note that there are two uncertainties: the probable and the approximate part.
|
|
|
|
Both part are bounded by parameters, so one can determine the confidence.
|
|
|
|
Both parts are bounded by parameters, so one can determine the confidence.
|
|
|
|
|
|
|
|
|
|
|
|
As with many problems in computer science, we are also interested in the \emph{efficiency} of learning algorithms.
|
|
|
|
As with many problems in computer science, we are also interested in the \emph{efficiency} of learning algorithms.
|
|
|
|
Instead of measuring time or space, we analyse the number of queries posed by an algorithm.
|
|
|
|
Instead of measuring time or space, we analyse the number of queries posed by an algorithm.
|
|
|
@ -112,14 +112,14 @@ This is a useful abstraction when designing a learning algorithm.
|
|
|
|
One can analyse the complexity (in terms of number of queries) independently of how these queries are resolved.
|
|
|
|
One can analyse the complexity (in terms of number of queries) independently of how these queries are resolved.
|
|
|
|
Nevertheless, at some point in time one has to implement them.
|
|
|
|
Nevertheless, at some point in time one has to implement them.
|
|
|
|
In our case of learning software behaviour, membership queries are easily implemented:
|
|
|
|
In our case of learning software behaviour, membership queries are easily implemented:
|
|
|
|
Simply provide the word $w$ to a running instance of the software and observe the output.
|
|
|
|
simply provide the word $w$ to a running instance of the software and observe the output.
|
|
|
|
\footnote{In reality, it is a bit harder than this.
|
|
|
|
\footnote{In reality, it is a bit harder than this.
|
|
|
|
There are plentiful of challenges to solve, such as timing, choosing your alphabet, choosing the kind of observations to make, and being able to reliably reset the software.}
|
|
|
|
There are plentiful of challenges to solve, such as timing, choosing your alphabet, choosing the kind of observations to make, and being able to reliably reset the software.}
|
|
|
|
Equivalence queries, however, are in general not doable.
|
|
|
|
Equivalence queries, however, are in general not doable.
|
|
|
|
Even if we have the (machine) code, it is often way too complicated to check equivalence.
|
|
|
|
Even if we have the (machine) code, it is often way too complicated to check equivalence.
|
|
|
|
That is why we resort to testing with EX queries.
|
|
|
|
That is why we resort to testing with EX queries.
|
|
|
|
The EX query from PAC learning normally assumes a fixed, unknown, distribution.
|
|
|
|
The EX query from PAC learning normally assumes a fixed, unknown, distribution.
|
|
|
|
In our case, we implement a distribution to test against.
|
|
|
|
In our case, we choose and implement a distribution to test against.
|
|
|
|
This cuts both ways:
|
|
|
|
This cuts both ways:
|
|
|
|
On one hand, it allows us to only test behaviour we really case about, on the other hand the results are only as good as our choice of distribution.
|
|
|
|
On one hand, it allows us to only test behaviour we really case about, on the other hand the results are only as good as our choice of distribution.
|
|
|
|
We deviate even further from the PAC-model as we sometimes change our distribution while learning.
|
|
|
|
We deviate even further from the PAC-model as we sometimes change our distribution while learning.
|
|
|
@ -131,8 +131,9 @@ Yet, as applications show, this is a useful way of learning software behaviour.
|
|
|
|
[title={Research challenges}]
|
|
|
|
[title={Research challenges}]
|
|
|
|
|
|
|
|
|
|
|
|
% Model learning is far from a
|
|
|
|
% Model learning is far from a
|
|
|
|
In this thesis, we will mostly see learning of DFAs or Mealy machines.
|
|
|
|
In this thesis, we will mostly see learning of deterministic automata or Mealy machines.
|
|
|
|
Although this is restrictive as many pieces of software require richer models, it has been successfully applied in many different areas.
|
|
|
|
Although this is restrictive as many pieces of software require richer models, it has been successfully applied in many different areas.
|
|
|
|
|
|
|
|
\todo{why}
|
|
|
|
The restrictions include the following.
|
|
|
|
The restrictions include the following.
|
|
|
|
\startitemize[after]
|
|
|
|
\startitemize[after]
|
|
|
|
\item The system behaves deterministically.
|
|
|
|
\item The system behaves deterministically.
|
|
|
@ -143,8 +144,7 @@ This also means that the model does not incorporate time or data.
|
|
|
|
\item One knows when the target is reached.
|
|
|
|
\item One knows when the target is reached.
|
|
|
|
\stopitemize
|
|
|
|
\stopitemize
|
|
|
|
|
|
|
|
|
|
|
|
\description{Research challenge 1: Confidence in the hypothesis.}
|
|
|
|
\description{Research challenge 1: Approximating equivalence queries.}
|
|
|
|
\todo{/ Approximating equivalence queries.}
|
|
|
|
|
|
|
|
Having confidence in a learned model is difficult.
|
|
|
|
Having confidence in a learned model is difficult.
|
|
|
|
We have PAC guarantees (as discussed before), but sometimes we may want to draw other conclusions.
|
|
|
|
We have PAC guarantees (as discussed before), but sometimes we may want to draw other conclusions.
|
|
|
|
For example, we may require the hypothesis to be correct, provided that the real system is implemented with a certain number of states.
|
|
|
|
For example, we may require the hypothesis to be correct, provided that the real system is implemented with a certain number of states.
|
|
|
@ -155,19 +155,17 @@ We will review existing algorithms and discuss new algorithms for test generatio
|
|
|
|
|
|
|
|
|
|
|
|
\startdescription[title={Research challenge 2: Generalisation to infinite alphabets.}]
|
|
|
|
\startdescription[title={Research challenge 2: Generalisation to infinite alphabets.}]
|
|
|
|
Automata over infinite alphabets are very useful for modelling protocols which involve identifiers or timestamps.
|
|
|
|
Automata over infinite alphabets are very useful for modelling protocols which involve identifiers or timestamps.
|
|
|
|
Not only is the alphabet infinite in these cases, the state-space is as well, since the values have to be remembered.
|
|
|
|
Not only is the alphabet infinite in these cases, the state space is as well, since the values have to be remembered.
|
|
|
|
In the second part of this thesis, we will see how nominal techniques can be used to tackle this challenge.
|
|
|
|
In the second part of this thesis, we will see how nominal techniques can be used to tackle this challenge.
|
|
|
|
|
|
|
|
|
|
|
|
Being able to learn automata over an infinite alphabet is not new.
|
|
|
|
Being able to learn automata over an infinite alphabet is not new.
|
|
|
|
It has been tackled in the theses of \citet[Aarts14], \citet[Cassel15], and \citet[Fiterau-Brostean18].
|
|
|
|
It has been tackled, for instance, by \citet[HowarSJC12], \citet[BolligHLM13] and in the theses of \citet[Aarts14], \citet[Cassel15], and \citet[Fiterau-Brostean18].
|
|
|
|
In the first, the problem is solved by considering abstractions, which reduces the alphabet to a finite one.
|
|
|
|
In the first thesis, the problem is solved by considering abstractions, which reduces the alphabet to a finite one.
|
|
|
|
These abstractions are automatically refined when a counterexample is presented to the algorithms.
|
|
|
|
These abstractions are automatically refined when a counterexample is presented to the algorithms.
|
|
|
|
\citet[Fiterau-Brostean18] extends this approach to cope with \quotation{fresh values}, crucial for protocols such as TCP.
|
|
|
|
\citet[Fiterau-Brostean18] extends this approach to cope with \quotation{fresh values}, crucial for protocols such as TCP.
|
|
|
|
In the thesis by \citet[Cassel15], another approach is taken.
|
|
|
|
In the thesis by \citet[Cassel15], another approach is taken.
|
|
|
|
The queries are changed to tree queries.
|
|
|
|
The queries are changed to tree queries.
|
|
|
|
The approach in my thesis will be based on symmetries, which gives yet another insight into the problem of learning such automata.
|
|
|
|
The approach in my thesis will be based on symmetries, which gives yet another insight into the problem of learning such automata.
|
|
|
|
|
|
|
|
|
|
|
|
\todo{Ook mensen als Bernhard citeren.}
|
|
|
|
|
|
|
|
\stopdescription
|
|
|
|
\stopdescription
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@ -175,7 +173,7 @@ The approach in my thesis will be based on symmetries, which gives yet another i
|
|
|
|
\startsection
|
|
|
|
\startsection
|
|
|
|
[title={Applications of Model Learning}]
|
|
|
|
[title={Applications of Model Learning}]
|
|
|
|
|
|
|
|
|
|
|
|
Since this thesis only contains one real-world application on learning in \in{Chapter}[chap:applying-automata-learning], I think it is good to mention a few others.
|
|
|
|
Since this thesis only contains one real-world application on learning in \in{Chapter}[chap:applying-automata-learning], it is good to mention a few others.
|
|
|
|
Although we remain in the context of learning software behaviour, the applications are quite different.
|
|
|
|
Although we remain in the context of learning software behaviour, the applications are quite different.
|
|
|
|
This is by no means a complete list.
|
|
|
|
This is by no means a complete list.
|
|
|
|
|
|
|
|
|
|
|
@ -211,7 +209,7 @@ This gave developers opportunities to solve problems before replacing the old co
|
|
|
|
|
|
|
|
|
|
|
|
\stopsection
|
|
|
|
\stopsection
|
|
|
|
\startsection
|
|
|
|
\startsection
|
|
|
|
[title={Testing Methods}]
|
|
|
|
[title={Black Box Testing}]
|
|
|
|
|
|
|
|
|
|
|
|
An important step in automata learning is equivalence checking.
|
|
|
|
An important step in automata learning is equivalence checking.
|
|
|
|
Normally, this is abstracted away and done by an oracle, but we intend to implement such an oracle ourselves for our applications.
|
|
|
|
Normally, this is abstracted away and done by an oracle, but we intend to implement such an oracle ourselves for our applications.
|
|
|
@ -231,7 +229,7 @@ In particular, the set of experiments is polynomial in size.
|
|
|
|
These techniques will be discussed in detail in \in{Chapter}[chap:test-methods].
|
|
|
|
These techniques will be discussed in detail in \in{Chapter}[chap:test-methods].
|
|
|
|
More background and other related problems, as well as their complexity results, are well exposed in a survey of \citet[LeeY94].
|
|
|
|
More background and other related problems, as well as their complexity results, are well exposed in a survey of \citet[LeeY94].
|
|
|
|
|
|
|
|
|
|
|
|
In the following example we model a record player as a finite state machine.
|
|
|
|
To give an example of conformance checking, we model a record player as a finite state machine.
|
|
|
|
We will not model the audible output -- that would depend not only on the device, but also the record one chooses to play\footnote{In particular, we have to add time to the model as one side of a record only lasts for roughly 25 minutes. Unless we take a record with sound on the locked groove such as the \emph{Sgt. Pepper's Lonely Hearts Club Band} album by \emph{The Beatles}.}.
|
|
|
|
We will not model the audible output -- that would depend not only on the device, but also the record one chooses to play\footnote{In particular, we have to add time to the model as one side of a record only lasts for roughly 25 minutes. Unless we take a record with sound on the locked groove such as the \emph{Sgt. Pepper's Lonely Hearts Club Band} album by \emph{The Beatles}.}.
|
|
|
|
Instead, the only observation we can make is looking how fast the turntable spins.
|
|
|
|
Instead, the only observation we can make is looking how fast the turntable spins.
|
|
|
|
The device has two buttons: a start-stop button (\playstop) and a speed button (\spin) which toggles between $33 \frac{1}{3}$ rpm and $45$ rpm.
|
|
|
|
The device has two buttons: a start-stop button (\playstop) and a speed button (\spin) which toggles between $33 \frac{1}{3}$ rpm and $45$ rpm.
|
|
|
@ -292,7 +290,7 @@ In this example, a test suite should at least include the sequences \playstop\,\
|
|
|
|
(tr) edge [bend right] node [left] {\playstop} (br)
|
|
|
|
(tr) edge [bend right] node [left] {\playstop} (br)
|
|
|
|
(br) edge [bend right] node [right] {\playstop} (tr)
|
|
|
|
(br) edge [bend right] node [right] {\playstop} (tr)
|
|
|
|
(br) edge [loop below, color=red] node [above] {\spin} (br);
|
|
|
|
(br) edge [loop below, color=red] node [above] {\spin} (br);
|
|
|
|
\stoptikzpicture}}{a}
|
|
|
|
\stoptikzpicture}}{(a)}
|
|
|
|
{\hbox{\starttikzpicture[node distance=3cm, bend angle=10]
|
|
|
|
{\hbox{\starttikzpicture[node distance=3cm, bend angle=10]
|
|
|
|
\node[state, initial above, align=center] (tl) {slow\\spinning};
|
|
|
|
\node[state, initial above, align=center] (tl) {slow\\spinning};
|
|
|
|
\node[state, right of=tl, align=center] (tr) {fast\\spinning};
|
|
|
|
\node[state, right of=tl, align=center] (tr) {fast\\spinning};
|
|
|
@ -308,7 +306,7 @@ In this example, a test suite should at least include the sequences \playstop\,\
|
|
|
|
(tr) edge [color=red] node [below right] {\playstop} (bl)
|
|
|
|
(tr) edge [color=red] node [below right] {\playstop} (bl)
|
|
|
|
(br) edge [bend right] node [right] {\playstop} (tr)
|
|
|
|
(br) edge [bend right] node [right] {\playstop} (tr)
|
|
|
|
(br) edge [bend right] node [above] {\spin} (bl);
|
|
|
|
(br) edge [bend right] node [above] {\spin} (bl);
|
|
|
|
\stoptikzpicture}}{b}
|
|
|
|
\stoptikzpicture}}{(b)}
|
|
|
|
\stopcombination
|
|
|
|
\stopcombination
|
|
|
|
\stopplacefigure
|
|
|
|
\stopplacefigure
|
|
|
|
|
|
|
|
|
|
|
@ -319,21 +317,21 @@ In this example, a test suite should at least include the sequences \playstop\,\
|
|
|
|
|
|
|
|
|
|
|
|
In the second part of this thesis, I will present results related to \emph{nominal automata}.
|
|
|
|
In the second part of this thesis, I will present results related to \emph{nominal automata}.
|
|
|
|
Usually, nominal techniques are introduced in order to solve problems involving name binding in topics like lambda calculus.
|
|
|
|
Usually, nominal techniques are introduced in order to solve problems involving name binding in topics like lambda calculus.
|
|
|
|
However, we use them to deal with \emph{register automata}.
|
|
|
|
However, we use them in automata theory, specifically to model \emph{register automata}.
|
|
|
|
These are automata which have an infinite alphabet, often thought of as input actions with \emph{data}.
|
|
|
|
These are automata which have an infinite alphabet, often thought of as input actions with \emph{data}.
|
|
|
|
The control flow of the automaton may actually depend on the data.
|
|
|
|
The control flow of the automaton may actually depend on the data.
|
|
|
|
However, the data cannot be used in an arbitrary way as this would lead to many problems being undecidable.
|
|
|
|
However, the data cannot be used in an arbitrary way as this would lead to many problems, such as emptiness, being undecidable.
|
|
|
|
\footnote{The class of automata with arbitrary data operations is sometimes called \emph{extended finite state machines}.}
|
|
|
|
\footnote{The class of automata with arbitrary data operations is sometimes called \emph{extended finite state machines}.}
|
|
|
|
A crucial concept in nominal automata is that of \emph{symmetries}.
|
|
|
|
A principal concept in nominal techniques is that of \emph{symmetries}.
|
|
|
|
|
|
|
|
|
|
|
|
To motivate the use of symmetries, we will look at an example of a register auttomaton.
|
|
|
|
To motivate the use of symmetries, we will look at an example of a register auttomaton.
|
|
|
|
In the following automaton we model a (not-so-realistic) login system for a single person.
|
|
|
|
In the following automaton we model a (not-so-realistic) login system for a single person.
|
|
|
|
The alphabet consists of the following actions:
|
|
|
|
The alphabet consists of the following actions:
|
|
|
|
\startformula\startmathmatrix[n=2, distance=1cm]
|
|
|
|
\startformula\startmathmatrix[n=2, distance=1cm]
|
|
|
|
\NC \kw{register}(p) \NC \kw{logout}() \NR
|
|
|
|
\NC \kw{sign-up}(p) \NC \kw{logout}() \NR
|
|
|
|
\NC \kw{login}(p) \NC \kw{view}() \NR
|
|
|
|
\NC \kw{login}(p) \NC \kw{view}() \NR
|
|
|
|
\stopmathmatrix\stopformula
|
|
|
|
\stopmathmatrix\stopformula
|
|
|
|
The \kw{register} action allows one to set a password $p$.
|
|
|
|
The \kw{sign-up} action allows one to set a password $p$.
|
|
|
|
This can only be done when the system is initialised.
|
|
|
|
This can only be done when the system is initialised.
|
|
|
|
The \kw{login} and \kw{logout} actions speak for themselves and the \kw{view} action allows one to see the secret data (we abstract away from what the user actually gets to see here).
|
|
|
|
The \kw{login} and \kw{logout} actions speak for themselves and the \kw{view} action allows one to see the secret data (we abstract away from what the user actually gets to see here).
|
|
|
|
A simple automaton with roughly this behaviour is given in \in{Figure}[fig:login-system].
|
|
|
|
A simple automaton with roughly this behaviour is given in \in{Figure}[fig:login-system].
|
|
|
@ -350,7 +348,7 @@ We will only informally discuss its semantics for now.
|
|
|
|
|
|
|
|
|
|
|
|
\path[->]
|
|
|
|
\path[->]
|
|
|
|
(0) edge [loop below] node [below] {$\ast$ / \checknok}
|
|
|
|
(0) edge [loop below] node [below] {$\ast$ / \checknok}
|
|
|
|
(0) edge node [above, align=center] {$\kw{register}(p)$ / \checkok \\ set $r := p$} (1)
|
|
|
|
(0) edge node [above, align=center] {$\kw{sign-up}(p)$ / \checkok \\ set $r := p$} (1)
|
|
|
|
(1) edge [loop below] node [below] {$\ast$ / \checknok}
|
|
|
|
(1) edge [loop below] node [below] {$\ast$ / \checknok}
|
|
|
|
(1) edge [bend left, align=center] node [above] {$\kw{login}(p)$ / \checkok \\ if $r = p$} (2)
|
|
|
|
(1) edge [bend left, align=center] node [above] {$\kw{login}(p)$ / \checkok \\ if $r = p$} (2)
|
|
|
|
(2) edge [loop below] node [right, align=left] {$\kw{view}()$ / \checkok \\ $\ast$ / \checknok}
|
|
|
|
(2) edge [loop below] node [right, align=left] {$\kw{view}()$ / \checkok \\ $\ast$ / \checknok}
|
|
|
@ -358,24 +356,24 @@ We will only informally discuss its semantics for now.
|
|
|
|
\stoptikzpicture}
|
|
|
|
\stoptikzpicture}
|
|
|
|
\stopplacefigure
|
|
|
|
\stopplacefigure
|
|
|
|
|
|
|
|
|
|
|
|
To model the behaviour nicely, we want the domain of passwords to be infinite.
|
|
|
|
To model the behaviour, we want the domain of passwords to be infinite.
|
|
|
|
After all, one should allow arbitrarily long passwords to be secure.
|
|
|
|
After all, one should allow arbitrarily long passwords to be secure.
|
|
|
|
This means that a register automaton is actually an automaton over an infinite alphabet.
|
|
|
|
This means that a register automaton is actually an automaton over an infinite alphabet.
|
|
|
|
\todo{Opmerking Frits}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Common algorithms for automata, such as learning, will not work with an infinite alphabet.
|
|
|
|
Common algorithms for automata, such as learning, will not work with an infinite alphabet.
|
|
|
|
Any loop which iterates over the alphabet will diverge.
|
|
|
|
Any loop which iterates over the alphabet will diverge.
|
|
|
|
In order to cope with this, we will use the \emph{symmetries} present in the alphabet.
|
|
|
|
In order to cope with this, we will use the \emph{symmetries} present in the alphabet.
|
|
|
|
|
|
|
|
|
|
|
|
Let us continue with the example and look at its symmetries.
|
|
|
|
Let us continue with the example and look at its symmetries.
|
|
|
|
If a person registers with a password \quotation{\kw{hello}} and consequently logins with \quotation{\kw{hello}}, then this is not distinguishable from a person registering and logging in with \quotation{\kw{bye}}.
|
|
|
|
If a person signs up with a password \quotation{\kw{hello}} and consequently logins with \quotation{\kw{hello}}, then this is not distinguishable from a person signing up and logging in with \quotation{\kw{bye}}.
|
|
|
|
This is an example of symmetry:
|
|
|
|
This is an example of symmetry:
|
|
|
|
the values \quotation{\kw{hello}} and \quotation{\kw{bye}} can be permuted, or interchanged.
|
|
|
|
the values \quotation{\kw{hello}} and \quotation{\kw{bye}} can be permuted, or interchanged.
|
|
|
|
Note, however, that the trace $\kw{register}(\kw{hello})\, \kw{login}(\kw{bye})$ is different from the two before.
|
|
|
|
Note, however, that the trace $\kw{sign-up}(\kw{hello})\, \kw{login}(\kw{bye})$ is different from the two before:
|
|
|
|
So we see that we cannot simply collapse the whole alphabet into one equivalence class.
|
|
|
|
no permutation of \quotation{\kw{hello}} and \quotation{\kw{bye}} will bring us to a logged-in state with that trace.
|
|
|
|
|
|
|
|
So we see that, despite the symmetry, we cannot simply identify the value \quotation{\kw{hello}} and \quotation{\kw{bye}}.
|
|
|
|
For this reason, we keep the alphabet infinite and explicitly mention its symmetries.
|
|
|
|
For this reason, we keep the alphabet infinite and explicitly mention its symmetries.
|
|
|
|
|
|
|
|
|
|
|
|
Before formalising symmetries, let me remark the the use of symmetries in automata theory is not new.
|
|
|
|
Using symmetries in automata theory is not a new idea.
|
|
|
|
One of the first to use symmetries was \citet[Sakamoto97].
|
|
|
|
One of the first to use symmetries was \citet[Sakamoto97].
|
|
|
|
He devised an \LStar{} learning algorithm for register automata, much like the one presented in \in{Chapter}[chap:learning-nominal-automata].
|
|
|
|
He devised an \LStar{} learning algorithm for register automata, much like the one presented in \in{Chapter}[chap:learning-nominal-automata].
|
|
|
|
The symmetries are crucial to reduce the problem to a finite alphabet and use the regular \LStar{} algorithm.
|
|
|
|
The symmetries are crucial to reduce the problem to a finite alphabet and use the regular \LStar{} algorithm.
|
|
|
@ -385,18 +383,18 @@ Their approach was based on the same symmetries and they developed a theory of \
|
|
|
|
Named sets are equivalent to nominal sets.
|
|
|
|
Named sets are equivalent to nominal sets.
|
|
|
|
However, nominal sets are defined in a more elementary way.
|
|
|
|
However, nominal sets are defined in a more elementary way.
|
|
|
|
The nominal sets we will soon see are introduced by \citet[GabbayP02] to solve certain problems in name binding in abstract syntaxes.
|
|
|
|
The nominal sets we will soon see are introduced by \citet[GabbayP02] to solve certain problems in name binding in abstract syntaxes.
|
|
|
|
Although this is not really related to automata theory, it was picked up by \citet[BojanczykKL14].
|
|
|
|
Although this is not really related to automata theory, it was picked up by \citet[BojanczykKL14],
|
|
|
|
They provide an equivalence between register automata and nominal automata.
|
|
|
|
who provide an equivalence between register automata and nominal automata.
|
|
|
|
Additionally, they generalise the work on nominal sets to other symmetries.
|
|
|
|
Additionally, they generalise the work on nominal sets to other symmetries.
|
|
|
|
|
|
|
|
|
|
|
|
The symmetries we encounter in this thesis are the following, but other symmetries can be found in the literature.
|
|
|
|
The symmetries we encounter in this thesis are listed below, but other symmetries can be found in the literature.
|
|
|
|
The symmetry directly corresponds to the data values (and operations) used in an automaton.
|
|
|
|
The symmetry directly corresponds to the data values (and operations) used in an automaton.
|
|
|
|
The data values are often called \emph{atoms}.
|
|
|
|
The data values are often called \emph{atoms}.
|
|
|
|
\startitemize[after]
|
|
|
|
\startitemize[after]
|
|
|
|
\item
|
|
|
|
\item
|
|
|
|
The \quotation{equality symmetry}.
|
|
|
|
The \quotation{equality symmetry}.
|
|
|
|
Here the domain can be any countably infinite set.
|
|
|
|
Here the domain can be any countably infinite set.
|
|
|
|
We can take, for example, the set of string we used before as the domain from which we take passwords.
|
|
|
|
We can take, for example, the set of strings we used before as the domain from which we take passwords.
|
|
|
|
No further structure is used on this domain, meaning that any value is just as good as any other.
|
|
|
|
No further structure is used on this domain, meaning that any value is just as good as any other.
|
|
|
|
The symmetries therefore consist of all bijections on this domain.
|
|
|
|
The symmetries therefore consist of all bijections on this domain.
|
|
|
|
\item
|
|
|
|
\item
|
|
|
@ -438,12 +436,14 @@ The upshot of doing this is that the set $Q$ (and transition structure) correspo
|
|
|
|
We do not have to \emph{encode} how values relate or how they interact.
|
|
|
|
We do not have to \emph{encode} how values relate or how they interact.
|
|
|
|
Instead, the set (and transition structure) defines all we need to know.
|
|
|
|
Instead, the set (and transition structure) defines all we need to know.
|
|
|
|
Algorithms, such as reachability, minimisation, and learning, can be run on such automata, despite the sets being infinite.
|
|
|
|
Algorithms, such as reachability, minimisation, and learning, can be run on such automata, despite the sets being infinite.
|
|
|
|
These algorithms can be implemented rather easily, by using a library which implements nominal sets.
|
|
|
|
These algorithms can be implemented rather easily by using a libraries such as \NLambda{}, \LOIS{}, or \ONS{} from \in{Chapter}[chap:ordered-nominal-sets].
|
|
|
|
|
|
|
|
These libraries implement a data structure for nominal sets, and provide ways to iterate over such (infinite) sets.
|
|
|
|
|
|
|
|
|
|
|
|
One has to be careful as not all results from automata theory transfer to nominal automata.
|
|
|
|
One has to be careful as not all results from automata theory transfer to nominal automata.
|
|
|
|
A notable example is the powerset construction, which converts a non-deterministic automaton into a deterministic one.
|
|
|
|
A notable example is the powerset construction, which converts a non-deterministic automaton into a deterministic one.
|
|
|
|
The problem here is that the powerset of a set is generally \emph{not} orbit-finite and so the finiteness requirement is not met.
|
|
|
|
The problem here is that the powerset of a set is generally \emph{not} orbit-finite and so the finiteness requirement is not met.
|
|
|
|
Consequently, languages accepted by nominal DFAs are \emph{not} closed under Kleene star, or even concatenation.
|
|
|
|
Consequently, languages accepted by nominal DFAs are \emph{not} closed under Kleene star, or even concatenation.
|
|
|
|
|
|
|
|
\todo{ref?}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
\stopsubsection
|
|
|
|
\stopsubsection
|
|
|
@ -451,12 +451,13 @@ Consequently, languages accepted by nominal DFAs are \emph{not} closed under Kle
|
|
|
|
\startsection
|
|
|
|
\startsection
|
|
|
|
[title={Contributions}]
|
|
|
|
[title={Contributions}]
|
|
|
|
|
|
|
|
|
|
|
|
The following chapters are split into two parts.
|
|
|
|
This thesis is split into two parts.
|
|
|
|
\in{Part}[part:testing] contains material about testing techniques, while \in{Part}[part:nominal] is about nominal techniques.
|
|
|
|
\in{Part}[part:testing] contains material about black box testing, while \in{Part}[part:nominal] is about nominal techniques.
|
|
|
|
Each chapter could be read in isolation.
|
|
|
|
The chapters can be read in isolation.
|
|
|
|
However, the chapters do get more technical and mathematical -- especially in \in{Part}[part:nominal].
|
|
|
|
However, the chapters do get more technical and mathematical -- especially in \in{Part}[part:nominal].
|
|
|
|
|
|
|
|
|
|
|
|
Detailed discussion on the contributions, relations to other research, and future directions of research are presented in each chapter.
|
|
|
|
Detailed discussion on the contributions, relations to other research, and future directions of research are presented in each chapter.
|
|
|
|
|
|
|
|
\todo{Expand w/ a list of contributions for each chapter.}
|
|
|
|
|
|
|
|
|
|
|
|
\startdescription[title={\in{Chapter}[chap:test-methods]: FSM-based test methods.}]
|
|
|
|
\startdescription[title={\in{Chapter}[chap:test-methods]: FSM-based test methods.}]
|
|
|
|
This chapter introduces test generation methods which can be used for learning.
|
|
|
|
This chapter introduces test generation methods which can be used for learning.
|
|
|
@ -556,7 +557,7 @@ These machines can be decomposed by projecting on each output,
|
|
|
|
resulting in smaller components that can be learned with fewer queries.
|
|
|
|
resulting in smaller components that can be learned with fewer queries.
|
|
|
|
We give experimental evidence that this is a useful technique which can reduce the number of queries substantially.
|
|
|
|
We give experimental evidence that this is a useful technique which can reduce the number of queries substantially.
|
|
|
|
This is all motivated by the idea that compositional methods are widely used throughout engineering and that we should use this in model learning.
|
|
|
|
This is all motivated by the idea that compositional methods are widely used throughout engineering and that we should use this in model learning.
|
|
|
|
This work was presented at ICGI:
|
|
|
|
This work was presented at ICGI 2018:
|
|
|
|
|
|
|
|
|
|
|
|
\cite[entry][Moerman18].
|
|
|
|
\cite[entry][Moerman18].
|
|
|
|
\stopdescription
|
|
|
|
\stopdescription
|
|
|
|