Archived
1
Fork 0

Hoofdstukken 3 (applying...) en 4 (minimal...) doorlopen.

This commit is contained in:
Joshua Moerman 2018-11-08 16:39:33 +01:00
parent 60d026f7f4
commit deed1efa90
3 changed files with 54 additions and 60 deletions

View file

@ -34,16 +34,16 @@ To the best of our knowledge, this is the first paper in which active automata l
Once they have high-level models of the behaviour of software components, software engineers can construct better software in less time. Once they have high-level models of the behaviour of software components, software engineers can construct better software in less time.
A key problem in practice, however, is the construction of models for existing software components, for which no or only limited documentation is available. A key problem in practice, however, is the construction of models for existing software components, for which no or only limited documentation is available.
The construction of models from observations of component behaviour can be performed using regular inference (aka automata learning) techniques \citep[DBLP:journals/iandc/Angluin87, Higuera10, DBLP:conf/sfm/SteffenHM11]. The construction of models from observations of component behaviour can be performed using regular inference -- also known as automata learning (see \citenp[DBLP:journals/iandc/Angluin87, Higuera10, DBLP:conf/sfm/SteffenHM11]).
The most efficient such techniques use the set-up of \emph{active learning,} illustrated in \in{Figure}[fig:learning], in which a \quotation{learner} has the task to learn a model of a system by actively asking questions to a \quotation{teacher}. The most efficient such techniques use the set-up of \emph{active learning,} illustrated in \in{Figure}[fig:learning], in which a \quotation{learner} has the task to learn a model of a system by actively asking questions to a \quotation{teacher}.
\startplacefigure \startplacefigure
[title={Active learning of reactive systems}, [title={Active learning of reactive systems.},
reference=fig:learning] reference=fig:learning]
\externalfigure[Learner.pdf][width=0.6\textwidth] \externalfigure[Learner.pdf][width=0.6\textwidth]
\stopplacefigure \stopplacefigure
The core of the teacher is a \emph{System Under Test (SUT),} a reactive system to which one can apply inputs and whose outputs one may observe. The core of the teacher is a \emph{System Under Test} (SUT), a reactive system to which one can apply inputs and whose outputs one may observe.
The learner interacts with the SUT to infer a model by sending inputs and observing the resulting outputs (\quotation{membership queries}). The learner interacts with the SUT to infer a model by sending inputs and observing the resulting outputs (\quotation{membership queries}).
In order to find out whether an inferred model is correct, the learner may pose an \quotation{equivalence query}. In order to find out whether an inferred model is correct, the learner may pose an \quotation{equivalence query}.
The teacher uses a model-based testing (MBT) tool to try and answer such queries: The teacher uses a model-based testing (MBT) tool to try and answer such queries:
@ -54,10 +54,10 @@ Based on such a counterexample, the learner may then construct an improved hypot
Hence, the task of the learner is to collect data by interacting with the teacher and to formulate hypotheses, and the task of the MBT tool is to establish the validity of these hypotheses. Hence, the task of the learner is to collect data by interacting with the teacher and to formulate hypotheses, and the task of the MBT tool is to establish the validity of these hypotheses.
It is important to note that it may occur that an SUT passes the test for an hypothesis, even though this hypothesis is not valid. It is important to note that it may occur that an SUT passes the test for an hypothesis, even though this hypothesis is not valid.
Triggered by various theoretical and practical results, see e.g. the work by \citet[DBLP:conf/fase/BergGJLRS05, DBLP:conf/fmco/Leucker06, DBLP:journals/sttt/RaffeltSBM09, Aarts14, DBLP:journals/jlp/CasselHJMS15, DBLP:conf/tacas/MertenHSCJ12, DBLP:conf/vmcai/HowarSJC12], there is a fast-growing interest in automata learning technology. Triggered by various theoretical and practical results, see for instance the work by \citet[DBLP:conf/fase/BergGJLRS05, DBLP:conf/fmco/Leucker06, DBLP:journals/sttt/RaffeltSBM09, Aarts14, DBLP:journals/jlp/CasselHJMS15, DBLP:conf/tacas/MertenHSCJ12, DBLP:conf/vmcai/HowarSJC12], there is a fast-growing interest in automata learning technology.
In recent years, automata learning has been applied successfully, e.g., to regression testing of telecommunication systems \citep[DBLP:conf/cav/HungarNS03], checking conformance of communication protocols to a reference implementation \citep[DBLP:journals/ml/AartsKTVV14], finding bugs in Windows and Linux implementations of TCP \citep[DBLP:conf/fmics/Fiterau-BrosteanJV14], analysis of botnet command and control protocols \citep[DBLP:conf/ccs/ChocSS10], and integration testing \citep[DBLP:conf/forte/LiGS06, DBLP:conf/pts/GrozLPS08]. In recent years, automata learning has been applied successfully, e.g., to regression testing of telecommunication systems \citep[DBLP:conf/cav/HungarNS03], checking conformance of communication protocols to a reference implementation \citep[DBLP:journals/ml/AartsKTVV14], finding bugs in Windows and Linux implementations of TCP \citep[DBLP:conf/fmics/Fiterau-BrosteanJV14], analysis of botnet command and control protocols \citep[DBLP:conf/ccs/ChocSS10], and integration testing \citep[DBLP:conf/forte/LiGS06, DBLP:conf/pts/GrozLPS08].
In this paper, we explore whether LearnLib \citep[DBLP:journals/sttt/RaffeltSBM09], a state-of-the-art automata learning tool, is able to learn a model of the Engine Status Manager (ESM), a piece of control software that is used in many printers and copiers of Oc\'{e}. In this chapter, we explore whether LearnLib by \citet[DBLP:journals/sttt/RaffeltSBM09], a state-of-the-art automata learning tool, is able to learn a model of the Engine Status Manager (ESM), a piece of control software that is used in many printers and copiers of Oc\'{e}.
Software components like the ESM can be found in many embedded systems in one form or another. Software components like the ESM can be found in many embedded systems in one form or another.
Being able to retrieve models of such components automatically is potentially very useful. Being able to retrieve models of such components automatically is potentially very useful.
For instance, if the software is fixed or enriched with new functionality, one may use a learned model for regression testing. For instance, if the software is fixed or enriched with new functionality, one may use a learned model for regression testing.
@ -78,10 +78,10 @@ Moreover, the \cpp{} code contained in the actions of the transitions also creat
LearnLib has been used to learn models with tens of thousands of states by \citet[DBLP:journals/sttt/RaffeltMSM09], and therefore we expected that it would be easy to learn a model for the ESM. LearnLib has been used to learn models with tens of thousands of states by \citet[DBLP:journals/sttt/RaffeltMSM09], and therefore we expected that it would be easy to learn a model for the ESM.
However, finding counterexamples for incorrect hypotheses turned out to be challenging due to the large number of 77 inputs. However, finding counterexamples for incorrect hypotheses turned out to be challenging due to the large number of 77 inputs.
The test algorithms implemented in LearnLib, such as random testing, the W-method by \citet[DBLP:journals/tse/Chow78, Vasilevskii73] and the Wp-method by \citet[DBLP:journals/tse/FujiwaraBKAG91], failed to deliver counterexamples within an acceptable time. The test algorithms implemented in LearnLib, such as random testing, the W-method by \citet[DBLP:journals/tse/Chow78, Vasilevskii73] and the Wp-method by \citet[DBLP:journals/tse/FujiwaraBKAG91], failed to deliver counterexamples within an acceptable time.
Automata learning techniques have been successfully applied to case studies in which the total number of input symbols is much larger, but in these cases it was possible to reduce the number of inputs to a small number ($< 10$) using abstraction techniques \citep[DBLP:journals/fmsd/AartsJUV15, DBLP:conf/vmcai/HowarSM11]. Automata learning techniques have been successfully applied to case studies in which the total number of input symbols is much larger, but in these cases it was possible to reduce the number of inputs to a small number (less than 10) using abstraction techniques \citep[DBLP:journals/fmsd/AartsJUV15, DBLP:conf/vmcai/HowarSM11].
In the case of ESM, use of abstraction techniques only allowed us to reduce the original 156 concrete actions to 77 abstract actions. In the case of ESM, use of abstraction techniques only allowed us to reduce the original 156 concrete actions to 77 abstract actions.
We therefore implemented and extension of the algorithm of \citet[DBLP:journals/tc/LeeY94] for computing an adaptive distinguishing sequence. We therefore implemented an extension of an algorithm of \citet[DBLP:journals/tc/LeeY94] for computing adaptive distinguishing sequences.
Even when an adaptive distinguishing sequence does not exist, Lee \& Yannakakis' algorithm produces an adaptive sequence that \quotation{almost} identifies states. Even when an adaptive distinguishing sequence does not exist, Lee \& Yannakakis' algorithm produces an adaptive sequence that \quotation{almost} identifies states.
In combination with a standard algorithm for computing separating sequences for pairs of states, we managed to verify states with on average 3 test queries and to learn a model of the ESM with 77 inputs and 3.410 states. In combination with a standard algorithm for computing separating sequences for pairs of states, we managed to verify states with on average 3 test queries and to learn a model of the ESM with 77 inputs and 3.410 states.
We also constructed a model directly from the ESM software and established equivalence with the learned model. We also constructed a model directly from the ESM software and established equivalence with the learned model.
@ -91,8 +91,8 @@ Preliminary evidence suggests that our adaptation of Lee \& Yannakakis' algorith
During recent years most researchers working on active automata learning focused their efforts on efficient algorithms and tools for the construction of hypothesis models. During recent years most researchers working on active automata learning focused their efforts on efficient algorithms and tools for the construction of hypothesis models.
Our work shows that if we want to further scale automata learning to industrial applications, we also need better algorithms for finding counterexamples for incorrect hypotheses. Our work shows that if we want to further scale automata learning to industrial applications, we also need better algorithms for finding counterexamples for incorrect hypotheses.
Following \citet[DBLP:conf/fase/BergGJLRS05], our work shows that the context of automata learning provides both new challenges and new opportunities for the application of testing algorithms. Following \citet[DBLP:conf/fase/BergGJLRS05], our work shows that the context of automata learning provides both new challenges and new opportunities for the application of testing algorithms.
All the models for the ESM case study together with the learning and testing statistics are available at \todo{http://www.mbsd.cs.ru.nl/publications/papers/fvaan/ESM/}, as a benchmark for both the automata learning and testing communities. All the models for the ESM case study together with the learning and testing statistics are available at \href{http://www.mbsd.cs.ru.nl/publications/papers/fvaan/ESM/}, as a benchmark for both the automata learning and testing communities.
\todo{Link naar nieuwe benchmark site} It is now also included in the automata wiki at \href{http://automata.cs.ru.nl/}.
\startsection \startsection
@ -115,7 +115,7 @@ An \emph{engine} refers to the printing or scanning part of a printer or copier.
Other products can be connected to an engine that pre- or postprocess the paper, for example a cutter, folder, stacker or stapler. Other products can be connected to an engine that pre- or postprocess the paper, for example a cutter, folder, stacker or stapler.
\startplacefigure \startplacefigure
[title={Global overview of the engine software}, [title={Global overview of the engine software.},
reference=fig:esra-overview] reference=fig:esra-overview]
\externalfigure[ESRAOverview.pdf][width=0.5\textwidth] \externalfigure[ESRAOverview.pdf][width=0.5\textwidth]
\stopplacefigure \stopplacefigure
@ -129,7 +129,7 @@ The \emph{managers} communicate with the controller using the \emph{external int
These adapters translate the external protocols to internal protocols. These adapters translate the external protocols to internal protocols.
The managers manage the different functions of the engine. The managers manage the different functions of the engine.
They are divided by the different functionalities such as status control, print or scan actions or error handling they implement. They are divided by the different functionalities such as status control, print or scan actions or error handling they implement.
In order to do this a manager may communicate with other managers and functions. In order to do this, a manager may communicate with other managers and functions.
% %
A \emph{function} is responsible for a specific set of hardware components. A \emph{function} is responsible for a specific set of hardware components.
It translates commands from the managers to the function hardware and reports the status and other information of the function hardware to the managers. It translates commands from the managers to the function hardware and reports the status and other information of the function hardware to the managers.
@ -148,7 +148,7 @@ Moreover, it informs all its connected clients (managers or the controller) of s
Finally, it handles status transitions when an error occurs. Finally, it handles status transitions when an error occurs.
\startplacefigure \startplacefigure
[title={Overview of the managers and clients connected to the ESM}, [title={Overview of the managers and clients connected to the ESM.},
reference=fig:esm-protocols] reference=fig:esm-protocols]
\externalfigure[ESMProtocols.pdf][width=0.5\textwidth] \externalfigure[ESMProtocols.pdf][width=0.5\textwidth]
\stopplacefigure \stopplacefigure
@ -197,7 +197,7 @@ These messages are placed in a queue to be processed later.
[title={The ESM state diagram}] [title={The ESM state diagram}]
\startplacefigure \startplacefigure
[title={Top states and transitions of the ESM}, [title={Top states and transitions of the ESM.},
reference=fig:esm-states] reference=fig:esm-states]
\externalfigure[ESMStates.pdf][width=\textwidth] \externalfigure[ESMStates.pdf][width=\textwidth]
\stopplacefigure \stopplacefigure
@ -266,7 +266,7 @@ An extra capsule was created in RRRT which connects to the ports defined by the
This capsule opened a TCP connection to LearnLib. This capsule opened a TCP connection to LearnLib.
Inputs and outputs are translated to and from a string format and sent over the connection. Inputs and outputs are translated to and from a string format and sent over the connection.
Before each membership query, the learner needs to bring the SUT back to its initial state. Before each membership query, the learner needs to bring the SUT back to its initial state.
This means that LearnLib needs a way to reset the SUT. In other words, LearnLib needs a way to reset the SUT.
Some inputs and outputs sent to and from the ESM carry parameters. Some inputs and outputs sent to and from the ESM carry parameters.
These parameters are enumerations of statuses, or integers bounded by the number of functions connected to the ESM. These parameters are enumerations of statuses, or integers bounded by the number of functions connected to the ESM.
@ -283,8 +283,8 @@ After the inputs are grouped a total of 77 inputs remain when learning the ESM u
It was not immediately obvious how to model the ESM by a Mealy machine, since some inputs trigger no output, whereas other inputs trigger several outputs. It was not immediately obvious how to model the ESM by a Mealy machine, since some inputs trigger no output, whereas other inputs trigger several outputs.
In order to resolve this, we benefited from the run-to-completion execution model used in RRRT. In order to resolve this, we benefited from the run-to-completion execution model used in RRRT.
Whenever an input is sent all the outputs are collected until quiescence is detected. Whenever an input is sent, all the outputs are collected until quiescence is detected.
Next all the outputs are concatenated and are sent to LearnLib as a single aggregated output. Next, all the outputs are concatenated and are sent to LearnLib as a single aggregated output.
% %
In model-based testing, quiescence is usually detected by waiting for a fixed time-out period. In model-based testing, quiescence is usually detected by waiting for a fixed time-out period.
However, this causes the system to be mostly idle while waiting for the time-out, which is inefficient. However, this causes the system to be mostly idle while waiting for the time-out, which is inefficient.
@ -315,10 +315,7 @@ If for example the length of the counterexample needs to be at least 6 inputs to
An average test query takes around 1 ms, so it would take about 7 years to execute these test queries. An average test query takes around 1 ms, so it would take about 7 years to execute these test queries.
\startsubsubsection \startdescription{Augmented DS-method\footnote{This was later called the \emph{hybrid ADS-method}.}.}
[title={Augmented DS-method.},
reference=sec:randomPrefix]
In order to reduce the number of tests, \citet[DBLP:journals/tse/Chow78] and In order to reduce the number of tests, \citet[DBLP:journals/tse/Chow78] and
\citet[Vasilevskii73] pioneered the so called W-method. \citet[Vasilevskii73] pioneered the so called W-method.
In their framework a test query consists of a prefix $p$ bringing the SUT to a specific state, a (random) middle part $m$ and a suffix $s$ assuring that the SUT is in the appropriate state. In their framework a test query consists of a prefix $p$ bringing the SUT to a specific state, a (random) middle part $m$ and a suffix $s$ assuring that the SUT is in the appropriate state.
@ -349,7 +346,7 @@ In these cases the incomplete result from the LY algorithm still contained a lot
As an example we show an incomplete adaptive distinguishing sequence for one of the hypothesis in \in{Figure}[fig:distinguishing-sequence]. As an example we show an incomplete adaptive distinguishing sequence for one of the hypothesis in \in{Figure}[fig:distinguishing-sequence].
When we apply the input sequence I46 I6.0 I10 I19 I31.0 I37.3 I9.2 and observe outputs O9 O3.3 Q \ldots{} O28.0, we know for sure that the SUT was in state 788. When we apply the input sequence I46 I6.0 I10 I19 I31.0 I37.3 I9.2 and observe outputs O9 O3.3 Q \ldots{} O28.0, we know for sure that the SUT was in state 788.
Unfortunately not all paths lead to a singleton set. Unfortunately, not all paths lead to a singleton set.
When for instance we apply the sequence I46 I6.0 I10 and observe the outputs O9 O3.14 Q, we know for sure that the SUT was in one of the states 18, 133, 1287 or 1295. When for instance we apply the sequence I46 I6.0 I10 and observe the outputs O9 O3.14 Q, we know for sure that the SUT was in one of the states 18, 133, 1287 or 1295.
In these cases we have to perform more experiments and we resort to pairwise separating sequences. In these cases we have to perform more experiments and we resort to pairwise separating sequences.
@ -359,13 +356,10 @@ In our case, however, it greatly reduced the test suites.
Once we have our set of suffixes, which we call $Z$ now, our test algorithm works as follows. Once we have our set of suffixes, which we call $Z$ now, our test algorithm works as follows.
The algorithm first exhausts the set $P I^{\leq 1} Z$. The algorithm first exhausts the set $P I^{\leq 1} Z$.
If this does not provide a counterexample, we will randomly pick test queries from $P I^2 I^\ast Z$, where the algorithm samples uniformly from $P$, $I^2$ and $Z$ (if $Z$ contains more that $1$ sequence for the supposed state) and with a geometric distribution on $I^\ast$. If this does not provide a counterexample, we will randomly pick test queries from $P I^2 I^\ast Z$, where the algorithm samples uniformly from $P$, $I^2$ and $Z$ (if $Z$ contains more that $1$ sequence for the supposed state) and with a geometric distribution on $I^\ast$.
\stopdescription
\stopsubsubsection \startdescription{Sub-alphabet selection.}
\startsubsubsection
[title={Sub-alphabet selection.},
reference=sec:subaplha]
Using the above method the algorithm still failed to learn the ESM. Using the above method the algorithm still failed to learn the ESM.
By looking at the RRRT-based model we were able to see why the algorithm failed to learn. By looking at the RRRT-based model we were able to see why the algorithm failed to learn.
In the initialisation phase, the controller gives exceptional behaviour when providing a certain input eight times consecutively. In the initialisation phase, the controller gives exceptional behaviour when providing a certain input eight times consecutively.
@ -375,15 +369,15 @@ With this knowledge we could construct a single counterexample by hand by which
In order to automate this process, we defined a sub-alphabet of actions that are important during the initialisation phase of the controller. In order to automate this process, we defined a sub-alphabet of actions that are important during the initialisation phase of the controller.
This sub-alphabet will be used a bit more often than the full alphabet. We do this as follows. This sub-alphabet will be used a bit more often than the full alphabet. We do this as follows.
% %
We start testing with the alphabet which provided a counterexample for the previous hypothesis (for the first hypothesis we take the subalphabet). We start testing with the alphabet which provided a counterexample for the previous hypothesis (for the first hypothesis we take the sub-alphabet).
If no counterexample can be found within a specified query bound, then we repeat with the next alphabet. If no counterexample can be found within a specified query bound, then we repeat with the next alphabet.
If both alphabets do not produce a counterexample within the bound, the bound is increased by some factor and we repeat all. If both alphabets do not produce a counterexample within the bound, the bound is increased by some factor and we repeat all.
% %
This method only marginally increases the number of tests. This method only marginally increases the number of tests.
But it did find the right counterexample we first had to construct by hand. But it did find the right counterexample we first had to construct by hand.
\stopdescription
\stopsubsubsection
\stopsubsection \stopsubsection
\startsubsection \startsubsection
[title={Results}, [title={Results},
@ -396,9 +390,9 @@ The following list gives the most important statistics gathered during the learn
\startitemize \startitemize
\item The learned model has 3.410 states. \item The learned model has 3.410 states.
\item Altogether, 114 hypotheses were generated. \item Altogether, 114 hypotheses were generated.
\item The time needed for learning the final hypothesis was 8 hours, 26 minutes, and 19 seconds. \item The time needed for learning the final hypothesis was 8 h, 26 min, and 19 s.
\item 29.933.643 membership queries were required, with on average 35,77 inputs per query. \item 29.933.643 membership queries were posed (on average 35,77 inputs per query).
\item 30.629.711 test queries were required, with on average 29,06 inputs per query. \item 30.629.711 test queries were required (on average 29,06 inputs per query).
\stopitemize \stopitemize
@ -429,22 +423,22 @@ In the end we decided to implement the required model transformations ourselves.
\in{Figure}[fig:formats] displays the different formats for representing models that we used and the transformations between those formats. \in{Figure}[fig:formats] displays the different formats for representing models that we used and the transformations between those formats.
\startplacefigure \startplacefigure
[title={Formats for representing models and transformations between formats}, [title={Formats for representing models and transformations between formats.},
reference=fig:formats] reference=fig:formats]
\externalfigure[formats.pdf][width=0.7\textwidth] \externalfigure[formats.pdf][width=0.7\textwidth]
\stopplacefigure \stopplacefigure
We used the bisimulation checker of CADP \citep[DBLP:conf/tacas/GaravelLMS11] to check the equivalence of labelled transition system models in {\tt .aut} format. We used the bisimulation checker of CADP by \citet[DBLP:conf/tacas/GaravelLMS11] to check the equivalence of labelled transition system models in {\tt .aut} format.
The Mealy machine models learned by LearnLib are represented as {\tt .dot} files. The Mealy machine models learned by LearnLib are represented as {\tt .dot} files.
A small script converts these Mealy machines to labelled transition systems in {\tt .aut} format. A small script converts these Mealy machines to labelled transition systems in {\tt .aut} format.
We used the Uppaal \citep[DBLP:conf/qest/BehrmannDLHPYH06] tool as an editor for defining extended finite state machines (EFSM), represented as {\tt .xml} files. We used the Uppaal tool by \citet[DBLP:conf/qest/BehrmannDLHPYH06] as an editor for defining extended finite state machines (EFSM), represented as {\tt .xml} files.
A script developed in the ITALIA project (\todo{http://www.italia.cs.ru.nl/}) converts these EFSM models to LOTOS, and then CADP takes care of the conversion from LOTOS to the {\tt .aut} format. A script developed in the ITALIA project (\href{http://www.italia.cs.ru.nl/}) converts these EFSM models to LOTOS, and then CADP takes care of the conversion from LOTOS to the {\tt .aut} format.
The Uppaal syntax is not sufficiently expressive to directly encode the RRRT definition of the ESM, since this definition makes heavy use of UML \citep[OMGUML2] concepts such as state hierarchy and transitions from composite states, concepts which are not present in Uppaal. The Uppaal syntax is not sufficiently expressive to directly encode the RRRT definition of the ESM, since this definition makes heavy use of UML \citep[OMGUML2] concepts such as state hierarchy and transitions from composite states, concepts which are not present in Uppaal.
Using Uppaal would force us to duplicate many transitions and states. Using Uppaal would force us to duplicate many transitions and states.
We decided to manually create an intermediate hierarchical EFSM (HEFSM) model using the UML drawing tool PapyrusUML \citep[LanusseTEMGTSDT09]. We decided to manually create an intermediate hierarchical EFSM (HEFSM) model using the UML drawing tool PapyrusUML \citep[LanusseTEMGTSDT09].
The HEFSM model closely resembles the RRRT UML model, but many elements used in UML state machines are left out because they are not needed for modeling the ESM and complicate the transformation process. The HEFSM model closely resembles the RRRT UML model, but many elements used in UML state machines are left out because they are not needed for modelling the ESM and complicate the transformation process.
\stopsubsection \stopsubsection
@ -508,7 +502,7 @@ When flattening the statechart, we modified the guards of supertransitions to en
\startplacefigure \startplacefigure
[title={Example of supertransition transformation. On the left the original version. On the right the transformed version}, [title={Example of supertransition transformation. On the left the original version. On the right the transformed version},
list={Supertransition transformation}, list={Supertransition transformation.},
reference=fig:model-example2] reference=fig:model-example2]
\externalfigure[model-example2.pdf][width=0.4\textwidth] \externalfigure[model-example2.pdf][width=0.4\textwidth]
\stopplacefigure \stopplacefigure
@ -594,18 +588,18 @@ We have verified the equivalence of the learned model and the RRRT-based model b
reference=sec:future] reference=sec:future]
Using an extension of algorithm by \citet[DBLP:journals/tc/LeeY94] for adaptive distinguishing sequences, we succeeded to learn a Mealy machine model of a piece of widely used industrial control software. Using an extension of algorithm by \citet[DBLP:journals/tc/LeeY94] for adaptive distinguishing sequences, we succeeded to learn a Mealy machine model of a piece of widely used industrial control software.
Our extension of Lee \& Yannakakis' algorithm is rather obvious, but nevertheless it appears to be new. Our extension of Lee \& Yannakakis' algorithm is rather obvious, but nevertheless appears to be new.
Preliminary evidence suggests that it outperforms existing conformance testing algorithms. Preliminary evidence suggests that it outperforms existing conformance testing algorithms.
We are currently performing experiments in which we compare the new algorithm with other test algorithms on a number of realistic benchmarks. We are currently performing experiments in which we compare the new algorithm with other test algorithms on a number of realistic benchmarks.
There are several possibilities for extending the ESM case study. There are several possibilities for extending the ESM case study.
To begin with, one could try to learn a model of the ESM with more than one function. To begin with, one could try to learn a model of the ESM with more than one function.
Another interesting possibility would be to learn models of the EHM, ACM and other managers connected to the ESM. Using these models some of the properties discussed by \citet[Ploeger05] could be verified at a more detailed level. Another interesting possibility would be to learn models of the EHM, ACM, and other managers connected to the ESM. Using these models some of the properties discussed by \citet[Ploeger05] could be verified at a more detailed level.
We expect that the combination of LearnLib with the extended Lee \& Yannakakis algorithm can be applied to learn models of many other software components. We expect that the combination of LearnLib with the extended Lee \& Yannakakis algorithm can be applied to learn models of many other software components.
In the specific case study described in this article, we know that our learning algorithm has succeeded to learn the correct model, since we established equivalence with a reference model that was constructed independently from the RRRT model of the ESM software. In the specific case study described in this article, we know that our learning algorithm has succeeded to learn the correct model, since we established equivalence with a reference model that was constructed independently from the RRRT model of the ESM software.
In the absence of a reference model, we can never guarantee that the actual system behaviour conforms to a learned model. In the absence of a reference model, we can never guarantee that the actual system behaviour conforms to a learned model.
In order to deal with this problem, it is important to define metrics that quantify the difference (or distance) between a hypothesis and a correct model of the SUT, and to develop test generation algorithms that guarantee an upper bound on this difference. In order to deal with this problem, it is important to define metrics that quantify the difference (or distance) between a hypothesis and a correct model of the SUT and to develop test generation algorithms that guarantee an upper bound on this difference.
Preliminary work in this area is reported by \citet[DBLP:conf/icgi/SmetsersVVV14]. Preliminary work in this area is reported by \citet[DBLP:conf/icgi/SmetsersVVV14].

View file

@ -18,7 +18,7 @@ Finding minimal separating sequences for all pairs of inequivalent states in a f
is a classic problem in automata theory. Sets of minimal separating sequences, for instance, play a is a classic problem in automata theory. Sets of minimal separating sequences, for instance, play a
central role in many conformance testing methods. Moore has already outlined a partition refinement central role in many conformance testing methods. Moore has already outlined a partition refinement
algorithm that constructs such a set of sequences in $\bigO(mn)$ time, where $m$ is the number of algorithm that constructs such a set of sequences in $\bigO(mn)$ time, where $m$ is the number of
transitions and $n$ is the number of states. In this paper, we present an improved algorithm based transitions and $n$ is the number of states. In this chapter, we present an improved algorithm based
on the minimisation algorithm of Hopcroft that runs in $\bigO(m \log n)$ time. The efficiency of our on the minimisation algorithm of Hopcroft that runs in $\bigO(m \log n)$ time. The efficiency of our
algorithm is empirically verified and compared to the traditional algorithm. algorithm is empirically verified and compared to the traditional algorithm.
\stopabstract \stopabstract
@ -29,7 +29,7 @@ algorithm is empirically verified and compared to the traditional algorithm.
In diverse areas of computer science and engineering, systems can be modelled by \emph{finite state machines} (FSMs). In diverse areas of computer science and engineering, systems can be modelled by \emph{finite state machines} (FSMs).
One of the cornerstones of automata theory is minimisation of such machines (and many variation thereof). One of the cornerstones of automata theory is minimisation of such machines -- and many variation thereof.
In this process one obtains an equivalent minimal FSM, where states are different if and only if they have different behaviour. In this process one obtains an equivalent minimal FSM, where states are different if and only if they have different behaviour.
The first to develop an algorithm for minimisation was \citet[Moore56]. The first to develop an algorithm for minimisation was \citet[Moore56].
His algorithm has a time complexity of $\bigO(mn)$, where $m$ is the number of transitions, and $n$ is the number of states of the FSM. His algorithm has a time complexity of $\bigO(mn)$, where $m$ is the number of transitions, and $n$ is the number of states of the FSM.
@ -38,7 +38,7 @@ Later, \citet[Hopcroft71] improved this bound to $\bigO(m \log n)$.
Minimisation algorithms can be used as a framework for deriving a set of \emph{separating sequences} that show \emph{why} states are inequivalent. Minimisation algorithms can be used as a framework for deriving a set of \emph{separating sequences} that show \emph{why} states are inequivalent.
The separating sequences in Moore's framework are of minimal length \citep[Gill62]. The separating sequences in Moore's framework are of minimal length \citep[Gill62].
Obtaining minimal separating sequences in Hopcroft's framework, however, is a non-trivial task. Obtaining minimal separating sequences in Hopcroft's framework, however, is a non-trivial task.
In this paper, we present an algorithm for finding such minimal separating sequences for all pairs of inequivalent states of a FSM in $\bigO(m \log n)$ time. In this chapter, we present an algorithm for finding such minimal separating sequences for all pairs of inequivalent states of a FSM in $\bigO(m \log n)$ time.
Coincidentally, \citet[DBLP:conf/popl/BonchiP13] recently introduced a new algorithm for the equally fundamental problem of proving equivalence of states in non-deterministic automata. Coincidentally, \citet[DBLP:conf/popl/BonchiP13] recently introduced a new algorithm for the equally fundamental problem of proving equivalence of states in non-deterministic automata.
As both their and our work demonstrate, even classical problems in automata theory can still offer surprising research opportunities. As both their and our work demonstrate, even classical problems in automata theory can still offer surprising research opportunities.
@ -57,9 +57,10 @@ We define a \emph{FSM} as a Mealy machine $M = (I, O, S, \delta, \lambda)$, wher
The functions $\delta$ and $\lambda$ are naturally extended to $\delta \colon S \times I^{*} \to S$ and $\lambda \colon S \times I^{*} \to O^{*}$. The functions $\delta$ and $\lambda$ are naturally extended to $\delta \colon S \times I^{*} \to S$ and $\lambda \colon S \times I^{*} \to O^{*}$.
Moreover, given a set of states $S' \subseteq S$ and a sequence $x \in I^{*}$, we define $\delta(S', x) = \{\delta(s, x) \mid s \in S'\}$ and $\lambda(S', x) = \{\lambda(s, x) \mid s \in S'\}$. Moreover, given a set of states $S' \subseteq S$ and a sequence $x \in I^{*}$, we define $\delta(S', x) = \{\delta(s, x) \mid s \in S'\}$ and $\lambda(S', x) = \{\lambda(s, x) \mid s \in S'\}$.
The \emph{inverse transition function} $\delta^{-1} \colon S \times I \to \mathcal{P}(S)$ is defined as $\delta^{-1}(s, a) = \{t \in S \mid \delta(t, a) = s\}$. The \emph{inverse transition function} $\delta^{-1} \colon S \times I \to \mathcal{P}(S)$ is defined as $\delta^{-1}(s, a) = \{t \in S \mid \delta(t, a) = s\}$.
Observe that Mealy machines are deterministic and input-enabled (i.e., complete) by definition. Observe that Mealy machines are deterministic and input-enabled (i.e., complete) by definition.
The initial state is not specified because it is of no importance in what follows. The initial state is not specified because it is of no importance in what follows.
For the remainder of this paper we fix a machine $M = (I, O, S, \delta, \lambda)$. For the remainder of this chapter we fix a machine $M = (I, O, S, \delta, \lambda)$.
We use $n$ to denote its number of states, that is $n = |S|$, and $m$ to denote its number of transitions, that is $m = |S| \times |I|$. We use $n$ to denote its number of states, that is $n = |S|$, and $m$ to denote its number of transitions, that is $m = |S| \times |I|$.
\startdefinition[reference=def:equivalent] \startdefinition[reference=def:equivalent]
@ -162,11 +163,11 @@ We say a splitting tree $T$ is valid (resp. acceptable, stable, complete) if $P(
A leaf can be expanded in one of two ways, corresponding to the two ways a block can be split. A leaf can be expanded in one of two ways, corresponding to the two ways a block can be split.
Given a leaf $u$ and its block $B = l(u)$ we define the following two splits: Given a leaf $u$ and its block $B = l(u)$ we define the following two splits:
\description{split-output} \description{(split-output)}
Suppose there is an input $a$ such that $B$ can be split w.r.t output after $a$. Suppose there is an input $a$ such that $B$ can be split w.r.t output after $a$.
Then we set $\sigma(u) = a$, and we create a node for each subset of $B$ that produces the same output $x$ on $a$. These nodes are set to be children of $u$. Then we set $\sigma(u) = a$, and we create a node for each subset of $B$ that produces the same output $x$ on $a$. These nodes are set to be children of $u$.
\description{split-state} \description{(split-state)}
Suppose there is an input $a$ such that $B$ can be split w.r.t. the state after $a$. Suppose there is an input $a$ such that $B$ can be split w.r.t. the state after $a$.
Then instead of splitting $B$ as described before, we proceed as follows. Then instead of splitting $B$ as described before, we proceed as follows.
First, we locate the node $v = \lca(\delta(B, a))$. First, we locate the node $v = \lca(\delta(B, a))$.
@ -200,7 +201,7 @@ It follows directly that states are equivalent if and only if they are in the sa
\stopplacealgorithm \stopplacealgorithm
\startexample[reference=ex:tree] \startexample[reference=ex:tree]
\in{Figure}[fig:automaton-tree] shows a FSM and a complete splitting tree for it. \in{Figure}[fig:automaton-tree] shows an FSM and a complete splitting tree for it.
This tree is constructed by \in{Algorithm}[alg:tree] as follows. This tree is constructed by \in{Algorithm}[alg:tree] as follows.
First, the root node is labelled by $\{s_0, \ldots, s_5\}$. First, the root node is labelled by $\{s_0, \ldots, s_5\}$.
The even and uneven states produce different outputs after $a$, hence the root node is split. The even and uneven states produce different outputs after $a$, hence the root node is split.
@ -368,7 +369,7 @@ In both cases we have shown that $|x| \geq k+1$ as required.
\stopplacealgorithm \stopplacealgorithm
\startexample \startexample
\in{Figure}{(a)}[fig:splitting-tree] shows a stable and minimal splitting tree $T$ for the machine in \in{Figure}[fig:automaton-tree]. \in{Figure}{a}[fig:splitting-tree] shows a stable and minimal splitting tree $T$ for the machine in \in{Figure}[fig:automaton-tree].
This tree is constructed by \in{Algorithm}[alg:minimaltree] as follows. This tree is constructed by \in{Algorithm}[alg:minimaltree] as follows.
It executes the same as \in{Algorithm}[alg:tree] until we consider the node labelled $\{s_0, s_2\}$. It executes the same as \in{Algorithm}[alg:tree] until we consider the node labelled $\{s_0, s_2\}$.
At this point $k = 1$. At this point $k = 1$.
@ -429,9 +430,9 @@ and every block includes an indication of the slice containing its label and a p
\stopsection \stopsection
\startsection \startsection
[title={Optimizing the Algorithm}] [title={Optimising the Algorithm}]
In this section, we present an improvement on \in{Algorithm}[alg:increase-k] that uses two ideas described by \citet[Hopcroft71] in his seminal paper on minimizing finite automata: In this section, we present an improvement on \in{Algorithm}[alg:increase-k] that uses two ideas described by \citet[Hopcroft71] in his seminal paper on minimising finite automata:
\emph{using the inverse transition set}, and \emph{processing the smaller half}. \emph{using the inverse transition set}, and \emph{processing the smaller half}.
The algorithm that we present is a drop-in replacement, so that \in{Algorithm}[alg:minimaltree] stays the same except for some bookkeeping. The algorithm that we present is a drop-in replacement, so that \in{Algorithm}[alg:minimaltree] stays the same except for some bookkeeping.
This way, we can establish correctness of the new algorithms more easily. This way, we can establish correctness of the new algorithms more easily.
@ -451,7 +452,7 @@ The set $C(u)$ is constructed according to (split-state), where each child $w \i
\placeformula[eq:split-fw] \placeformula[eq:split-fw]
\startformula\startalign \startformula\startalign
\NC l(u_w) \NC = \{ s \in l(u) \,|\, \delta(s, a) \in l(w) \} \NR \NC l(u_w) \NC = \{ s \in l(u) \,|\, \delta(s, a) \in l(w) \} \NR
\NC \NC = l(u) \cap \delta^{-1}(l(w), a) \NR \NC \NC = l(u) \cap \delta^{-1}(l(w), a). \NR
\stopalign\stopformula \stopalign\stopformula
In order to perform the same splits in each layer as before, we maintain a list $L_{k}$ of $k$-candidates. In order to perform the same splits in each layer as before, we maintain a list $L_{k}$ of $k$-candidates.
We keep the list in order of the construction of nodes, because when we split w.r.t. a child of a node $u$ before we split w.r.t. $u$, the result is not well-defined. We keep the list in order of the construction of nodes, because when we split w.r.t. a child of a node $u$ before we split w.r.t. $u$, the result is not well-defined.
@ -576,7 +577,7 @@ This way, we assure constant time lookup of the leaf for a state, and we can upd
\stopdescription \stopdescription
\startdescription{Largest child.} \startdescription{Largest child.}
For finding the largest child, we maintain counts for the temporary children and a current biggest one. For finding the largest child, we maintain counts for the temporary children and a current biggest one.
On finalizing the temporary children we store (a reference to) the biggest child in the node, so that we can skip this node later in the algorithm. On finalising the temporary children we store (a reference to) the biggest child in the node, so that we can skip this node later in the algorithm.
\stopdescription \stopdescription
\startdescription{Storing sequences.} \startdescription{Storing sequences.}
The operation on \inline[line:prepend] is done in constant time by using a linked list. The operation on \inline[line:prepend] is done in constant time by using a linked list.
@ -591,20 +592,19 @@ The operation on \inline[line:prepend] is done in constant time by using a linke
A splitting tree can be used to extract relevant information for two classical test generation methods: A splitting tree can be used to extract relevant information for two classical test generation methods:
a \emph{characterisation set} for the W-method and a \emph{separating family} for the HSI-method. a \emph{characterisation set} for the W-method and a \emph{separating family} for the HSI-method.
For an introduction and comparison of FSM-based test generation methods we refer to \citet[DBLP:journals/infsof/DorofeevaEMCY10]. For an introduction and comparison of FSM-based test generation methods we refer to \citet[DBLP:journals/infsof/DorofeevaEMCY10] or \in{Chapter}[chap:test-methods].
\startdefinition[def:cset] \startdefinition[def:cset]
A set $W \subset I^{\ast}$ is called a \emph{characterisation set} if for every pair of inequivalent states $s, t$ there is a sequence $w \in W$ such that $\lambda(s, w) \neq \lambda(t, w)$. A set $W \subset I^{\ast}$ is called a \emph{characterisation set} if for every pair of inequivalent states $s, t$ there is a sequence $w \in W$ such that $\lambda(s, w) \neq \lambda(t, w)$.
\stopdefinition \stopdefinition
\startlemma[reference=lem:cset] \startlemma[reference=lem:cset]
Let $T$ be a complete splitting tree, then $\{\sigma(u)|u \in T\}$ is a characterisation set. Let $T$ be a complete splitting tree, then the set $\{\sigma(u) \mid u \in T\}$ is a characterisation set.
\stoplemma \stoplemma
\startproof \startproof
Let $W = \{\sigma(u) \mid u \in T\}$. Let $W = \{\sigma(u) \mid u \in T\}$.
Let $s, t \in S$ be inequivalent states, then by completeness $s$ and $t$ are contained in different leaves of $T$. Let $s, t \in S$ be inequivalent states, then by completeness $s$ and $t$ are contained in different leaves of $T$.
Hence $u=lca(s, t)$ exists and $\sigma(u)$ separates $s$ and $t$. Hence $u=lca(s, t)$ exists and $\sigma(u) \in W$ separates $s$ and $t$.
Furthermore, $\sigma(u) \in W$.
This shows that $W$ is a characterisation set. This shows that $W$ is a characterisation set.
\stopproof \stopproof
@ -640,7 +640,7 @@ The separating family can be constructed from the splitting tree by collecting a
Since we have to do this for every state, this takes $\bigO(n^2)$ time. Since we have to do this for every state, this takes $\bigO(n^2)$ time.
\stopproof \stopproof
For test generation one moreover needs a transition cover. For test generation one also needs a transition cover.
This can be constructed in linear time with a breadth first search. This can be constructed in linear time with a breadth first search.
We conclude that we can construct all necessary information for the W-method in time $\bigO(m \log n)$ as opposed the the $\bigO(mn)$ algorithm used by \citet[DBLP:journals/infsof/DorofeevaEMCY10]. We conclude that we can construct all necessary information for the W-method in time $\bigO(m \log n)$ as opposed the the $\bigO(mn)$ algorithm used by \citet[DBLP:journals/infsof/DorofeevaEMCY10].
Furthermore, we conclude that we can construct all the necessary information for the HSI-method in time $\bigO(m \log n + n^2)$, improving on the the reported bound $\bigO(mn^3)$ by \citet[DBLP:journals/cj/HieronsT15]. Furthermore, we conclude that we can construct all the necessary information for the HSI-method in time $\bigO(m \log n + n^2)$, improving on the the reported bound $\bigO(mn^3)$ by \citet[DBLP:journals/cj/HieronsT15].
@ -663,7 +663,7 @@ The running times in seconds on an Intel Core i5-2500 are plotted in \in{Figure}
We note that different slopes imply different complexity classes, since both axes have a logarithmic scale. We note that different slopes imply different complexity classes, since both axes have a logarithmic scale.
\startplacefigure \startplacefigure
[title={Running time in seconds of \in{Algorithm}[alg:increase-k] (grey) and \in{Algorithm}[alg:increase-k-v3] (black).}, [title={Running time in seconds of \in{Algorithm}[alg:increase-k] in grey and \in{Algorithm}[alg:increase-k-v3] in black.},
list={Running time of \in{Algorithms}[alg:increase-k] and \in{}[alg:increase-k-v3].}, list={Running time of \in{Algorithms}[alg:increase-k] and \in{}[alg:increase-k-v3].},
reference=fig:results] reference=fig:results]
\startcombination[2*1] \startcombination[2*1]
@ -689,7 +689,7 @@ We note that different slopes imply different complexity classes, since both axe
\startsection \startsection
[title={Conclusion}] [title={Conclusion}]
In this paper we have described an efficient algorithm for constructing a set of minimal-length sequences that pairwise distinguish all states of a finite state machine. In this chapter we have described an efficient algorithm for constructing a set of minimal-length sequences that pairwise distinguish all states of a finite state machine.
By extending Hopcroft's minimisation algorithm, we are able to construct such sequences in $\bigO(m \log n)$ for a machine with $m$ transitions and $n$ states. By extending Hopcroft's minimisation algorithm, we are able to construct such sequences in $\bigO(m \log n)$ for a machine with $m$ transitions and $n$ states.
This improves on the traditional $\bigO(mn)$ method that is based on the classic algorithm by Moore. This improves on the traditional $\bigO(mn)$ method that is based on the classic algorithm by Moore.
As an upshot, the sequences obtained form a characterisation set and a separating family, which play a crucial in conformance testing. As an upshot, the sequences obtained form a characterisation set and a separating family, which play a crucial in conformance testing.

View file

@ -35,7 +35,7 @@ This chapter is based on the following publication:
\noindent\cite[entry][#1]} \noindent\cite[entry][#1]}
\definedescription[description] \definedescription[description]
[headstyle=bold, alternative=serried, width=fit] [headstyle=bold, alternative=serried, width=fit, indenting=yes]
\definedescription[step] \definedescription[step]
[headstyle=bold, alternative=serried, width=fit, text={Step }] [headstyle=bold, alternative=serried, width=fit, text={Step }]