From ad58d6ac07cf60f7dd93ecd756a08915c2f90d20 Mon Sep 17 00:00:00 2001 From: Joshua Moerman Date: Mon, 30 Mar 2015 11:40:35 +0200 Subject: [PATCH] Adds some documentation in latex --- .gitignore | 19 ++++- docs/explanation.tex | 168 +++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 185 insertions(+), 2 deletions(-) create mode 100644 docs/explanation.tex diff --git a/.gitignore b/.gitignore index 6ebfe63..99d92c9 100644 --- a/.gitignore +++ b/.gitignore @@ -1,8 +1,23 @@ +# mac .DS_Store -*.dot -*.png +# qt creator/cmake +*.user +build + +# latex +*.aux +*.tdo +*.log + +# output *.pdf +*.png + +# data +.dot + +# our own tools *_seq *splitting_tree *test_suite diff --git a/docs/explanation.tex b/docs/explanation.tex new file mode 100644 index 0000000..7b0c047 --- /dev/null +++ b/docs/explanation.tex @@ -0,0 +1,168 @@ +\documentclass[envcountsame]{llncs} + +\usepackage{amsmath} +\usepackage[backgroundcolor=white]{todonotes} + +\newcommand{\Def}[1]{\emph{#1}} +\newcommand{\bigO}{\mathcal{O}} + +\begin{document} +\maketitle + +\section{Introduction} + +Recently automata learning has gained popularity. Learning algorithms are +applied to real world systems (systems under learning, or SUL for short) and +this shows some problems. + +In the classical active learning algorithms such as $L^\ast$ one supposes a +teacher to which the algorithm can ask \Def{membership queries} and \Def{ +equivalence queries}. In the former case the algorithms asks the teacher for the +output (sequence) for a given input sequence. In the latter case the algorithm +provides the teacher with a hypothesis and the teacher answers with either an +input sequence on which the hypothesis behaves differently than the SUL or +answers affirmatively in the case the machines are behaviorally equivalent. + +In real world applications we have to implement the teacher ourselves, despite +the fact that we do not know all the details of the SUL. The membership queries +are easily implemented by resetting the machine and applying the input. The +equivalence queries, however, are often impossible to implement. Instead, we +have to resort to some sort of random testing. Doing random testing naively is +of course hopeless, as the state space is often too big. Luckily we we have a +hypothesis at hand, we can use for model based testing. + +One standard framework for model based testing is pioneered by Chow and +Vasilevski. Briefly, the framework supposes prefix sequences which allows us to +go from the initial state to a given state $s$ (in the model) or even a given +transition $t \to s$, and suffix sequences which test whether the machine +actually is in state $s$. If we have the right suffixes and test every +transition of the model, we can ensure that the SUL is either equivalent or has +strictly more states. Such a test suite can be constructed with a size +polynomially in the number of states of the model. This is contrary to +exhaustive testing or (naive) random testing, where there are exponentially many +sequences. + +For the prefixes we can use any single source shortest path algorithm. In fact, +if we restrict ourselves to the above framework, this is the best we can do. +This gives $n$ sequences of length at most $n-1$ (in fact the total sum is at +most $\frac{1}{2}n(n-1)$). + +For the suffixes, we can use the standard Hopcroft algorithm to generate +seperating sequences. If we want to test a given state $s$, we take the set of +suffixes (we allow the set of suffixes to depend on the state we want to test) +to be all seperating sequences for all other states $t$. This set has at most $n +-1$ elements of at most length $n$, again the total sum is $\frac{1}{2}n(n-1)$. +A natural question arises: can we do better? + +In the presence of a distinguishing sequence, Lee and Yannakakis prove that one +can take a set of suffixes of just $1$ element of at most length $\frac{1}{2}n(n +-1)$. This does not provide an improvement in the worst case scenario. Even +worse, such a sequence might not exist. + +In this paper we propose a testing algorithm which combines the two methods +described above. The distinguishing sequence might not exist, but the tree +constructed during the Lee and Yannakakis provides a lot of information which we +can complement with the classical Hopcroft approach. Despite the fact that this +is not an improvement in the worst case scenario, this hybrid method enabled us +to learn an industrial grade machine, which was infeasible to learn with the +standard methods provided by LearnLib. + + +\section{Preliminaries} + +We restrict our attention to \Def{Mealy machines}. Let $I$ (resp. $O$) denote +the finite set of inputs (resp. outputs). Then a Mealy machine $M$ consists of a +set of states $S$ with a initial states $s_0$, together with a transition +function $\delta : I \times S \to S$ and an output function $\lambda : I \times +S \to O$. Note that we assume machines to be deterministic and total. We also +assume that our system under learning is a Mealy machine. Both functions $\delta +$ and $\lambda$ are extended to words in $I^\ast$. + +We are in the context of learning, so we will generally denote the hypothesis by +$H$ and the system under learning by $SUL$. Note that we can assume $H$ to be +minimal and reachable. + +We assume that the alphabets $I$ and $O$ are fixed in these notes. +\begin{definition} + A \Def{set of words $X$ (over $I$)} is a subset $X \subset I^\ast$. + + Given a set of states $S$, then a \Def{family of sets $X$ (over $I$)} is a + collection $X = \{X_s\}_{s \in S}$ where each $X_s$ is a set of words. +\end{definition} + +All words we will consider are over $I$, so we will refer to them as simply +words. The idea of a family of sets was also introduced by Fujiwara. They are +used to collect sequences which are relevant for a certain state. We define +some operations on sets and families: +\newcommand{\tensor}{\otimes} +\begin{itemize} + \item Let $X$ and $Y$ be two sets of words over $I$, then $X \cdot Y$ is the + set of all concatenations: $X \cdot Y = \{ x y \,|\, x \in X, y \in Y \}$. + \item Let $X^n = X \cdots X$ denote the iterated concatenation and $X^{\leq k} + = \bigcup_{n \leq k} X^n$ all concatenations of at most length $k$. + In particular $I^n$ are all words of length + precisely $n$ and $I^{\leq k}$ are all words of length at most $k$. + \item Let $X = \{ X_s \}_{s \in S}$ and $Y = \{ Y_s \}_{s \in S}$ be two + families of sets. We define a new family of words $X \tensor_H Y$ as + $(X \tensor_H Y)_s = \{ x y \,|\, x \in X_s, y \in Y_{\delta(s, x)} \}$. + Note that this depends on the transitions in the machine $H$. + \item Let $X$ be a family and $Y$ just a set of words, + then the usual concatenation is defined as $(X \cdot Y)_s = X_s \cdot Y$. + \item Let $X$ be a family of sets, then the union $\bigcup X$ forms a set + of words. +\end{itemize} + +Let $H$ be a fixed machine and let $\tensor$ denote $\tensor_H$. We define some +useful sets (which depend on $H$): +\begin{itemize} + \item The set of prefixes $P_s = \{ x \,|\, \text{a shortest } x \text{ such + that } \delta(s, x) = t, t \in S \}$. Note that $P_{s_0}$ is particularly + interesting. These sets can be constructed by any shortest path algorithm. + Note that $P \cdot I$ is a set covering all transitions in $H$. + \item The set $W_s = \{ x \,|\, x \text{ seperates } s \text{ and } t, t \in + S\}$. This can be constructed using Hopcroft's algorithm or Gill's algorithm + if one wants minimal separating sequences. + \item If $x$ is an adaptive distinguishing sequence in the sense of Lee and + Yannakakis, and let $x_s$ denote the associated UIO for state $x$, we + define $Z_s = \{ x_s \}$. +\end{itemize} + +We obtain different methods (note that all test suites are expressed as families +of sets $X$, the actual test suite is $X_{s_0}$): +\begin{itemize} + \item The originial Chow and Vasilevski (W-method) test suite is given by: + $$ P \cdot I^{\leq k+1} \cdot \bigcup W $$ + which distinguishes $H$ from any non-equivalent machine with at + most $|S| + k$ states. + \item The Wp-method as described by Fujiwara: + $$ (P \cdot I^{\leq k} \cdot \bigcup W) \cup (P \cdot I^{\leq k+1} \tensor W) $$ + which is a smaller test suite than the W-method, but just as strong. Note + that the original description by Fujiwara is more detailed in order to + reduce redundancy. + \item The method proposed by Lee and Yannakakis: + $$ P \cdot I^{\leq k+1} \tensor Z $$ + which is as big as the Wp-method in the worst case (if it even exists) and + just as strong. +\end{itemize} + +An important observation is that the size of $P$, $W$ and $Z$ are polynomially +in the number of states of $H$, but that the middle part $I^{\leq k+1}$ is +exponential. If the numbers of states of $SUL$ is known, one can perform a (big) +exhaustive test. In practice this is not known or has a very large bound. To +mitigate this we can exhaust $I^{\leq 1}$ and then resort to randomly sample $I +^\ast$. It is in this sampling phase that we want $W$ and $Z$ to contain the +least number of elements as every element contributes to the exponential blowup. + +Also note that $W$ can also be constructed in different ways. For example taking +$W_s = \{ u_s \}$, where $u_s$ is a UIO for state $s$ (assuming they all exist) +gives valid variants of the first two methods. Also if an adaptive +distinguishing sequence exists, all states have UIOs and we can use the first +two methods. The third method, however, is slightly smallar as we do not need +$\bigcup W$ in this case, because the UIOs constructed from an adaptive +distinguishing sequence share (non-empty) prefixes. + +% fix from http://tex.stackexchange.com/questions/103735/list-of-todos-todonotes-is-empty-with-llncs +\setcounter{tocdepth}{1} +\listoftodos + +\end{document}