\documentclass[envcountsame]{llncs}

\usepackage{amsmath}
\usepackage[backgroundcolor=white]{todonotes}

\newcommand{\Def}[1]{\emph{#1}}
\newcommand{\bigO}{\mathcal{O}}

\begin{document}
\maketitle

\section{Introduction}

Recently automata learning has gained popularity. Learning algorithms are
applied to real world systems (systems under learning, or SUL for short) and
this shows some problems.

In the classical active learning algorithms such as $L^\ast$ one supposes a
teacher to which the algorithm can ask \Def{membership queries} and \Def{
equivalence queries}. In the former case the algorithms asks the teacher for the
output (sequence) for a given input sequence. In the latter case the algorithm
provides the teacher with a hypothesis and the teacher answers with either an
input sequence on which the hypothesis behaves differently than the SUL or
answers affirmatively in the case the machines are behaviorally equivalent.

In real world applications we have to implement the teacher ourselves, despite
the fact that we do not know all the details of the SUL. The membership queries
are easily implemented by resetting the machine and applying the input. The
equivalence queries, however, are often impossible to implement. Instead, we
have to resort to some sort of random testing. Doing random testing naively is
of course hopeless, as the state space is often too big. Luckily we we have a
hypothesis at hand, we can use for model based testing.

One standard framework for model based testing is pioneered by Chow and
Vasilevski. Briefly, the framework supposes prefix sequences which allows us to
go from the initial state to a given state $s$ (in the model) or even a given
transition $t \to s$, and suffix sequences which test whether the machine
actually is in state $s$. If we have the right suffixes and test every
transition of the model, we can ensure that the SUL is either equivalent or has
strictly more states. Such a test suite can be constructed with a size
polynomially in the number of states of the model. This is contrary to
exhaustive testing or (naive) random testing, where there are exponentially many
sequences.

For the prefixes we can use any single source shortest path algorithm. In fact,
if we restrict ourselves to the above framework, this is the best we can do.
This gives $n$ sequences of length at most $n-1$ (in fact the total sum is at
most $\frac{1}{2}n(n-1)$).

For the suffixes, we can use the standard Hopcroft algorithm to generate
seperating sequences. If we want to test a given state $s$, we take the set of
suffixes (we allow the set of suffixes to depend on the state we want to test)
to be all seperating sequences for all other states $t$. This set has at most $n
-1$ elements of at most length $n$, again the total sum is $\frac{1}{2}n(n-1)$.
A natural question arises: can we do better?

In the presence of a distinguishing sequence, Lee and Yannakakis prove that one
can take a set of suffixes of just $1$ element of at most length $\frac{1}{2}n(n
-1)$. This does not provide an improvement in the worst case scenario. Even
worse, such a sequence might not exist.

In this paper we propose a testing algorithm which combines the two methods
described above. The distinguishing sequence might not exist, but the tree
constructed during the Lee and Yannakakis provides a lot of information which we
can complement with the classical Hopcroft approach. Despite the fact that this
is not an improvement in the worst case scenario, this hybrid method enabled us
to learn an industrial grade machine, which was infeasible to learn with the
standard methods provided by LearnLib.


\section{Preliminaries}

We restrict our attention to \Def{Mealy machines}. Let $I$ (resp. $O$) denote
the finite set of inputs (resp. outputs). Then a Mealy machine $M$ consists of a
set of states $S$ with a initial states $s_0$, together with a transition
function $\delta : I \times S \to S$ and an output function $\lambda : I \times
S \to O$. Note that we assume machines to be deterministic and total. We also
assume that our system under learning is a Mealy machine. Both functions $\delta
$ and $\lambda$ are extended to words in $I^\ast$.

We are in the context of learning, so we will generally denote the hypothesis by
$H$ and the system under learning by $SUL$. Note that we can assume $H$ to be
minimal and reachable.

We assume that the alphabets $I$ and $O$ are fixed in these notes.
\begin{definition}
	A \Def{set of words $X$ (over $I$)} is a subset $X \subset I^\ast$.

	Given a set of states $S$, then a \Def{family of sets $X$ (over $I$)} is a
	collection $X = \{X_s\}_{s \in S}$ where each $X_s$ is a set of words.
\end{definition}

All words we will consider are over $I$, so we will refer to them as simply
words. The idea of a family of sets was also introduced by Fujiwara. They are
used to collect sequences which are relevant for a certain state. We define
some operations on sets and families:
\newcommand{\tensor}{\otimes}
\begin{itemize}
	\item Let $X$ and $Y$ be two sets of words over $I$, then $X \cdot Y$ is the
	set of all concatenations: $X \cdot Y = \{ x y \,|\, x \in X, y \in Y \}$.
	\item Let $X^n = X \cdots X$ denote the iterated concatenation and $X^{\leq k}
	= \bigcup_{n \leq k} X^n$ all concatenations of at most length $k$.
	In particular $I^n$ are all words of length
	precisely $n$ and $I^{\leq k}$ are all words of length at most $k$.
	\item Let $X = \{ X_s \}_{s \in S}$ and $Y = \{ Y_s \}_{s \in S}$ be two
	families of sets. We define a new family of	words $X \tensor_H Y$ as
	$(X \tensor_H Y)_s = \{ x y \,|\, x \in X_s, y \in Y_{\delta(s, x)} \}$.
	Note that this depends on the transitions in the machine $H$.
	\item Let $X$ be a family and $Y$ just a set of words,
	then the usual concatenation is defined as $(X \cdot Y)_s = X_s \cdot Y$.
	\item Let $X$ be a family of sets, then the union $\bigcup X$ forms a set
	of words.
\end{itemize}

Let $H$ be a fixed machine and let $\tensor$ denote $\tensor_H$. We define some
useful sets (which depend on $H$):
\begin{itemize}
	\item The set of prefixes $P_s = \{ x \,|\, \text{a shortest } x \text{ such
	that } \delta(s, x) = t, t \in S \}$. Note that $P_{s_0}$ is particularly
	interesting. These sets can be constructed by any shortest path algorithm.
	Note that $P \cdot I$ is a set covering all transitions in $H$.
	\item The set $W_s = \{ x \,|\, x \text{ seperates } s \text{ and } t, t \in
	S\}$. This can be constructed using Hopcroft's algorithm or Gill's algorithm
	if one wants minimal separating sequences.
	\item If $x$ is an adaptive distinguishing sequence in the sense of Lee and
	Yannakakis, and let $x_s$ denote the associated UIO for state $x$, we
	define $Z_s = \{ x_s \}$.
\end{itemize}

We obtain different methods (note that all test suites are expressed as families
of sets $X$, the actual test suite is $X_{s_0}$):
\begin{itemize}
	\item The originial Chow and Vasilevski (W-method) test suite is given by:
	$$ P \cdot I^{\leq k+1} \cdot \bigcup W $$
	which distinguishes $H$ from any non-equivalent machine with at
	most $|S| + k$ states.
	\item The Wp-method as described by Fujiwara:
	$$ (P \cdot I^{\leq k} \cdot \bigcup W) \cup (P \cdot I^{\leq k+1} \tensor W) $$
	which is a smaller test suite than the W-method, but just as strong. Note
	that the original description by Fujiwara is more detailed in order to
	reduce redundancy.
	\item The method proposed by Lee and Yannakakis:
	$$ P \cdot I^{\leq k+1} \tensor Z $$
	which is as big as the Wp-method in the worst case (if it even exists) and
	just as strong.
\end{itemize}

An important observation is that the size of $P$, $W$ and $Z$ are polynomially
in the number of states of $H$, but that the middle part $I^{\leq k+1}$ is
exponential. If the numbers of states of $SUL$ is known, one can perform a (big)
exhaustive test. In practice this is not known or has a very large bound. To
mitigate this we can exhaust $I^{\leq 1}$ and then resort to randomly sample $I
^\ast$. It is in this sampling phase that we want $W$ and $Z$ to contain the
least number of elements as every element contributes to the exponential blowup.

Also note that $W$ can also be constructed in different ways. For example taking
$W_s = \{ u_s \}$, where $u_s$ is a UIO for state $s$ (assuming they all exist)
gives valid variants of the first two methods. Also if an adaptive
distinguishing sequence exists, all states have UIOs and we can use the first
two methods. The third method, however, is slightly smallar as we do not need
$\bigcup W$ in this case, because the UIOs constructed from an adaptive
distinguishing sequence share (non-empty) prefixes.

% fix from http://tex.stackexchange.com/questions/103735/list-of-todos-todonotes-is-empty-with-llncs
\setcounter{tocdepth}{1}
\listoftodos

\end{document}