710 lines
40 KiB
TeX
710 lines
40 KiB
TeX
\project thesis
|
|
\startcomponent minimal-separating-sequences
|
|
|
|
\startchapter
|
|
[title={Minimal Separating Sequences for All Pairs of States},
|
|
reference=chap:minimal-separating-sequences]
|
|
|
|
\todo{Auteurs en moet ik ook support noemen van de andere auteurs? Keywords?}
|
|
|
|
\startabstract
|
|
Finding minimal separating sequences for all pairs of inequivalent states in a finite state machine
|
|
is a classic problem in automata theory. Sets of minimal separating sequences, for instance, play a
|
|
central role in many conformance testing methods. Moore has already outlined a partition refinement
|
|
algorithm that constructs such a set of sequences in $\bigO(mn)$ time, where $m$ is the number of
|
|
transitions and $n$ is the number of states. In this paper, we present an improved algorithm based
|
|
on the minimization algorithm of Hopcroft that runs in $\bigO(m \log n)$ time. The efficiency of our
|
|
algorithm is empirically verified and compared to the traditional algorithm.
|
|
\stopabstract
|
|
|
|
|
|
\startsection
|
|
[title={Introduction}]
|
|
|
|
In diverse areas of computer science and engineering, systems can be modelled by \emph{finite state machines} (FSMs).
|
|
One of the cornerstones of automata theory is minimization of such machines (and many variation thereof).
|
|
In this process one obtains an equivalent minimal FSM, where states are different if and only if they have different behaviour.
|
|
The first to develop an algorithm for minimisation was \citet[Moore56].
|
|
His algorithm has a time complexity of $\bigO(mn)$, where $m$ is the number of transitions, and $n$ is the number of states of the FSM.
|
|
Later, \citet[Hopcroft71] improved this bound to $\bigO(m \log n)$.
|
|
|
|
Minimisation algorithms can be used as a framework for deriving a set of \emph{separating sequences} that show \emph{why} states are inequivalent.
|
|
The separating sequences in Moore's framework are of minimal length \citep[Gill62].
|
|
Obtaining minimal separating sequences in Hopcroft's framework, however, is a non-trivial task.
|
|
In this paper, we present an algorithm for finding such minimal separating sequences for all pairs of inequivalent states of a FSM in $\bigO(m \log n)$ time.
|
|
|
|
Coincidentally, \citet[DBLP:conf/popl/BonchiP13] recently introduced a new algorithm for the equally fundamental problem of proving equivalence of states in non-deterministic automata.
|
|
As both their and our work demonstrate, even classical problems in automata theory can still offer surprising research opportunities.
|
|
Moreover, new ideas for well-studied problems may lead to algorithmic improvements that are of practical importance in a variety of applications.
|
|
|
|
One such application for our work is in \emph{conformance testing}.
|
|
Here, the goal is to test if a black box implementation of a system is functioning as described by a given FSM.
|
|
It consists of applying sequences of inputs to the implementation, and comparing the output of the system to the output prescribed by the FSM.
|
|
Minimal separating sequences are used in many test generation methods \cite[DBLP:journals/infsof/DorofeevaEMCY10]. Therefore, our algorithm can be used to improve these methods.
|
|
|
|
|
|
\stopsection
|
|
\startsection
|
|
[title={Preliminaries}]
|
|
|
|
We define a \emph{FSM} as a Mealy machine $M = (I, O, S, \delta, \lambda)$, where $I, O$ and $S$ are finite sets of \emph{inputs}, \emph{outputs} and \emph{states} respectively, $\delta \colon S \times I \to S$ is a \emph{transition function} and $\lambda \colon S \times I \to O$ is an \emph{output function}.
|
|
The functions $\delta$ and $\lambda$ are naturally extended to $\delta \colon S \times I^{*} \to S$ and $\lambda \colon S \times I^{*} \to O^{*}$.
|
|
Moreover, given a set of states $S' \subseteq S$ and a sequence $x \in I^{*}$, we define $\delta(S', x) = \{\delta(s, x) \mid s \in S'\}$ and $\lambda(S', x) = \{\lambda(s, x) \mid s \in S'\}$.
|
|
The \emph{inverse transition function} $\delta^{-1} \colon S \times I \to \mathcal{P}(S)$ is defined as $\delta^{-1}(s, a) = \{t \in S \mid \delta(t, a) = s\}$.
|
|
Observe that Mealy machines are deterministic and input-enabled (i.e., complete) by definition.
|
|
The initial state is not specified because it is of no importance in what follows.
|
|
For the remainder of this paper we fix a machine $M = (I, O, S, \delta, \lambda)$.
|
|
We use $n$ to denote its number of states, that is $n = |S|$, and $m$ to denote its number of transitions, that is $m = |S| \times |I|$.
|
|
|
|
\startdefinition[reference=def:equivalent]
|
|
States $s$ and $t$ are \emph{equivalent} if $\lambda(s, x) = \lambda(t, x)$ for all $x$ in $I^{*}$.
|
|
\stopdefinition
|
|
|
|
We are interested in the case where $s$ and $t$ are not equivalent, i.e., \emph{inequivalent}.
|
|
If all pairs of distinct states of a machine $M$ are inequivalent, then $M$ is \emph{minimal}.
|
|
An example of a minimal FSM is given in \in{Figure}[fig:automaton-tree].
|
|
|
|
\startdefinition[reference=def:separate]
|
|
A \emph{separating sequence} for states $s$ and $t$ in $s$ is a sequence $x \in I^{*}$ such that $\lambda(s, x) \neq \lambda(t, x)$.
|
|
We say $x$ is \emph{minimal} if $|y| \geq |x|$ for all separating sequences $y$ for $s$ and $t$.
|
|
\stopdefinition
|
|
|
|
A separating sequence always exists if two states are inequivalent, and there might be multiple minimal separating sequences.
|
|
Our goal is to obtain minimal separating sequences for all pairs of inequivalent states of $M$.
|
|
|
|
|
|
\startsubsection
|
|
[title={Partition Refinement},
|
|
reference=sec:partition]
|
|
|
|
In this section we will discuss the basics of minimisation.
|
|
Both Moore's algorithm and Hopcroft's algorithm work by means of partition refinement.
|
|
A similar treatment (for DFAs) is given by \citet[DBLP:journals/acta/Gries73].
|
|
|
|
A \emph{partition} $P$ of $S$ is a set of pairwise disjoint non-empty subsets of $S$ whose union is exactly $S$.
|
|
Elements in $P$ are called \emph{blocks}.
|
|
If $P$ and $P'$ are partitions of $S$, then $P'$ is a \emph{refinement} of $P$ if every block of $P'$ is contained in a block of $P$.
|
|
A partition refinement algorithm constructs the finest partition under some constraint.
|
|
In our context the constraint is that equivalent states belong to the same block.
|
|
|
|
\startdefinition[reference=def:valid]
|
|
A partition is \emph{valid} if equivalent states are in the same block.
|
|
\stopdefinition
|
|
|
|
Partition refinement algorithms for FSMs start with the trivial partition $P = \{S\}$, and iteratively refine $P$ until it is the finest valid partition (where all states in a block are equivalent).
|
|
The blocks of such a \emph{complete} partition form the states of the minimized FSM, whose transition and output functions are well-defined because states in the same block are equivalent.
|
|
|
|
Let $B$ be a block and $a$ be an input.
|
|
There are two possible reasons to split $B$ (and hence refine the partition).
|
|
First, we can \emph{split $B$ with respect to output after $a$} if the set $\lambda(B, a)$ contains more than one output.
|
|
Second, we can \emph{split $B$ with respect to the state after $a$} if there is no single block $B'$ containing the set $\delta(B, a)$.
|
|
In both cases it is obvious what the new blocks are: in the first case each output in $\lambda(B, a)$ defines a new block, in the second case each block containing a state in $\delta(B, a)$ defines a new block.
|
|
Both types of refinement preserve validity.
|
|
|
|
Partition refinement algorithms for FSMs first perform splits w.r.t. output, until there are no such splits to be performed.
|
|
This is precisely the case when the partition is \emph{acceptable}.
|
|
|
|
\startdefinition[reference=def:acceptable]
|
|
A partition is \emph{acceptable} if for all pairs $s, t$ of states contained in the same block and for all inputs $a$ in $I$, $\lambda(s, a) = \lambda(t, a)$.
|
|
\stopdefinition
|
|
|
|
Any refinement of an acceptable partition is again acceptable.
|
|
The algorithm continues performing splits w.r.t. state, until no such splits can be performed.
|
|
This is exactly the case when the partition is stable.
|
|
|
|
\startdefinition[reference=def:stable]
|
|
A partition is \emph{stable} if it is acceptable and for any input $a$ in $I$ and states $s$ and $t$ that are in the same block, states $\delta(s, a)$ and $\delta(t, a)$ are also in the same block.
|
|
\stopdefinition
|
|
|
|
Since an FSM has only finitely many states, partition refinement will terminate.
|
|
The output is the finest valid partition which is acceptable and stable.
|
|
For a more formal treatment on partition refinement we refer to \citet[DBLP:journals/acta/Gries73].
|
|
|
|
|
|
\stopsubsection
|
|
\startsubsection
|
|
[title={Splitting Trees and Refinable Partitions},
|
|
reference=sec:tree]
|
|
|
|
Both types of splits described above can be used to construct a separating sequence for the states that are split.
|
|
In a split w.r.t. the output after $a$, this sequence is simply $a$.
|
|
In a split w.r.t. the state after $a$, the sequence starts with an $a$ and continues with the separating sequence for states in $\delta(B, a)$.
|
|
In order to systematically keep track of this information, we maintain a \emph{splitting tree}.
|
|
The splitting tree was introduced by \citet[DBLP:journals/tc/LeeY94] as a data structure for maintaining the operational history of a partition refinement algorithm.
|
|
|
|
\startdefinition[reference=def:splitting-tree]
|
|
A \emph{splitting tree} for $M$ is a rooted tree $T$ with a finite set of nodes with the following properties:
|
|
\startitemize
|
|
\item Each node $u$ in $T$ is labelled by a subset of $S$, denoted $l(u)$.
|
|
\item The root is labelled by $S$.
|
|
\item For each inner node $u$, $l(u)$ is partitioned by the labels of its children.
|
|
\item Each inner node $u$ is associated with a sequence $\sigma(u)$ that separates states contained in different children of $u$.
|
|
\stopitemize
|
|
\stopdefinition
|
|
|
|
We use $C(u)$ to denote the set of children of a node $u$.
|
|
The \emph{lowest common ancestor} (lca) for a set $S' \subseteq S$ is the node $u$ such that $S' \subseteq l(u)$ and $S' \not \subseteq l(v)$ for all $v \in C(u)$ and is denoted by $\lca(S')$.
|
|
For a pair of states $s$ and $t$ we use the shorthand $\lca(s, t)$ for $\lca(\{s, t\})$.
|
|
|
|
The labels $l(u)$ can be stored as a \emph{refinable partition} data structure \citep[ValmariL08].
|
|
This is an array containing a permutation of the states, ordered so that states in the same block are adjacent.
|
|
The label $l(u)$ of a node then can be indicated by a slice of this array.
|
|
If node $u$ is split, some states in the \emph{slice} $l(u)$ may be moved to create the labels of its children, but this will not change the \emph{set} $l(u)$.
|
|
|
|
A splitting tree $T$ can be used to record the history of a partition refinement algorithm because at any time the leaves of $T$ define a partition on $S$, denoted $P(T)$.
|
|
We say a splitting tree $T$ is valid (resp. acceptable, stable, complete) if $P(T)$ is as such.
|
|
A leaf can be expanded in one of two ways, corresponding to the two ways a block can be split.
|
|
Given a leaf $u$ and its block $B = l(u)$ we define the following two splits:
|
|
|
|
\description{split-output}
|
|
Suppose there is an input $a$ such that $B$ can be split w.r.t output after $a$.
|
|
Then we set $\sigma(u) = a$, and we create a node for each subset of $B$ that produces the same output $x$ on $a$. These nodes are set to be children of $u$.
|
|
|
|
\description{split-state}
|
|
Suppose there is an input $a$ such that $B$ can be split w.r.t. the state after $a$.
|
|
Then instead of splitting $B$ as described before, we proceed as follows.
|
|
First, we locate the node $v = \lca(\delta(B, a))$.
|
|
Since $v$ cannot be a leaf, it has at least two children whose labels contain elements of $\delta(B, a)$.
|
|
We can use this information to expand the tree as follows.
|
|
For each node $w$ in $C(v)$ we create a child of $u$ labelled $\{s \in B \mid \delta(s, a) \in l(w) \}$ if the label contains at least one state.
|
|
Finally, we set $\sigma(u) = a \sigma(v)$.
|
|
|
|
A straight-forward adaptation of partition refinement for constructing a stable splitting tree for $M$ is shown in \in{Algorithm}[alg:tree].
|
|
The termination and the correctness of the algorithm outlined in \in{Section}[sec:partition] are preserved.
|
|
It follows directly that states are equivalent if and only if they are in the same label of a leaf node.
|
|
|
|
\startplacealgorithm
|
|
[title={Constructing a stable splitting tree.},
|
|
reference=alg:tree]
|
|
\startalgorithmic
|
|
\REQUIRE An FSM $M$
|
|
\ENSURE A valid and stable splitting tree $T$
|
|
\startlinenumbering
|
|
\STATE initialise $T$ to be a tree with a single node labelled $S$
|
|
\REPEAT
|
|
\STATE find $a \in I, B \in P(T)$ such that we can split $B$ w.r.t. output $\lambda(\cdot, a)$
|
|
\STATE expand the $u \in T$ with $l(u)=B$ as described in (split-output)
|
|
\UNTIL{$P(T)$ is acceptable}
|
|
\REPEAT
|
|
\STATE find $a \in I, B \in P(T)$ such that we can split $B$ w.r.t. state $\delta(\cdot, a)$
|
|
\STATE expand the $u \in T$ with $l(u)=B$ as described in (split-state)
|
|
\UNTIL{$P(T)$ is stable}
|
|
\stoplinenumbering
|
|
\stopalgorithmic
|
|
\stopplacealgorithm
|
|
|
|
\startexample[reference=ex:tree]
|
|
\in{Figure}[fig:automaton-tree] shows a FSM and a complete splitting tree for it.
|
|
This tree is constructed by \in{Algorithm}[alg:tree] as follows.
|
|
First, the root node is labelled by $\{s_0, \ldots, s_5\}$.
|
|
The even and uneven states produce different outputs after $a$, hence the root node is split.
|
|
Then we note that $s_4$ produces a different output after $b$ than $s_0$ and $s_2$, so $\{s_0, s_2, s_4\}$ is split as well.
|
|
At this point $T$ is acceptable: no more leaves can be split w.r.t. output.
|
|
Now, the states $\delta(\{s_1, s_3, s_5\}, a)$ are contained in different leaves of $T$.
|
|
Therefore, $\{s_1, s_3, s_5\}$ is split into $\{s_1, s_5\}$ and $\{s_3\}$ and associated with sequence $ab$.
|
|
At this point, $\delta(\{s_0, s_2\}, a)$ contains states that are in both children of $\{s_1, s_3, s_5\}$, so $\{s_0, s_2\}$ is split and the associated sequence is $aab$.
|
|
We continue until $T$ is complete.
|
|
\stopexample
|
|
|
|
\startplacefigure
|
|
[title={A FSM (a) and a complete splitting tree for it (b)},
|
|
reference=fig:automaton-tree]
|
|
|
|
\starttikzpicture
|
|
\node[state] (0) {$s_0$};
|
|
\node[state] (1) [above right of=0] {$s_1$};
|
|
\node[state] (2) [right of=1] {$s_2$};
|
|
\node[state] (3) [below right of=2] {$s_3$};
|
|
\node[state] (4) [below left of=3] {$s_4$};
|
|
\node[state] (5) [below right of=0] {$s_5$};
|
|
\path[->]
|
|
(0) edge [loop left] node [left] {${b}/0$}
|
|
(0) edge [bend right] node [right] {${a}/0$} (1)
|
|
(1) edge node [above] {${a}/1$} (2)
|
|
(1) edge [bend right] node [left] {${b}/0$} (0)
|
|
(2) edge node [right] {${b}/0$} (3)
|
|
(2) edge node [above right] {${a}/0$} (3)
|
|
(3) edge node [below right] {${b}/0$} (4)
|
|
(3) edge node [right] {${a}/1$} (4)
|
|
(4) edge node [above] {${a}/0$} (5)
|
|
(4) edge node [below] {${b}/1$} (5)
|
|
(5) edge node [below left] {${b}/0$} (0)
|
|
(5) edge node [left] {${a}/1$} (0);
|
|
\stoptikzpicture
|
|
|
|
\todo{monotone-tree}
|
|
% \subfloat[\label{fig:monotone-tree}]{
|
|
% \includegraphics[width=0.45\textwidth]{monotone-tree.pdf}
|
|
% }
|
|
\stopplacefigure
|
|
|
|
|
|
\stopsubsection
|
|
\stopsection
|
|
\startsection
|
|
[title={Minimal Separating Sequences}]
|
|
|
|
In \in{Section}[sec:tree] we have described an algorithm for constructing a complete splitting tree.
|
|
This algorithm is non-deterministic, as there is no prescribed order on the splits.
|
|
In this section we order them to obtain minimal separating sequences.
|
|
|
|
Let $u$ be a non-root inner node in a splitting tree, then the sequence $\sigma(u)$ can also be used to split the parent of $u$.
|
|
This allows us to construct splitting trees where children will never have shorter sequences than their parents, as we can always split with those sequences first.
|
|
Trees obtained in this way are guaranteed to be \emph{layered}, which means that for all nodes $u$ and all $u' \in C(u)$, $|\sigma(u)| \leq |\sigma(u')|$.
|
|
Each layer consists of nodes for which the associated separating sequences have the same length.
|
|
|
|
Our approach for constructing minimal sequences is to ensure that each layer is as large as possible before continuing to the next one.
|
|
This idea is expressed formally by the following definitions.
|
|
|
|
\startdefinition
|
|
A splitting tree $T$ is \emph{$k$-stable} if for all states $s$ and $t$ in the same leaf we have $\lambda(s, x) = \lambda(t, x)$ for all $x \in I^{\leq k}$.
|
|
\stopdefinition
|
|
|
|
\startdefinition
|
|
A splitting tree $T$ is \emph{minimal} if for all states $s$ and $t$ in different leaves $\lambda(s, x) \neq \lambda(t, x)$ implies $|x| \geq |\sigma(\lca(s, t))|$ for all $x \in I^{*}$.
|
|
\stopdefinition
|
|
|
|
Minimality of a splitting tree can be used to obtain minimal separating sequences for pairs of states.
|
|
If the tree is in addition stable, we obtain minimal separating sequences for all inequivalent pairs of states.
|
|
Note that if a minimal splitting tree is $(n-1)$-stable ($n$ is the number of states of $M$), then it is stable (\in{Definition}[def:stable]).
|
|
This follows from the well-known fact that $n-1$ is an upper bound for the length of a minimal separating sequence \citep[Moore56].
|
|
|
|
\in{Algorithm}[alg:minimaltree] ensures a stable and minimal splitting tree.
|
|
The first repeat-loop is the same as before (in \in{Algorithm}[alg:tree]).
|
|
Clearly, we obtain a $1$-stable and minimal splitting tree here.
|
|
It remains to show that we can extend this to a stable and minimal splitting tree.
|
|
\in{Algorithm}[alg:increase-k] will perform precisely one such step towards stability, while maintaining minimality.
|
|
Termination follows from the same reason as for \in{Algorithm}[alg:tree].
|
|
Correctness for this algorithm is shown by the following key lemma.
|
|
We will denote the input tree by $T$ and the tree after performing \in{Algorithm}[alg:increase-k] by $T'$.
|
|
Observe that $T$ is an initial segment of $T'$.
|
|
|
|
\startlemma
|
|
[reference=lem:stability]
|
|
\in{Algorithm}[alg:increase-k] ensures a $(k+1)$-stable minimal splitting tree.
|
|
\stoplemma
|
|
\startproof
|
|
Let us proof stability.
|
|
Let $s$ and $t$ be in the same leaf of $T'$ and let $x \in I^{*}$ be such that $\lambda(s, x) \neq \lambda(t, x)$.
|
|
We show that $|x| > k+1$.
|
|
|
|
Suppose for the sake of contradiction that $|x| \leq k+1$.
|
|
Let $u$ be the leaf containing $s$ and $t$ and write $x = a x'$.
|
|
We see that $\delta( s, a)$ and $\delta(t, a)$ are separated by $k$-stability of $T$.
|
|
So the node $v = \lca(\delta(l(u), a))$ has children and an associated sequence $\sigma(v)$.
|
|
There are two cases:
|
|
\startitemize
|
|
\item
|
|
$|\sigma(v)| < k$, then $a \sigma(v)$ separates $s$ and $t$ and is of length $\leq k$.
|
|
This case contradicts the $k$-stability of $T$.
|
|
\item
|
|
$|\sigma(v)| = k$, then the loop in \in{Algorithm}[alg:increase-k] will consider this case and split.
|
|
Note that this may not split $s$ and $t$ (it may occur that $a \sigma(v)$ splits different elements in $l(u)$).
|
|
We can repeat the above argument inductively for the newly created leaf containing $s$ and $t$.
|
|
By finiteness of $l(u)$, the induction will stop and, in the end, $s$ and $t$ are split.
|
|
\stopitemize
|
|
Both cases end in contradiction, so we conclude that $|x| > k+1$.
|
|
|
|
Let us now prove minimality.
|
|
It suffices to consider only newly split states in $T'$.
|
|
Let $s$ and $t$ be two states with $|\sigma(\lca(s, t))| = k+1$.
|
|
Let $x \in I^{*}$ be a sequence such that $\lambda(s, x) \neq \lambda(t, x)$.
|
|
We need to show that $|x| \geq k+1$.
|
|
Since $x \neq \epsilon$ we can write $x = a x'$ and consider the states $s' = \delta(s, a)$ and $t' = \delta(t, a)$ which are separated by $x'$.
|
|
Two things can happen:
|
|
\startitemize
|
|
\item
|
|
The states $s'$ and $t'$ are in the same leaf in $T$.
|
|
Then by $k$-stability of $T$ we get $\lambda(s', y) = \lambda(t', y)$ for all $y \in I^{\leq k}$.
|
|
So $|x'| > k$.
|
|
\item
|
|
The states $s'$ and $t'$ are in different leaves in $T$ and let $u = \lca(s', t')$.
|
|
Then $a\sigma (u)$ separates $s$ and $t$.
|
|
Since $s$ and $t$ are in the same leaf in $T$ we get $|a \sigma(u)| \geq k+1$ by $k$-stability.
|
|
This means that $|\sigma(u)| \geq k$ and by minimality of $T$ we get $|x'| \geq k$.
|
|
\stopitemize
|
|
In both cases we have shown that $|x| \geq k+1$ as required.
|
|
\stopproof
|
|
|
|
\startplacealgorithm
|
|
[title={Constructing a stable and minimal splitting tree.},
|
|
reference=alg:minimaltree]
|
|
\startalgorithmic
|
|
\REQUIRE An FSM $M$ with $n$ states
|
|
\ENSURE A stable, minimal splitting tree $T$
|
|
\startlinenumbering
|
|
\STATE initialise $T$ to be a tree with a single node labelled $S$
|
|
\REPEAT
|
|
\STATE find $a \in I, B \in P(T)$ such that we can split $B$ w.r.t. output $\lambda(\cdot, a)$
|
|
\STATE expand the $u \in T$ with $l(u)=B$ as described in (split-output)
|
|
\UNTIL{$P(T)$ is acceptable}
|
|
\FOR{$k = 1$ {\bf to} $n-1$}
|
|
\STATE invoke \in{Algorithm}[alg:increase-k] or \in{Algorithm}[alg:increase-k-v3] on $T$ for $k$
|
|
\ENDFOR
|
|
\stoplinenumbering
|
|
\stopalgorithmic
|
|
\stopplacealgorithm
|
|
|
|
\startplacealgorithm
|
|
[title={A step towards the stability of a splitting tree.},
|
|
reference=alg:increase-k]
|
|
\startalgorithmic
|
|
\REQUIRE A $k$-stable and minimal splitting tree $T$
|
|
\ENSURE $T$ is a $(k+1)$-stable, minimal splitting tree
|
|
\startlinenumbering
|
|
\FORALL{leaves $u \in T$ and all inputs $a \in I$}
|
|
\STATE $v \gets \lca(\delta(l(u), a))$
|
|
\IF{$v$ is an inner node and $|\sigma(v)| = k$}
|
|
\STATE expand $u$ as described in (split-state) (this generates new leaves)
|
|
\ENDIF
|
|
\ENDFOR
|
|
\stoplinenumbering
|
|
\stopalgorithmic
|
|
\stopplacealgorithm
|
|
|
|
\startexample
|
|
\in{Figure}[fig:splitting-tree-only] shows a stable and minimal splitting tree $T$ for the machine in \in{Figure}[fig:automaton].
|
|
This tree is constructed by \in{Algorithm}[alg:minimaltree] as follows.
|
|
It executes the same as \in{Algorithm}[alg:tree] until we consider the node labelled $\{s_0, s_2\}$.
|
|
At this point $k = 1$.
|
|
We observe that the sequence of $\lca(\delta(\{s_0,s_2\}, a))$ has length $2$, which is too long, so we continue with the next input.
|
|
We find that we can indeed split w.r.t. the state after $b$, so the associated sequence is $ba$.
|
|
Continuing, we obtain the same partition as before, but with smaller witnesses.
|
|
|
|
The internal data structure (a refinable partition) is shown in \in{Figure}[fig:internal-data-structure]:
|
|
the array with the permutation of the states is at the bottom,
|
|
and every block includes an indication of the slice containing its label and a pointer to its parent (as our final algorithm needs to find the parent block, but never the child blocks).
|
|
\stopexample
|
|
|
|
\todo{Figure: splitting tree + internal representation}
|
|
|
|
%\begin{figure}
|
|
% \centering
|
|
% \subfloat[\label{fig:splitting-tree-only}]{
|
|
% \includegraphics[width=0.43\textwidth]{splitting-tree.pdf}
|
|
% }
|
|
% \subfloat[\label{fig:internal-data-structure}]{
|
|
% \setcounter{maxdepth}{3}
|
|
% \begin{tikzpicture}[scale=1.0,>=stealth']
|
|
% \gasblock{0}{6}{0}{2}{\makebox[0pt][r]{\tiny\textcolor{gray}{$\sigma(B_2) = \mbox{}$}} a}
|
|
% \gasblock{0}{3}{1}{4}{b}
|
|
% \gasblock{3}{6}{1}{8}{a \sigma(B_4)}
|
|
% \gasblock{0}{2}{2}{6}{b \sigma(B_2)}
|
|
% \gasblock{2}{3}{2}{3}{}
|
|
% \gasblock{3}{5}{2}{10}{a \sigma(B_6)}
|
|
% \gasblock{5}{6}{2}{7}{}
|
|
% \gasblock{0}{1}{3}{0}{}
|
|
% \gasblock{1}{2}{3}{5}{}
|
|
% \gasblock{3}{4}{3}{1}{}
|
|
% \gasblock{4}{5}{3}{9}{}
|
|
% \draw (0,0) node[shape=rectangle,minimum width=1cm,draw] (s2) {\footnotesize $s_2$};
|
|
% \draw (1,0) node[shape=rectangle,minimum width=1cm,draw] (s0) {\footnotesize $s_0$};
|
|
% \draw (2,0) node[shape=rectangle,minimum width=1cm,draw] (s4) {\footnotesize $s_4$};
|
|
% \draw (3,0) node[shape=rectangle,minimum width=1cm,draw] (s5) {\footnotesize $s_5$};
|
|
% \draw (4,0) node[shape=rectangle,minimum width=1cm,draw] (s1) {\footnotesize $s_1$};
|
|
% \draw (5,0) node[shape=rectangle,minimum width=1cm,draw] (s3) {\footnotesize $s_3$};
|
|
% \draw[->]
|
|
% (s2) edge (B0)
|
|
% (B0) edge (B6)
|
|
% (B6) edge (B4)
|
|
% (B4) edge (B2)
|
|
% (s0) edge (B5)
|
|
% (B5) edge (B6)
|
|
% (s4) edge (B3)
|
|
% (B3) edge (B4)
|
|
% (s5) edge (B1)
|
|
% (B1) edge (B10)
|
|
% (B10) edge (B8)
|
|
% (B8) edge (B2)
|
|
% (s1) edge (B9)
|
|
% (B9) edge (B10)
|
|
% (s3) edge (B7)
|
|
% (B7) edge (B8);
|
|
% \end{tikzpicture}
|
|
% }
|
|
% \caption{A complete and minimal splitting tree for the FSM in \in{Figure}[fig:automaton] (a) and its
|
|
% internal refinable partition data structure (b)}
|
|
% \label{fig:splitting-tree}
|
|
%\end{figure}
|
|
|
|
|
|
\stopsection
|
|
\startsection
|
|
[title={Optimizing the Algorithm}]
|
|
|
|
In this section, we present an improvement on \in{Algorithm}[alg:increase-k] that uses two ideas described by \citet[Hopcroft71] in his seminal paper on minimizing finite automata:
|
|
\emph{using the inverse transition set}, and \emph{processing the smaller half}.
|
|
The algorithm that we present is a drop-in replacement, so that \in{Algorithm}[alg:minimaltree] stays the same except for some bookkeeping.
|
|
This way, we can establish correctness of the new algorithms more easily.
|
|
The variant presented in this section reduces the amount of redundant computations that were made in \in{Algorithm}[alg:increase-k].
|
|
|
|
Using Hopcroft's first idea, we turn our algorithm upside down:
|
|
instead of searching for the lca for each leaf, we search for the leaves $u$ for which $l(u) \subseteq \delta^{-1}(l(v), a)$, for each potential lca $v$ and input $a$.
|
|
To keep the order of splits as before, we define \emph{$k$-candidates}.
|
|
|
|
\startdefinition[reference=def:k-candidate]
|
|
A \emph{$k$-candidate} is a node $v$ with $|\sigma(v)| = k$.
|
|
\stopdefinition
|
|
|
|
A $k$-candidate $v$ and an input $a$ can be used to split a leaf $u$ if $v = \lca(\delta(l(u), a))$, because in this case there are at least two states $s, t$ in $l(u)$ such that $\delta(s, a)$ and $\delta(t, a)$ are in labels of different nodes in $C(v)$.
|
|
Refining $u$ this way is called \emph{splitting $u$ with respect to $(v, a)$}.
|
|
The set $C(u)$ is constructed according to (split-state), where each child $w \in C(v)$ defines a child $u_w$ of $u$ with states
|
|
\placeformula[eq:split-fw]
|
|
\startformula\startalign
|
|
\NC l(u_w) \NC = \{ s \in l(u) \,|\, \delta(s, a) \in l(w) \} \NR
|
|
\NC \NC = l(u) \cap \delta^{-1}(l(w), a) \NR
|
|
\stopalign\stopformula
|
|
In order to perform the same splits in each layer as before, we maintain a list $L_{k}$ of $k$-candidates.
|
|
We keep the list in order of the construction of nodes, because when we split w.r.t. a child of a node $u$ before we split w.r.t. $u$, the result is not well-defined.
|
|
Indeed, the order on $L_{k}$ is the same as the order used by \in{Algorithm}[alg:minimaltree].
|
|
So far, the improved algorithm still would have time complexity $\bigO(mn)$.
|
|
|
|
To reduce the complexity we have to use Hopcroft's second idea of \emph{processing the smaller half}.
|
|
The key idea is that, when we fix a $k$-candidate $v$, all leaves are split with respect to $(v, a)$ simultaneously.
|
|
Instead of iterating over of all leaves to refine them, we iterate over $s \in \delta^{-1}(l(w), a)$ for all $w$ in $C(v)$ and look up in which leaf it is contained to move $s$ out of it.
|
|
From Lemma 8 by \citet[DBLP:journals/tcs/Knuutila01] it follows that we can skip one of the children of $v$.
|
|
This lowers the time complexity to $\bigO(m \log n)$.
|
|
In order to move $s$ out of its leaf, each leaf $u$ is associated with a set of temporary children $C'(u)$ that is initially empty, and will be finalized after iterating over all $s$ and $w$.
|
|
|
|
\startplacealgorithm
|
|
[title={A better step towards the stability of a splitting tree.},
|
|
reference=alg:increase-k-v3]
|
|
\startalgorithmic
|
|
\REQUIRE A $k$-stable and minimal splitting tree $T$, and a list $L_k$
|
|
\ENSURE $T$ is a $(k+1)$-stable and minimal splitting tree, and a list $L_{k+1}$
|
|
\startlinenumbering
|
|
\STATE $L_{k+1} \gets \emptyset$
|
|
\FORALL{$k$-candidates $v$ in $L_{k}$ in order}
|
|
\STATE let $w'$ be a node in $C(v)$ with $|l(w')| \geq |l(w)|$ for all nodes $w \in C(v)$
|
|
\FORALL{inputs $a$ in $I$}
|
|
\FORALL{nodes $w$ in $C(v) \setminus \{ w' \}$}
|
|
\FORALL{states $s$ in $\delta^{-1}(l(w), a)$}
|
|
\STATE locate leaf $u$ such that $s \in l(u)$
|
|
\IF{$C'(u)$ does not contain node $u_{w}$}
|
|
\STATE add a new node $u_{w}$ to $C'(u)$ \someline[line:add-temp-child]
|
|
\ENDIF
|
|
\STATE move $s$ from $l(u)$ to $l(u_{w})$ \someline[line:move-to-temp-child]
|
|
\ENDFOR
|
|
\ENDFOR
|
|
\FORALL{leaves $u$ with $C'(u) \neq \emptyset$\startline[line:finalize]}
|
|
\IF{$|l(u)| = 0$}
|
|
\IF{$|C'(u)| = 1$}
|
|
\STATE recover $u$ by moving its elements back and clear $C'(u)$ \someline[line:recover]
|
|
\STATE {\bf continue} with the next leaf
|
|
\ENDIF
|
|
\STATE set $p=u$ and $C(u)=C'(u)$ \someline[line:setptou]
|
|
\ELSE
|
|
\STATE construct a new node $p$ and set $C(p) = C'(u) \cup \{ u \}$
|
|
\STATE insert $p$ in the tree in the place where $u$ was
|
|
\ENDIF
|
|
\STATE set $\sigma(p) = a \sigma(v)$ \someline[line:prepend]
|
|
\STATE append $p$ to $L_{k+1}$ and clear $C'(u)$ \stopline[line:finalize]
|
|
\ENDFOR
|
|
\ENDFOR
|
|
\ENDFOR
|
|
\stoplinenumbering
|
|
\stopalgorithmic
|
|
\stopplacealgorithm
|
|
|
|
In \in{Algorithm}[alg:increase-k-v3] we use the ideas described above.
|
|
For each $k$-candidate $v$ and input $a$, we consider all children $w$ of $v$, except for the largest one (in case of multiple largest children, we skip one of these arbitrarily).
|
|
For each state $s \in \delta^{-1}(l(w), a)$ we consider the leaf $u$ containing it.
|
|
If this leaf does not have an associated temporary child for $w$ we create such a child (\inline[line:add-temp-child]), if this child exists we move $s$ into that child (\inline[line:move-to-temp-child]).
|
|
|
|
Once we have done the simultaneous splitting for the candidate $v$ and input $a$, we finalize the temporary children.
|
|
This is done at \inline[line:finalize].
|
|
If there is only one temporary child with all the states, no split has been made and we recover this node (\inline[line:recover]). In the other case we make the temporary children permanent.
|
|
|
|
The states remaining in $u$ are those for which $\delta(s, a)$ is in the child of $v$ that we have skipped; therefore we will call it the \emph{implicit child}.
|
|
We should not touch these states to keep the theoretical time bound.
|
|
Therefore, we construct a new parent node $p$ that will \quotation{adopt} the children in $C'(u)$ together with $u$ (\inline[line:setptou]).
|
|
|
|
We will now explain why considering all but the largest children of a node lowers the algorithm's time complexity.
|
|
Let $T$ be a splitting tree in which we color all children of each node blue, except for the largest one.
|
|
Then:
|
|
|
|
\startlemma[reference=lem:even]
|
|
A state $s$ is in at most $(\log_{2} n) - 1$ labels of blue nodes.
|
|
\stoplemma
|
|
\startproof
|
|
Observe that every blue node $u$ has a sibling $u'$ such that $|l(u')| \geq |l(u)|$.
|
|
So the parent $p(u)$ has at least $2|l(u)|$ states in its label, and the largest blue node has at most $n/2$ states.
|
|
|
|
Suppose a state $s$ is contained in $m$ blue nodes.
|
|
When we walk up the tree starting at the leaf containing $s$, we will visit these $m$ blue nodes.
|
|
With each visit we can double the lower bound of the number of states.
|
|
Hence $n/2 \geq 2^m$ and $m \leq (\log_{2} n) - 1$.
|
|
\stopproof
|
|
|
|
\startcorollary[reference=cor:even-better]
|
|
A state $s$ is in at most $\log_{2} n$ sets $\delta^{-1}(l(u), a)$, where $u$ is a blue node and $a$ is an input in $I$.
|
|
\stopcorollary
|
|
|
|
If we now quantify over all transitions, we immediately get the following result.
|
|
We note that the number of blue nodes is at most $n-1$, but since this fact is not used, we leave this to the reader.
|
|
|
|
\startcorollary[reference=cor:count-all
|
|
Let $\mathcal{B}$ denote the set of blue nodes and define
|
|
\startformula
|
|
\mathcal{X} = \{ (b, a, s) \,|\, b \in \mathcal{B}, a \in I, s \in \delta^{-1}(l(b), a) \}.
|
|
\stopformula
|
|
Then $\mathcal{X}$ has at most $m \log_{2} n$ elements.
|
|
\stopcorollary
|
|
|
|
The important observation is that when using \in{Algorithm}[alg:increase-k-v3] we iterate in total over every element in $\mathcal{X}$ at most once.
|
|
|
|
\starttheorem[reference=th:complexity]
|
|
\in{Algorithm}[alg:minimaltree] using \in{Algorithm}[alg:increase-k-v3] runs in $\bigO(m \log n)$ time.
|
|
\stoptheorem
|
|
\startproof
|
|
We prove that bookkeeping does not increase time complexity by discussing the implementation.
|
|
|
|
\startdescription{Inverse transition.}
|
|
$\delta^{-1}$ can be constructed as a preprocessing step in $\bigO(m)$.
|
|
\stopdescription
|
|
\startdescription{State sorting.}
|
|
As described in \in{Section}[sec:tree], we maintain a refinable partition data structure.
|
|
Each time new pair of a $k$-candidate $v$ and input $a$ is considered, leaves are split by performing a bucket sort.
|
|
|
|
First, buckets are created for each node in $w \in C(v) \setminus w'$ and each leaf $u$ that contains one or more elements from $\delta^{-1}(l(w), a)$, where $w'$ is a largest child of $v$.
|
|
The buckets are filled by iterating over the states in $\delta^{-1}(l(w), a)$ for all $w$.
|
|
Then, a pivot is set for each leaf $u$ such that exactly the states that have been placed in a bucket can be moved right of the pivot (and untouched states in $\delta^{-1}(l(w'), a)$ end up left of the pivot).
|
|
For each leaf $u$, we iterate over the states in its buckets and the corresponding indices right of its pivot, and we swap the current state with the one that is at the current index.
|
|
For each bucket a new leaf node is created.
|
|
The refinable partition is updated such that the current state points to the most recently created leaf.
|
|
|
|
This way, we assure constant time lookup of the leaf for a state, and we can update the array in constant time when we move elements out of a leaf.
|
|
\stopdescription
|
|
\startdescription{Largest child.}
|
|
For finding the largest child, we maintain counts for the temporary children and a current biggest one.
|
|
On finalizing the temporary children we store (a reference to) the biggest child in the node, so that we can skip this node later in the algorithm.
|
|
\stopdescription
|
|
\startdescription{Storing sequences.}
|
|
The operation on \inline[line:prepend] is done in constant time by using a linked list.
|
|
\stopdescription
|
|
\stopproof
|
|
|
|
|
|
\stopsection
|
|
\startsection
|
|
[title=Application in Conformance Testing]
|
|
|
|
A splitting tree can be used to extract relevant information for two classical test generation methods:
|
|
a \emph{characterization set} for the W-method and a \emph{separating family} for the HSI-method.
|
|
For an introduction and comparison of FSM-based test generation methods we refer to \citet[DBLP:journals/infsof/DorofeevaEMCY10].
|
|
|
|
\startdefinition[def:cset]
|
|
A set $W \subset I^{\ast}$ is called a \emph{characterization set} if for every pair of inequivalent states $s, t$ there is a sequence $w \in W$ such that $\lambda(s, w) \neq \lambda(t, w)$.
|
|
\stopdefinition
|
|
|
|
\startlemma[reference=lem:cset]
|
|
Let $T$ be a complete splitting tree, then $\{\sigma(u)|u \in T\}$ is a characterization set.
|
|
\stoplemma
|
|
\startproof
|
|
Let $W = \{\sigma(u) \mid u \in T\}$.
|
|
Let $s, t \in S$ be inequivalent states, then by completeness $s$ and $t$ are contained in different leaves of $T$.
|
|
Hence $u=lca(s, t)$ exists and $\sigma(u)$ separates $s$ and $t$.
|
|
Furthermore, $\sigma(u) \in W$.
|
|
This shows that $W$ is a characterisation set.
|
|
\stopproof
|
|
|
|
\startlemma
|
|
A characterization set with minimal length sequences can be constructed in time $\bigO(m \log n)$.
|
|
\stoplemma
|
|
\startproof
|
|
By \in{Lemma}[lem:cset] the sequences associated with the inner nodes of a splitting tree form a characterization set.
|
|
By \in{Theorem}[th:complexity], such a tree can be constructed in time $\bigO(m \log n)$.
|
|
Traversing the tree to obtain the characterization set is linear in the number of nodes (and hence linear in the number of states).
|
|
\stopproof
|
|
|
|
\startdefinition[reference=def:separating_fam]
|
|
A collection of sets $\{H_s\}_{s \in S}$ is called a \emph{separating family} if for every pair of inequivalent states $s, t$ there is a sequence $h$ such that $\lambda(s, h) \neq \lambda(t, h)$ and $h$ is a prefix of some $h_s \in H_s$ and some $h_t \in H_t$.
|
|
\stopdefinition
|
|
|
|
\startlemma[reference=lem:testing]
|
|
Let $T$ be a complete splitting tree, the sets $\{ \sigma(u) \mid s \in l(u), u \in T \}_{s \in S}$ form a separating family.
|
|
\stoplemma
|
|
\startproof
|
|
Let $H_s = \{\sigma(u) \mid s \in l(u)\}$.
|
|
Let $s, t \in S$ be inequivalent states, then by completeness $s$ and $t$ are contained in different leaves of $T$.
|
|
Hence $u=lca(s,t)$ exists. Since both $s$ and $t$ are contained in $l(u)$, the separating sequence $\sigma(u)$ is contained in both sets $H_s$ and $H_t$.
|
|
Therefore, it is a (trivial) prefix of some word $h_s \in H_s$ and some $h_t \in H_t$.
|
|
Hence $\{H_s\}_{s \in S}$ is a separating family.
|
|
\stopproof
|
|
|
|
\startlemma
|
|
A separating family with minimal length sequences can be constructed in time $\bigO(m \log n + n^2)$.
|
|
\stoplemma
|
|
\startproof
|
|
The separating family can be constructed from the splitting tree by collecting all sequences of all parents of a state (by \in{Lemma}[lem:testing]).
|
|
Since we have to do this for every state, this takes $\bigO(n^2)$ time.
|
|
\stopproof
|
|
|
|
For test generation one moreover needs a transition cover.
|
|
This can be constructed in linear time with a breadth first search.
|
|
We conclude that we can construct all necessary information for the W-method in time $\bigO(m \log n)$ as opposed the the $\bigO(mn)$ algorithm used by \citet[DBLP:journals/infsof/DorofeevaEMCY10].
|
|
Furthermore, we conclude that we can construct all the necessary information for the HSI-method in time $\bigO(m \log n + n^2)$, improving on the the reported bound $\bigO(mn^3)$ by \citet[DBLP:journals/cj/HieronsT15].
|
|
The original HSI-method was formulated differently and might generate smaller sets.
|
|
We conjecture that our separating family has the same size if we furthermore remove redundant prefixes.
|
|
This can be done in $\bigO(n^2)$ time using a trie data structure.
|
|
|
|
|
|
\stopsection
|
|
\startsection
|
|
[title={Experimental Results}]
|
|
|
|
We have implemented \in{Algorithms}[alg:increase-k, alg:increase-k-v3] in Go, and we have compared their running time on two sets of FSMs.
|
|
\footnote{Available at \todo{https://gitlab.science.ru.nl/rick/partition/}.}
|
|
The first set is from \citet[DBLP:conf/icfem/SmeenkMVJ15], where FSMs for embedded control software were automatically constructed.
|
|
These FSMs are of increasing size, varying from 546 to 3~410 states, with 78 inputs and up to 151 outputs.
|
|
The second set is inferred from \citet[Hopcroft71], where two classes of finite automata, $A$ and $B$, are described that serve as a worst case for \in{Algorithms}[alg:increase-k, alg:increase-k-v3] respectively.
|
|
The FSMs that we have constructed for these automata have 1 input, 2 outputs, and $2^2$~--~$2^{15}$ states.
|
|
The running times in seconds on an Intel Core i5-2500 are plotted in \in{Figure}[fig:results].
|
|
We note that different slopes imply different complexity classes, since both axes have a logarithmic scale.
|
|
|
|
\startplacefigure
|
|
[title={Running time in seconds of \in{Algorithm}[alg:increase-k] (grey) and \in{Algorithm}[alg:increase-k-v3] (black).},
|
|
list={Running time of \in{Algorithms}[alg:increase-k] and \in{}[alg:increase-k-v3].},
|
|
reference=fig:results]
|
|
\startcombination[2*1]
|
|
{\starttikzpicture
|
|
\startloglogaxis[width=0.5\textwidth, xtick={500, 1000, 2000, 3000}, xticklabels={$500$, $1000$, $2000$, $3000$}]
|
|
\addplot[color=darkgray] table[x=states, y=Moore] {esm.table};
|
|
\addplot[color=black] table[x=states, y=Hopcroft] {esm.table};
|
|
\stoploglogaxis
|
|
\stoptikzpicture} {(a) Embedded control software.}
|
|
{\starttikzpicture
|
|
\startloglogaxis[width=0.5\textwidth, xtick={4, 64, 2048, 32768}, xticklabels={$2^2$, $2^6$, $2^{11}$, $2^{15}$}]
|
|
\addplot[color=darkgray, dashed] table[x=states, y=Moore] {hopcroft_b.table};
|
|
\addplot[color=black, dashed] table[x=states, y=Hopcroft] {hopcroft_b.table};
|
|
\addplot[color=darkgray] table[x=states, y=Moore] {hopcroft_a.table};
|
|
\addplot[color=black] table[x=states, y=Hopcroft] {hopcroft_a.table};
|
|
\stoploglogaxis
|
|
\stoptikzpicture} {(b) Class $A$ (dashed) and class $B$ (solid).}
|
|
\stopcombination
|
|
\stopplacefigure
|
|
|
|
|
|
\stopsection
|
|
\startsection
|
|
[title={Conclusion}]
|
|
|
|
In this paper we have described an efficient algorithm for constructing a set of minimal-length sequences that pairwise distinguish all states of a finite state machine.
|
|
By extending Hopcroft's minimisation algorithm, we are able to construct such sequences in $\bigO(m \log n)$ for a machine with $m$ transitions and $n$ states.
|
|
This improves on the traditional $\bigO(mn)$ method that is based on the classic algorithm by Moore.
|
|
As an upshot, the sequences obtained form a characterisation set and a separating family, which play a crucial in conformance testing.
|
|
|
|
Two key observations were required for a correct adaptation of Hopcroft's algorithm.
|
|
First, it is required to perform splits in order of the length of their associated sequences.
|
|
This guarantees minimality of the obtained separating sequences.
|
|
Second, it is required to consider nodes as a candidate before any one of its children are considered as a candidate.
|
|
This order follows naturally from the construction of a splitting tree.
|
|
|
|
Experimental results show that our algorithm outperforms the classic approach for both worst-case finite state machines and models of embedded control software.
|
|
Applications of minimal separating sequences such as the ones described by \citet[DBLP:journals/infsof/DorofeevaEMCY10, DBLP:conf/icfem/SmeenkMVJ15] therefore show that our algorithm is useful in practice.
|
|
|
|
|
|
\stopsection
|
|
\referencesifcomponent
|
|
\stopchapter
|
|
\stopcomponent
|