Archived
1
Fork 0

Initial commit: it works

This commit is contained in:
Joshua Moerman 2020-11-16 10:32:56 +01:00
commit 1b7d2a0eca
23 changed files with 31253 additions and 0 deletions

3
.gitignore vendored Normal file
View file

@ -0,0 +1,3 @@
.stack-work/
*~
*.code-workspace

30
LICENSE Normal file
View file

@ -0,0 +1,30 @@
Copyright Joshua Moerman (c) 2020
All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above
copyright notice, this list of conditions and the following
disclaimer in the documentation and/or other materials provided
with the distribution.
* Neither the name of Author name here nor the names of other
contributors may be used to endorse or promote products derived
from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

4
README.md Normal file
View file

@ -0,0 +1,4 @@
# WordParse
Tool to compute word counts, bigrams, trigrams, etc for a large body of
text. I have used this to generate a word cloud from my research papers.

2
Setup.hs Normal file
View file

@ -0,0 +1,2 @@
import Distribution.Simple
main = defaultMain

54
WordParse.cabal Normal file
View file

@ -0,0 +1,54 @@
cabal-version: 1.12
-- This file has been generated from package.yaml by hpack version 0.33.0.
--
-- see: https://github.com/sol/hpack
--
-- hash: da768fcf600dd37ae5fce6e8868b24b8666feb98c7f1da0b3f2cf0b343f1f819
name: WordParse
version: 0.1.0.0
description: Please see the README on GitHub at <https://github.com/githubuser/WordParse#readme>
author: Joshua Moerman
copyright: 2020 Joshua Moerman
license: BSD3
license-file: LICENSE
build-type: Simple
extra-source-files:
README.md
source-repository head
type: git
location: https://github.com/githubuser/WordParse
library
exposed-modules:
Parse
Process
other-modules:
Paths_WordParse
hs-source-dirs:
src
build-depends:
attoparsec
, base >=4.7 && <5
, containers
, parser-combinators
, text
default-language: Haskell2010
executable WordParse-exe
main-is: Main.hs
other-modules:
Paths_WordParse
hs-source-dirs:
app
ghc-options: -threaded -rtsopts -with-rtsopts=-N
build-depends:
WordParse
, attoparsec
, base >=4.7 && <5
, containers
, parser-combinators
, text
default-language: Haskell2010

30
app/Main.hs Normal file
View file

@ -0,0 +1,30 @@
{-# language PartialTypeSignatures #-}
{-# language OverloadedStrings #-}
module Main where
import System.Environment
import Data.Attoparsec.Text
import Data.Text.IO
import Data.Text
import Parse
import Process
printBW :: _ => ([Text], a) -> IO ()
printBW (ws, c) = do
let joined = Data.Text.intercalate " " ws
Data.Text.IO.putStr joined
Prelude.putStr ","
Prelude.print c
main :: IO ()
main = do
[filename] <- getArgs
txt <- Data.Text.IO.readFile filename
let result = parseOnly Parse.words txt
case result of
Right ls -> do
let result = process ls
mapM_ printBW result
Left err -> print err

223
output/all.csv Normal file
View file

@ -0,0 +1,223 @@
nominal sets,210
nominal automata,154
test suite,180
orbit finite,82
splitting tree,68
set generators,53
automata learning,47
complete test,47
state space,45
separating sequences,42
deterministic automata,70
non deterministic,40
learning algorithms,60
black box,34
generating functions,33
membership queries,33
total order,33
join irreducible,32
order symmetry,31
separating family,31
probabilistic programs,29
residual automata,55
equivalence queries,27
mealy machines,55
infinite alphabets,26
nominal languages,26
observation table,26
register automata,26
finite set,48
finitely supported,24
state identifiers,24
test generation,24
characterisation set,23
non guessing,23
transition function,23
data structure,22
initial state,22
single orbit,22
test methods,22
compatible states,21
distinguishing sequence,21
finite state,21
automata theory,20
control software,20
fsm based,20
hybrid ads,20
minimal set,20
product automata,19
succinct automaton,19
model learning,18
can only,17
closed consistent,17
distinguishing tree,17
finite support,17
nominal techniques,17
regular languages,17
renaming sets,17
embedded control,16
equivariant map,16
ordered nominal,16
well defined,16
closed under,15
equality symmetry,15
fast computations,15
generation methods,15
nominal renaming,15
separated product,15
conformance testing,14
inequivalent states,14
local symmetries,14
active learning,13
data values,13
exact learning,13
generating function,13
residual languages,13
state identifier,13
succinct automata,13
upper bound,13
equivariant maps,12
minimal dfa,12
minimisation algorithm,12
partition refinement,12
program variables,12
implementation state,11
ioco relation,11
language semantics,11
guessing automata,10
total order symmetry,31
learning nominal automata,27
complete test suites,45
minimal separating sequences,25
minimal set generators,20
learning product automata,18
embedded control software,16
ordered nominal sets,16
residual nominal automata,16
adaptive distinguishing sequences,30
finite nominal set,15
non deterministic automata,15
test generation methods,15
fsm based test,14
applying automata learning,13
formal power series,13
nominal renaming sets,13
all pairs of states,12
black box testing,12
minimal splitting tree,12
non guessing automata,10
orbit finite sets,10
stable minimal splitting,10
hybrid ads method,9
orbit finite set,9
states are equivalent,9
black box systems,8
join irreducible elements,8
only finitely many,8
canonical residual automaton,7
complete splitting tree,7
moore minimisation algorithm,7
almost surely terminating,6
automata,512
states,476
algorithm,356
learning,301
testing,290
language,215
orbit,306
minimal,172
languages,157
model,145
implementation,142
sequences,138
deterministic,132
residual,132
proof,131
transition,128
output,126
equivariant,122
order,122
product,122
tree,121
function,118
input,118
queries,113
semantics,100
infinite,96
equivalence,94
method,92
software,92
structure,89
support,86
theory,83
alphabet,81
generators,81
time,77
construction,76
data,76
program,76
symmetry,89
ioco,62
representation,62
hypothesis,58
space,58
family,57
algebra,56
consistent,56
teacher,56
partition,53
action,52
relation,52
atoms,51
join,51
system,51
group,50
loop,50
counterexample,48
succinct,47
dimension,46
maps,46
complexity,45
canonical,44
characterisation,44
probabilistic,44
specification,44
fsm,42
nondeterministic,42
behaviour,40
observation,40
components,37
properties,35
formal,34
category,33
distribution,33
irreducible,33
monad,33
completeness,32
conformance,32
permutations,32
reachable,32
regular,31
unique,31
value,31
abstract,30
applications,30
control,30
derivatives,30
coalgebra,29
correct,28
minimisation,28
probability,28
compatible,27
experiments,27
library,27
power,26
research,26
isomorphism,25
rational,25
study,25
chain,24
interesting,24
bisimulation,23
functor,23
1 nominal sets 210
2 nominal automata 154
3 test suite 180
4 orbit finite 82
5 splitting tree 68
6 set generators 53
7 automata learning 47
8 complete test 47
9 state space 45
10 separating sequences 42
11 deterministic automata 70
12 non deterministic 40
13 learning algorithms 60
14 black box 34
15 generating functions 33
16 membership queries 33
17 total order 33
18 join irreducible 32
19 order symmetry 31
20 separating family 31
21 probabilistic programs 29
22 residual automata 55
23 equivalence queries 27
24 mealy machines 55
25 infinite alphabets 26
26 nominal languages 26
27 observation table 26
28 register automata 26
29 finite set 48
30 finitely supported 24
31 state identifiers 24
32 test generation 24
33 characterisation set 23
34 non guessing 23
35 transition function 23
36 data structure 22
37 initial state 22
38 single orbit 22
39 test methods 22
40 compatible states 21
41 distinguishing sequence 21
42 finite state 21
43 automata theory 20
44 control software 20
45 fsm based 20
46 hybrid ads 20
47 minimal set 20
48 product automata 19
49 succinct automaton 19
50 model learning 18
51 can only 17
52 closed consistent 17
53 distinguishing tree 17
54 finite support 17
55 nominal techniques 17
56 regular languages 17
57 renaming sets 17
58 embedded control 16
59 equivariant map 16
60 ordered nominal 16
61 well defined 16
62 closed under 15
63 equality symmetry 15
64 fast computations 15
65 generation methods 15
66 nominal renaming 15
67 separated product 15
68 conformance testing 14
69 inequivalent states 14
70 local symmetries 14
71 active learning 13
72 data values 13
73 exact learning 13
74 generating function 13
75 residual languages 13
76 state identifier 13
77 succinct automata 13
78 upper bound 13
79 equivariant maps 12
80 minimal dfa 12
81 minimisation algorithm 12
82 partition refinement 12
83 program variables 12
84 implementation state 11
85 ioco relation 11
86 language semantics 11
87 guessing automata 10
88 total order symmetry 31
89 learning nominal automata 27
90 complete test suites 45
91 minimal separating sequences 25
92 minimal set generators 20
93 learning product automata 18
94 embedded control software 16
95 ordered nominal sets 16
96 residual nominal automata 16
97 adaptive distinguishing sequences 30
98 finite nominal set 15
99 non deterministic automata 15
100 test generation methods 15
101 fsm based test 14
102 applying automata learning 13
103 formal power series 13
104 nominal renaming sets 13
105 all pairs of states 12
106 black box testing 12
107 minimal splitting tree 12
108 non guessing automata 10
109 orbit finite sets 10
110 stable minimal splitting 10
111 hybrid ads method 9
112 orbit finite set 9
113 states are equivalent 9
114 black box systems 8
115 join irreducible elements 8
116 only finitely many 8
117 canonical residual automaton 7
118 complete splitting tree 7
119 moore minimisation algorithm 7
120 almost surely terminating 6
121 automata 512
122 states 476
123 algorithm 356
124 learning 301
125 testing 290
126 language 215
127 orbit 306
128 minimal 172
129 languages 157
130 model 145
131 implementation 142
132 sequences 138
133 deterministic 132
134 residual 132
135 proof 131
136 transition 128
137 output 126
138 equivariant 122
139 order 122
140 product 122
141 tree 121
142 function 118
143 input 118
144 queries 113
145 semantics 100
146 infinite 96
147 equivalence 94
148 method 92
149 software 92
150 structure 89
151 support 86
152 theory 83
153 alphabet 81
154 generators 81
155 time 77
156 construction 76
157 data 76
158 program 76
159 symmetry 89
160 ioco 62
161 representation 62
162 hypothesis 58
163 space 58
164 family 57
165 algebra 56
166 consistent 56
167 teacher 56
168 partition 53
169 action 52
170 relation 52
171 atoms 51
172 join 51
173 system 51
174 group 50
175 loop 50
176 counterexample 48
177 succinct 47
178 dimension 46
179 maps 46
180 complexity 45
181 canonical 44
182 characterisation 44
183 probabilistic 44
184 specification 44
185 fsm 42
186 nondeterministic 42
187 behaviour 40
188 observation 40
189 components 37
190 properties 35
191 formal 34
192 category 33
193 distribution 33
194 irreducible 33
195 monad 33
196 completeness 32
197 conformance 32
198 permutations 32
199 reachable 32
200 regular 31
201 unique 31
202 value 31
203 abstract 30
204 applications 30
205 control 30
206 derivatives 30
207 coalgebra 29
208 correct 28
209 minimisation 28
210 probability 28
211 compatible 27
212 experiments 27
213 library 27
214 power 26
215 research 26
216 isomorphism 25
217 rational 25
218 study 25
219 chain 24
220 interesting 24
221 bisimulation 23
222 functor 23

222
output/all_swapped.csv Normal file
View file

@ -0,0 +1,222 @@
210,nominal sets
154,nominal automata
180,test suite
82,orbit finite
68,splitting tree
53,set generators
47,automata learning
47,complete test
45,state space
42,separating sequences
70,deterministic automata
40,non deterministic
60,learning algorithms
34,black box
33,generating functions
33,membership queries
33,total order
32,join irreducible
31,order symmetry
31,separating family
29,probabilistic programs
55,residual automata
27,equivalence queries
55,mealy machines
26,infinite alphabets
26,nominal languages
26,observation table
26,register automata
48,finite set
24,finitely supported
24,state identifiers
24,test generation
23,characterisation set
23,non guessing
23,transition function
22,data structure
22,initial state
22,single orbit
22,test methods
21,compatible states
21,distinguishing sequence
21,finite state
20,automata theory
20,control software
20,fsm based
20,hybrid ads
20,minimal set
19,product automata
19,succinct automaton
18,model learning
17,can only
17,closed consistent
17,distinguishing tree
17,finite support
17,nominal techniques
17,regular languages
17,renaming sets
16,embedded control
16,equivariant map
16,ordered nominal
16,well defined
15,closed under
15,equality symmetry
15,fast computations
15,generation methods
15,nominal renaming
15,separated product
14,conformance testing
14,inequivalent states
14,local symmetries
13,active learning
13,data values
13,exact learning
13,generating function
13,residual languages
13,state identifier
13,succinct automata
13,upper bound
12,equivariant maps
12,minimal dfa
12,minimisation algorithm
12,partition refinement
12,program variables
11,implementation state
11,ioco relation
11,language semantics
10,guessing automata
31,total order symmetry
27,learning nominal automata
45,complete test suites
25,minimal separating sequences
20,minimal set generators
18,learning product automata
16,embedded control software
16,ordered nominal sets
16,residual nominal automata
30,adaptive distinguishing sequences
15,finite nominal set
15,non deterministic automata
15,test generation methods
14,fsm based test
13,applying automata learning
13,formal power series
13,nominal renaming sets
12,all pairs of states
12,black box testing
12,minimal splitting tree
10,non guessing automata
10,orbit finite sets
10,stable minimal splitting
9,hybrid ads method
9,orbit finite set
9,states are equivalent
8,black box systems
8,join irreducible elements
8,only finitely many
7,canonical residual automaton
7,complete splitting tree
7,moore minimisation algorithm
6,almost surely terminating
512,automata
476,states
356,algorithm
301,learning
290,testing
215,language
306,orbit
172,minimal
157,languages
145,model
142,implementation
138,sequences
132,deterministic
132,residual
131,proof
128,transition
126,output
122,equivariant
122,order
122,product
121,tree
118,function
118,input
113,queries
100,semantics
96,infinite
94,equivalence
92,method
92,software
89,structure
86,support
83,theory
81,alphabet
81,generators
77,time
76,construction
76,data
76,program
89,symmetry
62,ioco
62,representation
58,hypothesis
58,space
57,family
56,algebra
56,consistent
56,teacher
53,partition
52,action
52,relation
51,atoms
51,join
51,system
50,group
50,loop
48,counterexample
47,succinct
46,dimension
46,maps
45,complexity
44,canonical
44,characterisation
44,probabilistic
44,specification
42,fsm
42,nondeterministic
40,behaviour
40,observation
37,components
35,properties
34,formal
33,category
33,distribution
33,irreducible
33,monad
32,completeness
32,conformance
32,permutations
32,reachable
31,regular
31,unique
31,value
30,abstract
30,applications
30,control
30,derivatives
29,coalgebra
28,correct
28,minimisation
28,probability
27,compatible
27,experiments
27,library
26,power
26,research
25,isomorphism
25,rational
25,study
24,chain
24,interesting
23,bisimulation
23,functor
1 210 nominal sets
2 154 nominal automata
3 180 test suite
4 82 orbit finite
5 68 splitting tree
6 53 set generators
7 47 automata learning
8 47 complete test
9 45 state space
10 42 separating sequences
11 70 deterministic automata
12 40 non deterministic
13 60 learning algorithms
14 34 black box
15 33 generating functions
16 33 membership queries
17 33 total order
18 32 join irreducible
19 31 order symmetry
20 31 separating family
21 29 probabilistic programs
22 55 residual automata
23 27 equivalence queries
24 55 mealy machines
25 26 infinite alphabets
26 26 nominal languages
27 26 observation table
28 26 register automata
29 48 finite set
30 24 finitely supported
31 24 state identifiers
32 24 test generation
33 23 characterisation set
34 23 non guessing
35 23 transition function
36 22 data structure
37 22 initial state
38 22 single orbit
39 22 test methods
40 21 compatible states
41 21 distinguishing sequence
42 21 finite state
43 20 automata theory
44 20 control software
45 20 fsm based
46 20 hybrid ads
47 20 minimal set
48 19 product automata
49 19 succinct automaton
50 18 model learning
51 17 can only
52 17 closed consistent
53 17 distinguishing tree
54 17 finite support
55 17 nominal techniques
56 17 regular languages
57 17 renaming sets
58 16 embedded control
59 16 equivariant map
60 16 ordered nominal
61 16 well defined
62 15 closed under
63 15 equality symmetry
64 15 fast computations
65 15 generation methods
66 15 nominal renaming
67 15 separated product
68 14 conformance testing
69 14 inequivalent states
70 14 local symmetries
71 13 active learning
72 13 data values
73 13 exact learning
74 13 generating function
75 13 residual languages
76 13 state identifier
77 13 succinct automata
78 13 upper bound
79 12 equivariant maps
80 12 minimal dfa
81 12 minimisation algorithm
82 12 partition refinement
83 12 program variables
84 11 implementation state
85 11 ioco relation
86 11 language semantics
87 10 guessing automata
88 31 total order symmetry
89 27 learning nominal automata
90 45 complete test suites
91 25 minimal separating sequences
92 20 minimal set generators
93 18 learning product automata
94 16 embedded control software
95 16 ordered nominal sets
96 16 residual nominal automata
97 30 adaptive distinguishing sequences
98 15 finite nominal set
99 15 non deterministic automata
100 15 test generation methods
101 14 fsm based test
102 13 applying automata learning
103 13 formal power series
104 13 nominal renaming sets
105 12 all pairs of states
106 12 black box testing
107 12 minimal splitting tree
108 10 non guessing automata
109 10 orbit finite sets
110 10 stable minimal splitting
111 9 hybrid ads method
112 9 orbit finite set
113 9 states are equivalent
114 8 black box systems
115 8 join irreducible elements
116 8 only finitely many
117 7 canonical residual automaton
118 7 complete splitting tree
119 7 moore minimisation algorithm
120 6 almost surely terminating
121 512 automata
122 476 states
123 356 algorithm
124 301 learning
125 290 testing
126 215 language
127 306 orbit
128 172 minimal
129 157 languages
130 145 model
131 142 implementation
132 138 sequences
133 132 deterministic
134 132 residual
135 131 proof
136 128 transition
137 126 output
138 122 equivariant
139 122 order
140 122 product
141 121 tree
142 118 function
143 118 input
144 113 queries
145 100 semantics
146 96 infinite
147 94 equivalence
148 92 method
149 92 software
150 89 structure
151 86 support
152 83 theory
153 81 alphabet
154 81 generators
155 77 time
156 76 construction
157 76 data
158 76 program
159 89 symmetry
160 62 ioco
161 62 representation
162 58 hypothesis
163 58 space
164 57 family
165 56 algebra
166 56 consistent
167 56 teacher
168 53 partition
169 52 action
170 52 relation
171 51 atoms
172 51 join
173 51 system
174 50 group
175 50 loop
176 48 counterexample
177 47 succinct
178 46 dimension
179 46 maps
180 45 complexity
181 44 canonical
182 44 characterisation
183 44 probabilistic
184 44 specification
185 42 fsm
186 42 nondeterministic
187 40 behaviour
188 40 observation
189 37 components
190 35 properties
191 34 formal
192 33 category
193 33 distribution
194 33 irreducible
195 33 monad
196 32 completeness
197 32 conformance
198 32 permutations
199 32 reachable
200 31 regular
201 31 unique
202 31 value
203 30 abstract
204 30 applications
205 30 control
206 30 derivatives
207 29 coalgebra
208 28 correct
209 28 minimisation
210 28 probability
211 27 compatible
212 27 experiments
213 27 library
214 26 power
215 26 research
216 25 isomorphism
217 25 rational
218 25 study
219 24 chain
220 24 interesting
221 23 bisimulation
222 23 functor

88
output/biwords.csv Normal file
View file

@ -0,0 +1,88 @@
nominal sets,210
nominal automata,154
test suite,180
orbit finite,82
splitting tree,68
set generators,53
automata learning,47
complete test,47
state space,45
separating sequences,42
deterministic automata,70
non deterministic,40
learning algorithms,60
black box,34
generating functions,33
membership queries,33
total order,33
join irreducible,32
order symmetry,31
separating family,31
probabilistic programs,29
residual automata,55
equivalence queries,27
mealy machines,55
infinite alphabets,26
nominal languages,26
observation table,26
register automata,26
finite set,48
finitely supported,24
state identifiers,24
test generation,24
characterisation set,23
non guessing,23
transition function,23
data structure,22
initial state,22
single orbit,22
test methods,22
compatible states,21
distinguishing sequence,21
finite state,21
automata theory,20
control software,20
fsm based,20
hybrid ads,20
minimal set,20
product automata,19
succinct automaton,19
model learning,18
can only,17
closed consistent,17
distinguishing tree,17
finite support,17
nominal techniques,17
regular languages,17
renaming sets,17
embedded control,16
equivariant map,16
ordered nominal,16
well defined,16
closed under,15
equality symmetry,15
fast computations,15
generation methods,15
nominal renaming,15
separated product,15
conformance testing,14
inequivalent states,14
local symmetries,14
active learning,13
data values,13
exact learning,13
generating function,13
residual languages,13
state identifier,13
succinct automata,13
upper bound,13
equivariant maps,12
minimal dfa,12
minimisation algorithm,12
partition refinement,12
program variables,12
implementation state,11
ioco relation,11
language semantics,11
guessing automata,10
1 nominal sets 210
2 nominal automata 154
3 test suite 180
4 orbit finite 82
5 splitting tree 68
6 set generators 53
7 automata learning 47
8 complete test 47
9 state space 45
10 separating sequences 42
11 deterministic automata 70
12 non deterministic 40
13 learning algorithms 60
14 black box 34
15 generating functions 33
16 membership queries 33
17 total order 33
18 join irreducible 32
19 order symmetry 31
20 separating family 31
21 probabilistic programs 29
22 residual automata 55
23 equivalence queries 27
24 mealy machines 55
25 infinite alphabets 26
26 nominal languages 26
27 observation table 26
28 register automata 26
29 finite set 48
30 finitely supported 24
31 state identifiers 24
32 test generation 24
33 characterisation set 23
34 non guessing 23
35 transition function 23
36 data structure 22
37 initial state 22
38 single orbit 22
39 test methods 22
40 compatible states 21
41 distinguishing sequence 21
42 finite state 21
43 automata theory 20
44 control software 20
45 fsm based 20
46 hybrid ads 20
47 minimal set 20
48 product automata 19
49 succinct automaton 19
50 model learning 18
51 can only 17
52 closed consistent 17
53 distinguishing tree 17
54 finite support 17
55 nominal techniques 17
56 regular languages 17
57 renaming sets 17
58 embedded control 16
59 equivariant map 16
60 ordered nominal 16
61 well defined 16
62 closed under 15
63 equality symmetry 15
64 fast computations 15
65 generation methods 15
66 nominal renaming 15
67 separated product 15
68 conformance testing 14
69 inequivalent states 14
70 local symmetries 14
71 active learning 13
72 data values 13
73 exact learning 13
74 generating function 13
75 residual languages 13
76 state identifier 13
77 succinct automata 13
78 upper bound 13
79 equivariant maps 12
80 minimal dfa 12
81 minimisation algorithm 12
82 partition refinement 12
83 program variables 12
84 implementation state 11
85 ioco relation 11
86 language semantics 11
87 guessing automata 10

33
output/triwords.csv Normal file
View file

@ -0,0 +1,33 @@
total order symmetry,31
learning nominal automata,27
complete test suites,45
minimal separating sequences,25
minimal set generators,20
learning product automata,18
embedded control software,16
ordered nominal sets,16
residual nominal automata,16
adaptive distinguishing sequences,30
finite nominal set,15
non deterministic automata,15
test generation methods,15
fsm based test,14
applying automata learning,13
formal power series,13
nominal renaming sets,13
all pairs of states,12
black box testing,12
minimal splitting tree,12
non guessing automata,10
orbit finite sets,10
stable minimal splitting,10
hybrid ads method,9
orbit finite set,9
states are equivalent,9
black box systems,8
join irreducible elements,8
only finitely many,8
canonical residual automaton,7
complete splitting tree,7
moore minimisation algorithm,7
almost surely terminating,6
1 total order symmetry 31
2 learning nominal automata 27
3 complete test suites 45
4 minimal separating sequences 25
5 minimal set generators 20
6 learning product automata 18
7 embedded control software 16
8 ordered nominal sets 16
9 residual nominal automata 16
10 adaptive distinguishing sequences 30
11 finite nominal set 15
12 non deterministic automata 15
13 test generation methods 15
14 fsm based test 14
15 applying automata learning 13
16 formal power series 13
17 nominal renaming sets 13
18 all pairs of states 12
19 black box testing 12
20 minimal splitting tree 12
21 non guessing automata 10
22 orbit finite sets 10
23 stable minimal splitting 10
24 hybrid ads method 9
25 orbit finite set 9
26 states are equivalent 9
27 black box systems 8
28 join irreducible elements 8
29 only finitely many 8
30 canonical residual automaton 7
31 complete splitting tree 7
32 moore minimisation algorithm 7
33 almost surely terminating 6

102
output/uniwords.csv Normal file
View file

@ -0,0 +1,102 @@
automata,512
states,476
algorithm,356
learning,301
testing,290
language,215
orbit,306
minimal,172
languages,157
model,145
implementation,142
sequences,138
deterministic,132
residual,132
proof,131
transition,128
output,126
equivariant,122
order,122
product,122
tree,121
function,118
input,118
queries,113
semantics,100
infinite,96
equivalence,94
method,92
software,92
structure,89
support,86
theory,83
alphabet,81
generators,81
time,77
construction,76
data,76
program,76
symmetry,89
ioco,62
representation,62
hypothesis,58
space,58
family,57
algebra,56
consistent,56
teacher,56
partition,53
action,52
relation,52
atoms,51
join,51
system,51
group,50
loop,50
counterexample,48
succinct,47
dimension,46
maps,46
complexity,45
canonical,44
characterisation,44
probabilistic,44
specification,44
fsm,42
nondeterministic,42
behaviour,40
observation,40
components,37
properties,35
formal,34
category,33
distribution,33
irreducible,33
monad,33
completeness,32
conformance,32
permutations,32
reachable,32
regular,31
unique,31
value,31
abstract,30
applications,30
control,30
derivatives,30
coalgebra,29
correct,28
minimisation,28
probability,28
compatible,27
experiments,27
library,27
power,26
research,26
isomorphism,25
rational,25
study,25
chain,24
interesting,24
bisimulation,23
functor,23
1 automata 512
2 states 476
3 algorithm 356
4 learning 301
5 testing 290
6 language 215
7 orbit 306
8 minimal 172
9 languages 157
10 model 145
11 implementation 142
12 sequences 138
13 deterministic 132
14 residual 132
15 proof 131
16 transition 128
17 output 126
18 equivariant 122
19 order 122
20 product 122
21 tree 121
22 function 118
23 input 118
24 queries 113
25 semantics 100
26 infinite 96
27 equivalence 94
28 method 92
29 software 92
30 structure 89
31 support 86
32 theory 83
33 alphabet 81
34 generators 81
35 time 77
36 construction 76
37 data 76
38 program 76
39 symmetry 89
40 ioco 62
41 representation 62
42 hypothesis 58
43 space 58
44 family 57
45 algebra 56
46 consistent 56
47 teacher 56
48 partition 53
49 action 52
50 relation 52
51 atoms 51
52 join 51
53 system 51
54 group 50
55 loop 50
56 counterexample 48
57 succinct 47
58 dimension 46
59 maps 46
60 complexity 45
61 canonical 44
62 characterisation 44
63 probabilistic 44
64 specification 44
65 fsm 42
66 nondeterministic 42
67 behaviour 40
68 observation 40
69 components 37
70 properties 35
71 formal 34
72 category 33
73 distribution 33
74 irreducible 33
75 monad 33
76 completeness 32
77 conformance 32
78 permutations 32
79 reachable 32
80 regular 31
81 unique 31
82 value 31
83 abstract 30
84 applications 30
85 control 30
86 derivatives 30
87 coalgebra 29
88 correct 28
89 minimisation 28
90 probability 28
91 compatible 27
92 experiments 27
93 library 27
94 power 26
95 research 26
96 isomorphism 25
97 rational 25
98 study 25
99 chain 24
100 interesting 24
101 bisimulation 23
102 functor 23

39
package.yaml Normal file
View file

@ -0,0 +1,39 @@
name: WordParse
version: 0.1.0.0
license: BSD3
author: "Joshua Moerman"
copyright: "2020 Joshua Moerman"
extra-source-files:
- README.md
# To avoid duplicated efforts in documentation and dealing with the
# complications of embedding Haddock markup inside cabal files, it is
# common to point users to the README.md file.
description: Please see the README on GitHub at <https://github.com/githubuser/WordParse#readme>
dependencies:
- base >= 4.7 && < 5
library:
source-dirs: src
dependencies:
- containers
- attoparsec
- parser-combinators
- text
executables:
WordParse-exe:
main: Main.hs
source-dirs: app
ghc-options:
- -threaded
- -rtsopts
- -with-rtsopts=-N
dependencies:
- WordParse
- containers
- attoparsec
- parser-combinators
- text

18
src/Parse.hs Normal file
View file

@ -0,0 +1,18 @@
module Parse where
import Control.Monad (void)
import Data.Char (isLetter)
import Data.Text (Text)
import Data.Attoparsec.Text
word :: Parser Text
word = takeWhile1 isLetter
space :: Parser ()
space = void $ takeWhile1 (not . isLetter)
words :: Parser [Text]
words = sepBy1 word Parse.space

36
src/Process.hs Normal file
View file

@ -0,0 +1,36 @@
{-# language PartialTypeSignatures #-}
{-# language OverloadedStrings #-}
module Process where
import Data.List (sortOn)
import Data.Text (Text, toLower, length)
import Data.Map (Map)
import qualified Data.Map.Strict as Map
import qualified Data.Set as Set
step1 :: [Text] -> [Text]
step1 = fmap Data.Text.toLower
filterShorts :: [Text] -> [Text]
filterShorts = filter (\w -> Data.Text.length w > 2)
filterEnglish :: [Text] -> [Text]
filterEnglish = filter (\w -> not (w `Set.member` ignores))
where
ignores = Set.fromList ["and", "the", "for", "with", "that"]
step2 :: _ => [a] -> Map a Int
step2 l = Map.fromListWith (+) [ (w, 1) | w <- l ]
step3 :: _ => Map a Int -> [(a, Int)]
step3 = sortOn (negate . snd) . Map.toList
process = take 1000 . step3 . step2 . fmap (\w -> [w]) . filterEnglish . step1 . filterShorts
-- counts everything twice
biwords :: [a] -> [[a]]
biwords ls = zipWith (\a b -> [a, b]) ls (tail ls)
triwords :: [a] -> [[a]]
triwords ls = zipWith (:) ls (tail (biwords ls))

3
stack.yaml Normal file
View file

@ -0,0 +1,3 @@
resolver: lts-16.19
packages:
- .

12
stack.yaml.lock Normal file
View file

@ -0,0 +1,12 @@
# This file was autogenerated by Stack.
# You should not edit this file by hand.
# For more information, please see the documentation at:
# https://docs.haskellstack.org/en/stable/lock_files
packages: []
snapshots:
- completed:
size: 532177
url: https://raw.githubusercontent.com/commercialhaskell/stackage-snapshots/master/lts/16/19.yaml
sha256: d2b828ecf50386841d0c5700b58d38566992e10d63a062af497ab29ab031faa1
original: lts-16.19

1370
words/1905.05519.txt Normal file

File diff suppressed because it is too large Load diff

2735
words/2007.06327.txt Normal file

File diff suppressed because it is too large Load diff

View file

@ -0,0 +1,991 @@
n-Complete Test Suites for IOCO
Petra van den Bos(B) , Ramon Janssen, and Joshua Moerman
Institute for Computing and Information Sciences,
Radboud University, Nijmegen, The Netherlands
{petra,ramonjanssen,joshua.moerman}@cs.ru.nl
Abstract. An n-complete test suite for automata guarantees to detect
all faulty implementations with a bounded number of states. This
principle is well-known when testing FSMs for equivalence, but the problem becomes harder for ioco conformance on labeled transitions systems.
Existing methods restrict the structure of specifications and implementations. We eliminate those restrictions, using only the number of implementation states, and fairness in test execution. We provide a formalization,
a construction and a correctness proof for n-complete test suites for ioco.
1
Introduction
The holy grail of model-based testing is a complete test suite: a test suite that
can detect any possible faulty implementation. For black-box testing, this is
impossible: a tester can only make a finite number of observations, but for an
implementation of unknown size, it is unclear when to stop. Often, a so called
n-complete test suite is used to tackle this problem, meaning it is complete for
all implementations with at most n states.
For specifications modeled as finite state machines (FSMs) (also called Mealy
machines), this has already been investigated extensively. In this paper we
will explore how an n-complete test suite can be constructed for suspension
automata. We use the ioco relation [11] instead of equivalence of FSMs.
An n-complete test suite for FSM equivalence usually provides some way
to reach all states and transitions of the implementation. After reaching some
state, it is tested whether this is the correct state, by observing behavior which
is unique for that state, and hence distinguishing it from all other states.
Unlike FSM equivalence, ioco is not an equivalence relation, meaning that different implementations may conform to the same specification and, conversely,
an implementation may conform to different specifications. In this paper, we
focus on the problem of distinguishing states. For ioco, this cannot be done with
simple identification. If an implementation state conforms to multiple specifications states, those states are defined to be compatible. Incompatible states can
be handled in ways comparable to FSM-methods, but distinguishing compatible
states requires more effort.
P. van den Bos and R. Janssen—Supported by NWO project 13859 (SUMBAT).
c IFIP International Federation for Information Processing 2017

Published by Springer International Publishing AG 2017. All Rights Reserved
N. Yevtushenko et al. (Eds.): ICTSS 2017, LNCS 10533, pp. 91107, 2017.
DOI: 10.1007/978-3-319-67549-7 6
92
P. van den Bos et al.
In this paper, we give a structured approach for distinguishing incompatible
states. We also propose a strategy to handle compatible states. Obviously, they
cannot be distinguished in the sense of incompatible states. We thus change the
aim of distinguishing: instead of forcing a non-conformance to either specification
state, we may also prove conformance to both. As our only tool in proving this
is by further testing, this is a recursive problem: during complete testing, we are
required to prove conformance to multiple states by testing. We thus introduce a
recursively defined test suite. We give examples where this still gives a finite test
suite, together with a completeness proof for this approach. To show an upper
bound for the required size of a test suite, we also show that an n-complete test
suite with finite size can always be constructed, albeit an inefficient one.
Related Work. Testing methods for Finite State Machines (FSMs) have been
analyzed thoroughly, and n-complete test suites are already known for quite
a while. A survey is given in [3]. Progress has been made on generalizing these
testing methods to nondeterministic FSMs, for example in [6,9]. FSM-based work
that more closely resembles ioco is reduction of non-deterministic FSMs [4].
Complete testing in ioco received less attention than in FSM theory on this
subject. The original test generation method [11] is an approach in which test
cases are generated randomly. The method is complete in the sense that any
fault can be found, but there is no upper bound to the required number and
length of test cases.
In [8], complete test suites are constructed for Mealy-IOTSes. Mealy-IOTSes
are a subclass of suspension automata, but are similar to Mealy machines as
(sequences of) outputs are coupled to inputs. This makes the transition from
FSM testing more straightforward.
The work most similar to ours [10] works on deterministic labeled transition systems, adding quiescence afterwards, as usual for ioco. Non-deterministic
models are thus not considered, and cannot be handled implicitly through determinization, as determinization can only be done after adding quiescence. Some
further restrictions are made on the specification domains. In particular, all
specification states should be reachable without depending on choices for output transitions of the implementation. Furthermore, all states should be mutually incompatible. In this sense, our test suite construction can be applied to
a broader set of systems, but will potentially be much less efficient. Thus, we
prioritize exploring the bounds of n-complete test suites for ioco, whereas [10]
aims at efficient test suites, by restricting the models which can be handled.
2
Preliminaries
The original ioco theory is defined for labeled transition systems, which may
contain internal transitions, be nondeterministic, and may have states without outputs [11]. To every state without outputs, a self-loop with quiescence is
added as an artificial output. The resulting labeled transition system is then
determinized to create a suspension automaton, which is equivalent to the initial
n-Complete Test Suites for IOCO
93
labeled transition system with respect to ioco [13]. In this paper, we will consider a slight generalization of suspension automata, such that our results hold
for ioco in general: quiescent transitions usually have some restrictions, but we
do not require them and we will treat quiescence as any other output. We will
define them in terms of general automata with inputs and outputs.
Definition 1. An I/O-automaton is a tuple (Q, LI , LO , T, q0 ) where
Q is a finite set of states
LI is a finite set of input labels
LO is a finite set of output labels
T : Q × (LI LO )  Q is the (partial) transition function
q0 ∈ Q is the initial state
We denote the domain of I/O-automata for LI and LO with A(LI , LO ).
For the remainder of this paper we fix LI and LO as disjoint sets of input and
output labels respectively, with L = LI LO , and omit them if clear from the
context. Furthermore, we use a, b as input symbols and x, y, z as output symbols.
Definition 2. Let S = (Q, LI , LO , T, q0 ) ∈ A, q ∈ Q, B ⊆ Q, μ ∈ L and
σ ∈ L . Then we define:

if T (q, μ) = ⊥
q after μ =
{T (q, μ)} otherwise
 
q after μ
B after μ =
q  ∈B
q after  = {q}
q after μσ = (q after μ) after σ
B after σ =

q  after σ
q  ∈B
out(B) = {x ∈ LO | B after x = ∅}
in(B) = {a ∈ LI | B after a = ∅}
init(B) = in(B) out(B)
Straces(B) = {σ  ∈ L | B after σ  = ∅}


S is output-enabled if ∀p ∈ Q : out(p) = ∅ SA = {S ∈ A | S is output-enabled}


S is input-enabled if ∀p ∈ Q : in(p) = LI SAIE = {S ∈ SA | S is input-enabled}
We interchange singleton sets with its element, e.g. we write out(q) instead
of out({q}). Definitions on states will sometimes be used for automata as well,
acting on their initial states. Similarly, definitions on automata will be used for
states, acting on the automaton with that state as its initial state. For example,
for S = (Q, LI , LO , T, q0 ) ∈ A and q ∈ Q, we may write S after μ instead of q0
after μ, and we may write that q is input-enabled if S is input-enabled.
In this paper, specifications are suspension automata in SA, and implementations are input-enabled suspension automata in SAIE . The ioco relation formalizes when implementations conform to specifications. We give a definition
relating suspension automata, following [11], and the coinductive definition [7]
relating states. Both definitions have been proven to coincide.
Definition 3. Let S ∈ SA, and I ∈ SAIE . Then we say that I ioco S if ∀σ ∈
Straces(S) : out(I after σ) ⊆out(S after σ).
94
P. van den Bos et al.
Definition 4. Let S = (Qs , LI , LO , Ts , q0s ) ∈ SA, and I = (Qi , LI , LO , Ti , q0i ) ∈
SAIE . Then for qi ∈ Qi , qs ∈ Qs , we say that qi ioco qs if there exists a
coinductive ioco relation R ⊆ Qi × Qs such that (qi , qs ) ∈ R, and ∀(q, p) ∈ R:
∀a ∈ in(p) : (q after a, p after a) ∈ R
∀x ∈ out(q) : x ∈ out(p) ∧ (q after x, p after x) ∈ R
In order to define complete test suites, we require execution of tests to be
fair : if a trace σ is performed often enough, then every output x appearing
in the implementation after σ will eventually be observed. Furthermore, the
implementation may give an output after σ before the tester can supply an
input. We then assume that the tester will eventually succeed in performing
this input after σ. This fairness assumption is unavoidable for any notion of
completeness in testing suspension automata: a fault can never be detected if an
implementation always chooses paths that avoid this fault.
3
Distinguishing Experiments
An important part of n-complete test suites for FSM equivalence is the distinguishing sequence, used to identify an implementation state. As ioco is not
an equivalence relation, there does not have to be a one-to-one correspondence
between specification and implementation states.
3.1
Equivalence and Compatibility
We first describe equivalence and compatibility relations between states, in order
to define distinguishing experiments. We consider two specifications to be equivalent, denoted S1 ≈ S2 , if they have the same implementations conforming to
them. Then, for all implementations I, we have I ioco S1 iff I ioco S2 . For two
inequivalent specifications, there is thus an implementation which conforms to
one, but not the other.
Intuitively, equivalence relates states with the same traces. However, implicit
underspecification by absent inputs should be handled equivalently to explicit
underspecification with chaos. This is done by using chaotic completion [11].
This definition of equivalence is inspired by the relation wioco [12], which relates
specifications based on their sets of traces.
Definition 5. Let (Q, LI , LO , T, q0 ) ∈ SA. Define chaos, a specification to
which every implementation conforms, as X = ({χ}, LI , LO , {(χ, x, χ) | x ∈
L}, χ). Let QX = Q {χ}. The relation ≈ ⊆ QX × QX relates all equivalent
states. It is the largest relation for which it holds that q ≈ q  if:
out(q) = out(q  ) ∧ (∀μ ∈ init(q) ∩ init(q  ) : q after μ ≈ q  after μ)
∧ (∀a ∈ in(q)\in(q  ) : q after a ≈ χ) ∧ (∀a ∈ in(q  )\in(q) : q  after a ≈ χ)
n-Complete Test Suites for IOCO
95
For two inequivalent specifications, there may still exist an implementation
that conforms to the two. In that case, we define the specifications to be compatible, following the terminology introduced in [9,10]. We introduce an explicit
relation for compatibility.
Definition 6. Let (Q, LI , LO , T, q0 ) ∈ SA. The relation ♦ ⊆ Q × Q relates all
compatible states. It is the largest relation for which it holds that q ♦ q  if:
(∀a ∈ in(q) ∩ in(q  ) : q after a ♦ q  after a)
∧ (∃x ∈ out(q) ∩ out(q  ) : q after x ♦ q  after x)
Compatibility is symmetric and reflexive, but not transitive. Conversely, two
specifications are incompatible if there exists no implementation conforming to
both. When q1 ♦ q2 , we can indeed easily make an implementation which conforms to both q1 and q2 : the set of outputs of the implementation state can
simply be out(q1 )∩out(q2 ), which is non-empty by definition of ♦. Upon such
an output transition or any input transition, the two successor states are again
compatible, thus the implementation can keep picking transitions in this manner. For example, in Fig. 1, compatible states 2 and 3 of the specification are
both implemented by state 2 of the implementation.
a
3
a
x
4
y
x
y
x
1
2
x
6
y
a
5
z
(a) Specification S with 2 ♦ 3.
a
3
a
x
4
a
x
y
x
1
a x
6
z
y
2∧3
2
a
5
a
(b) An implementation of S.
a
4∧5
x
6
y
z
(c) The merge of specification states 2 and 3.
Fig. 1. A specification, an implementation, and a merge of two states.
Beneš et al. [1] describe the construction of merging specifications. For specification states qs and qs , their merge is denoted qs ∧ qs . For any implementation
state qi , it holds that qi ioco qs ∧ qi ioco qs ⇐⇒ qi ioco (qs ∧ qs ). Intuitively, a
merge of two states thus only allows behavior allowed by both states. Figure 1c
shows the merge of specification states 2 and 3. The merge of qs and qs can be
implemented if and only if qs ♦ qs : indeed, for incompatible states, the merge
has states without any output transitions, which is denoted invalid in [1].
3.2
Distinguishing Trees
When an implementation is in state qi , two incompatible specification states qs
and qs are distinguished by showing to which of the two qi conforms, assuming
that it conforms to one. Conversely, we can say that we have to show a nonconformance of qi to qs or qs . Generally, a set of states D is distinguished by
96
P. van den Bos et al.
showing non-conformance to all its states, possibly except one. As a base case,
if |D| ≤ 1, then D is already distinguished. We will construct a distinguishing
tree as an input-enabled automaton which distinguishes D after reaching pass.
Definition
7. Let μ be a symbol and D a set of states. Then injective(μ, D) if

μ ∈ {in(q) | q ∈ D} LO ∧ ∀q, q  ∈ D : q = q  ∧ μ ∈init(q)∩init(q  ) =⇒ q
after μ = q  after μ. This is extended to sets of symbols Σ as injective(Σ, D) if
∀μ ∈ Σ : injective(μ, D).
Definition 8. Let (Q, LI , LO , T, q0 ) ∈ SA(LI , LO ), and D ⊆ Q a set of mutually incompatible states. Then define DT (LI , LO , D) ⊆ A(LO , LI ) inductively
as the domain of input-enabled distinguishing trees for D, such that for every
Y ∈ DT (LI , LO , D) with initial state t0 :
if |D| ≤ 1, then t0 is the verdict state pass, and
if |D| > 1, then t0 has either
• a transition for a single input a ∈ LI to a Y  ∈ DT (LI , LO , D after a)
such that injective(a, D), and transitions to a verdict state reset for all
x ∈ LO , or
• a transition for every output x ∈ LO to a Y  ∈ DT (LI , LO , D after x)
such that injective(x, D).
Furthermore, pass or reset is always reached after a finite number of steps,
and these states are sink states, i.e. contain transitions only to itself.
A distinguishing tree can synchronize with an implementation to reach a
verdict state. As an implementation is output-enabled and the distinguishing
tree is input-enabled, this never blocks. If the tree performs an input, the implementation may provide an output first, resulting in reset: another attempt is
needed to perform the input. If no input is performed by the tree, it waits for
any output, after which it can continue. In this way, the tester is guaranteed to
steer the implementation to a pass, where the specification states disagree on
the allowed outputs: the implementation has to choose an output, thus has to
choose which specifications (not) to implement.
For a set D of mutually incompatible states, such a tree may not exist. For
example, consider states 1, 3 and 5 in Fig. 2. States 1 and 3 both lead to the
same state after a, and can therefore not be distinguished. Similarly, states 3
and 5 cannot be distinguished after b. Labels a and b are therefore not injective
according to Definition 7 and should not be used. This concept is similar in FSM
testing [5]. A distinguishing sequence always exists when |D| = 2. When |D| > 2,
we can thus use multiple experiments to separate all states pairwise.
Lemma 9. Let S ∈ SA. Let q and q  be two states of S, such that q ♦ q  . Then
there exists a distinguishing tree for q and q  .
Proof. Since q ♦ q  , we know that:
(∃a ∈ in(q) ∩ in(q  ) : q after a ♦ q  after a)
(∀x ∈ out(q) ∩ out(q  ) : q after x ♦ q  after x)
n-Complete Test Suites for IOCO
97
So we have that some input or all outputs, enabled in both q and q  , lead to
incompatible states, for which this holds again. Hence, we can construct a tree
with nodes that either have a child for an enabled input of both states, or
children for all outputs enabled in the states (children for not enabled outputs
are distinguishing trees for ∅), as in the second case of Definition 8. If this tree
would be infinite, then this tree would describe infinite sequences of labels. Since
S is finite, such a sequence would be a cycle in S. This would mean that q ♦ q  ,
which is not the case. Hence we have that the tree is finite, as required by
Definition 8.


z
5
a
y
b
a
4
b
3
z
a
2
x
z
1
b
Fig. 2. No distinguishing tree exists for {1,3,5}.
3.3
Distinguishing Compatible States
Distinguishing techniques such as described in Sect. 3.2 rely on incompatibility
of two specifications, by steering the implementation to a point where the specifications disagree on the allowed outputs. This technique fails for compatible specifications, as an implementation state may conform to both specifications. Thus,
a tester then cannot steer the implementation to showing a non-conformance to
either.
We thus extend the aim of a distinguishing experiment: instead of showing a
non-conformance to any of two compatible states qs and qs , we may also prove
conformance to both. This can be achieved with an n-complete test suite for
qs ∧ qs ; this will be explained in Sect. 4.1. Note that even for an implementation
which does not conform to one of the specifications, n-complete testing is needed.
Such an implementation may be distinguished, but it is unknown how, due to
compatibility. See for example the specification and implementation of Fig. 1.
State 2 of the implementation can only be distinguished from state 3 by observing
ax, which is non-conforming behavior for state 2. Although y would also be nonconforming for state 2, this behavior is not observed.
In case that a non-conformance to the merged specification is found with an
n-complete test suite, then the outcome is similar to that of a distinguishing tree
for incompatible states: we have disproven conformance to one of the individual
specifications (or to both).
4
Test Suite Definition
The number n of an n-complete test suite T of a specification S tells how many
states an implementation I is allowed to have to give the guarantee that I ioco
98
P. van den Bos et al.
S after passing T (we will define passing a test suite later). To do this, we must
only count the states relevant for conformance.
Definition 10. Let S = (Qs , LI , LO , T, q0s ) ∈ SA, and I = (Qi , LI , LO , Ti , q0i ) ∈
SAIE . Then,
A state qs ∈ Qs is reachable if ∃σ ∈ L : S after σ = qs .
A state qi ∈ Qi is specified if ∃σ ∈ Straces(S) : I after σ = qi . A transition
(qi , μ, qi ) ∈ Ti is specified if qi is specified, and if either μ ∈ LO , or μ ∈
LI ∧ ∃σ ∈ L : I after σ = qi ∧ σμ ∈ Straces(S).
We denote the number of reachable states of S with |S|, and the number of
specified, reachable states of I with |I|.
Definition 11. Let S ∈ SA be a specification. Then a test suite T for S is
n-complete if for each implementation I: I passes T =⇒ (I ioco S |I| > n).
In particular, |S|-complete means that if an implementation passes the test
suite, then the implementation is correct (w.r.t. ioco) or it has strictly more
states than the specification. Some authors use the convention that n denotes
the number of extra states (so the above would be called 0-completeness).
To define a full complete test suite, we first define sets of distinguishing
experiments.
Definition 12. Let (Q, LI , LO , T, q0 ) ∈ SA. For any state q ∈ Q, we choose a
set W (q) of distinguishing experiments, such that for all q  ∈ Q with q = q  :
if q ♦ q  , then W (q) contains a distinguishing tree for D ⊆ Q, s.t. q, q  ∈ D.
if q ♦ q  , then W (q) contains a complete test suite for q ∧ q  .
Moreover, we need sequences to access all specified, reachable implementation
states. After such sequences distinguishing experiments can be executed. We will
defer the explicit construction of the set of access sequences. For now we assume
some set P of access sequences to exist.
Definition 13. Let S ∈ SA and I ∈ SAIE . Let P be a set of access sequences
and let P + = {σ ∈ P P · L | S after σ = ∅}. Then the distinguishing test suite
is defined as T = {στ | σ ∈ P + , τ ∈ W (q0 after σ)}. An element t ∈ T is a test.
4.1
Distinguishing Experiments for Compatible States
The distinguishing test suite relies on executing distinguishing experiments. If
a specification contains compatible states, the test suite contains distinguishing
experiments which are themselves n-complete test suites. This is thus a recursive
construction: we need to show that such a test suite is finite. For particular
specifications, recursive repetition of the distinguishing test suite as described
above is already finite. For example, specification S in Fig. 1 contains compatible
states, but in the merge of every two compatible states, no further compatible
states remain. A test suite for S needs to distinguish states 2 and 3. For this
n-Complete Test Suites for IOCO
99
purpose, it uses an n-complete test suite for 2 ∧ 3, which contains no compatible
states, and thus terminates by only containing distinguishing trees.
However, the merge of two compatible states may in general again contain
compatible states. In these cases, recursive repetition of distinguishing test suites
may not terminate. An alternative unconditional n-complete test suite may be
constructed using state counting methods [4], as shown in the next section.
Although inefficient, it shows the possibility of unconditional termination. The
recursive strategy thus may serve as a starting point for other, efficient constructions for n-complete test suites.
Unconditional n-complete Test Suites. We introduce Lemma 16 to bound
test suite execution. We first define some auxiliary definitions.
Definition 14. Let S ∈ SA, σ ∈ L , and x ∈ LO . Then σx is an iococounterexample if S after σ = ∅, x ∈out(S after σ).
Naturally, I ioco S if and only if Straces(I) contains no ioco-counterexample.
Definition 15. Let S = (Qs , LI , LO , Ts , qs0 ) ∈ SA and I ∈ SAIE . A trace σ
Straces(S) is short if ∀qs ∈ Qs : |{ρ | ρ is a prefix of σ ∧ qs0 after ρ = qs }| ≤ |I|.
 S, then Straces(I) contains
Lemma 16. Let S ∈ SA and I ∈ SAIE . If I ioco

a short ioco-counterexample.
 S, then Straces(I) must contain an ioco-counterexample σ. If
Proof. If I ioco

σ is short, the proof is trivial, so assume it is not. Hence, there exists a state
qs , with at least |I| + 1 prefixes of σ leading to qs . At least two of those prefixes
ρ and ρ must lead to the same implementation state, i.e. it holds that qi0 after
ρ = qi0 after ρ and qs0 after ρ = qs0 after ρ . Assuming |ρ| < |ρ | without loss
of generality, we can thus create an ioco-counterexample σ  shorter than σ by

replacing ρ by ρ. If σ  is still not short, we can repeat this process until it is. 
We can use Lemma 16 to bound exhaustive testing to obtain n-completeness.
When any specification state is visited |I| + 1 times with any trace, then any
extensions of this trace will not be short, and we do not need to test them.
Fairness allows us to test all short traces which are present in the implementation.
Corollary 17. Given a specification S the set of all traces of length at most
|S| n is an n-complete test suite.
Example 18. Figure 3 shows an example of a non-conforming implementation
with a counterexample yyxyyxyyxyyx, of maximal length 4 · 3 = 12.
4.2
Execution of Test Suites
A test στ is executed by following σ, and then executing the distinguishing
experiment τ . If the implementation chooses any output deviating from σ, then
the test gives a reset and should be reattempted. Finishing τ may take several
100
P. van den Bos et al.
x
1
y
4
y
y
x
2
y
x
3
(a) Specification S.
x
3
1
y
y
2
(b) Implementation I.
Fig. 3. A specification, and a non-conforming implementation.
executions: a distinguishing tree may give a reset, and an n-complete test suite
to distinguish compatible states may contain multiple tests. Therefore σ needs
to be run multiple times, in order to allow full execution of the distinguishing
experiment. By assuming fairness, every distinguishing experiment is guaranteed
to termininate, and thus also every test.
The verdict of a test suite T for specification S is concluded simply by checking for observed ioco-counterexamples to S during execution. When executing
a distinguishing experiment w as part of T, the verdict of w is ignored when
concluding a verdict for T: we only require w to be fully executed, i.e. be reattempted if it gives a reset, until it gives a pass or fail. For example, if σ leads
to specification state q, and q needs to be distinguished from compatible state
q  , a test suite T for q ∧ q  is needed to distinguished q and q  . If T finds a
non-conformance to either q or q  , it yields fail. Only in the former case, T will
also yield fail, and in the latter case, T will continue with other tests: q and
q  have been successfully distinguished, but no non-conformance to q has been
found. If all tests have been executed in this manner, T will conclude pass.
4.3
Access Sequences
In FSM-based testing, the set P for reaching all implementation states is taken
care of rather efficiently. The set P is constructed by choosing a word σ for
each specification state, such that σ leads to that state (note the FSMs are fully
deterministic). By passing the tests P · W , where W is a set of distinguishing
experiment for every reached state, we know the implementation has at least
some number of states (by observing that many different behaviors). By passing
tests P ·L·W we also verify that every transition has the correct
 destination state.
By extending these tests to P · L≤k+1 · W (where L≤k+1 = m∈{0,··· ,k+1} Lm ),
we can reach all implementation states if the implementation has at most k
more states than the specification. For suspension automata, however, things
are more difficult for two reasons: (1) A specification state may be reachable
only if an implementation chooses to implement a particular, optional output
transition (in which case this state is not certainly reachable [10]), and (2) if
the specification has compatible states, the implementation may implement two
specification states with a single implementation state.
Consider Fig. 4 for an example. An implementation can omit state 2 of the
specification, as shown in Fig. 4b. Now Fig. 4c shows a fault not found by a test
n-Complete Test Suites for IOCO
101
suite P · L≤1 · W : if we take y ∈ P , z ∈ L, and observe z ∈ W (3), we do not
reach the faulty y transition in the implementation. So by leaving out states, we
introduce an opportunity to make a fault without needing more states than the
specification. This means that we may need to increase the size of the test suite
in order to obtain the desired completeness. In this example, however, a test
suite P · L≤2 · W is enough, as the test suite will contain a test with yzz ∈ P · L2
after which the faulty output y ∈ W (3) will be observed.
x
y
2
1
3
(a) Specification S.
z
y
z
y
y
z
(b) Conforming implementation.
z
(c) Non-conforming
implementation.
Fig. 4. A specification with not certainly reachable states 2 and 3.
Clearly, we reach all states in a n-state implementation for any specification
S, by taking P to be all traces in Straces(S) of at most length n. This set P
can be constructed by simple enumeration. We then have that the traces in the
set P will reach all specified, reachable states in all implementations I such that
|I| ≤ n. In particular this will mean that P + reaches all specified transitions.
Although this generates exponentially many sequences, the length is substantially shorter than the sequences obtained by the unconditional n-complete test
suite. We conjecture that a much more efficient construction is possible with a
careful analysis of compatible states and the not certainly reachable states.
4.4
Completeness Proof for Distinguishing Test Suites
We let T be the distinguishing test suite as defined in Definition 13. As discussed
before, if q and q  are compatible, the set W (q) can be defined using another
complete test suite. If the test suite is again a distinguishing test suite, completeness of it is an induction hypothesis. If, on the other hand, the unconditional
n-complete test suite is used, completeness is already guaranteed (Corollary 17).
Theorem 19. Let S = (Qs , LI , LO , Ts , q0s ) ∈ SA be a specification. Let T be a
distinguishing test suite for S. Then T is n-complete.
Proof. We will show that for any implementation of the correct size and which
passes the test suite we can build a coinductive ioco relation which contain the
initial states. As a basis for that relation we take the states which are reached by
the set P . This may not be an ioco relation, but by extending it (in two steps)
we obtain a full ioco relation. Extending the relation is an instance of a so-called
up-to technique, we will use terminology from [2].
102
P. van den Bos et al.
More precisely, Let I = (Qi , LI , LO , Ti , q0i ) ∈ SAIE be an implementation
with |I| ≤ n which passes T. By construction of P , all reachable specified implementation states are reached by P and so all specified transitions are reached
by P + .
The set P defines a subset of Qi × Qs , namely R = {(q0i after σ, q0s
after σ) | σ ∈ P }. We add relations for all equivalent states: R = {(i, s) |
(i, s ) ∈ R, s ∈ Qs , s ≈ s }. Furthermore, let J = {(i, s, s ) | i ∈ Qi , s, s
Qs such that i ioco s ∧ i ioco 
s } and Ri,s,s be the ioco relation for i ioco s ∧


i ioco s , now define R = R (i,s,s )∈J Ri,s,s . We want to show that R defines
a coinductive ioco relation. We do this by showing that R progresses to R.
Let (i, s) ∈ R. We assume that we have seen all of out(i) and that
out(i) ⊆ out(s) (this is taken care of by the test suite and the fairness assumption). Then, because we use P + , we also reach the transitions after i. We need
to show that the input and output successors are again related.
Let a ∈ LI . Since I is input-enabled we have a transition for a with i after
a = i2 . Suppose there is a transition for a from s: s after a = s2 (if not, then
were done). We have to show that (i2 , s2 ) ∈ R.
Let x ∈ LO . Suppose there is a transition for x: i after x = i2 Then (since
out(i)⊆out(s)) there is a transition for x from s: s after x = s2 . We have to
show that (i2 , s2 ) ∈ R.
In both cases we have a successor (i2 , s2 ) which we have to prove to be in R. Now
since P reaches all states of I, we know that (i2 , s2 ) ∈ R for some s2 . If s2 ≈ s2
then (i2 , s2 ) ∈ R ⊆ R holds trivially, so suppose that s2 ≈ s2 . Then there exists
a distinguishing experiment w ∈ W (s2 ) ∩ W (s2 ) which has been executed in i2 ,
namely in two tests: a test σw for some σ ∈ P + with S after σ = s2 , and a test
σ  w for some σ  ∈ P with S after σ  . Then there are two cases:
If s2 ♦ s2 then w is a distinguishing tree separating s2 and s2 . Then there is
a sequence ρ taken in w of the test σw, i.e. w after ρ reaches a pass state
of w, and similarly there is a sequence ρ that is taken in w of the test σ  w.
By construction of distinguishing trees, ρ must be an ioco-counterexample for
either s2 or s2 , but because T passed this must be s2 . Similarly, ρ disproves
s2 . One implementation state can implement at most one of {ρ, ρ }. This
contradicts that the two tests passed, so this case cannot happen.
If s2 ♦ s2 (but s2 ≈ s2 as assumed above), then w is a test suite itself for
s2 ∧ s2 . If w passed in both tests then i2 ioco s2 and i2 ioco s2 , and hence
(i2 , s2 ) ∈ Ri,s2 ,s2 ⊆ R. If w failed in one of the tests σw or σ  w, then i2 does
not conform to both s2 and s2 , and hence w also fails in the other test. So
again, there is a counterexample ρ for s2 and ρ for s2 . One implementation
state can implement at most one of {ρ, ρ }. This contradicts that the two
tests passed, so this case cannot happen.
We have now seen that R progresses to R. It is clear that R progresses to R
too. Then, since each Ri,s,s is an ioco relation, they progress to Ri,s,s ⊆ R. And
so the union, R, progresses to R, meaning that R is a coinductive ioco relation.


Furthermore, we have (i0 , s0 ) ∈ R (because  ∈ P ), concluding the proof.
n-Complete Test Suites for IOCO
103
We remark that if the specification does not contain any compatible states,
that the proof can be simplified a lot. In particular, we do not need n-complete
test suites for merges of states, and we can use the relation R instead of R.
5
Constructing Distinguishing Trees
Lee and Yannakakis proposed an algorithm for constructing adaptive distinguishing sequences for FSMs [5]. With a partition refinement algorithm, a splitting
tree is build, from which the actual distinguishing sequence is extracted.
A splitting tree is a tree of which each node is identified with a subset of the
states of the specification. The set of states of a child node is a (strict) subset of
the states of its parent node. In contrast to splitting trees for FSMs, siblings may
overlap: the tree does not describe a partition refinement. We define leaves(Y )
as the set of leaves of a tree Y . The algorithm will split the leaf nodes, i.e. assign
children to every leaf node. If all leaves are identified with a singleton set of
states, we can distinguish all states of the root node.
Additionally, every non-leaf node is associated with a set of labels from L. We
denote the labels of node D with labels(D). The distinguishing tree that is going
to be constructed from the splitting tree is built up from these labels. As argued
in Sect. 3.2, we require injective distinguishing trees, thus our splitting trees only
contain injective labels, i.e. injective(labels(D), D) for all non-leaf nodes D.
Below we list three conditions that describe when it is possible to split the
states of a leaf D, i.e. by taking some transition, we are able to distinguish some
states from the other states of D. We will see later how a split is done. If the
first condition is true, at least one state is immediately distinguished from all
other states. The other two conditions describe that a leaf D can be split if after
an input or all outputs some node D is reached that already is split, i.e. D is
a non-leaf node. Consequently, a split for condition 1 should be done whenever
possible, and otherwise a split for condition 2 or 3 can be done. Depending on
the implementation one is testing, one may prefer splitting with either condition
2 or 3, when both conditions are true.
We present each condition by first giving an intuitive description in words,
and then a more formal definition. With Π(A) we denote the set of all non-trivial
partitions of a set of states A.
Definition 20. A leaf D of tree Y can be split if one of the following conditions
hold:
1. All outputs are enabled in some but not in all states.
∀x ∈ out(D) : injective(x, D) ∧ ∃d ∈ D : d after x = ∅
2. Some states reach different leaves than other states for all outputs.
∀x ∈ out(D) : injective(x, D) ∧ ∃P ∈ Π(D), ∀d, d ∈ P :
(d = d =⇒ ∀l ∈ leaves(Y ) : l ∩ d after x = ∅ l ∩ d after x = ∅)
104
P. van den Bos et al.
3. Some states reach different leaves than other states for some input.
∃a ∈ in(D) : injective(a, D) ∧ ∃P ∈ Π(D), ∀d, d ∈ P :
(d = d =⇒ ∀l ∈ leaves(Y ) : l ∩ d after a = ∅ l ∩ d after a = ∅)
Algorithm 1 shows how to split a single leaf of the splitting tree (we chose
arbitrarily to give condition 2 a preference over condition 3). A splitting tree is
constructed in the following manner. Initially, a splitting tree is a leaf node of
the state set from the specification. Then, the full splitting tree is constructed by
splitting leaf nodes with Algorithm 1 until no further splits can be made. If all
leaves in the resulting splitting tree are singletons, the splitting tree is complete
and a distinguishing tree can be constructed (described in the next section).
Otherwise, no distinguishing tree exists. Note that the order of the splits is left
unspecified.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Input: A specification S = (Q, LI , LO , T, q0 ) ∈ SA
Input: The current (unfinished) splitting tree Y
Input: A leaf node D from Y
if Condition 1 holds for D then
P := {D after x | x ∈ out(D)};
labels(D) := out(D);
Add the partition blocks of P as children of D;
else if Condition 2 holds for D then
labels(D) := out(D);
foreach x ∈ out(D) do
P := the finest partition for Condition 2 with D and x;
Add the partition blocks of P as children of D;
end
else if Condition 3 holds for D with input a then
P := the finest partition for Condition 3 with D and a;
labels(D) := {a};
Add the partition blocks of P as children of D;
return Y ;
Algorithm 1. Algorithm for splitting a leaf node of a splitting tree.
Example 21. Let us apply Algorithm 1 on the suspension automaton in Fig. 5a.
Figure 5b shows the resulting splitting tree. We initialize the root node to
{1, 2, 3, 4, 5}. Condition 1 applies, since states 1 and 5 only have output y
enabled, while states 2, 3 and 4 only have outputs x and z enabled. Thus, we
add leaves {1, 5} and {2, 3, 4}.
We can split {1, 5} by taking an output transition for y according to condition
2, as 1 after y = 4 ∈ {2, 3, 4}, while 5 after y = 1 ∈ {1, 5}, i.e. 1 and 5 reach
different leaves. Condition 2 also applies for {2, 3, 4}. We have that {2, 3} after
x = {2, 4} ⊆ {2, 3, 4} while 4 after x = 5 ∈ {5}. Hence we obtain children {4}
n-Complete Test Suites for IOCO
105
{1,2,3,4,5}: x, y, z
y
5
1
a
2
a
y
x
z
4
z
z
x
x
{2,3,4}: x, z
{1,5}: y
a
{1}
3
(a) Example specification with
mutually incompatible states.
{5}
{4} {2,3}: a {2} {3,4}: a
{2} {3}
{3} {4}
(b) Splitting tree of Figure 5a.
Fig. 5. Specification and its splitting tree.
and {2, 3} for output x. For z we have that 2 after z = 1 ∈ {1} while {3, 4}
after z = {3, 4} ⊆ {2, 3, 4}, so we obtain children {2} and {3, 4} for z.
We can split {2,3} by taking input transition a according to condition 3,
since 2 after a = 4 and 3 after a = 2, and no leaf of the splitting tree contains
both state 2 and state 4. Note that we could also have split on output transitions
x and z. Node {3, 4} cannot be split for output transition z, since {3, 4} after
z = {3, 4} which is a leaf, and hence condition 2 does not hold. However node
{3, 4} can be split for input transition a, as 3 after a = 2 and 4 after a = 4.
Now all leaves are singleton, so we can distinguish all states with this tree.
A distinguishing tree Y ∈ DT (LI , LO , D) for D can be constructed from a
splitting tree with singleton leaf nodes. This follows the structure in Definition 8,
and we only need to choose whether to provide an input, or whether to observe
outputs. We look at the lowest node D in the split tree such that D ⊆ D .
x
{1,2,3,4,5}
y
{2,4,5}
x y z
z
{1,3,4}
x y
{1,4}
x y
z
{1} {1,3} {5} {4} {3} {2,5}
{4,5}
y
x
z
x y
z
x y
z
{5} {1} {3} {2} {4} {4}
z
{4}
{3,4}
a
x
y
z
{4} {1} {1} {2,4} reset reset reset
y
x
z
{4,5}
x y
z
{5}
{1} {3}
{1,3}
x y
z
{2}
{4} {4}
Fig. 6. Distinguishing tree of Fig. 5a. The states are named by the sets of states which
they distinguish. Singleton and empty sets are the pass states. Self-loops in verdict
states have been omitted, for brevity.
106
P. van den Bos et al.
If labels(D ) has an input, then Y has a transition for this input, and a transition
to reset for all outputs. If labels(D ) contains outputs, then Y has a transition for
all outputs. In this manner, we recursively construct states of the distinguishing
tree until |D| ≤ 1, in which case we have reached a pass state. Figure 6 shows
the distinguishing tree obtained from the splitting tree in Fig. 5b.
6
Conclusions
We firmly embedded theory on n-complete test suites into ioco theory, without making any restricting assumptions. We have identified several problems
where classical FSM techniques fail for suspension automata, in particular for
compatible states. An extension of the concept of distinguishing states has been
introduced such that compatible states can be handled, by testing the merge
of such states. This requires that the merge itself does not contain compatible
states. Furthermore, upper bounds for several parts of a test suite have been
given, such as reaching all states in the implementation.
These upper bounds are exponential in the number of states, and may limit
practical applicability. Further investigation is needed to efficiently tackle these
parts of the test suite. Alternatively, looser notions for completeness may circumvent these problems. Furthermore, experiments are needed to compare our
testing method and random testing as in [11] quantitatively, in terms of efficiency
of computation and execution time, and the ability to find bugs, preferably on
a real world case study.

File diff suppressed because it is too large Load diff

File diff suppressed because it is too large Load diff

15177
words/everything Normal file

File diff suppressed because it is too large Load diff

798
words/moerman19a.txt Normal file
View file

@ -0,0 +1,798 @@
Proceedings of Machine Learning Research 93:5466, 2019
International Conference on Grammatical Inference
Learning Product Automata
Joshua Moerman
joshua.moerman@cs.ru.nl
Institute for Computing and Information Sciences
Radboud University
Nijmegen, the Netherlands
Editors: Olgierd Unold, Witold Dyrka, and Wojciech Wieczorek
Abstract
We give an optimisation for active learning algorithms, applicable to learning Moore machines with decomposable outputs. These machines can be decomposed themselves by
projecting on each output. This results in smaller components that can then be learnt with
fewer queries. We give experimental evidence that this is a useful technique which can reduce the number of queries substantially. Only in some cases the performance is worsened
by the slight overhead. Compositional methods are widely used throughout engineering,
and the decomposition presented in this article promises to be particularly interesting for
learning hardware systems.
Keywords: query learning, product automata, composition
1. Introduction
Query learning (or, active learning) is becoming a valuable tool in engineering of both
hardware and software systems (Vaandrager, 2017). Indeed, applications can be found in
a broad range of applications: finding bugs in network protocols as shown by FiterăuBroştean et al. (2016, 2017), assisting with refactoring legacy software as shown by Schuts
et al. (2016), and reverse engineering bank cards by Chalupar et al. (2014).
These learning techniques originate from the field of grammatical inference. One of the
crucial steps for applying these to black box systems was to move from deterministic finite
automata to deterministic Moore or Mealy machines, capturing reactive systems with any
kind of output. With little adaptations, the algorithms work well, as shown by the many
applications. This is remarkable, since little specific knowledge is used beside the input
alphabet of actions.
Realising that composition techniques are ubiquitous in engineering, we aim to use more
structure of the system during learning. In the present paper we use the simplest type of
composition; we learn product automata, where outputs of several components are simply
paired. Other types of compositions, such as sequential composition of Mealy machines, are
discussed in Section 5.
To the best of the authors knowledge, this has not been done before explicitly. Furthermore, libraries such as LearnLib (see Isberner et al., 2015) and libalf (see Bollig et al.,
2010b) do not include such functionality out of the box. Implicitly, however, it has been
done before. Rivest and Schapire (1994) use two tricks to reduce the size of some automata
in their paper “Diversity-based inference of finite automata”. The first trick is to look at
c 2019 J. Moerman.
Learning Product Automata
o1
o2
i
M
i
o1
M1
o2
M2
Figure 1: A Moore machine with two outputs (left) can be equivalently seen as two (potentially smaller) Moore machines with a single output each (right).
the reversed automaton (in their terminology, the diversity-based automaton). The second
trick (which is not explicitly mentioned, unfortunately) is to have a different automaton
for each observable (i.e., output). In one of their examples the two tricks combined give a
reduction from ±1019 states to just 54 states.
In this paper, we isolate this trick and use it in query learning. We give an extension
of L* which handles products directly and we give a second algorithm which simply runs
two learners simultaneously. Furthermore, we argue that this is particularly interesting
in the context of model learning of hardware, as systems are commonly engineered in a
compositional way. We give preliminary experimental evidence that the technique works
and improves the learning process. As benchmarks, we learn (simulated) circuits which
provide several output bits.
2. Preliminaries
We use the formalism of Moore machines to describe our algorithms. Nonetheless, the
results can also be phrased in terms of Mealy machines.
Definition 1 A Moore machine is a tuple M = (Q, I, O, δ, o, q0 ) where Q, I and O are
finite sets of states, inputs and outputs respectively, δ : Q × I → Q is the transition function,
o : Q → O is the output function, and q0 ∈ Q is the initial state. We define the size of the
machine, |M |, to be the cardinality of Q.
We extend the definition of the transition function to words as δ : Q × I → Q. The
behaviour of a state q is the map JqK : I → O defined by JqK(w) = o(δ(q, w)). Two states
q, q 0 are equivalent if JqK = Jq 0 K. A machine is minimal if all states have different behaviour
and all states are reachable. We will often write JM K to mean Jq0 K and say that machines
are equivalent if their initial states are equivalent.
Definition 2 Given two Moore machines with equal input sets, M1 = (Q1 , I, O1 , δ1 , o1 , q0 1 )
and M2 = (Q2 , I, O2 , δ2 , o2 , q0 2 ), we define their product M1 × M2 by:
M1 × M2 = (Q1 × Q2 , I, O1 × O2 , δ, o, (q0 1 , q0 2 )),
where δ((q1 , q2 ), a) = (δ1 (q1 , a), δ2 (q2 , a)) and o((q1 , q2 )) = (o1 (q1 ), o2 (q2 )).
55
Learning Product Automata
0
0
1
0
0
1
0
1
Figure 2: A state of the 8-bit register machine.
The product is formed by running both machines in parallel and letting I act on both
machine synchronously. The output of both machines is observed. Note that the product
Moore machine might have unreachable states, even if the components are reachable. The
product of more than two machines is defined by induction.
Let M be a machine with outputs in O1 × O2 . By post-composing the output function with projection functions we get two machines, called components, M1 and M2 with
outputs in O1 and O2 respectively. This is depicted in Figure 1. Note that M is equivalent to M1 × M2 . If M and its components Mi are taken to be minimal,
then we have
p
|M | ≤ |M1 | · |M2 | and |Mi | ≤ |M |. In the p
best case we have |Mi | = |M | and so the behaviour of M can be described using only 2 |M | states, which is less than |M | (if |M | > 4).
With iterated products the reduction can be more substantial as shown in the following example. This reduction in state space is beneficial for learning algorithms.
We introduce basic notation: πi : A1 × A2 → Ai are the usual projection functions. On
a function f : X → A1 × A2 we use the shorthand πi f to denote πi ◦ f . As usual, uv denotes
concatenation of string u and v, and this is lifted to sets of strings U V = {uv | u ∈ U, v ∈ V }.
We define the set [n] = {1, . . . , n} and the set of Boolean values B = {0, 1}.
2.1. Example
We take the n-bit register machine example from Rivest and Schapire (1994). The state
space of the n-bit register machine Mn is given by n bits and a position of the reading/writing head, see Figure 2. The inputs are commands to control the position of the
head and to flip the current bit. The output is the current bit vector. Formally it is defined
as Mn = (Bn × [n], {L, R, F }, Bn , δ, o, i), where the initial state is i = ((0, . . . , 0), 1) and the
output is o(((b1 , . . . , bn ), k)) = (b1 , . . . , bn ). The transition function is defined such that L
moves the head to the left, R moves the head to the right (and wraps around on either
ends), and F flips the current bit. Formally,
(
((b1 , . . . , bn ), k 1) if k > 1,
δ(((b1 , . . . , bn ), k), L) =
((b1 , . . . , bn ), n)
if k = 1,
(
((b1 , . . . , bn ), k + 1) if k < n,
δ(((b1 , . . . , bn ), k), R) =
((b1 , . . . , bn ), 1)
if k = n,
δ(((b1 , . . . , bn ), k), F ) = ((b1 , . . . , ¬bk , . . . , bn ), k).
The machine Mn is minimal and has n · 2n states. So although this machine has very
simple behaviour, learning it will require a lot of queries because of its size. Luckily, the
machine can be decomposed into smaller components. For each bit l, we define a component
Mnl = (B × [n], {L, R, F }, B, δ l , π1 , (0, 1)) which only stores one bit and the head position.
The transition function δ l is defined similarly as before on L and R, but only flips the bit
on L if the head is on position l (i.e., δ l ((b, l), F ) = (¬b, l) and δ l ((b, k)) = (b, k) if k 6= l).
56
Learning Product Automata
The product Mn1 × · · · × Mnn is equivalent to Mn . Each of the components Mnl is minimal
and has only 2n states. So by this decomposition, we only need 2 · n2 states to describe the
whole behaviour of Mn . Note, however, that the product Mn1 × · · · × Mnn is not minimal;
many states are unreachable.
3. Learning
We describe two approaches for active learning of product machines. One is a direct extension of the well-known L* algorithm. The other reduces the problem to any active learning
algorithm, so that one can use more optimised algorithms.
We fix an unknown target machine M with a known input alphabet I and output
alphabet O = O1 × O2 . The goal of the learning algorithm is to infer a machine equivalent
to M , given access to a minimally adequate teacher as introduced by Angluin (1987). The
teacher will answer the following two types of queries.
• Membership queries (MQs): The query consists of a word w ∈ I and the teacher will
answer with the output JM K(w) ∈ O.
• Equivalence queries (EQs): The query consists of a Moore machine H, the hypothesis,
and the teacher will answer with YES if M and H are equivalent and she will answer
with a word w such that JM K(w) 6= JHK(w) otherwise.
3.1. Learning product automata with an L* extension
We can use the general framework for automata learning as set up by van Heerdt et al.
(2017). The general account does not directly give concrete algorithms, but it does give
generalised definitions for closedness and consistency. The main data structure for the
algorithm is an observation table.
Definition 3 An observation table is a triple (S, E, T ) where S, E ⊆ I are finite sets of
words and T : S SI → OE is defined by T (s)(e) = JM K(se).
During the L* algorithm the sets S, E grow and T encodes the knowledge of JM K so far.
Definition 4 Let (S, E, T ) be an observation table.
• The table is product-closed if for all t ∈ SI there exist s1 , s2 ∈ S such that
πi T (t) = πi T (si ) for i = 1, 2.
• The table is product-consistent if for i = 1, 2 and for all s, s0 ∈ S we have
πi T (s) = πi T (s0 ) implies πi T (sa) = πi T (s0 a) for all a ∈ I.
These definitions are related to the classical definitions of closedness and consistency as
shown in the following lemma. The converses of the first two points do not necessarily hold.
We also proof that if a observation table is product-closed and product-consistent, then a
well-defined product machine can be constructed which is consistent with the table.
57
Learning Product Automata
Algorithm 1 The product-L* algorithm.
1: Initialise S and E to {}
2: Initialise T with MQs
3: repeat
4:
while (S, E, T ) is not product-closed or -consistent do
5:
if (S, E, T ) not product-closed then
6:
find t ∈ SI such that there is no s ∈ S with πi T (t) = πi T (s) for some i
7:
add t to S and fill the new row using MQs
8:
if (S, E, T ) not product-consistent then
9:
find s, s0 ∈ S, a ∈ I and e ∈ E such that πi T (s) = πi T (s0 ) but πi T (sa)(e) 6=
πi T (s0 a)(e) for some i
10:
add ae to E and fill the new column using MQs
11:
Construct H (by Lemma 6)
12:
if EQ(H) gives a counterexample w then
13:
add w and all its prefixes to S
14:
fill the new rows with MQs
15: until EQ(H) = YES
16: return H
Lemma 5 Let OT = (S, E, T ) be an observation table and let πi OT = (S, E, πi T ) be a
component. The following implications hold.
1.
2.
3.
4.
OT
OT
OT
OT
is
is
is
is
closed =⇒ OT is product-closed.
consistent ⇐= OT is product-consistent.
product-closed ⇐⇒ πi OT is closed for each i.
product-consistent ⇐⇒ πi OT is consistent for each i.
Proof (1) If OT is closed, then each t ∈ SI has a s ∈ S such that T (t) = T (s). This
implies in particular that πi T (t) = πi T (s), as required. (In terms of the definition, this
means we can take s1 = s2 = s.)
(2) Let OT be product-consistent and s, s0 ∈ S such that T (s) = T (s0 ). We then know
that πi T (s) = πi T (s0 ) for each i and hence πi T (sa) = πi T (s0 a) for each i and a. This means
that T (sa) = T (s0 a) as required.
Statements (3) and (4) just rephrase the definitions.
Lemma 6 Given a product-closed and -consistent table we can define a product Moore
machine consistent with the table, where each component is minimal.
Proof If the table OT is product-closed and -consistent, then by the previous lemma, the
tables πi OT are closed and consistent in the usual way. For these tables we can use the
construction of Angluin (1987). As a result we get a minimal machine Hi which is consistent
with table πi OT . Taking the product of these gives a machine which is consistent with OT .
(Beware that this product is not necessarily the minimal machine consistent with OT .)
58
Learning Product Automata
Algorithm 2 Learning product machines with other learners.
1: Initialise two learners L1 and L2
2: repeat
3:
while Li queries M Q(w) do
4:
forward M Q(w) to the teacher and get output o
5:
return πi o to Li
{at this point both learners constructed a hypothesis}
6:
Let Hi be the hypothesis of Li
7:
Construct H = H1 × H2
8:
if EQ(H) returns a counterexample w then
9:
if JH1 K(w) 6= π1 JM K(w) then
10:
return w to L1
11:
if JH2 K(w) 6= π2 JM K(w) then
12:
return w to L2
13: until EQ(H) = YES
14: return YES to both learners
15: return H
The product-L* algorithm (Algorithm 1) resembles the original L* algorithm, but uses
the new notions of closed and consistent. Its termination follows from the fact that L*
terminates on both components.
By Lemma 5 (1) we note that the algorithm does not need more rows than we would
need by running L* on M . By point (4) of the same lemma, we find that it does not need
more columns than L* would need on each component combined. This means that in the
worst case, the table is twice as big as the original L* would do. However, in good cases
(such as the running example), the table is much smaller, as the number of rows is less for
each component and the columns needed for each component may be similar.
3.2. Learning product automata via a reduction
The previous algorithm constructs two machines from a single table. This suggests that we
can also run two learning algorithms to construct two machines. We lose the fact that the
data structure is shared between the learners, but we gain that we can use more efficient
algorithms than L* without any effort.
Algorithm 2 is the algorithm for learning product automata via this reduction. It runs
two learning algorithms at the same time. All membership queries are passed directly to the
teacher and only the relevant output is passed back to the learner. (In the implementation,
the query is cached, so that if the other learner poses the same query, it can be immediately answered.) If both learners are done posing membership queries, they will pose an
equivalence query at which point the algorithm constructs the product automaton. If the
equivalence query returns a counterexample, the algorithm forwards it to the learners.
The crucial observation is that a counterexample is necessarily a counterexample for at
least one of the two learners. (If at a certain stage only one learner makes an error, we keep
the other learner suspended, as we may obtain a counterexample for that one later on.)
59
Learning Product Automata
This observation means that at least one of the learners makes progress and will eventually
converge. Hence, the whole algorithm will converge.
In the worst case, twice as many queries will be posed, compared to learning the whole
machine at once. (This is because learning the full machine also learns its components.)
In good cases, such as the running example, it requires much less queries. Typical learning
algorithms require roughly O(n2 ) membership queries, where n is the number of states in
the minimal machine. For the example Mn this bound gives O((n · 2n )2 ) = O(n2 · 22n )
queries. When learning the components Mnl with the above algorithm, the bound gives just
O((2n)2 + · · · + (2n)2 ) = O(n3 ) queries.
4. Experiments
We have implemented the algorithm via reduction in LearnLib.1 As we expect the reduction
algorithm to be the most efficient and simpler, we leave an implementation of the direct
extension of L* as future work. The implementation handles products of any size (as opposed
to only products of two machines). Additionally, the implementation also works on Mealy
machines and this is used for some of the benchmarks.
In this section, we compare the product learner with a regular learning algorithm. We
use the TTT algorithm by Isberner et al. (2014) for the comparison and also as the learners
used in Algorithm 2. We measure the number of membership and equivalence queries. The
results can be found in Table 1.
The equivalence queries are implemented by random sampling so as to imitate the intended application of learning black-box systems. This way, an exact learning algorithm
turns into a PAC (probably approximately correct) algorithm. Efficiency is typically measured by the total number of input actions which also accounts for the length of the membership queries (including the resets). This is a natural measure in the context of learning
black box systems, as each action requires some amount of time to perform.
We evaluated the product learning algorithm on the following two classes of machines.
n-bit register machine The machines Mn are as described before. We note that the
product learner is much more efficient, as expected.
Circuits In addition to the (somewhat artificial) examples Mn , we use circuits which
appeared in the logic synthesis workshops (LGSynth89/91/93), part of the ACM/SIGDA
benchmarks.2 These models have been used as benchmarks before for FSM-based testing
methods by Hierons and Türker (2015) and describe the behaviour of real-world circuits.
The circuits have bit vectors as outputs, and can hence be naturally be decomposed by
taking each bit individually. As an example, Figure 3 depicts one of the circuits (bbara).
The behaviour of this particular circuit can be modelled with seven states, but when restricting to each individual output bit, we obtain two machines of just four states. For the
circuits bbsse and mark1, we additionally regrouped bit together in order to see how the
performance changes when we decompose differently.
1. The implementation and models can be found on-line at https://gitlab.science.ru.nl/moerman/
learning-product-automata.
2. The original files describing these circuits can be found at https://people.engr.ncsu.edu/brglez/
CBL/benchmarks/.
60
Learning Product Automata
/0
--0-/0
--10/0
-111/0
0011/0
1011
/0
-111
-11
1/0
0
1
00 011
-1
11 /0
11
/0
/0
10
11
/00
-111 0011
/
/00 00
-111/00
1011/00
--0-/1
--10/1
-111/1
1011/00
1/0
101
--0-/00
--10/00
/0
11 /0
10 011
0
0
1/0
001
--0-/01
--10/01
1011/01
--0-/0
--10/0
0011/0
1011/0
-111/0
--10/0
--0-/0
1011/0
/0
11
1/0 0
10
-11011/
0
--0-/00
--10/00
0011/00
/00
0011
1/00
/00
101
11
-1
0
1011/0
0011
/00
-111/0
0
--0-/00
--10/00
1/0
001
--0-/00
--10/00
-111/0
001
1/0
0
0
/0
11
10
/00
0011
--0-/1
--10/1
1011/1
--10/0
--0-/0
--0-/00
--10/00
--0-/0
--10/0
001
1/0
-111/00
00 111/
11 0
/0
--0-/10
--10/10
-111/10
--0-/0
--10/0
Figure 3: The bbara circuit (left) has two output bits. This can be decomposed into two
smaller circuits with a single output bit (middle and right).
For some circuits the number of membership queries is reduced compared to a regular
learner. Unfortunately, the results are not as impressive as for the n-bit register machine.
An interesting case is ex3 where the number of queries is slightly increased, but the total
amount of actions performed is substantially reduced. The number of actions needed in
total is actually reduced in all cases, except for bbsse. This exception can be explained
by the fact that the biggest component of bbsse still has 25 states, which is close to the
original 31 states. We also note that the choice of decomposition matters, for both mark1
and bbsse it was beneficial to regroup components.
In Figure 4, we look at the size of each hypothesis generated during the learning process.
We note that, although each component grows monotonically, the number of reachable
states in the product does not grow monotonically. In this particular instance where we
learn mark1 there was a hypothesis of 58 128 states, much bigger than the target machine of
202 states. This is not an issue, as the teacher will allow it and answer the query regardless.
Even in the PAC model with membership queries, this poses no problem as we can still
efficiently determine membership. However, in some applications the equivalence queries
are implemented with a model checker (e.g., in the work by Fiterău-Broştean et al., 2016)
or a sophisticated test generation tool. In these cases, the increased size of intermediate
hypotheses may be undesirable.
5. Discussion
We have shown two query learning algorithms which exploit a decomposable output. If
the output can be split, then also the machine itself can be decomposed in components.
As the preliminary experiments show, this can be a very effective optimization for learning
black box reactive systems. It should be stressed that the improvement of the optimization
depends on the independence of the components. For example, the n-bit register machine
has nearly independent components and the reduction in the number of queries is big. The
more realistic circuits did not show such drastic improvements in terms of queries. When
61
Learning Product Automata
Machine
M2
M3
M4
M5
M6
M7
M8
bbara
keyb
ex3
bbsse
mark1
bbsse*
mark1*
States
8
24
64
160
384
896
2048
7
41
28
31
202
31
202
Components
2
3
4
5
6
7
8
2
2
2
7
16
4
8
Product learner
EQs
MQs Actions
3
100
621
3
252
1 855
8
456
3 025
6
869
7 665
11
1 383
12 870
11
2 087
24 156
13
3 289
41 732
3
167
1 049
25 12 464 153 809
24
1 133
9 042
20 14 239 111 791
30 16 712 145 656
19 11 648
89 935
22 13 027 117 735
EQs
5
5
6
17
25
52
160
3
24
18
8
67
8
67
TTT learner
MQs Actions
115
869
347
2946
1 058
13 824
2 723
34 657
6 250
90 370
14 627 226 114
34 024 651 678
216
1 535
6024 265 805
878
91 494
4 872
35 469
15 192 252 874
4 872
35 469
15 192 252 874
number of states
Table 1: Comparison of the product learner with an ordinary learner.
104
102
100
2
4
6
8
10
12
14
Hypothesis-number
16
18
20
22
Figure 4: The number of states for each hypothesis while learning mark1.
62
Learning Product Automata
taking the length of the queries in account as well (i.e., counting all actions performan on
the system), we see an improvement for most of the test cases.
In the remainder of this section we discuss related ideas and future work.
5.1. Measuring independence
As the results show, the proposed technique is often beneficial but not always. It would
be interesting to know when to use decomposition. It is an interesting question how to
(quantitatively) measure the independence. Such a measure can potentially be used by the
learning algorithm to determine whether to decompose or not.
5.2. Generalisation to subsets of products
In some cases, we might know even more about our output alphabet. The output set O may
be a proper subset of O1 × O2 , indicating that some outputs can only occur “synchronised”.
For example, we might have O = {(0, 0)} {(a, b) | a, b ∈ [3]}, that is, the output 0 for
either component can only occur if the other component is also 0.
In such cases we can use the above algorithm still, but we may insist that the teacher
only accepts machines with output in O for the equivalence queries (as opposed to outputs
in {0, 1, 2, 3}2 ). When constructing H = H1 × H2 in line 7 of Algorithm 2, we can do a
reachability analysis on H to check for non-allowed outputs. If such traces exist, we know
it is a counterexample for at least one of the two learners. With such traces we can fix the
defect ourselves, without having to rely on the teacher.
5.3. Product DFAs
For two DFAs (Q1 , δ1 , F1 , q0 1 ) and (Q2 , δ2 , F2 , q0 2 ), a state in the product automaton is
accepting if both components are accepting. In the formalism of Moore machines, the finals
states are determined by their characteristic function and this means that the output is given
by o(q1 , q2 ) = o1 (q1 )∧o2 (q2 ). Again, the components may be much smaller than the product
and this motivated Heinz and Rogers (2013) to learn (a subclass of) product DFAs. This
type of product is more difficult to learn as the two components are not directly observable.
Such automata are also relevant in model checking and some of the (open) problems are
discussed by Kupferman and Mosheiff (2015).
5.4. Learning automata in reverse
The main result of Rivest and Schapire (1994) was to exploit the structure of the socalled “diversity-based” automaton. This automaton may also be called the reversed Moore
machine. Reversing provides a duality between reachability and equivalence. This duality is
theoretically explored by Rot (2016) and Bonchi et al. (2014) in the context of Brzozowskis
minimization algorithm.
Let M R denote the reverse of M , then we have JM R K(w) = JM K(wR ). This allows us
to give an L* algorithm which learns M R by posing membership queries with the words
reversed. We computed M R for the circuit models and all but one of them was much larger
than the original. This suggests that it might not be useful as an optimisation in learning
hardware or software systems. However, a more thorough investigation is desired.
63
Learning Product Automata
A; B
i
o
A
B
Figure 5: The sequential compostion A; B of two Mealy machines A and B.
5.5. Other types of composition
The case of learning a sequential composition is investigated by Abel and Reineke (2016).
In their work, there are two Mealy machines, A and B, and the output of A is fed into B, see
Figure 5. The goal is to learn a machine for B, assuming that A is known (i.e., white box).
The oracle only answers queries for the sequential composition, which is defined formally as
JA; BK(w) = JBK(JAK(w)). Since B can only be interacted with through A, we cannot use
L* directly. The authors show how to learn B using a combination of L* and SAT solvers.
Moreover, they give evidence that this is more efficient than learning A; B as a whole.
An interesting generalisation of the above is to consider A as an unknown as well.
The goal is to learn A and B simultaneously, while observing the outputs of B and the
communication between the components. The authors conjecture that this would indeed
be possible and result in a learning algorithm which is more efficient than learning A; B
(private communication).
Another type of composition is used by Bollig et al. (2010a). Here, several automata
are put in parallel and communicate with each other. The goal is not to learn a black box
system, but to use learning when designing such a system. Instead of words, the teacher
(i.e., designer in this case) receives message sequence charts which encode the processes and
actions. Furthermore, they exploit partial order reduction in the learning algorithm.
We believe that a combination of our and the above compositional techniques can improve the scalability of learning black box systems. Especially in the domain of software
and hardware we expect such techniques to be important, since the systems themselves are
often designed in a modular way.
Acknowledgments
We would like to thank Nathanaël Fijalkow, Ramon Janssen, Gerco van Heerdt, Harco Kuppens, Alexis Linard, Alexandra Silva, Rick Smetsers, and Frits Vaandrager for proofreading
this paper and providing useful feedback. Thanks to Andreas Abel for discussing the case
of learning a sequential composition of two black box systems. Also thanks to anonymous
reviewers for interesting references and comments.