Nominal Techniques and Black Box Testing for Automata Learning Joshua Moerman ii Work in the thesis has been carried out under the auspices of the research school IPA (Institute for Programming research and Algorithmics) Printed by Gildeprint, Enschede Typeset using ConTEXt MKIV ISBN: 978–94–632–3696–6 IPA Dissertation series: 2019-06 Copyright © Joshua Moerman, 2019 www.joshuamoerman.nl Nominal Techniques and Black Box Testing for Automata Learning Proefschrift ter verkrijging van de graad van doctor aan de Radboud Universiteit Nijmegen op gezag van de rector magnificus prof. dr. J.H.J.M. van Krieken, volgens besluit van het college van decanen in het openbaar te verdedigen op maandag 1 juli 2019 om 16:30 uur precies door Joshua Samuel Moerman geboren op 1 oktober 1991 te Utrecht iv Promotoren: – prof. dr. F.W. Vaandrager – prof. dr. A. Silva (University College London, Verenigd Koninkrijk) Copromotor: – dr. S.A. Terwijn Leden manuscriptcommissie: – prof. dr. B.P.F. Jacobs – prof. dr. A.R. Cavalli (Télécom SudParis, Frankrijk) – prof. dr. F. Howar (Technische Universität, Dortmund, Duitsland) – prof. dr. S. Lasota (Uniwesytet Warszawkski, Polen) – dr. D. Petrișan (Université Paris Diderot, Frankrijk) Paranimfen: – Alexis Linard – Tim Steenvoorden Samenvatting Het leren van automaten speelt een steeds grotere rol bij de verificatie van software. Tijdens het leren, verkent een leeralgoritme het gedrag van software. Dit gaat in principe volledig automatisch, en het algoritme pakt vanzelf interessante eigenschappen op van de software. Het is hiermee mogelijk een redelijk precies model te maken van de werking van het stukje software dat we onder de loep nemen. Fouten en onverwacht gedrag van software kunnen hiermee worden blootgelegd. In dit proefschrift kijken we in eerste instantie naar technieken voor testgeneratie. Deze zijn nodig om het leeralgoritme een handje te helpen. Na het automatisch verkennen van gedrag, formuleert het leeralgoritme namelijk een hypothese die de software nog niet goed genoeg modelleert. Om de hypothese te verfijnen en verder te leren, hebben we tests nodig. Efficiëntie staat hierbij centraal: we willen zo min mogelijk testen, want dat kost tijd. Aan de andere kant moeten we wel volledig testen: als er een discrepantie is tussen het geleerde model en de software, dan willen we die met een test kunnen aanwijzen. In de eerste paar hoofdstukken laten we zien hoe testen van automaten te werk gaat. We geven een theoretisch kader om verschillende, bestaande n-volledige testgeneratiemethodes te vergelijken. Op grond hiervan beschrijven we een nieuw, efficiënt algoritme. Dit nieuwe algoritme staat centraal bij een industriële casus waarin we een model van complexe printer-software van Océ leren. We laten ook zien hoe een van de deelproblemen – het onderscheiden van toestanden met zo kort mogelijke invoer – efficiënt kan worden opgelost. Het tweede thema in dit proefschrift is de theorie van formele talen en automaten met oneindige alfabetten. Ook dit is zinnig voor het leren van automaten. Software, en in het bijzonder internet-communicatie-protocollen, maken namelijk vaak gebruik van „identifiers” om bijvoorbeeld verschillende gebruikers te onderscheiden. Het liefst nemen we oneindig veel van zulke identifiers aan, aangezien we niet weten hoeveel er nodig zijn voor het leren van de automaat. We laten zien hoe we de leeralgoritmes gemakkelijk kunnen veralgemeniseren naar oneindige alfabetten door gebruik te maken van nominale verzamelingen. In het bijzonder kunnen we hiermee registerautomaten leren. Vervolgens werken we de theorie van nominale automaten verder uit. We laten zien hoe je deze structuren efficiënt kan implementeren. En we geven een speciale klasse van nominale automaten die een veel kleinere representatie hebben. Dit zou gebruikt kunnen worden om zulke automaten sneller te leren. vi Summary Automata learning plays a more and more prominent role in the field of software verification. Learning algorithms are able to automatically explore the behaviour of software. By revealing interesting properties of the software, these algorithms can create models of the, otherwise unknown, software. These learned models can, in turn, be inspected and analysed, which often leads to finding bugs and inconsistencies in the software. An important tool which we need when learning software is test generation. This is the topic of the first part of this thesis. After the learning algorithm has learned a model and constructed a hypothesis, test generation methods are used to validate this hypothesis. Efficiency is key: we want to test as little as possible, as testing may take valuable time. However, our tests have to be complete: if the hypothesis fails to model the software well, we better have a test which shows this discrepancy. The first few chapters explain black box testing of automata. We present a theoretical framework in which we can compare existing n-complete test generation methods. From this comparison, we are able to define a new, efficient algorithm. In an industrial case study on embedded printer software, we show that this new algorithm works well for finding counterexamples for the hypothesis. Besides the test generation, we show that one of the subproblems – finding the shortest sequences to separate states – can be solved very efficiently. The second part of this thesis is on the theory of formal languages and automata with infinite alphabets. This, too, is discussed in the context of automata learning. Many pieces of software make use of identifiers or sequence numbers. These are used, for example, in order to distinguish different users or messages. Ideally, we would like to model such systems with infinitely many identifiers, as we do not know beforehand how many of them will be used. Using the theory of nominal sets, we show that learning algorithms can easily be generalised to automata with infinite alphabets. In particular, this shows that we can learn register automata. Furthermore, we deepen the theory of nominal sets. First, we show that, in a special case, these sets can be implemented in an efficient way. Second, we give a subclass of nominal automata which allow for a much smaller representation. This could be useful for learning such automata more quickly. viii Acknowledgements Foremost, I would like to thank my supervisors. Having three of them ensured that there were always enough ideas to work on, theory to understand, papers to review, seminars to attend, and chats to have. Frits, thank you for being a very motivating supervisor, pushing creativity, and being only a few meters away. It started with a small puzzle (trying a certain test algorithm to help with a case study), which was a great, hands-on start of my Ph.D.. You introduced me to the field of model learning in a way that showcases both the theoretical and practical aspects. Alexandra, thanks for introducing me to abstract reasoning about state machines, the coalgebraic way. Although not directly shown in this thesis, this way of thinking has helped and you pushed me to pursuit clear reasoning. Besides the theoretical things I’ve learned, you have also taught me many personal lessons inside and outside of academia; thanks for inviting me to London, Caribbean islands, hidden cocktail clubs, and the best food. And thanks for leaving me with Daniela and Matteo, who introduced me to nominal techniques, while you were on sabbatical. Bas, thanks for broadening my understanding of the topics touched upon in this thesis. Unfortunately, we have no papers together, but the connections you showed to logic, computational learning, and computability theory have influenced the thesis nevertheless. I am grateful for the many nice chats we had. I would like to thank the members of the manuscript committee, Bart, Ana, Falk, Sławek, and Daniela. Reading a thesis is undoubtedly a lot of work, so thank you for the effort and feedback you have given me. Thanks, also, to the additional members coming to Nijmegen to oppose during the defence, Jan Friso, Jorge, and Paul. On the first floor of the Mercator building, I had the pleasure of spending four years with fun office mates. Michele, thanks for introducing me to the Ph.D. life, by always joking around. Hopefully, we can play a game of Briscola again. Alexis, many thanks for all the tasty proeverijen, whether it was beers, wines, poffertjes, kroketten, or anything else. Your French influences will be missed. Niels, thanks for the abstract nonsense and bashing on politics. Next to our office, was the office with Tim, with whom I had the pleasure of working from various coffee houses in Nijmegen. Further down the corridor, there was the office of Paul and Rick. Paul, thanks for being the kindest colleague I’ve had and for inviting us to your musical endeavours. Rick, thanks for the algorithmic sparring, we had a great collaboration. Was there a more iconic duo on our floor? A good contender would be Petra and Ramon. Thanks for the fun we had with ioco, together with Jan and Mariëlle. Nils, thanks for steering me towards probabilistic x things and opening a door to Aachen. I am also very grateful to Jurriaan for bringing back some coalgebra and category theory to our floor, and hosting me in London. My other co-authors, Wouter, David, Bartek, Michał, and David, also deserve many credits for all the interesting discussion we had. Harco, thanks for the technical support. Special thanks go to Ingrid, for helping with the often-overlooked, but important, administrative matters. Doing a Ph.D. would not be complete without a good amount of playing kicker, having borrels, and eating cakes at the iCIS institute. Thanks to all of you, Markus, Bram, Marc, Sam, Bas, Joost, Dan, Giso, Baris, Simone, Aleks, Manxia, Leon, Jacopo, Gabriel, Michael, Paulus, Marcos, Bas, and Henning.1 Thanks to the people I have met across the channel (which hopefully will remain part of the EU): Benni, Nath, Kareem, Rueben, Louis, Borja, Fred, Tobias, Paul, Gerco, and Carsten, for the theoretical adventure, but also for joining me to Phonox and other parties in London. I am especially thankful to Matteo and Emanuela for hosting me many times and for Hillary and Justin for accommodating me for three months each. I had a lot of fun at the IPA events. I’m very thankful to Tim and Loek for organising these events. Special thanks to Nico and Priyanka for organising a Halloween social event with me. Also thanks to all the participants in the IPA events, you made it a lot of fun! My gratitude extends to all the people I have met at summer schools and conferences. I had a lot of fun learning about different cultures, languages, and different ways of doing research. Hope we meet again! Besides all the fun research, I had a great time with my friends and family. We went to nice parties, had excellent dinners, and much more; thanks, Nick, Edo, Gabe, Saskia, Stijn, Sandra, Geert, Marco, Carmen, and Wesley. Thanks to Marlon, Hannah, Wouter, Dennis, Christiaan, and others from #RU for borrels, bouldering, and jams. Thanks to Ragnar, Josse, Julian, Jeroen, Vincent, and others from the BAPC for algorithmic fun. Thanks to my parents, Kees and Irene, and my brother, David, and his wife, Germa, for their love and support. My gratitude extends to my family in law, Ine, Wim, Jolien and Jesse. My final words of praise go to Tessa, my wife, I am very happy to have you on my side. You inspire me in many ways, and I enjoy doing all the fun stuff we do. Thank you a lot. 1 In no particular order. These lists are randomised. Contents Samenvatting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii Acknowledgements 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Model Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Applications of Model Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Research challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Black Box Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Nominal Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 Conclusion and Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 Part 1: Testing Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2 FSM-based Test Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mealy machines and sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Test generation methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hybrid ADS method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Proof of completeness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Related Work and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 19 26 31 35 36 38 3 Applying Automata Learning to Embedded Control Software . . . . . . . . . 41 Engine Status Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 Learning the ESM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 4 Minimal Separating Sequences for All Pairs of States . . . . . . . . . . . . . . . . Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Minimal Separating Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Optimising the Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Application in Conformance Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 60 64 67 70 71 72 xii Part 2: Nominal Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 5 Learning Nominal Automata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 Overview of the Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 Angluin’s Algorithm for Nominal DFAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 Learning Non-Deterministic Nominal Automata . . . . . . . . . . . . . . . . . . . . 93 Implementation and Preliminary Experiments . . . . . . . . . . . . . . . . . . . . . . 101 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 Discussion and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 6 Fast Computations on Ordered Nominal Sets . . . . . . . . . . . . . . . . . . . . . . . Nominal sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Representation in the total order symmetry . . . . . . . . . . . . . . . . . . . . . . . . . Implementation and Complexity of ONS . . . . . . . . . . . . . . . . . . . . . . . . . . . Results and evaluation in automata theory . . . . . . . . . . . . . . . . . . . . . . . . Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 111 113 118 120 126 128 7 Separation and Renaming in Nominal Sets . . . . . . . . . . . . . . . . . . . . . . . . . Monoid actions and nominal sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A monoidal construction from Pm-sets to Sb-sets . . . . . . . . . . . . . . . . . . . Nominal and separated automata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Related and future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 133 137 143 149 Bibliography ....................................................... Curriculum Vitae ................................................... 151 169 Chapter 1 Introduction When I was younger, I often learned how to play with new toys by messing about with them, by pressing buttons at random, observing their behaviour, pressing more buttons, and so on. Only resorting to the manual – or asking “experts” – to confirm my beliefs on how the toys work. Now that I am older, I do mostly the same with new devices, new tools, and new software. However, now I know that this is an established computer science technique, called model learning. Model learning2 is an automated technique to construct a state-based model – often a type of automaton – from a black box system. The goal of this technique can be manifold: it can be used to reverse-engineer a system, to find bugs in it, to verify properties of the system, or to understand the system in one way or another. It is not just random testing: the information learned during the interaction with the system is actively used to guide following interactions. Additionally, the information learned can be inspected and analysed. This thesis is about model learning and related techniques. In the first part, I present results concerning black box testing of automata. Testing is a crucial part in learning software behaviour and often remains a bottleneck in applications of model learning. In the second part, I show how nominal techniques can be used to learn automata over structured infinite alphabets. The study on nominal automata was directly motivated by work on learning network protocols which rely on identifiers or sequence numbers. But before we get ahead of ourselves, we should first understand what we mean by learning, as learning means very different things to different people. In educational science, learning may involve concepts such as teaching, blended learning, and interdisciplinarity. Data scientists may think of data compression, feature extraction, and neural networks. In this thesis we are mostly concerned with software verification. But even in the field of verification several types of learning are relevant. 1 Model Learning In the context of software verification, we often look at stateful computations with inputs and outputs. For this reason, it makes sense to look at words, or traces. For an alphabet Σ, we denote the set of words by Σ∗ . 2 There are many names for the type of learning, such as active automata learning. The generic name “model learning” is chosen as a counterpoint to model checking. 2 Chapter 1 The learning problem is defined as follows. There is some fixed, but unknown, language ℒ ⊆ Σ∗ . This language may define the behaviour of a software component, a property in model checking, a set of traces from a protocol, etc. We wish to infer a description of ℒ after only having observed a small part of this language. For example, we may have seen hundred words belonging to the language and a few which do not belong to the language. Then concluding with a good description of ℒ is difficult, as we are missing information about the infinitely many words we have not observed. Such a learning problem can be stated and solved in a variety of ways. In the applications we do in our research group, we often try to infer a model of a software component. (Chapter 3 describes such an application.) In these cases, a learning algorithm can interact with the software. So it makes sense to study a learning paradigm which allows for queries, and not just a data set of samples. A typical query learning framework was established by Angluin (1987). In her framework, the learning algorithm may pose two types of queries to a teacher, or oracle: Membership queries (MQ) The learner poses such a query by providing a word w ∈ Σ∗ to the teacher. The teacher will then reply whether w ∈ ℒ or not. This type of query is often generalised to more general output, in these cases we consider ℒ : Σ∗ → O and the teacher replies with ℒ(w). In some papers, such a query is then called an output query. Equivalence queries (EQ) The learner can provide a hypothesised description H of ℒ to the teacher. If the hypothesis is correct, the teacher replies with yes. If, however, the hypothesis is incorrect, the teacher replies with no together with a counterexample, i.e., a word which is in ℒ but not in the hypothesis or vice versa. By posing many such queries, the learner algorithm is supposed to converge to a correct model. This type of learning is hence called exact learning. Angluin (1987) showed that one can do this efficiently for deterministic finite automata (DFAs), when ℒ is in the class of regular languages. It should be clear why this is called query learning or active learning. The learning algorithm initiates interaction with the teacher by posing queries, it may construct its own data points and ask for their corresponding label. Active learning is in contrast to passive learning where all observations are given to the algorithm up front. Another paradigm which is relevant for our type of applications is PAC-learning with membership queries. Here, the algorithm can again use MQs as before, but the EQs are replaced by random sampling. So the allowed query is: Random sample queries (EX) If the learner poses this query (there are no parameters), the teacher responds with a random word w together with its label, i.e., whether w ∈ ℒ or not. (Here, random means that the words are sampled by some probability distribution known to the teacher.) Introduction 3 Instead of requiring that the learner exactly learns the model, we only require the following. The learner should probably return a model which is approximate to the target. This gives the name probably approximately correct (PAC). Note that there are two uncertainties: the probable and the approximate part. Both parts are bounded by parameters, so one can determine the confidence. As with many problems in computer science, we are also interested in the efficiency of learning algorithms. Instead of measuring time or space, we analyse the number of queries posed by an algorithm. Efficiency often means that we require a polynomial number of queries. But polynomial in what? The learner has no input, other than the access to a teacher. We ask the algorithms to be polynomial in the size of the target (i.e., the size of the description which has yet to be learned). In the case of PAC learning we also require it to be polynomial in the two parameters for confidence. Deterministic automata can be efficiently learned in the PAC model. In fact, any efficient exact learning algorithm with MQs and EQs can be transformed into an efficient PAC algorithm with MQs (see Kearns & Vazirani, 1994, exercise 8.1). For this reason, we mostly focus on the former type of learning in this thesis. The transformation from exact learning to PAC learning is implemented by simply testing the hypothesis with random samples. This can be postponed until we actually implement a learning algorithm and apply it. When using only EQs, only MQs, or only EXs, then there are hardness results for exact learning of DFAs. So the combinations MQs + EQs (for exact learning) and MQs + EXs (for PAC learning) have been carefully picked, they provide a minimal basis for efficient learning. See the book of Kearns and Vazirani (1994) for such hardness results and more information on PAC learning. So far, all the queries are assumed to be just there. Somehow, these are existing procedures which we can invoke with MQ(w), EQ(H), or EX(). This is a useful abstraction when designing a learning algorithm. One can analyse the complexity (in terms of number of queries) independently of how these queries are resolved. Nevertheless, at some point in time one has to implement them. In our case of learning software behaviour, membership queries are easily implemented: simply provide the word w to a running instance of the software and observe the output.3 Equivalence queries, however, are in general not doable. Even if we have the (machine) code, it is often way too complicated to check equivalence. That is why we resort to testing with EX queries. The EX query from PAC learning normally assumes a fixed, unknown probability distribution on words. In our case, we choose and implement a distribution to test against. This cuts both ways: On the one hand, it allows us to only test behaviour we really care about, on the other hand the results are only as good as our choice of distribution. We deviate even further from the PAC-model as we sometimes change 3 In reality, it is a bit harder than this. There are plentiful of challenges to solve, such as timing, choosing your alphabet, choosing the kind of observations to make, and being able to reliably reset the software. 4 Chapter 1 our distribution while learning. Yet, as applications show, this is a useful way of learning software behaviour. 2 Applications of Model Learning Since this thesis contains only one real-world application of learning in Chapter 3, it is good to mention a few others. Although we remain in the context of learning software behaviour, the applications are quite different from each other. This is by no means a complete list. Bug finding in protocols. A prominent example is by Fiterău-Broștean, et al. (2016). They learn models of TCP implementations – both clients and server sides. Interestingly, they found bugs in the (closed source) Windows implementation. Later, Fiterău-Broștean and Howar (2017) also found a bug in the sliding window of the Linux implementation of TCP. Other protocols have been learned as well, such as the MQTT protocol by Tappler, et al. (2017), TLS by de Ruiter and Poll (2015), and SSH by Fiterău-Broștean, et al. (2017). Many of these applications reveal bugs by learning a model and consequently apply model checking. The combination of learning and model checking was first described by Peled, et al. (2002). Bug finding in smart cards. Aarts, et al. (2013) learn the software on smart cards of several Dutch and German banks. These cards use the EMV protocol, which is run on the card itself. So this is an example of a real black box system, where no other monitoring is possible and no code is available. No vulnerabilities were found, although each card had a slightly different state machine. The e.dentifier, a card reader implementing a challenge-response protocol, has been learned by Chalupar, et al. (2014). They built a Lego machine which could automatically press buttons and the researchers found a security flaw in this card reader. Regression testing. Hungar, et al. (2003) describe the potential of automata learning in regression testing. The aim is not to find bugs, but to monitor the development process of a system. By considering the differences between models at different stages, one can generate regressions tests. Refactoring legacy software. Model learning can also be used in order to verify refactored software. Schuts, et al. (2016) have applied this at a project within Philips. They learn both an old version and a new version of the same component. By comparing the learned models, some differences could be seen. This gave developers opportunities to solve problems before replacing the old component by the new one. Introduction 5 3 Research challenges In this thesis, we will mostly see learning of deterministic automata or Mealy machines. Although this is limited, as many pieces of software require richer models, it has been successfully applied in the above examples. The limitations include the following. – The system behaves deterministically. – One can reliably reset the system. – The system can be modelled with a finite state space. This also means that the model does not incorporate time or data. – The input alphabet is finite. – One knows when the target is reached. Research challenge 1: Approximating equivalence queries. Having confidence in a learned model is difficult. We have PAC guarantees (as discussed before), but sometimes we may want to draw other conclusions. For example, we may require the hypothesis to be correct, provided that the real system is implemented with a certain number of states. Efficiency is important here: We want to obtain those guarantees fast and we want to quickly find counterexamples when the hypothesis is wrong. Test generation methods is the topic of the first part in this thesis. We will review existing algorithms and discuss new algorithms for test generation. Research challenge 2: Generalisation to infinite alphabets. Automata over infinite alphabets are very useful for modelling protocols which involve identifiers or timestamps. Not only is the alphabet infinite in these cases, the state space is as well, since the values have to be remembered. In the second part of this thesis, we will see how nominal techniques can be used to tackle this challenge. Being able to learn automata over an infinite alphabet is not new. It has been tackled, for instance, by Howar, et al. (2012), Bollig, et al. (2013) and in the theses of Aarts (2014), Cassel (2015), and Fiterău-Broștean (2018). In the first thesis, the problem is solved by considering abstractions, which reduces the alphabet to a finite one. These abstractions are automatically refined when a counterexample is presented to the algorithms. Fiterău-Broștean (2018) extends this approach to cope with “fresh values”, crucial for protocols such as TCP. In the thesis by Cassel (2015), another approach is taken. The queries are changed to tree queries. The approach in my thesis will be based on symmetries, which gives yet another perspective into the problem of learning such automata. 4 Black Box Testing An important step in automata learning is equivalence checking. Normally, this is abstracted away and done by an oracle, but we intend to implement such an oracle 6 Chapter 1 ourselves for our applications. Concretely, the problem we need to solve is that of conformance checking4 as it was first described by Moore (1956). The problem is as follows: Given the description of a finite state machine and a black box system, does the system behave exactly as the description? We wish to determine this by running experiments on the system (as it is black box). It should be clear that this is a hopelessly difficult task, as an error can be hidden arbitrarily deep in the system. That is why we often assume some knowledge of the system. In this thesis we often assume a bound on the number of states of the system. Under these conditions, Moore (1956) already solved the problem. Unfortunately, his experiment is exponential in size, or in his own words: “fantastically large.” Years later, Chow (1978) and Vasilevskii (1973) independently designed efficient experiments. In particular, the set of experiments is polynomial in the number of states. These techniques will be discussed in detail in Chapter 2. More background and other related problems, as well as their complexity results, are well exposed in a survey of Lee and Yannakakis (1994). slow spinning � fast spinning � � � � � no spinning � � � � � no spinning � Figure 1.1 Behaviour of a record player modelled as a finite state machine. To give an example of conformance checking, we model a record player as a finite state machine. We will not model the audible output – that would depend not only on the device, but also the record one chooses to play5. Instead, the only observation we can make is looking how fast the turntable spins. The device has two buttons: a 4 5 Also known as machine verification or fault detection. In particular, we have to add time to the model as one side of a record only lasts for roughly 25 minutes. Unless we take a record with sound on the locked groove such as the Sgt. Pepper’s Lonely Hearts Club Band album by The Beatles. Introduction 7 start-stop button ( ) and a speed button (� ) which toggles between 33 13 rpm and � � 45 rpm. When turned on, the system starts playing immediately at 33 13 rpm – this is useful for DJing. The intended behaviour of the record player has four states as depicted in Figure 1.1. Let us consider some faults which could be present in an implementation with four states. In Figure 1.2, two flawed record players are given. In the first (Figure 1.2a), the sequence � � leads us to the wrong state. However, this is not immediately observable, the turntable is in a non-spinning state as it should be. The fault is only visible when we press once more: now the turntable is spinning fast instead of slow. The sequence � � is a counterexample. In the second example (Figure 1.2b), the fault is again not immediately obvious: after pressing � we are in the wrong state as observed by pressing . Here, the counterexample is � . When a model of the implementation is given, it is not hard to find counterexamples. However, in a black box setting we do not have such a model. In order test whether a black box system is equivalent to a model, we somehow need to test all possible counterexamples. In this example, a test suite should include sequences such as � � and � . � � � � � � � � � � � � � � � � � � slow spinning � � � � � fast spinning � slow spinning � � � � � fast spinning � � � � � � � � � � � � � � � no spinning no spinning � no spinning � no spinning � � (a) Figure 1.2 (b) Two faulty record players. 5 Nominal Techniques In the second part of this thesis, I will present results related to nominal automata. Usually, nominal techniques are introduced in order to solve problems involving name binding in topics like lambda calculus. However, we use them in automata 8 Chapter 1 theory, specifically to model register automata. These are automata which have an infinite alphabet, often thought of as input actions with data. The control flow of the automaton may actually depend on the data. However, the data cannot be used in an arbitrary way as this would lead to many decision problems, such as emptiness and equivalence, being undecidable.6 A principal concept in nominal techniques is that of symmetries. To motivate the use of symmetries, we will look at an example of a register auttomaton. In the following automaton we model a (not-so-realistic) login system for a single person. The alphabet consists of the following actions: sign-up(p) login(p) logout() view() The sign-up action allows one to set a password p. This can only be done when the system is initialised. The login and logout actions speak for themselves and the view action allows one to see the secret data (we abstract away from what the user actually gets to see here). A simple automaton with roughly this behaviour is given in Figure 1.3. We will only informally discuss its semantics for now. q0 ∗/� sign-up(p) / � set r ≔ p login(p) / � if r = p q1 r q2 r logout() / � ∗/� view() / � ∗/� Figure 1.3 A simple register automaton. The symbol ∗ denotes any input otherwise not specified. The r in states q1 and q2 is a register. To model the behaviour, we want the domain of passwords to be infinite. After all, one should allow arbitrarily long passwords to be secure. This means that a register automaton is actually an automaton over an infinite alphabet. Common algorithms for automata, such as learning, will not work with an infinite alphabet. Any loop which iterates over the alphabet will diverge. In order to cope with this, we will use the symmetries present in the alphabet. Let us continue with the example and look at its symmetries. If a person signs up with a password “hello” and consequently logins with “hello”, then this is not distinguishable from a person signing up and logging in with “bye”. This is an example of symmetry: the values “hello” and “bye” can be permuted, or interchanged. Note, however, that the trace sign-up(hello) login(bye) is different from the two before: 6 The class of automata with arbitrary data operations is sometimes called extended finite state machines. Introduction 9 no permutation of “hello” and “bye” will bring us to a logged-in state with that trace. So we see that, despite the symmetry, we cannot simply identify the value “hello” and “bye”. For this reason, we keep the alphabet infinite and explicitly mention its symmetries. Using symmetries in automata theory is not a new idea. In the context of model checking, the first to use symmetries were Emerson and Sistla (1996) and Ip and Dill (1996). But only Ip and Dill (1996) used it to deal with infinite data domains. For automata learning with infinite domains, symmetries were used by Sakamoto (1997). He devised an L∗ learning algorithm for register automata, much like the one presented in Chapter 5. The symmetries are crucial to reduce the problem to a finite alphabet and use the regular L∗ algorithm. (Chapter 5 shows how to do it with more general symmetries.) Around the same time Ferrari, et al. (2005) worked on automata theoretic algorithms for the π-calculus. Their approach was based on the same symmetries and they developed a theory of named sets to implement their algorithms. Named sets are equivalent to nominal sets. However, nominal sets are defined in a more elementary way. The nominal sets we will soon see are introduced by Gabbay and Pitts (2002) to solve certain problems in name binding in abstract syntaxes. Although this is not really related to automata theory, it was picked up by Bojańczyk, et al. (2014), who provide an equivalence between register automata and nominal automata. (This equivalence is exposed in more detail in the book of Bojańczyk, 2018.) Additionally, they generalise the work on nominal sets to other symmetries. The symmetries we encounter in this thesis are listed below, but other symmetries can be found in the literature. The symmetry directly corresponds to the data values (and operations) used in an automaton. The data values are often called atoms. – The equality symmetry. Here the domain can be any countably infinite set. We can take, for example, the set of strings we used before as the domain from which we take passwords. No further structure is used on this domain, meaning that any value is just as good as any other. The symmetries therefore consist of all bijections on this domain. – The total order symmetry. In this case, we take a countable infinite set with a dense total order. Typically, this means we use the rational numbers, ℚ, as data values and symmetries which respect the ordering. 5.1 What is a nominal set? So what exactly is a nominal set? I will not define it here and leave the formalities to the corresponding chapters. It suffices, for now, to think of nominal sets as abstract sets (often infinite) on which a group of symmetries acts. This action makes it possible to interpret the symmetries of the data values in the abstract set. For automata, this 10 Chapter 1 allows us to talk about symmetries on the state space, the set of transitions, and the alphabet. In order to implement these sets algorithmically, we impose two finiteness requirements. Both properties can be expressed using only the group action. – Each element is finitely supported. A way to think of this requirement is that each element is “constructed” out of finitely many data values. – The set is orbit-finite. This means that we can choose finitely many elements such that any other element is a permuted version of one of those elements. If we wish to model the automaton from Figure 1.3 as a nominal automaton, then we can simply define the state space as Q = {q0 } ∪ {q1,a | a ∈ 𝔸} ∪ {q2,a | a ∈ 𝔸}, where 𝔸 is the set of atoms. In this example, 𝔸 is the set of all possible passwords. The set Q is infinite, but satisfies the two finiteness requirements. The upshot of doing this is that the set Q (and transition structure) corresponds directly to the semantics of the automaton. We do not have to encode how values relate or how they interact. Instead, the set (and transition structure) defines all we need to know. Algorithms, such as reachability, minimisation, and learning, can be run on such automata, despite the sets being infinite. These algorithms can be implemented rather easily by using a libraries such as Nλ, Lois, or Ons from Chapter 6. These libraries implement a data structure for nominal sets, and provide ways to iterate over such (infinite) sets. One has to be careful as not all results from automata theory transfer to nominal automata. A notable example is the powerset construction, which converts a non-deterministic automaton into a deterministic one. The problem here is that the powerset of a set is generally not orbit-finite and so the finiteness requirement is not met. Consequently, languages accepted by nominal DFAs are not closed under Kleene star, or even concatenation. 6 Contributions This thesis is split into two parts. Part 1 contains material about black box testing, while Part 2 is about nominal techniques. The chapters can be read in isolation. However, the chapters do get more technical and mathematical – especially in Part 2. Detailed discussion on related work and future directions of research are presented in each chapter. Chapter 2: FSM-based test methods. This chapter introduces test generation methods which can be used for learning or conformance testing. The methods are presented Introduction 11 in a uniform way, which allows to give a single proof of completeness for all these methods. Moreover, the uniform presentation gives room to develop new test generation methods. The main contributions are: – Uniform description of known methods: Theorem 26 (p. 35) – A new proof of completeness: Section 5 (p. 36) – New algorithm (hybrid ADS) and its implementation: Section 3.2 (p. 34) Chapter 3: Applying automata learning to embedded control software. In this chapter we will apply model learning to an industrial case study. It is a unique benchmark as it is much bigger than any of the applications seen before (3410 states and 77 inputs). This makes it challenging to learn a model and the main obstacle is finding counterexamples. The main contribution is: – Application of the hybrid ADS algorithm: Section 2.2 (p. 49) – Succesfully learn a large-scale system: Section 2.3 (p. 51) This is based on the following publication: Smeenk, W., Moerman, J., Vaandrager, F. W., & Jansen, D. N. (2015). Applying Automata Learning to Embedded Control Software. In Formal Methods and Software Engineering - 17th International Conference on Formal Engineering Methods, ICFEM, Proceedings. Springer. doi:10.1007/978-3-319-25423-4_5. Chapter 4: Minimal separating sequences for all pairs of states. Continuing on test generation methods, this chapter presents an efficient algorithm to construct separating sequences. Not only is the algorithm efficient – it runs in 𝒪(n log n) time – it also constructs minimal length sequences. The algorithm is inspired by a minimisation algorithm by Hopcroft (1971), but extending it to construct witnesses is non-trivial. The main contributions are: – Efficient algorithm for separating sequences: Algorithms 4.2 & 4.4 (p. 66 & 68) – Applications to black box testing: Section 4 (p. 70) – Implementation: Section 5 (p. 71) This is based on the following publication: Smetsers, R., Moerman, J., & Jansen, D. N. (2016). Minimal Separating Sequences for All Pairs of States. In Language and Automata Theory and Applications - 10th International Conference, LATA, Proceedings. Springer. doi:10.1007/978-3-319-30000-9_14. Chapter 5: Learning nominal automata. In this chapter, we show how to learn automata over infinite alphabets. We do this by translating the L∗ algorithm directly to a nominal version, νL∗ . The correctness proofs mimic the original proofs by Angluin 12 Chapter 1 (1987). Since our new algorithm is close to the original, we are able to translate variants of the L∗ algorithm as well. In particular, we provide a learning algorithm for nominal non-deterministic automata. The main contributions are: – – – – L∗ -algorithm for nominal automata: Section 3 (p. 86) Its correctness and complexity: Theorem 7 & Corollary 11 (p. 89 & 93) Generalisation to non-deterministic automata: Section 4.2 (p. 96) Implementation in Nλ: Section 5.2 (p. 103) This is based on the following publication: Moerman, J., Sammartino, M., Silva, A., Klin, B., & Szynwelski, M. (2017). Learning nominal automata. In Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages, POPL. ACM. doi:10.1145/3009837.3009879. Chapter 6: Fast computations on ordered nominal sets. In this chapter, we provide a library to compute with nominal sets. We restrict our attention to nominal sets over the total order symmetry. This symmetry allows for a rather easy characterisation of orbits, and hence an easy implementation. We experimentally show that it is competitive with existing tools, which are based on SMT solvers. The main contributions are: – Characterisation theorem of orbits: Table 6.1 (p. 118) – Complexity results: Theorems 18 & 21 (p. 119 and 123) – Implementation: Section 3 (p. 118) This is based on the following publication: Venhoek, D., Moerman, J., & Rot, J. (2018). Fast Computations on Ordered Nominal Sets. In Theoretical Aspects of Computing - ICTAC - 15th International Colloquium, Proceedings. Springer. doi:10.1007/978-3-030-02508-3_26. Chapter 7: Separation and Renaming in Nominal Sets. We investigate how to reduce the size of certain nominal automata. This is based on the observation that some languages (with outputs) are not just invariant under symmetries, but invariant under arbitrary transformations, or renamings. We define a new type of automaton, the separated nominal automaton, and show that they exactly accept those languages which are closed under renamings. All of this is shown by using a theoretical framework: we establish a strong relationship between nominal sets on one hand, and nominal renaming sets on the other. The main contributions are: – Adjunction between nominal sets and renaming sets: Theorem 16 (p. 138) – This adjunction is monoidal: Theorem 17 (p. 139) – Separated automata have reduced state space: Example 36 (p. 147) This is based on a paper under submission: Introduction 13 Moerman, J. & Rot, J. (2019). Separation and Renaming in Nominal Sets. (Under submission). Besides these chapters in this thesis, I have published the following papers. These are not included in this thesis, but a short summary of those papers is presented below. Complementing Model Learning with Mutation-Based Fuzzing. Our group at the Radboud University participated in the RERS challenge 2016. This is a challenge where reactive software is provided and researchers have to asses validity of certain properties (given as LTL specifications). We approached this with model learning: Instead of analysing the source code, we simply learned the external behaviour, and then used model checking on the learned model. This has worked remarkably well, as the models of the external behaviour are not too big. Our results were presented at the RERS workshop (ISOLA 2016). The report can be found on arΧiv: Smetsers, R., Moerman, J., Janssen, M., & Verwer, S. (2016). Complementing Model Learning with Mutation-Based Fuzzing. CoRR, abs/1611.02429. Retrieved from http:/ /arxiv.org/abs/1611.02429. 𝐧-Complete test suites for IOCO. In this paper, we investigate complete test suites for labelled transition systems (LTSs), instead of deterministic Mealy machines. This is a much harder problem than conformance testing of deterministic systems. The system may adversarially avoid certain states the tester wishes to test. We provide a test suite which is n-complete (provided the implementation is a suspension automaton). My main personal contribution here is the proof of completeness, which resembles the proof presented in Chapter 2 closely. The conference paper was presented at ICTSS: van den Bos, P., Janssen, R., & Moerman, J. (2017). n-Complete Test Suites for IOCO. In ICTSS 2017 Proceedings. Springer. doi:10.1007/978-3-319-67549-7_6. An extended version has appeared in: van den Bos, P., Janssen, R., & Moerman, J. (2018). n-Complete Test Suites for IOCO. Software Quality Journal. Advanced online publication. doi:10.1007/s11219-018-9422-x. Learning Product Automata. In this article, we consider Moore machines with multiple outputs. These machines can be decomposed by projecting on each output, resulting in smaller components that can be learned with fewer queries. We give experimental evidence that this is a useful technique which can reduce the number of queries substantially. This is all motivated by the idea that compositional methods are widely used throughout engineering and that we should use this in model learning. This work was presented at ICGI 2018: 14 Chapter 1 Moerman, J. (2019). Learning Product Automata. In International Conference on Grammatical Inference, ICGI, Proceedings. Proceedings of Machine Learning Research. (To appear). 7 Conclusion and Outlook With the current tools for model learning, it is possible to learn big state machines of black box systems. It involves using the clever algorithms for learning (such as the TTT algorithm by Isberner, 2015) and efficient testing methods (see Chapter 2). However, as the industrial case study from Chapter 3 shows, the bottleneck is often in conformance testing. In order to improve on this bottleneck, one possible direction is to consider ‘grey box testing.’ The methods discussed in this thesis are all black box methods, this could be considered as ‘too pessimistic.’ Often, we do have (parts of the) source code and we do know relationships between different inputs. A question for future research is how this additional information can be integrated in a principled manner in the learning and testing of systems. Black box testing still has theoretical challenges. Current generalisations to nondeterministic systems or language inclusion (such as black box testing for IOCO) often need exponentially big test suites. Whether this is necessary is unknown (to me): we only have upper bounds but no lower bounds. An interesting approach could be to see if there exists a notion of reduction between test suites. This is analogous to the reductions used in complexity theory to prove hardness of problems, or reductions used in PAC theory to prove learning problems to be inherently unpredictable. Another path taken in this thesis is the research on nominal automata. This was motivated by the problem of learning automata over infinite alphabets. So far, the results on nominal automata are mostly theoretical in nature. Nevertheless, we show that the nominal algorithms can be implemented and that they can be run concretely on black box systems (Chapter 5). The advantage of using the foundations of nominal sets is that the algorithms are closely related to the original L∗ algorithm. Consequently, variations of L∗ can easily be implemented. For instance, we show that NL∗ algorithm for non-deterministic automata works in the nominal case too. (We have not attempted to implement more recent algorithms such as TTT.) The nominal learning algorithms can be implemented in just a few hundreds lines of code, much less than the approach taken by, e.g., Fiterău-Broștean (2018). In this thesis, we tackle some efficiency issues when computing with nominal sets. In Chapter 6 we characterise orbits in order to give an efficient representation (for the total-order symmetry). Another result is the fact that some nominal automata can be ‘compressed’ to separated automata, which can be exponentially smaller (Chapter 7). However, the nominal tools still leave much to be desired in terms of efficiency. Introduction 15 Last, it would be interesting to marry the two paths taken in this thesis. I am not aware of n-complete test suites for register automata or nominal automata. The results on learning nominal automata in Chapter 5 show that this should be possible, as an observation table gives a test suite.7 However, there is an interesting twist to this problem. The test methods from Chapter 2 can all account for extra states. For nominal automata, we should be able to cope with extra states and extra registers. It would be interesting to see how the test suite grows as these two dimensions increase. 7 The rows of a table are access sequences, and the columns provide a characterisation set. 16 Chapter 1 Part 1: Testing Techniques 18 Chapter Chapter 2 FSM-based Test Methods In this chapter, we will discuss some of the theory of test generation methods for black box conformance checking. Since the systems we consider are black box, we cannot simply determine equivalence with a specification. The only way to gain confidence is to perform experiments on the system. A key aspect of test generation methods is the size and completeness of the test suites. On one hand, we want to cover as much as the specification as possible, hopefully ensuring that we find mistakes in any faulty implementation. On the other hand: testing takes time, so we want to minimise the size of a test suite. The test methods described here are well-known in the literature of FSM-based testing. They all share similar concepts, such as access sequences and state identifiers. In this chapter we will define these concepts, relate them with one another and show how to build test suites from these concepts. This theoretically discussion is new and enables us to compare the different methods uniformly. For instance, we can prove all these methods to be n-complete with a single proof. The discussion also inspired a new algorithm: the hybrid ADS methods. This method is applied to an industrial case study in Chapter 3. It combines the strength of the ADS method (which is not always applicable) with the generality of the HSI method. This chapter starts with the basics: Mealy machines, sequences and what it means to test a black box system. Then, starting from Section 1.3 we define several concepts, such as state identifiers, in order to distinguish one state from another. These concepts are then combined in Section 2 to derive test suites. In a similar vein, we define a novel test method in Section 3 and we discuss some of the implementation details of the hybrid-ads tool. We summarise the various test methods in Section 4. All methods are proven to be n-complete in Section 5. Finally, in Section 6, we discuss related work. 1 Mealy machines and sequences We will focus on Mealy machines, as those capture many protocol specifications and reactive systems. We fix finite alphabets I and O of inputs respectively outputs. We use the usual notation for operations on sequences (also called words): uv for the concatenation of two sequences u, v ∈ I∗ and |u| for the length of u. For a sequence w = uv we say that u and v are a prefix and suffix respectively. 20 Chapter 2 Definition 1. A (deterministic and complete) Mealy machine M consists of a finite set of states S, an initial state s0 ∈ S and two functions: – a transition function δ : S × I → S, and – an output function λ : S × I → O. Both the transition function and output function are extended inductively to sequences as δ : S × I∗ → S and λ : S × I∗ → O∗ : δ(s, ϵ) = s δ(s, aw) = δ(δ(s, a), w) λ(s, ϵ) = ϵ λ(s, aw) = λ(s, a)λ(δ(s, a), w) The behaviour of a state s is given by the output function λ(s, −) : I∗ → O∗ . Two states s and t are equivalent if they have equal behaviours, written s ∼ t, and two Mealy machines are equivalent if their initial states are equivalent. Remark 2. We will use the following conventions and notation. We often write s ∈ M instead of s ∈ S and for a second Mealy machine M′ its constituents are denoted S′ , s′0 , δ′ and λ′ . Moreover, if we have a state s ∈ M, we silently assume that s is not a member of any other Mealy machine M′ . (In other words, the behaviour of s is determined by the state itself.) This eases the notation since we can write s ∼ t without needing to introduce a context. An example Mealy machine is given in Figure 2.1. s1 a/0 a/1 s2 b/0, c/0 a/1 b/0, c/0 b/0 s0 c/0 c/0 b/0 s3 a/1 s4 c/1 a/1 b/0 Figure 2.1 An example specification with input I = {a, b, c} and output O = {0, 1}. 1.1 Testing In conformance testing we have a specification modelled as a Mealy machine and an implementation (the system under test, or SUT) which we assume to behave as a Mealy machine. Tests, or experiments, are generated from the specification and applied to the implementation. We assume that we can reset the implementation before every test. If the output is different from the specified output, then we know FSM-based Test Methods 21 the implementation is flawed. The goals is to test as little as possible, while covering as much as possible. A test suite is nothing more than a set of sequences. We do not have to encode outputs in the test suite, as those follow from the deterministic specification. Definition 3. A test suite is a finite subset T ⊆ I∗ . A test t ∈ T is called maximal if it is not a proper prefix of another test s ∈ T. We denote the set of maximal tests of T by max(T ). The maximal tests are the only tests in T we actually have to apply to our SUT as we can record the intermediate outputs. In the examples of this chapter we will show max(T ) instead of T. We define the size of a test suite as usual (Dorofeeva, et al., 2010 and Petrenko, et al., 2014). The size of a test suite is measured as the sum of the lengths of all its maximal tests plus one reset per test. Definition 4. The size of a test suite T is defined to be ‖T ‖ = (|t| + 1). ∑ t∈max(T ) 1.2 Completeness of test suites Example 5. No test suite is complete. Consider the specification in Figure 2.2a. This machine will always outputs a cup of coffee – when given money. For any test suite we can make a faulty implementation which passes the test suite. A faulty implementation might look like Figure 2.2b, where the machine starts to output beers after n steps (signalling that it’s the end of the day), where n is larger than the length of the longest sequence in the suite. This shows that no test-suite can be complete and it justifies the following definition. � /� s0 (a) � s′0 � /� s′1 � ⋯ /� � /� /� s′n (b) Figure 2.2 A basic example showing that finite test suites are incomplete. The implementation on the right will pass any test suite if we choose n big enough. Definition 6. Let M be a Mealy machine and T be a test suite. We say that T is m-complete (for M) if for all inequivalent machines M′ with at most m states there exists a t ∈ T such that λ(s0 , t) ≠ λ′ (s′0 , t). We are often interested in the case of m-completeness, where m = n + k for some k ∈ ℕ and n is the number of states in the specification. Here k will stand for the number of extra states we can test. 22 Chapter 2 Note the order of the quantifiers in the above definition. We ask for a single test suite which works for all implementations of bounded size. This is crucial for black box testing, as we do not know the implementation, so the test suite has to work for all of them. 1.3 Separating Sequences Before we construct test suites, we discuss several types of useful sequences. All the following notions are standard in the literature, and the corresponding references will be given in Section 2, where we discuss the test generation methods using these notions. We fix a Mealy machine M for the remainder of this chapter. Definition 7. We define the following kinds of sequences. – Given two states s, t in M we say that w is a separating sequence if λ(s, w) ≠ λ(t, w). – For a single state s in M, a sequence w is a unique input output sequence (UIO) if for every inequivalent state t in M we have λ(s, w) ≠ λ(t, w). – Finally, a (preset) distinguishing sequence (DS) is a single sequence w which separates all states of M, i.e., for every pair of inequivalent states s, t in M we have λ(s, w) ≠ λ(t, w). The above list is ordered from weaker to stronger notions, i.e., every distinguishing sequence is an UIO sequence for every state. Similarly, an UIO for a state s is a separating sequence for s and any inequivalent t. Separating sequences always exist for inequivalent states and finding them efficiently is the topic of Chapter 4. On the other hand, UIOs and DSs do not always exist for a machine. A machine M is minimal if every distinct pair of states is inequivalent (i.e., s ∼ t ⟹ s = t). We will not require M te be minimal, although this is often done in literature. Minimality is sometimes convenient, as one can write ‘every other state t’ instead of ‘every inequivalent state t’. Example 8. For the machine in Figure 2.1, we note that state s0 and s2 are separated by the sequence aa (but not by any shorter sequence). In fact, the sequence aa is an UIO for state s0 since it is the only state outputting 10 on that input. However, state s2 has no UIO: If the sequence were to start with b or c, state s3 and s4 respectively have equal transition, which makes it impossible to separate those states after the first symbol. If it starts with an a, states s3 and s4 are swapped and we make no progress in distinguishing these states from s2 . Since s2 has no UIO, the machine as a whole does not admit a DS. In this example, all other states actually have UIOs. For the states s0 , s1 , s3 and s4 , we can pick the sequences aa, a, c and ac respectively. In order to separate s2 from the other state, we have to pick multiple sequences. For instance, the set {aa, ac, c} will separate s2 from all other states. FSM-based Test Methods 23 1.4 Sets of separating sequences As the example shows, we need sets of sequences and sometimes even sets of sets of sequences – called families.8 Definition 9. We define the following kinds of sets of sequences. We require that all sets are prefix-closed, however, we only show the maximal sequences in examples.9 – A set of sequences W is a called a characterisation set if it contains a separating sequence for each pair of inequivalent states in M. – A state identifier for a state s ∈ M is a set Ws such that for every inequivalent t ∈ M a separating sequence for s and t exists in Ws . – A set of state identifiers {Ws }s is harmonised if Ws ∩ Wt contains a separating sequence for inequivalent states s and t. This is also called a separating family. A state identifier Ws will be used to test against a single state. In contrast to a characterisation set, it only include sequences which are relevant for s. The property of being harmonised might seem a bit strange. This property ensures that the same tests are used for different states. This extra consistency within a test suite is necessary for some test methods. We return to this notion in more detail in Example 22. We may obtain a characterisation set by simply considering every pair of states and look for a difference. However, it turns out a harmonised set of state identifiers exists for every machine and this can be constructed very efficiently (Chapter 4). From a set of state identifiers we may obtain a characterisation set by taking the union of all those sets. Example 10. As mentioned before, state s2 from Figure 2.1 has a state identifier {aa, ac, b}. In fact, this set is a characterisation set for the whole machine. Since the other states have UIOs, we can pick singleton sets as state identifiers. For example, state s0 has the UIO aa, so a state identifier for s0 is W0 = {aa}. Similarly, we can take W1 = {a} and W3 = {c}. But note that such a family will not be harmonised since the sets {a} and {c} have no common separating sequence. One more type of state identifier is of our interest: the adaptive distinguishing sequence. It it the strongest type of state identifier, and as a result not many machines have one. Like DSs, adaptive distinguishing sequences can identify a state using a single word. We give a slightly different (but equivalent) definition than the one of Lee and Yannakakis (1994). Definition 11. A separating family ℋ is an adaptive distinguishing sequence (ADS) if each set max(Hs ) is a singleton. 8 9 A family is often written as {Xs }s∈M or simply {Xs }s , meaning that for each state s ∈ M we have a set Xs . Taking these sets to be prefix-closed makes many proofs easier. 24 Chapter 2 It is called an adaptive sequence, since it has a tree structure which depends on the output of the machine. To see this tree structure, consider the first symbols of each of the sequences in the family. Since the family is harmonised and each set is essentially given by a single word, there is only one first symbol. Depending on the output after the first symbol, the sequence continues. Example 12. In Figure 2.3 we see a machine with an ADS. The ADS is given as follows: H0 = {aba} H1 = {aaba} H2 = {aba} H3 = {aaba} Note that all sequences start with a. This already separates s0 , s2 from s1 , s3 . To further separate the states, the sequences continues with either a b or another a. And so on. s0 a/0, b/0 s1 a 0 b/0 a b a/1, b/0 s3 0 s2 0 0 a/1 a/0, b/0 s2 a b 1 s0 0 s1 (a) 1 0 a 1 s3 (b) Figure 2.3 (a): A Mealy machine with an ADS and (b): the tree structure of this ADS. Given an ADS, there exists an UIO for every state. The converse – if every state has an UIO, then the machine admits an ADS – does not hold. The machine in Figure 2.1 admits no ADS, since s2 has no UIO. 1.5 Partial equivalence Definition 13. We define the following notation. – Let W be a set of sequences. Two states x, y are W-equivalent, written x ∼W y, if λ(x, w) = λ(y, w) for all w ∈ W. – Let 𝒲 be a family. Two states x, y are 𝒲-equivalent, written x ∼𝒲 y, if λ(x, w) = λ(y, w) for all w ∈ Wx ∩ Wy . FSM-based Test Methods 25 The relation ∼W is an equivalence relation and W ⊆ V implies that V separates more states than W, i.e., x ∼V y ⟹ x ∼W y. Clearly, if two states are equivalent (i.e., s ∼ t), then for any set W we have s ∼W t. Lemma 14. The relations ∼W and ∼𝒲 can be used to define characterisation sets and separating families. Concretely: – W is a characterisation set if and only if for all s, t in M, s ∼W t implies s ∼ t. – 𝒲 is a separating family if and only if for all s, t in M, s ∼𝒲 t implies s ∼ t. Proof. – W is a characterisation set by definition means s ̸∼ t ⟹ s ̸∼W t as W contains a separating sequence (if it exists at all). This is equivalent to s ∼W t ⟹ s ∼ t. – Let 𝒲 be a separating family and s ̸∼ t. Then there is a sequence w ∈ Ws ∩ Wt such that λ(s, w) ≠ λ(t, w), i.e., s ̸∼𝒲 t. We have shown s ̸∼ t ⟹ s ̸∼𝒲 t, which is equivalent to s ∼𝒲 t ⟹ s ∼ t. The converse is proven similarly. □ 1.6 Access sequences Besides sequences which separate states, we also need sequences which brings a machine to specified states. Definition 15. An access sequence for s is a word w such that δ(s0 , w) = s. A set P consisting of an access sequence for each state is called a state cover. If P is a state cover, then the set {pa | p ∈ P, a ∈ I} is called a transition cover. 1.7 Constructions on sets of sequences In order to define a test suite modularly, we introduce notation for combining sets of words. For sets of words X and Y, we define – their concatenation X ⋅ Y = {xy | x ∈ X, y ∈ Y}, – iterated concatenation X0 = {ϵ} and Xn+1 = X ⋅ Xn , and – bounded concatenation X≤n = ⋃i≤n Xi . On families we define – flattening: ⋃ 𝒳 = {x | x ∈ Xs , s ∈ S}, – union: 𝒳 ∪ 𝒴 is defined point-wise: (𝒳 ∪ 𝒴)s = Xs ∪ Ys , – concatenation10: X ⊙ 𝒴 = {xy | x ∈ X, y ∈ Yδ(s0 ,x) }, and – refinement: 𝒳; 𝒴 defined by11 10 11 We will often see the combination P ⋅ I ⊙ 𝒳, this should be read as (P ⋅ I) ⊙ 𝒳. We use the convention that ∩ binds stronger than ∪. In fact, all the operators here bind stronger than ∪. 26 Chapter 2 (𝒳; 𝒴)s = Xs ∪ Ys ∩ ∪ Yt . s∼𝒳 t s̸∼𝒴 t The latter construction is new and will be used to define a hybrid test generation method in Section 3. It refines a family 𝒳, which need not be separating, by including sequences from a second family 𝒴. It only adds those sequences to states if 𝒳 does not distinguish those states. This is also the reason behind the ;-notation: first the tests from 𝒳 are used to distinguish states, and then for the remaining states 𝒴 is used. Lemma 16. For all families 𝒳 and 𝒴: – 𝒳; 𝒳 = 𝒳, – 𝒳; 𝒴 = 𝒳, whenever 𝒳 is a separating family, and – 𝒳; 𝒴 is a separating family whenever 𝒴 is a separating family. Proof. For the first item, note that there are no states t such that s ∼𝒳 t and s ̸∼𝒳 t. Consequently, the union is empty, and the expression simplifies to (𝒳; 𝒳)s = Xs ∪ (Xs ∩ ∅) = Xs . If 𝒳 is a separating family, then the only t for which s ∼𝒳 t hold are t such that s ∼ t (Lemma 14). But s ∼ t is ruled out by s ̸∼𝒴 t, and again so (𝒳; 𝒴)s = Xs ∪ (Ys ∩ ∅) = Xs . For the last item, suppose that s ∼𝒳;𝒴 t. Then s and t agree on every sequence in (𝒳; 𝒴)s ∩ (𝒳; 𝒴)t . We distinguish two cases: – Suppose s ∼𝒳 t, then Ys ∩ Yt ⊆ (𝒳; 𝒴)s ∩ (𝒳; 𝒴)t . And so s and t agree on Ys ∩ Yt , meaning s ∼𝒴 t. Since 𝒴 is a separating family, we have s ∼ t. – Suppose s ̸∼𝒳 t. This contradicts s ∼𝒳;𝒴 t, since Xs ∩ Xt ⊆ (𝒳; 𝒴)s ∩ (𝒳; 𝒴)t . We conclude that s ∼ t. This proves that 𝒳; 𝒴 is a separating family. □ 2 Test generation methods In this section, we review the classical conformance testing methods: the W, Wp, UIO, UIOv, HSI, ADS methods. At the end of this section, we construct the test suites for the running example. Our hybrid ADS method uses a similar construction. There are many more test generation methods. Literature shows, however, that not all of them are complete. For example, the method by Bernhard (1994) is falsified by Petrenko (1997), and the UIO-method from Sabnani and Dahbura (1988) is shown to be incomplete by Chan, et al. (1989). For that reason, completeness of the correct methods is shown in Theorem 26. The proof is general enough to capture all the methods at once. We fix a state cover P throughout this section and take the transition cover Q = P ⋅ I. FSM-based Test Methods 27 2.1 W-method (Chow, 1978 and Vasilevskii, 1973) After the work of Moore (1956), it was unclear whether a test suite of polynomial size could exist. He presented a finite test suite which was complete, however it was exponential in size. Both Chow (1978) and Vasilevskii (1973) independently prove that test suites of polynomial size exist.12 The W-method is a very structured test suite construction. It is called the W-method as the characterisation set is often called W. Definition 17. Given a characterisation set W, we define the W test suite as TW = (P ∪ Q) ⋅ I≤k ⋅ W. This – and all following methods – tests the machine in two phases. For simplicity, we explain these phases when k = 0. The first phase consists of the tests P ⋅ W and tests whether all states of the specification are (roughly) present in the implementation. The second phase is Q ⋅ W and tests whether the successor states are correct. Together, these two phases put enough constraints on the implementation to know that the implementation and specification coincide (provided that the implementation has no more states than the specification). 2.2 The Wp-method (Fujiwara, et al., 1991) Fujiwara, et al. (1991) realised that one needs fewer tests in the second phase of the W-method. Since we already know the right states are present after phase one, we only need to check if the state after a transition is consistent with the expected state. This justifies the use of state identifiers for each state. Definition 18. Let 𝒲 be a family of state identifiers. The Wp test suite is defined as TWp = P ⋅ I≤k ⋅ ∪ 𝒲 ∪ Q ⋅ I≤k ⊙ 𝒲. Note that ⋃ 𝒲 is a characterisation set as defined for the W-method. It is needed for completeness to test states with the whole set ⋃ 𝒲. Once states are tested as such, we can use the smaller sets Ws for testing transitions. 2.3 The HSI-method (Luo, et al., 1995 and Petrenko, et al., 1993) The Wp-method in turn was refined by Luo, et al. (1995) and Petrenko, et al. (1993). They make use of harmonised state identifiers, allowing to take state identifiers in the initial phase of the test suite. Definition 19. 12 Let ℋ be a separating family. We define the HSI test suite by More precisely: the size of TW is polynomial in the size of the specification for each fixed k. 28 Chapter 2 THSI = (P ∪ Q) ⋅ I≤k ⊙ ℋ. Our hybrid ADS method is an instance of the HSI-method as we define it here. However, Luo, et al. (1995) and Petrenko, et al. (1993) describe the HSI-method together with a specific way of generating the separating families. Namely, the set obtained by a splitting tree with shortest witnesses. The hybrid ADS method does not refine the HSI-method defined in the more restricted sense. 2.4 The ADS-method (Lee & Yannakakis, 1994) As discussed before, when a Mealy machine admits a adaptive distinguishing sequence, only a single test has to be performed for identifying a state. This is exploited in the ADS-method. Definition 20. Let 𝒵 be an adaptive distinguishing sequence. The ADS test suite is defined as TADS = (P ∪ Q) ⋅ I≤k ⊙ 𝒵. 2.5 The UIOv-method (Chan, et al., 1989) Some Mealy machines which do not admit an adaptive distinguishing sequence, may still admit state identifiers which are singletons. These are exactly UIO sequences and gives rise to the UIOv-method. In a way this is a generalisation of the ADS-method, since the requirement that state identifiers are harmonised is dropped. Definition 21. Let 𝒰 = {a single UIO for s}s∈S be a family of UIO sequences, the UIOv test suite is defined as TUIOv = P ⋅ I≤k ⋅ ∪ 𝒰 ∪ Q ⋅ I≤k ⊙ 𝒰. One might think that using a single UIO sequence instead of the set ⋃ 𝒰 to verify the state is enough. In fact, this idea was used for the UIO-method which defines the test suite (P ∪ Q) ⋅ I≤k ⊙ 𝒰. The following is a counterexample, due to Chan, et al. (1989), to such conjecture. Example 22. The Mealy machines in Figure 2.4 shows that UIO-method does not define a 3-complete test suite. Take for example the UIOs u0 = aa, u1 = a, u2 = ba for the states s0 , s1 , s2 respectively. The test suite then becomes {aaaa, abba, baaa, bba} and the faulty implementation passes this suite. This happens because the sequence u2 is not an UIO in the implementation, and the state s′2 simulates both UIOs u1 and u2 . Hence we also want to check that a state does not behave as one of the other states, and therefore we use ⋃ 𝒰. With the same UIOs as above, the resulting UIOv test suite for the specification in Figure 2.4 is {aaaa, aba, abba, baaa, bba} of size 23. (Recall that we also count resets when measuring the size.) FSM-based Test Methods 29 b/1 s1 s′1 a/0 a/1 b/1 b/1 a/0 s0 a/0 a/1 s2 a/0 s′0 b/1 s′2 b/1 Specification Figure 2.4 b/1 Implementation An example where the UIO-method is not complete. 2.6 All test suites for Figure 2.1 Let us compute all the previous test suites on the specification in Figure 2.1. We will be testing without extra states, i.e., we construct 5-complete test suites. We start by defining the state and transition cover. For this, we take all shortest sequences from the initial state to the other states. This state cover is depicted in Figure 2.5. The transition cover is simply constructed by extending each access sequence with another symbol. P = {ϵ, a, aa, b, ba} Q = P ⋅ I = {a, b, c, aa, ab, ac, aaa, aab, aac, ba, bb, bc, baa, bab, bac} s1 b/0, c/0 b/0 s3 c/1 ϵ a/1 a/0 a/1 s2 Figure 2.5 b/0, c/0 s0 c/0 c/0 b/0 a/1 s4 a/1 b/0 A state cover for the specification from Figure 2.1. As shown earlier, the set W = {aa, ac, c} is a characterisation set. The W-method, which simply combines P ∪ Q with W, gives the following test suite of size 169: TW = { aaaaa, aaaac, aaac, aabaa, aabac, aabc, aacaa, aacac, aacc, abaa, abac, abc, acaa, acac, acc, baaaa, baaac, baac, babaa, babac, babc, bacaa, bacac, bacc, bbaa, bbac, bbc, bcaa, bcac, bcc, caa, cac, cc } 30 Chapter 2 With the Wp-method we get to choose a different state identifier per state. Since many states have an UIO, we can use them as state identifiers. This defines the following family 𝒲: W0 = {aa} W1 = {a} W2 = {aa, ac, c} W3 = {c} W4 = {ac} For the first part of the Wp test suite we need ⋃ 𝒲 = {aa, ac, c}. For the second part, we only combine the sequences in the transition cover with the corresponding suffixes. All in all we get a test suite of size 75: TWp = { aaaaa, aaaac, aaac, aabaa, aacaa, abaa, acaa, baaac, baac, babaa, bacc, bbac, bcaa, caa } For the HSI-method we need a separating family ℋ. We pick the following sets: H0 = {aa, c} H1 = {a} H2 = {aa, ac, c} H3 = {a, c} H4 = {aa, ac, c} (We repeat that these sets are prefix-closed, but we only show the maximal sequences.) Note that these sets are harmonised, unlike the family 𝒲. For example, the separating sequence a is contained in both H1 and H3 . This ensures that we do not have to consider ⋃ ℋ in the first part of the test suite. When combining this with the corresponding prefixes, we obtain the HSI test suite of size 125: THSI = { aaaaa, aaaac, aaac, aabaa, aabc, aacaa, aacc, abaa, abc, acaa, acc, baaaa, baaac, baac, babaa, babc, baca, bacc, bbaa, bbac, bbc, bcaa, bcc, caa, cc } On this particular example the Wp-method outperforms the HSI-method. The reason is that many states have UIOs and we picked those to be the state identifiers. In general, however, UIOs may not exist (and finding them is hard). The UIO-method and ADS-method are not applicable in this example because state s2 does not have an UIO. s′1 a/0 s′2 b/0, c/0 a/1 a/1, b/0, c/0 s′3 b/0 s′0 c/0 c/0 a/1 b/0 Figure 2.6 A faulty implementation for the specification in Figure 2.1. c/1 a/1 s′4 b/0 FSM-based Test Methods 31 We can run these test suites on the faulty implementation shown in Figure 2.6. Here, the a-transition from state s′2 transitions to the wrong target state. It is not an obvious mistake, since the faulty target s′0 has very similar transitions as s2 . Yet, all the test suites detect this error. When choosing the prefix aaa (included in the transition cover), and suffix aa (included in the characterisation set and state identifiers for s2 ), we see that the specification outputs 10111 and the implementation outputs 10110. The sequence aaaaa is the only sequence (in any of the test suites here) which detects this fault. Alternatively, the a-transition from s′2 would transition to s′4 , we need the suffix ac as aa will not detect the fault. Since the sequences ac is included in the state identifier for s2 , this fault would also be detected. This shows that it is sometimes necessary to include multiple sequences in the state identifier. Another approach to testing would be to enumerate all sequences up to a certain length. In this example, we need sequences of at least length 5. Consequently, the test suite contains 243 sequences and this boils down to a size of 1458. Such a brute-force approach is not scalable. 3 Hybrid ADS method In this section, we describe a new test generation method for Mealy machines. Its completeness will be proven in Theorem 26, together with completeness for all methods defined in the previous section. From a high level perspective, the method uses the algorithm by Lee and Yannakakis (1994) to obtain an ADS. If no ADS exists, their algorithm still provides some sequences which separates some inequivalent states. Our extension is to refine the set of sequences by using pairwise separating sequences. Hence, this method is a hybrid between the ADS-method and HSI-method. The reason we do this is the fact that the ADS-method generally constructs small test suites as experiments by Dorofeeva, et al. (2010) suggest. The test suites are small since an ADS can identify a state with a single word, instead of a set of words which is generally needed. Even if the ADS does not exist, using the partial result of Lee and Yannakakis’ algorithm can reduce the size of test suites. We will now see the construction of this hybrid method. Instead of manipulating separating families directly, we use a splitting tree. This is a data structure which is used to construct separating families or adaptive distinguishing sequences. Definition 23. A splitting tree (for M) is a rooted tree where each node u has – a non-empty set of states l(u) ⊆ M, and – if u is not a leaf, a sequence σ(u) ∈ I∗ . We require that if a node u has children C(u) then – the sets of states of the children of u partition l(u), i.e., the set P(u) = {l(v) | v ∈ C(u)} is a non-trivial partition of l(u), and 32 Chapter 2 – the sequence σ(u) witnesses the partition P(u), meaning that for all p, q ∈ P(u) we have p = q iff λ(s, σ(u)) = λ(t, σ(u)) for all s ∈ p, t ∈ q. A splitting tree is called complete if all inequivalent states belong to different leaves. Efficient construction of a splitting tree is described in more detail in Chapter 4. Briefly, the splitting tree records the execution of a partition refinement algorithm (such as Moore’s or Hopcroft’s algorithm). Each non-leaf node encodes a split together with a witness, which is a separating sequence for its children. From such a tree we can construct a state identifier for a state by locating the leaf containing that state and collecting all the sequences you read when traversing to the root. For adaptive distinguishing sequences an additional requirement is put on the splitting tree: for each non-leaf node u, the sequence σ(u) defines an injective map x ↦ (δ(x, σ(u)), λ(x, σ(u))) on the set l(u). Lee and Yannakakis (1994) call such splits valid. Figure 2.7 shows both valid and invalid splits. Validity precisely ensures that after performing a split, the states are still distinguishable. Hence, sequences of such splits can be concatenated. s0 , s 1 , s 2 , s 3 , s 4 a s0 , s2 , s3 , s4 c s1 s0 , s2 , s4 aa s3 s2 , s4 ac s0 s2 s4 Figure 2.7 A complete splitting tree with shortest witnesses for the specification of Figure 2.1. Only the splits a, aa, and ac are valid. The following lemma is a result of Lee and Yannakakis (1994). Lemma 24. A complete splitting tree with only valid splits exists if and only if there exists an adaptive distinguishing sequence. Our method uses the exact same algorithm as the one by Lee and Yannakakis. However, we also apply it in the case when the splitting tree with valid splits is not complete (and hence no adaptive distinguishing sequence exists). Their algorithm still produces a family of sets, but is not necessarily a separating family. FSM-based Test Methods 33 In order to recover separability, we refine that family. Let 𝒵′ be the result of Lee and Yannakakis’ algorithm (to distinguish from their notation, we add a prime) and let ℋ be a separating family extracted from an ordinary splitting tree. The hybrid ADS family is defined as 𝒵′ ; ℋ, and can be computed as sketched in Algorithm 2.1 (the algorithm works on splitting trees instead of separating families). By Lemma 16 we note the following: in the best case this family is an adaptive distinguishing sequence; in the worst case it is equal to ℋ; and in general it is a combination of the two families. In all cases, the result is a separating family because ℋ is. 1 2 3 4 5 6 7 8 9 Require: A Mealy machine M Ensure: A separating family Z T1 ← splitting tree for Moore’s minimisation algorithm T2 ← splitting tree with valid splits (see Lee & Yannakakis, 1994) 𝒵′ ← (incomplete) family constructed from T2 for all inequivalent states s, t in the same leaf of T2 do u ← lca(T1 , s, t) Zs ← Z′s ∪ {σ(u)} Zt ← Zt′ ∪ {σ(u)} end for return Z Algorithm 2.1 Obtaining the hybrid separating family 𝒵′ ; ℋ With the hybrid family we can define the test suite as follows. Its m-completeness is proven in Section 5. Definition 25. Let P be a state cover, 𝒵′ be a family of sets constructed with the Lee and Yannakakis algorithm, and ℋ be a separating family. The hybrid ADS test suite is Th-ADS = (P ∪ Q) ⋅ I≤k ⊙ (𝒵′ ; ℋ). 3.1 Example In the figure we see the (unique) result of Lee and Yannakakis’ algorithm. We note that the states s2 , s3 , s4 are not split, so we need to refine the family for those states. We take the separating family ℋ from before. From the incomplete ADS in Figure 2.8b above we obtain the family 𝒵′ . These families and the refinement 𝒵′ ; ℋ are given below. 34 Chapter 2 s0 , s1 , s2 , s3 , s4 a 0 s0 , s 2 , s 3 , s 4 aa s1 s1 s2 , s3 , s4 s0 a 1 0 a s0 (a) 1 s2 , s 3 , s 4 (b) Figure 2.8 (a): Largest splitting tree with only valid splits for Figure 2.1. (b): Its incomplete adaptive distinguishing tree. H0 H1 H2 H3 H4 = {aa, c} = {a} = {aa, ac, c} = {a, c} = {aa, ac, c} Z′0 Z′1 Z′2 Z′3 Z′4 = {aa} = {a} = {aa} = {aa} = {aa} (Z′ ; H)0 (Z′ ; H)1 (Z′ ; H)2 (Z′ ; H)3 (Z′ ; H)4 = {aa} = {a} = {aa, ac, c} = {aa, c} = {aa, ac, c} With the separating family 𝒵′ ; ℋ we obtain the following test suite of size 96: Th-ADS = { aaaaa, aaaac, aaac, aabaa, aacaa, abaa, acaa, baaaa, baaac, baac, babaa, bacaa, bacc, bbaa, bbac, bbc, bcaa, caa } We note that this is indeed smaller than the HSI test suite. In particular, we have a smaller state identifier for s0 : {aa} instead of {aa, c}. As a consequence, there are less combinations of prefixes and suffixes. We also observe that one of the state identifiers grew in length: {aa, c} instead of {a, c} for state s3 . 3.2 Implementation All the algorithms concerning the hybrid ADS-method have been implemented and can be found at https://github.com/Jaxan/hybrid-ads. We note that Algorithm 2.1 is implemented a bit more efficiently, as we can walk the splitting trees in a particular order. For constructing the splitting trees in the first place, we use Moore’s minimisation algorithm and the algorithms by Lee and Yannakakis (1994). We keep all relevant sets prefix-closed by maintaining a trie data structure. A trie data structure also allows us to immediately obtain the set of maximal tests only. 3.3 Randomisation Many constructions of the test suite generation can be randomised. There may exist many shortest access sequences to a state and we can randomly pick any. Also in the FSM-based Test Methods 35 construction of state identifiers many steps in the algorithm are non-deterministic: the algorithm may ask to find any input symbol which separates a set of states. The tool randomises many such choices. We have noticed that this can have a huge influence on the size of the test suite. However, a decent statistical investigation still lacks at the moment. In many of the applications such as learning, no bound on the number of states of the SUT is known. In such cases it is possible to randomly select test cases from an infinite test suite. Unfortunately, we lose the theoretical guarantees of completeness with random generation. Still, as we will see in Chapter 3, this can work really well. We can randomly test cases as follows. In the above definition for the hybrid ADS test suite we replace I≤k by I∗ to obtain an infinite test suite. Then we sample tests as follows: 1. sample an element p from P uniformly, 2. sample a word w from I∗ with a geometric distribution, and 3. sample uniformly from (𝒵′ ; ℋ)s for the state s = δ(s0 , pw). 4 Overview We give an overview of the aforementioned test methods. We classify them in two directions, – whether they use harmonised state identifiers or not and – whether it used singleton state identifiers or not. Theorem 26. Assume M to be minimal, reachable, and of size n. The following test suites are all n + k-complete: Arbitrary Many / pairwise Harmonised Wp HSI P ⋅ I≤k ⋅ ⋃ 𝒲 ∪ Q ⋅ I≤k ⊙ 𝒲 (P ∪ Q) ⋅ I≤k ⊙ ℋ Hybrid Hybrid ADS (P ∪ Q) ⋅ I≤k ⊙ (𝒵′ ; ℋ) Single / global UIOv ADS P ⋅ I≤k ⋅ ⋃ 𝒰 ∪ Q ⋅ I≤k ⊙ 𝒰 (P ∪ Q) ⋅ I≤k ⊙ 𝒵 Proof. See Corollary 33 and 35. □ Each of the methods in the right column can be written simpler as P ⋅ I≤k+1 ⊙ ℋ, since Q = P ⋅ I. This makes them very easy to implement. It should be noted that the ADS-method is a specific instance of the HSI-method and similarly the UIOv-method is an instance of the Wp-method. What is generally 36 Chapter 2 meant by the Wp-method and HSI-method is the above formula together with a particular way to obtain the (harmonised) state identifiers. We are often interested in the size of the test suite. In the worst case, all methods generate a test suite with a size in 𝒪(pn3 ) and this bound is tight (Vasilevskii, 1973). Nevertheless we expect intuitively that the right column performs better, as we are using a more structured set (given a separating family for the HSI-method, we can always forget about the common prefixes and apply the Wp-method, which will never be smaller if constructed in this way). Also we expect the bottom row to perform better as there is a single test for each state. Small experimental results confirm this intuition (Dorofeeva, et al., 2010). On the example in Figure 2.1, we computed all applicable test suites in Sections 2.6 and 3.1. The UIO and ADS methods are not applicable. For the W, Wp, HSI and hybrid ADS methods we obtained test suites of size 169, 75, 125 and 96 respectively. 5 Proof of completeness In this section, we will prove n-completeness of the discussed test methods. Before we dive into the proof, we give some background on the proof-principle of bisimulation. The original proofs of completeness often involve an inductive argument (on the length of words) inlined with arguments about characterisation sets. This can be hard to follow and so we prefer a proof based on bisimulations, which defers the inductive argument to a general statement about bisimulation. Many notions of bisimulation exist in the theory of labelled transition systems, but for Mealy machines there is just one simple definition. We give the definition and the main proof principle, all of which can be found in a paper by Rutten (1998). Definition 27. Let M be a Mealy machine. A relation R ⊆ S × S is called a bisimulation if for every (s, t) ∈ R we have – equal outputs: λ(s, a) = λ(t, a) for all a ∈ I, and – related successor states: (δ(s, a), δ(t, a)) ∈ R for all a ∈ I. Lemma 28. If two states s, t are related by a bisimulation, then s ∼ t.13 We use a slight generalisation of the bisimulation proof technique, called bisimulation up-to. This allows one to give a smaller set R which extends to a bisimulation. A good introduction of these up-to techniques is given by Bonchi and Pous (2015) or the thesis of Rot (2015). In our case we use bisimulation up-to ∼-union. The following lemma can be found in the given references. 13 The converse – which we do not need here – also holds, as ∼ is a bisimulation. FSM-based Test Methods 37 Definition 29. Let M be a Mealy machine. A relation R ⊆ S × S is called a bisimulation up-to ∼-union if for every (s, t) ∈ R we have – equal outputs: λ(s, a) = λ(t, a) for all a ∈ I, and – related successor states: (δ(s, a), δ(t, a)) ∈ R or δ(s, a) ∼ δ(t, a) for all a ∈ I. Lemma 30. Any bisimulation up-to ∼-union is contained in a bisimulation. We fix a specification M which has a minimal representative with n states and an implementation M′ with at most n + k states. We assume that all states are reachable from the initial state in both machines (i.e., both are connected). The next proposition gives sufficient conditions for a test suite of a certain shape to be complete. We then prove that these conditions hold for the test suites in this chapter. Proposition 31. Let 𝒲 and 𝒲′ be two families of words and P a state cover for M. Let T = P ⋅ I≤k ⊙ 𝒲 ∪ P ⋅ I≤k+1 ⊙ 𝒲′ be a test suite. If 1. for all x, y ∈ M : x ∼Wx ∩Wy y implies x ∼ y, 2. for all x, y ∈ M and z ∈ M′ : x ∼Wx z and z ∼Wy′ y implies x ∼ y, and 3. the machines M and M′ agree on T, then M and M′ are equivalent. Proof. First, we prove that P ⋅ I≤k reaches all states in M′ . For p, q ∈ P and x = δ(s0 , p), y = δ(s0 , q) such that x ̸∼Wx ∩Wy y, we also have δ′ (s′0 , p) ̸∼Wx ∩Wy δ′ (s′0 , q) in the implementation M′ . By (1) this means that there are at least n different behaviours in M′ , hence at least n states. Now n states are reached by the previous argument (using the set P). By assumption M′ has at most k extra states. If those extra states are reachable, they are reachable from an already visited state in at most k steps. So we can reach all states of M′ by using I≤k after P. Second, we show that the reached states are bisimilar. Define the relation R = {(δ(s0 , p), δ′ (s′0 , p)) | p ∈ P ⋅ I≤k }. Note that for each (s, i) ∈ R we have s ∼Ws i. For each state i ∈ M′ there is a state s ∈ M such that (s, i) ∈ R, since we reach all states in both machines by P ⋅ I≤k . We will prove that this relation is in fact a bisimulation up-to ∼-union. For output, we note that (s, i) ∈ R implies λ(s, a) = λ′ (i, a) for all a, since the machines agree on P ⋅ I≤k+1 . For the successors, let (s, i) ∈ R and a ∈ I and consider the successors s2 = δ(s, a) and i2 = δ′ (i, a). We know that there is some t ∈ M with (t, i2 ) ∈ R. We also know that we tested i2 with the set Wt . So we have: s2 ∼Ws′ i2 ∼Wt t. 2 By the second assumption, we conclude that s2 ∼ t. So s2 ∼ t and (t, i) ∈ R, which means that R is a bisimulation up-to ∼-union. Moreover, R contains the pair (s0 , s′0 ). By using Lemmas 30 and 28, we conclude that the initial s0 and s′0 are equivalent. □ 38 Chapter 2 Before we show that the conditions hold for the test methods, we reflect on the above proof first. This proof is very similar to the completeness proof by Chow (1978).14 In the first part we argue that all states are visited by using some sort of counting and reachability argument. Then in the second part we show the actual equivalence. To the best of the authors knowledge, this is first m-completeness proof which explicitly uses the concept of a bisimulation. Using a bisimulation allows us to slightly generalise and use bisimulation up-to ∼-union, dropping the the often-assumed requirement that M is minimal. Lemma 32. Let 𝒲′ be a family of state identifiers for M. Define the family 𝒲 by Ws = ⋃ 𝒲′ . Then the conditions (1) and (2) in Proposition 31 are satisfied. Proof. The first condition we note that Wx ∩ Wy = Wx = Wy , and so x ∼Wx ∩Wy y implies x ∼Wx y, now by definition of state identifier we get x ∼ y. For the second condition, let x ∼∪ 𝒲′ z ∼Wy′ y. Then we note that Wy′ ⊆ ⋃ W ′ and so we get x ∼Wy′ z ∼Wy′ y. By transitivity we get x ∼Wy′ y and so by definition of state identifier we get x ∼ y. □ Corollary 33. The W, Wp, and UIOv test suites are n + k-complete. Lemma 34. Let ℋ be a separating family and take 𝒲 = 𝒲′ = ℋ. Then the conditions (1) and (2) in Proposition 31 are satisfied. Proof. Let x ∼Hx ∩Hy y, then by definition of separating family x ∼ y. For the second condition, let x ∼Hx z ∼Hy y. Then we get x ∼Hx ∩Hy z ∼Hx ∩Hy y and so by transitivity x ∼Hx ∩Hy y, hence again x ∼ y. □ Corollary 35. The HSI, ADS and hybrid ADS test suites are n + k-complete. 6 Related Work and Discussion In this chapter, we have mostly considered classical test methods which are all based on prefixes and state identifiers. There are more recent methods which almost fit in the same framework. We mention the P (Simão & Petrenko, 2010), H (Dorofeeva, et al., 2005), and SPY (Simão, et al., 2009) methods. The P method construct a test suite by carefully considering sufficient conditions for a p-complete test suite (here p ≤ n, where n is the number of states). It does not generalise to extra states, but it seems to construct very small test suites. The H method is a refinement of the HSImethod where state identifiers for a testing transitions are reconsidered. (Note that Proposition 31 allows for a different family when testing transitions.) Last, the SPY 14 In fact, it is also similar to Lemma 4 by Angluin (1987) which proves termination in the L* learning algorithm. This correspondence was noted by Berg, et al. (2005). FSM-based Test Methods 39 method builds upon the HSI-method and changes the prefixes in order to minimise the size of a test suite, exploiting overlap in test sequences. We believe that this technique is independent of the HSI-method and can in fact be applied to all methods presented in this chapter. As such, the SPY method should be considered as an optimisation technique, orthogonal to the work in this chapter. Recently, Hierons and Türker (2015) devise a novel test method which is based on incomplete distinguishing sequences and is similar to the hybrid ADS method. They use sequences which can be considered to be adaptive distinguishing sequences on a subset of the state space. With several of those one can cover the whole state space, obtaining a m-complete test suite. This is a bit dual to our approach, as our “incomplete” adaptive distinguishing sequences define a course partition of the complete state space. Our method becomes complete by refining the tests with pairwise separating sequences. Some work is put into minimising the adaptive distinguishing sequences themselves. Türker and Yenigün (2014) describe greedy algorithms which construct small adaptive distinguishing sequences. Moreover, they show that finding the minimal adaptive distinguishing sequence is NP-complete in general, even approximation is NP-complete. We expect that similar heuristics also exist for the other test methods and that they will improve the performance. Note that minimal separating sequences do not guarantee a minimal test suite. In fact, we see that the hybrid ADS method outperforms the HSI-method on the example in Figure 2.1 since it prefers longer, but fewer, sequences. Some of the assumptions made at the start of this chapter have also been challenged. For non-deterministic Mealy machine, we mention the work of Petrenko and Yevtushenko (2014). We also mention the work of van den Bos, et al. (2017) and Simão and Petrenko (2014) for input/output transition systems with the ioco relation. In both cases, the test suites are still defined in the same way as in this chapter: prefixes followed by state identifiers. However, for non-deterministic systems, guiding an implementation into a state is harder as the implementation may choose its own path. For that reason, sequences are often replaced by automata, so that the testing can be adaptive. This adaptive testing is game-theoretic and the automaton provides a strategy. This game theoretic point of view is further investigated by van den Bos and Stoelinga (2018). The test suites generally are of exponential size, depending on how non-deterministic the systems are. The assumption that the implementation is resettable is also challenged early on. If the machine has no reliable reset (or the reset is too expensive), one tests the system with a single checking sequence. Lee and Yannakakis (1994) give a randomised algorithm for constructing such a checking sequence using adaptive distinguishing sequences. There is a similarity with the randomised algorithm by Rivest and Schapire (1993) for learning non-resettable automata. Recently, Groz, et al. (2018) give a deterministic learning algorithm for non-resettable machines based on adaptive distinguishing sequences. 40 Chapter 2 Many of the methods described here are benchmarked on small or random Mealy machines by Dorofeeva, et al. (2010) and Endo and Simão (2013). The benchmarks are of limited scope, the machine from Chapter 3, for instance, is neither small nor random. For this reason, we started to collect more realistic benchmarks at http:/ /automata.cs.ru.nl/. Chapter 3 Applying Automata Learning to Embedded Control Software Wouter Smeek Océ Technologies B.V. Joshua Moerman Radboud University Frits Vaandrager Radboud University David N. Jansen Radboud University Abstract Using an adaptation of state-of-the-art algorithms for black-box automata learning, as implemented in the LearnLib tool, we succeeded to learn a model of the Engine Status Manager (ESM), a software component that is used in printers and copiers of Océ. The main challenge that we encountered was that LearnLib, although effective in constructing hypothesis models, was unable to find counterexamples for some hypotheses. In fact, none of the existing FSM-based conformance testing methods that we tried worked for this case study. We therefore implemented an extension of the algorithm of Lee & Yannakakis for computing an adaptive distinguishing sequence. Even when an adaptive distinguishing sequence does not exist, Lee & Yannakakis’ algorithm produces an adaptive sequence that “almost” identifies states. In combination with a standard algorithm for computing separating sequences for pairs of states, we managed to verify states with on average 3 test queries. Altogether, we needed around 60 million queries to learn a model of the ESM with 77 inputs and 3.410 states. We also constructed a model directly from the ESM software and established equivalence with the learned model. To the best of our knowledge, this is the first paper in which active automata learning has been applied to industrial control software. This chapter is based on the following publication: Smeenk, W., Moerman, J., Vaandrager, F. W., & Jansen, D. N. (2015). Applying Automata Learning to Embedded Control Software. In Formal Methods and Software Engineering - 17th International Conference on Formal Engineering Methods, ICFEM, Proceedings. Springer. doi:10.1007/978-3-319-25423-4_5 42 Chapter 3 Once they have high-level models of the behaviour of software components, software engineers can construct better software in less time. A key problem in practice, however, is the construction of models for existing software components, for which no or only limited documentation is available. The construction of models from observations of component behaviour can be performed using regular inference – also known as automata learning (see Angluin, 1987; de la Higuera, 2010; Steffen, et al., 2011). The most efficient such techniques use the set-up of active learning, illustrated in Figure 3.1, in which a “learner” has the task to learn a model of a system by actively asking questions to a “teacher”. Learner Teacher Membership Query Output SUT Equivalence Query yes or no with counterexample Figure 3.1 MBT tool Active learning of reactive systems. The core of the teacher is a System Under Test (SUT), a reactive system to which one can apply inputs and whose outputs one may observe. The learner interacts with the SUT to infer a model by sending inputs and observing the resulting outputs (“membership queries”). In order to find out whether an inferred model is correct, the learner may pose an “equivalence query”. The teacher uses a model-based testing (MBT) tool to try and answer such queries: Given a hypothesised model, an MBT tool generates a long test sequence using some conformance testing method. If the SUT passes this test, then the teacher informs the learner that the model is deemed correct. If the outputs of the SUT and the model differ, this constitutes a counterexample, which is returned to the learner. Based on such a counterexample, the learner may then construct an improved hypothesis. Hence, the task of the learner is to collect data by interacting with the teacher and to formulate hypotheses, and the task of the MBT tool is to establish the validity of these hypotheses. It is important to note that it may occur that an SUT passes the test for an hypothesis, even though this hypothesis is not valid. Triggered by various theoretical and practical results, see for instance the work by Aarts (2014); Berg, et al. (2005); Cassel, et al. (2015); Howar, et al. (2012); Leucker (2006); Merten, et al. (2012); Raffelt, et al. (2009), there is a fast-growing interest in automata learning technology. In recent years, automata learning has been applied successfully, e.g., to regression testing of telecommunication systems (Hungar, et al., 2003), checking conformance of communication protocols to a reference implementation (Aarts, et al., 2014), finding bugs in Windows and Linux implementations of TCP Applying Automata Learning to Embedded Control Software 43 (Fiterău-Broștean, et al., 2014), analysis of botnet command and control protocols (Cho, et al., 2010), and integration testing (Groz, et al., 2008 and Li, et al., 2006). In this chapter, we explore whether LearnLib by Raffelt, et al. (2009), a state-ofthe-art automata learning tool, is able to learn a model of the Engine Status Manager (ESM), a piece of control software that is used in many printers and copiers of Océ. Software components like the ESM can be found in many embedded systems in one form or another. Being able to retrieve models of such components automatically is potentially very useful. For instance, if the software is fixed or enriched with new functionality, one may use a learned model for regression testing. Also, if the source code of software is hard to read and poorly documented, one may use a model of the software for model-based testing of a new implementation, or even for generating an implementation on a new platform automatically. Using a model checker one may also study the interaction of the software with other components for which models are available. The ESM software is actually well documented, and an extensive test suite exists. The ESM, which has been implemented using Rational Rose Real-Time (RRRT), is stable and has been in use for 10 years. Due to these characteristics, the ESM is an excellent benchmark for assessing the performance of automata learning tools in this area. The ESM has also been studied in other research projects: Ploeger (2005) modelled the ESM and other related managers and verified properties based on the official specifications of the ESM, and Graaf and van Deursen (2007) have checked the consistency of the behavioural specifications defined in the ESM against the RRRT definitions. Learning a model of the ESM turned out to be more complicated than expected. The top level UML/RRRT statechart from which the software is generated only has 16 states. However, each of these states contains nested states, and in total there are 70 states that do not have further nested states. Moreover, the C++ code contained in the actions of the transitions also creates some complexity, and this explains why the minimal Mealy machine that models the ESM has 3.410 states. LearnLib has been used to learn models with tens of thousands of states by Raffelt, et al. (2009), and therefore we expected that it would be easy to learn a model for the ESM. However, finding counterexamples for incorrect hypotheses turned out to be challenging due to the large number of 77 inputs. The test algorithms implemented in LearnLib, such as random testing, the W-method by Chow (1978) and Vasilevskii (1973) and the Wp-method by Fujiwara, et al. (1991), failed to deliver counterexamples within an acceptable time. Automata learning techniques have been successfully applied to case studies in which the total number of input symbols is much larger, but in these cases it was possible to reduce the number of inputs to a small number (less than 10) using abstraction techniques (Aarts, et al., 2015 and Howar, et al., 2011). In the case of ESM, use of abstraction techniques only allowed us to reduce the original 156 concrete actions to 77 abstract actions. 44 Chapter 3 We therefore implemented an extension of an algorithm of Lee and Yannakakis (1994) for computing adaptive distinguishing sequences. Even when an adaptive distinguishing sequence does not exist, Lee & Yannakakis’ algorithm produces an adaptive sequence that “almost” identifies states. In combination with a standard algorithm for computing separating sequences for pairs of states, we managed to verify states with on average 3 test queries and to learn a model of the ESM with 77 inputs and 3.410 states. We also constructed a model directly from the ESM software and established equivalence with the learned model. To the best of our knowledge, this is the first paper in which active automata learning has been applied to industrial control software. Preliminary evidence suggests that our adaptation of Lee & Yannakakis’ algorithm outperforms existing FSM-based conformance algorithms. During recent years most researchers working on active automata learning focused their efforts on efficient algorithms and tools for the construction of hypothesis models. Our work shows that if we want to further scale automata learning to industrial applications, we also need better algorithms for finding counterexamples for incorrect hypotheses. Following Berg, et al. (2005), our work shows that the context of automata learning provides both new challenges and new opportunities for the application of testing algorithms. All the models for the ESM case study together with the learning and testing statistics are available at http://www.mbsd.cs.ru.nl/publications/papers /fvaan/ESM/, as a benchmark for both the automata learning and testing communities. It is now also included in the automata wiki at http://automata.cs.ru.nl/. 1 Engine Status Manager The focus of this article is the Engine Status Manager (ESM), a software component that is used to manage the status of the engine of Océ printers and copiers. In this section, the overall structure and context of the ESM will be explained. 1.1 ESRA The requirements and behaviour of the ESM are defined in a software architecture called Embedded Software Reference Architecture (ESRA). The components defined in this architecture are reused in many of the products developed by Océ and form an important part of these products. This architecture is developed for cut-sheet printers or copiers. The term cut-sheet refers to the use of separate sheets of paper as opposed to a continuous feed of paper. An engine refers to the printing or scanning part of a printer or copier. Other products can be connected to an engine that pre- or postprocess the paper, for example a cutter, folder, stacker or stapler. Applying Automata Learning to Embedded Control Software 45 Engine Software Controller External Interface Adapters Managers OS Facilities and Services Functions Figure 3.2 Global overview of the engine software. Figure 3.2 gives an overview of the software in a printer or copier. The controller communicates the required actions to the engine software. This includes transport of digital images, status control, print or scan actions and error handling. The controller is responsible for queuing, processing the actions received from the network and operators and delegating the appropriate actions to the engine software. The managers communicate with the controller using the external interface adapters. These adapters translate the external protocols to internal protocols. The managers manage the different functions of the engine. They are divided by the different functionalities such as status control, print or scan actions or error handling they implement. In order to do this, a manager may communicate with other managers and functions. A function is responsible for a specific set of hardware components. It translates commands from the managers to the function hardware and reports the status and other information of the function hardware to the managers. This hardware can for example be the printing hardware or hardware that is not part of the engine hardware such as a stapler. Other functionalities such as logging and debugging are orthogonal to the functions and managers. 1.2 ESM and connected components The ESM is responsible for the transition from one status of the printer or copier to another. It coordinates the functions to bring them in the correct status. Moreover, it informs all its connected clients (managers or the controller) of status changes. Finally, it handles status transitions when an error occurs. Figure 3.3 shows the different components to which the ESM is connected. The Error Handling Manager (EHM), Action Control Manager (ACM) and other clients request engine statuses. The ESM decides whether a request can be honored immediately, has to be postponed or ignored. If the requested action is processed the ESM requests the functions to go to the appropriate status. The EHM has the highest priority and its requests are processed first. The EHM can request the engine to go into the defect status. The ACM has the next highest priority. The ACM requests the engine to switch between running and standby status. The other clients request transitions between the other statuses, such as idle, sleep, standby and low power. All the other clients have 46 Chapter 3 Error Handling Manager Action Control Manager Other Client Top Capsule Engine Status Manager Information Manager Function Figure 3.3 Overview of the managers and clients connected to the ESM. the same lowest priority. The Top Capsule instantiates the ESM and communicates with it during the initialisation of the ESM. The Information Manager provides some parameters during the initialisation. There are more managers connected to the ESM but they are of less importance and are thus not mentioned here. 1.3 Rational Rose RealTime The ESM has been implemented using Rational Rose RealTime (RRRT). In this tool so-called capsules can be created. Each of these capsules defines a hierarchical statechart diagram. Capsules can be connected with each other using structure diagrams. Each capsule contains a number of ports that can be connected to ports of other capsules by adding connections in the associated structure diagram. Each of these ports specifies which protocol should be used. This protocol defines which messages may be sent to and from the port. Transitions in the statechart diagram of the capsule can be triggered by arriving messages on a port of the capsule. Messages can be sent to these ports using the action code of the transition. The transitions between the states, actions and guards are defined in C++ code. From the state diagram, C++ source files are generated. The RRRT language and semantics is based on UML (Object Management Group (OMG), 2004) and ROOM (Selic, et al., 1994). One important concept used in RRRT is the run-to-completion execution model (Eshuis, et al., 2002). This means that when a received message is processed, the execution cannot be interrupted by other arriving messages. These messages are placed in a queue to be processed later. Applying Automata Learning to Embedded Control Software 47 1.4 The ESM state diagram defect goingToDefect sleep goingToSleep power on startup awakening power off idle resetting goingToStandby goingToLowPower standby medium starting lowPower stopping running Figure 3.4 Top states and transitions of the ESM. Figure 3.4 shows the top states of the ESM statechart. The statuses that can be requested by the clients and managers correspond to gray states. The other states are so called transitory states. In transitory states the ESM is waiting for the functions to report that they have moved to the corresponding status. Once all functions have reported, the ESM moves to the corresponding status. The idle status indicates that the engine has started up but that it is still cold (uncontrolled temperature). The standby status indicates that the engine is warm and ready for printing or scanning. The running status indicates that the engine is printing or scanning. The transitions from the overarching state to the goingToSleep and goingToDefect states indicate that it is possible to move to the sleep or defect status from any state. In some cases it is possible to awake from sleep status, in other cases the main power is turned off. The medium status is designed for diagnostics. In this status the functions can each be in a different status. For example one function is in standby status while another function is in idle status. The statechart diagram in Figure 3.4 may seem simple, but it hides many details. Each of the states has up to 5 nested states. In total there are 70 states that do not have further nested states. The C++ code contained in the actions of the transitions is in some cases non-trivial. The possibility to transition from any state to the sleep or defect state also complicates the learning. 48 Chapter 3 2 Learning the ESM In order to learn a model of the ESM, we connected it to LearnLib by Merten, et al. (2011), a state-of-the-art tool for learning Mealy machines developed at the University of Dortmund. A Mealy machine is a tuple M = (I, O, Q, q0 , δ, λ), where – I is a finite set of input symbols, – O is a finite set of output symbols, – Q is a finite set of states, – q0 ∈ Q is an initial state, – δ : Q × I → Q is a transition function, and – λ : Q × I → O is an output function. The behaviour of a Mealy machine is deterministic, in the sense that the outputs are fully determined by the inputs. Functions δ and λ are extended to accept sequences in the standard way. We say that Mealy machines M = (I, O, Q, q0 , δ, λ) and M′ = (I′ , O′ , Q′ , q′0 , δ′ , λ′ ) are equivalent if they generate an identical sequence of outputs for every sequence of inputs, that is, if I = I′ and, for all w ∈ I∗ , λ(q0 , w) = λ′ (q′0 , w). If the behaviour of an SUT is described by a Mealy machine M then the task of LearnLib is to learn a Mealy machine M′ that is equivalent to M. 2.1 Experimental set-up A clear interface to the ESM has been defined in RRRT. The ESM defines ports from which it receives a predefined set of inputs and to which it can send a predefined set of outputs. However, this interface can only be used within RRRT. In order to communicate with the LearnLib software a TCP connection was set up. An extra capsule was created in RRRT which connects to the ports defined by the ESM. This capsule opened a TCP connection to LearnLib. Inputs and outputs are translated to and from a string format and sent over the connection. Before each membership query, the learner needs to bring the SUT back to its initial state. In other words, LearnLib needs a way to reset the SUT. Some inputs and outputs sent to and from the ESM carry parameters. These parameters are enumerations of statuses, or integers bounded by the number of functions connected to the ESM. Currently, LearnLib cannot handle inputs with parameters; therefore, we introduced a separate input action for every parameter value. Based on domain knowledge and discussions with the Océ engineers, we could group some of these inputs together and reduce the total number of inputs. When learning the ESM using one function, 83 concrete inputs are grouped into four abstract inputs. When using two functions, 126 concrete inputs can be grouped. When an abstract input needs to be sent to the ESM, one concrete input of the represented group is randomly selected, as in the approach of Aarts, et al. (2015). This is a valid abstraction because all the inputs in the group have exactly the same behaviour in any state of the ESM. Applying Automata Learning to Embedded Control Software 49 This has been verified by doing code inspection. No other abstractions were found during the research. After the inputs are grouped a total of 77 inputs remain when learning the ESM using 1 function, and 105 inputs remain when using 2 functions. It was not immediately obvious how to model the ESM by a Mealy machine, since some inputs trigger no output, whereas other inputs trigger several outputs. In order to resolve this, we benefited from the run-to-completion execution model used in RRRT. Whenever an input is sent, all the outputs are collected until quiescence is detected. Next, all the outputs are concatenated and are sent to LearnLib as a single aggregated output. In model-based testing, quiescence is usually detected by waiting for a fixed time-out period. However, this causes the system to be mostly idle while waiting for the time-out, which is inefficient. In order to detect quiescence faster, we exploited the run-to-completion execution model used by RRRT: we modified the ESM to respond to a new low-priority test input with a (single) special output. This test input is sent after each normal input. Only after the normal input is processed and all the generated outputs have been sent, the test input is processed and the special output is generated; upon its reception, quiescence can be detected immediately and reliably. 2.2 Test selection strategies In the ESM case study the most challenging problem was finding counterexamples for the hypotheses constructed during learning. LearnLib implements several algorithms for conformance testing, one of which is a random walk algorithm. The random walk algorithm works by first selecting the length of the test query according to a geometric distribution, cut off at a fixed upper bound. Each of the input symbols in the test query is then randomly selected from the input alphabet I from a uniform distribution. In order to find counterexamples, a specific sequence of input symbols is needed to arrive at the state in the SUT that differentiates it from the hypothesis. The upper bound for the size of this search space is |I|n where |I| is the size of the input alphabet used, and n the length of the counterexample that needs to be found. If this sequence is long the chance of finding it is small. Because the ESM has many different input symbols to choose from, finding the correct one is hard. When learning the ESM with 1 function there are 77 possible input symbols. If for example the length of the counterexample needs to be at least 6 inputs to identify a certain state, then the upper bound on the number of test queries would be around 2 × 1011 . An average test query takes around 1 ms, so it would take about 7 years to execute these test queries. Augmented DS-method15. In order to reduce the number of tests, Chow (1978) and Vasilevskii (1973) pioneered the so called W-method. In their framework a test query 15 This was later called the hybrid ADS-method. 50 Chapter 3 consists of a prefix p bringing the SUT to a specific state, a (random) middle part m and a suffix s assuring that the SUT is in the appropriate state. This results in a test suite of the form PI≤k W, where P is a set of (shortest) access sequences, I≤k the set of all sequences of length at most k, and W is a characterisation set. Classically, this characterisation set is constructed by taking the set of all (pairwise) separating sequences. For k = 1 this test suite is complete in the sense that if the SUT passes all tests, then either the SUT is equivalent to the specification or the SUT has strictly more states than the specification. By increasing k we can check additional states. We tried using the W-method as implemented by LearnLib to find counterexamples. The generated test suite, however, was still too big in our learning context. Fujiwara, et al. (1991) observed that it is possible to let the set W depend on the state the SUT is supposed to be. This allows us to only take a subset of W which is relevant for a specific state. This slightly reduces the test suite without losing the power of the full test suite. This method is known as the Wp-method. More importantly, this observation allows for generalisations where we can carefully pick the suffixes. In the presence of an (adaptive) distinguishing sequence one can take W to be a single suffix, greatly reducing the test suite. Lee and Yannakakis (1994) describe an algorithm (which we will refer to as the LY algorithm) to efficiently construct this sequence, if it exists. In our case, unfortunately, most hypotheses did not enjoy existence of an adaptive distinguishing sequence. In these cases the incomplete result from the LY algorithm still contained a lot of information which we augmented by pairwise separating sequences. I46 O9 I6.0 O3.14 O3.3 I10 I11 I10 I10 Q {18, 133 1287, 1295} I19 I31.0 I37.3 I9.2 Q {555} {856} I19 I31.0 I37.3 I9.2 I19 I19 I9.1 I11 I37.0 I10 I18 {7, 106, 1025, 130, 1289, 1291} {514} {516} {425} {1135} ... O28.0 {788} {1137} {597} {556} {465} ... Q {857} {563} Figure 3.5 A small part of an incomplete distinguishing sequence as produced by the LY algorithm. Leaves contain a set of possible initial states, inner nodes have input sequences and edges correspond to different output symbols (of which we only drew some), where Q stands for quiescence. Applying Automata Learning to Embedded Control Software 51 As an example we show an incomplete adaptive distinguishing sequence for one of the hypothesis in Figure 3.5. When we apply the input sequence I46 I6.0 I10 I19 I31.0 I37.3 I9.2 and observe outputs O9 O3.3 Q … O28.0, we know for sure that the SUT was in state 788. Unfortunately, not all paths lead to a singleton set. When for instance we apply the sequence I46 I6.0 I10 and observe the outputs O9 O3.14 Q, we know for sure that the SUT was in one of the states 18, 133, 1287 or 1295. In these cases we have to perform more experiments and we resort to pairwise separating sequences. We note that this augmented DS-method is in the worst case not any better than the classical Wp-method. In our case, however, it greatly reduced the test suites. Once we have our set of suffixes, which we call Z now, our test algorithm works as follows. The algorithm first exhausts the set PI≤1 Z. If this does not provide a counterexample, we will randomly pick test queries from PI2 I∗ Z, where the algorithm samples uniformly from P, I2 and Z (if Z contains more that 1 sequence for the supposed state) and with a geometric distribution on I∗ . Sub-alphabet selection. Using the above method the algorithm still failed to learn the ESM. By looking at the RRRT-based model we were able to see why the algorithm failed to learn. In the initialisation phase, the controller gives exceptional behaviour when providing a certain input eight times consecutively. Of course such a sequence is hard to find in the above testing method. With this knowledge we could construct a single counterexample by hand by which means the algorithm was able to learn the ESM. In order to automate this process, we defined a sub-alphabet of actions that are important during the initialisation phase of the controller. This sub-alphabet will be used a bit more often than the full alphabet. We do this as follows. We start testing with the alphabet which provided a counterexample for the previous hypothesis (for the first hypothesis we take the sub-alphabet). If no counterexample can be found within a specified query bound, then we repeat with the next alphabet. If both alphabets do not produce a counterexample within the bound, the bound is increased by some factor and we repeat all. This method only marginally increases the number of tests. But it did find the right counterexample we first had to construct by hand. 2.3 Results Using the learning set-up discussed in Section 2.1 and the test selection strategies discussed in Section 2.2, a model of the ESM using 1 function could be learned. After an additional eight hours of testing no counterexample was found and the experiment was stopped. The following list gives the most important statistics gathered during the learning: – The learned model has 3.410 states. – Altogether, 114 hypotheses were generated. – The time needed for learning the final hypothesis was 8 h, 26 min, and 19 s. 52 Chapter 3 – 29.933.643 membership queries were posed (on average 35,77 inputs per query). – 30.629.711 test queries were required (on average 29,06 inputs per query). 3 Verification To verify the correctness of the model that was learned using LearnLib, we checked its equivalence with a model that was generated directly from the code. 3.1 Approach As mentioned already, the ESM has been implemented using Rational Rose RealTime (RRRT). Thus a statechart representation of the ESM is available. However, we have not been able to find a tool that translates RRRT models to Mealy machines, allowing us to compare the RRRT-based model of the ESM with the learned model. We considered several formalisms and tools that were proposed in the literature to flatten statecharts to state machines. The first one was a tool for hierarchical timed automata (HTA) by David, et al. (2002). However, we found it hard to translate the output of this tool, a network of Uppaal timed automata, to a Mealy machine that could be compared to the learned model. The second tool that we considered has been developed by Hansen, et al. (2010). This tool misses some essential features, for example the ability to assign new values to state variables on transitions. Finally, we considered a formalism called object-oriented action systems (OOAS) by Krenn, et al. (2009), but no tools to use this could be found. In the end we decided to implement the required model transformations ourselves. Figure 3.6 displays the different formats for representing models that we used and the transformations between those formats. RRRT UML statechart PapyrusUML Uppaal RFSM .xml HEFSM LearnLib Mealy Machine .dot CADP LTS Figure 3.6 Formats for representing models and transformations between formats. We used the bisimulation checker of CADP by Garavel, et al. (2011) to check the equivalence of labelled transition system models in .aut format. The Mealy machine models learned by LearnLib are represented as .dot files. A small script converts Applying Automata Learning to Embedded Control Software 53 these Mealy machines to labelled transition systems in .aut format. We used the Uppaal tool by Behrmann, et al. (2006) as an editor for defining extended finite state machines (EFSM), represented as .xml files. A script developed in the ITALIA project (http://www.italia.cs.ru.nl/) converts these EFSM models to LOTOS, and then CADP takes care of the conversion from LOTOS to the .aut format. The Uppaal syntax is not sufficiently expressive to directly encode the RRRT definition of the ESM, since this definition makes heavy use of UML (Object Management Group (OMG), 2004) concepts such as state hierarchy and transitions from composite states, concepts which are not present in Uppaal. Using Uppaal would force us to duplicate many transitions and states. We decided to manually create an intermediate hierarchical EFSM (HEFSM) model using the UML drawing tool PapyrusUML (Lanusse, et al., 2009). The HEFSM model closely resembles the RRRT UML model, but many elements used in UML state machines are left out because they are not needed for modelling the ESM and complicate the transformation process. 3.2 Model transformations We explain the transformation from the HEFSM model to the EFSM model using examples. The transformation is divided into five steps, which are executed in order: 1. combine transitions without input or output signal, 2. transform supertransitions, 3. transform internal transitions, 4. add input signals that do not generate an output, and 5. replace invocations of the next function. 1. Empty transitions. In order to make the model more readable and to make it easy to model if and switch statements in the C++ code the HEFSM model allows for transitions without a signal. These transitions are called empty transitions. An empty transition can still contain a guard and an assignment. However these kinds of transitions are only allowed on states that only contain empty outgoing transitions. This was done to make the transformation easy and the model easy to read. In order to transform a state with empty transitions all the incoming and outgoing transitions are collected. For each combination of incoming transition a and outgoing transition b a new transition c is created with the source of a as source and the target of b as target. The guard for transition c evaluates to true if and only if the guard of a and b both evaluate to true. The assignment of c is the concatenation of the assignment of a and b. The signal of c will be the signal of a because b cannot have a signal. Once all the new transitions are created all the states with empty transitions are removed together with all their incoming and outgoing transitions. Figure 3.7 shows an example model with empty transitions and its transformed version. Each of the incoming transitions from the state B is combined with each of 54 Chapter 3 the outgoing transitions. This results into two new transitions. The old transitions and state B are removed. A A OP() ["a==0"] B ["a==1"] ["a==1"] ["a==0"] C D C D Figure 3.7 Example of empty transition transformation. On the left the original version. On the right the transformed version. 2. Supertransitions. The RRRT model of the ESM contains many transitions originating from a composite state. Informally, these supertransitions can be taken in each of the substates of the composite state if the guard evaluates to true. In order to model the ESM as closely as possible, supertransitions are also supported in the HEFSM model. In RRRT transitions are evaluated from bottom to top. This means that first the transitions from the leaf state are considered, then transitions from its parent state and then from its parent’s parent state, etc. Once a transition for which the guard evaluates to true and the correct signal has been found it is taken. When flattening the statechart, we modified the guards of supertransitions to ensure the correct priorities. A IP() B B IP() ["a==1"] C A IP() ["a!=1"] IP() ["a==1"] C IP() Figure 3.8 Example of supertransition transformation. On the left the original version. On the right the transformed version. Figure 3.8 shows an example model with supertransitions and its transformed version. The supertransition from state A can be taken at each of A’s leaf states B and C. The transformation removes the original supertransition and creates a new transition at states B and C using the same target state. For leaf state C this is easy because it does not contain a transition with the input signal IP. In state B the transition to state C would be taken if a signal IP was processed and the state variable a equals 1. The supertransition can only be taken if the other transition cannot be taken. This is why the negation of other the guard is added to the new transition. If the original supertransition is an Applying Automata Learning to Embedded Control Software 55 internal transition the model needs further transformation after this transformation. This is described in the next paragraph. If the original supertransition is not an internal transition the new transitions will have the initial state of A as target. 3. Internal transitions. The ESM model also makes use of internal transitions in RRRT. Using such a transition the current state does not change. If such a transition is defined on a composite state it can be taken from all of the substates and return to the same leaf state it originated from. If defined on a composite state it is thus also a supertransition. This is also possible in the HEFSM model. In order to transform an internal transition it is first seen as a supertransition and the above transformation is applied. Then the target of the transition is simply set to the leaf state it originates from. An example can be seen in Figure 3.8. If the supertransition from state A is also defined to be an internal transition the transformed version on the right would need another transformation. The new transitions that now have the target state A would be transformed to have the same target state as their current source state. 4. Quiescent transitions. In order to reduce the number of transitions in the HEFSM model quiescent transitions are added automatically. For every state all the transitions for each signal are collected in a set T. A new self transition a is added for each signal. The guard for transition a evaluates to true if and only if none of the guards of the transactions in T evaluates to true. This makes the HEFSM input enabled without having to specify all the transitions. 5. The next function. In RRRT it is possible to write the guard and assignment in C++ code. It is thus possible that the value of a variable changes while an input signal is processed. In the HEFSM however all the assignments only take effect after the input signal is processed. In order to simulate this behaviour the next function is used. This function takes a variable name and evaluates to the value of this variable after the transition. 3.3 Results Figure 3.9 shows a visualisation of the learned model that was generated using Gephi (Bastian, et al., 2009). States are coloured according to the strongly connected components. The number of transitions between two states is represented by the thickness of the edge. The large number of states (3.410) and transitions (262.570) makes it hard to visualise this model. Nevertheless, the visualisation does provide insight in the behaviour of the ESM. The three protrusions at the bottom of Figure 3.9 correspond to deadlocks in the model. These deadlocks are “error” states that are present in the ESM by design. According to the Océ engineers, the sequences of inputs that are needed to drive the ESM into these deadlock states will always be followed by a system power reset. The protrusion at the top right of the figure corresponds to the initialisation phase of the ESM. This phase is performed only once and thus only transitions from the initialisation cluster to the main body of states are present. 56 Chapter 3 Figure 3.9 Final model of the ESM. During the construction of the RRRT-based model, the ESM code was thoroughly inspected. This resulted in the discovery of missing behaviour in one transition of the ESM code. An Océ software engineer confirmed that this behaviour is a (minor) bug, which will be fixed. We have verified the equivalence of the learned model and the RRRT-based model by using CADP (Garavel, et al., 2011). 4 Conclusions and Future Work Using an extension of algorithm by Lee and Yannakakis (1994) for adaptive distinguishing sequences, we succeeded to learn a Mealy machine model of a piece of widely used industrial control software. Our extension of Lee & Yannakakis’ algorithm is rather obvious, but nevertheless appears to be new. Preliminary evidence suggests that it outperforms existing conformance testing algorithms. We are currently performing experiments in which we compare the new algorithm with other test algorithms on a number of realistic benchmarks. There are several possibilities for extending the ESM case study. To begin with, one could try to learn a model of the ESM with more than one function. Another interesting possibility would be to learn models of the EHM, ACM, and other managers connected to the ESM. Using these models some of the properties discussed by Ploeger Applying Automata Learning to Embedded Control Software 57 (2005) could be verified at a more detailed level. We expect that the combination of LearnLib with the extended Lee & Yannakakis algorithm can be applied to learn models of many other software components. In the specific case study described in this article, we know that our learning algorithm has succeeded to learn the correct model, since we established equivalence with a reference model that was constructed independently from the RRRT model of the ESM software. In the absence of a reference model, we can never guarantee that the actual system behaviour conforms to a learned model. In order to deal with this problem, it is important to define metrics that quantify the difference (or distance) between a hypothesis and a correct model of the SUT and to develop test generation algorithms that guarantee an upper bound on this difference. Preliminary work in this area is reported by Smetsers, et al. (2014). Acknowledgements We thank Lou Somers for suggesting the ESM case study and for his support of our research. Fides Aarts and Harco Kuppens helped us with the use of LearnLib and CADP, and Jan Tretmans gave useful feedback. 58 Chapter 3 Chapter 4 Minimal Separating Sequences for All Pairs of States Rick Smetsers Radboud University Joshua Moerman Radboud University David N. Jansen Radboud University Abstract Finding minimal separating sequences for all pairs of inequivalent states in a finite state machine is a classic problem in automata theory. Sets of minimal separating sequences, for instance, play a central role in many conformance testing methods. Moore has already outlined a partition refinement algorithm that constructs such a set of sequences in 𝒪(mn) time, where m is the number of transitions and n is the number of states. In this chapter, we present an improved algorithm based on the minimisation algorithm of Hopcroft that runs in 𝒪(m log n) time. The efficiency of our algorithm is empirically verified and compared to the traditional algorithm. This chapter is based on the following publication: Smetsers, R., Moerman, J., & Jansen, D. N. (2016). Minimal Separating Sequences for All Pairs of States. In Language and Automata Theory and Applications - 10th International Conference, LATA, Proceedings. Springer. doi:10.1007/978-3-319-30000-9_14 60 Chapter 4 In diverse areas of computer science and engineering, systems can be modelled by finite state machines (FSMs). One of the cornerstones of automata theory is minimisation of such machines – and many variation thereof. In this process one obtains an equivalent minimal FSM, where states are different if and only if they have different behaviour. The first to develop an algorithm for minimisation was Moore (1956). His algorithm has a time complexity of 𝒪(mn), where m is the number of transitions, and n is the number of states of the FSM. Later, Hopcroft (1971) improved this bound to 𝒪(m log n). Minimisation algorithms can be used as a framework for deriving a set of separating sequences that show why states are inequivalent. The separating sequences in Moore’s framework are of minimal length (Gill, 1962). Obtaining minimal separating sequences in Hopcroft’s framework, however, is a non-trivial task. In this chapter, we present an algorithm for finding such minimal separating sequences for all pairs of inequivalent states of a FSM in 𝒪(m log n) time. Coincidentally, Bonchi and Pous (2013) recently introduced a new algorithm for the equally fundamental problem of proving equivalence of states in non-deterministic automata. As both their and our work demonstrate, even classical problems in automata theory can still offer surprising research opportunities. Moreover, new ideas for well-studied problems may lead to algorithmic improvements that are of practical importance in a variety of applications. One such application for our work is in conformance testing. Here, the goal is to test if a black box implementation of a system is functioning as described by a given FSM. It consists of applying sequences of inputs to the implementation, and comparing the output of the system to the output prescribed by the FSM. Minimal separating sequences are used in many test generation methods (Dorofeeva, et al., 2010). Therefore, our algorithm can be used to improve these methods. 1 Preliminaries We define a FSM as a Mealy machine M = (I, O, S, δ, λ), where I, O and S are finite sets of inputs, outputs and states respectively, δ : S × I → S is a transition function and λ : S × I → O is an output function. The functions δ and λ are naturally extended to δ : S × I∗ → S and λ : S × I∗ → O∗ . Moreover, given a set of states S′ ⊆ S and a sequence x ∈ I∗ , we define δ(S′ , x) = {δ(s, x) | s ∈ S′ } and λ(S′ , x) = {λ(s, x) | s ∈ S′ }. The inverse transition function δ−1 : S × I → 𝒫(S) is defined as δ−1 (s, a) = {t ∈ S | δ(t, a) = s}. Observe that Mealy machines are deterministic and input-enabled (i.e., complete) by definition. The initial state is not specified because it is of no importance in what follows. For the remainder of this chapter we fix a machine M = (I, O, S, δ, λ). We use n to denote its number of states, that is n = |S|, and m to denote its number of transitions, that is m = |S| × |I|. Definition 1. States s and t are equivalent if λ(s, x) = λ(t, x) for all x in I∗ . Minimal Separating Sequences for All Pairs of States 61 We are interested in the case where s and t are not equivalent, i.e., inequivalent. If all pairs of distinct states of a machine M are inequivalent, then M is minimal. An example of a minimal FSM is given in Figure 4.1. Definition 2. A separating sequence for states s and t in s is a sequence x ∈ I∗ such that λ(s, x) ≠ λ(t, x). We say x is minimal if |y| ≥ |x| for all separating sequences y for s and t. A separating sequence always exists if two states are inequivalent, and there might be multiple minimal separating sequences. Our goal is to obtain minimal separating sequences for all pairs of inequivalent states of M. 1.1 Partition Refinement In this section we will discuss the basics of minimisation. Both Moore’s algorithm and Hopcroft’s algorithm work by means of partition refinement. A similar treatment (for DFAs) is given by Gries (1973). A partition P of S is a set of pairwise disjoint non-empty subsets of S whose union is exactly S. Elements in P are called blocks. If P and P′ are partitions of S, then P′ is a refinement of P if every block of P′ is contained in a block of P. A partition refinement algorithm constructs the finest partition under some constraint. In our context the constraint is that equivalent states belong to the same block. Definition 3. A partition is valid if equivalent states are in the same block. Partition refinement algorithms for FSMs start with the trivial partition P = {S}, and iteratively refine P until it is the finest valid partition (where all states in a block are equivalent). The blocks of such a complete partition form the states of the minimised FSM, whose transition and output functions are well-defined because states in the same block are equivalent. Let B be a block and a be an input. There are two possible reasons to split B (and hence refine the partition). First, we can split B with respect to output after a if the set λ(B, a) contains more than one output. Second, we can split B with respect to the state after a if there is no single block B′ containing the set δ(B, a). In both cases it is obvious what the new blocks are: in the first case each output in λ(B, a) defines a new block, in the second case each block containing a state in δ(B, a) defines a new block. Both types of refinement preserve validity. Partition refinement algorithms for FSMs first perform splits w.r.t. output, until there are no such splits to be performed. This is precisely the case when the partition is acceptable. Definition 4. A partition is acceptable if for all pairs s, t of states contained in the same block and for all inputs a in I, λ(s, a) = λ(t, a). 62 Chapter 4 Any refinement of an acceptable partition is again acceptable. The algorithm continues performing splits w.r.t. state, until no such splits can be performed. This is exactly the case when the partition is stable. Definition 5. A partition is stable if it is acceptable and for any input a in I and states s and t that are in the same block, states δ(s, a) and δ(t, a) are also in the same block. Since an FSM has only finitely many states, partition refinement will terminate. The output is the finest valid partition which is acceptable and stable. For a more formal treatment on partition refinement we refer to Gries (1973). 1.2 Splitting Trees and Refinable Partitions Both types of splits described above can be used to construct a separating sequence for the states that are split. In a split w.r.t. the output after a, this sequence is simply a. In a split w.r.t. the state after a, the sequence starts with an a and continues with the separating sequence for states in δ(B, a). In order to systematically keep track of this information, we maintain a splitting tree. The splitting tree was introduced by Lee and Yannakakis (1994) as a data structure for maintaining the operational history of a partition refinement algorithm. Definition 6. A splitting tree for M is a rooted tree T with a finite set of nodes with the following properties: – Each node u in T is labelled by a subset of S, denoted l(u). – The root is labelled by S. – For each inner node u, l(u) is partitioned by the labels of its children. – Each inner node u is associated with a sequence σ(u) that separates states contained in different children of u. We use C(u) to denote the set of children of a node u. The lowest common ancestor (lca) for a set S′ ⊆ S is the node u such that S′ ⊆ l(u) and S′ ̸⊆ l(v) for all v ∈ C(u) and is denoted by lca(S′ ). For a pair of states s and t we use the shorthand lca(s, t) for lca({s, t}). The labels l(u) can be stored as a refinable partition data structure (Valmari & Lehtinen, 2008). This is an array containing a permutation of the states, ordered so that states in the same block are adjacent. The label l(u) of a node then can be indicated by a slice of this array. If node u is split, some states in the slice l(u) may be moved to create the labels of its children, but this will not change the set l(u). A splitting tree T can be used to record the history of a partition refinement algorithm because at any time the leaves of T define a partition on S, denoted P(T ). We say a splitting tree T is valid (resp. acceptable, stable, complete) if P(T ) is as such. A leaf can be expanded in one of two ways, corresponding to the two ways a block can be split. Given a leaf u and its block B = l(u) we define the following two splits: Minimal Separating Sequences for All Pairs of States 63 (split-output) Suppose there is an input a such that B can be split w.r.t output after a. Then we set σ(u) = a, and we create a node for each subset of B that produces the same output x on a. These nodes are set to be children of u. (split-state) Suppose there is an input a such that B can be split w.r.t. the state after a. Then instead of splitting B as described before, we proceed as follows. First, we locate the node v = lca(δ(B, a)). Since v cannot be a leaf, it has at least two children whose labels contain elements of δ(B, a). We can use this information to expand the tree as follows. For each node w in C(v) we create a child of u labelled {s ∈ B | δ(s, a) ∈ l(w)} if the label contains at least one state. Finally, we set σ(u) = aσ(v). A straight-forward adaptation of partition refinement for constructing a stable splitting tree for M is shown in Algorithm 4.1. The termination and the correctness of the algorithm outlined in Section 1.1 are preserved. It follows directly that states are equivalent if and only if they are in the same label of a leaf node. 1 2 3 4 5 6 7 8 9 Require: An FSM M Ensure: A valid and stable splitting tree T initialise T to be a tree with a single node labelled S repeat find a ∈ I, B ∈ P(T ) such that we can split B w.r.t. output λ(⋅, a) expand the u ∈ T with l(u) = B as described in (split-output) until P(T ) is acceptable repeat find a ∈ I, B ∈ P(T ) such that we can split B w.r.t. state δ(⋅, a) expand the u ∈ T with l(u) = B as described in (split-state) until P(T ) is stable Algorithm 4.1 Constructing a stable splitting tree. Example 7. Figure 4.1 shows an FSM and a complete splitting tree for it. This tree is constructed by Algorithm 4.1 as follows. First, the root node is labelled by {s0 , …, s5 }. The even and uneven states produce different outputs after a, hence the root node is split. Then we note that s4 produces a different output after b than s0 and s2 , so {s0 , s2 , s4 } is split as well. At this point T is acceptable: no more leaves can be split w.r.t. output. Now, the states δ({s1 , s3 , s5 }, a) are contained in different leaves of T. Therefore, {s1 , s3 , s5 } is split into {s1 , s5 } and {s3 } and associated with sequence ab. At this point, δ({s0 , s2 }, a) contains states that are in both children of {s1 , s3 , s5 }, so {s0 , s2 } is split and the associated sequence is aab. We continue until T is complete. 64 Chapter 4 s1 a/1 s2 b/0 a/0 b/0 a/0 b/0 s3 s0 a/1 b/0 s5 a/0 b/1 a/1 b/0 s4 (a) Figure 4.1 (b) An FSM (a) and a complete splitting tree for it (b). 2 Minimal Separating Sequences In Section 1.2 we have described an algorithm for constructing a complete splitting tree. This algorithm is non-deterministic, as there is no prescribed order on the splits. In this section we order them to obtain minimal separating sequences. Let u be a non-root inner node in a splitting tree, then the sequence σ(u) can also be used to split the parent of u. This allows us to construct splitting trees where children will never have shorter sequences than their parents, as we can always split with those sequences first. Trees obtained in this way are guaranteed to be layered, which means that for all nodes u and all u′ ∈ C(u), |σ(u)| ≤ |σ(u′ )|. Each layer consists of nodes for which the associated separating sequences have the same length. Our approach for constructing minimal sequences is to ensure that each layer is as large as possible before continuing to the next one. This idea is expressed formally by the following definitions. Definition 8. A splitting tree T is k-stable if for all states s and t in the same leaf we have λ(s, x) = λ(t, x) for all x ∈ I≤k . Definition 9. A splitting tree T is minimal if for all states s and t in different leaves λ(s, x) ≠ λ(t, x) implies |x| ≥ |σ(lca(s, t))| for all x ∈ I∗ . Minimality of a splitting tree can be used to obtain minimal separating sequences for pairs of states. If the tree is in addition stable, we obtain minimal separating sequences for all inequivalent pairs of states. Note that if a minimal splitting tree is (n − 1)-stable (n is the number of states of M), then it is stable (Definition 5). This follows from the well-known fact that n − 1 is an upper bound for the length of a minimal separating sequence (Moore, 1956). Algorithm 4.2 ensures a stable and minimal splitting tree. The first repeat-loop is the same as before (in Algorithm 4.1). Clearly, we obtain a 1-stable and minimal splitting tree here. It remains to show that we can extend this to a stable and minimal Minimal Separating Sequences for All Pairs of States 65 splitting tree. Algorithm 4.3 will perform precisely one such step towards stability, while maintaining minimality. Termination follows from the same reason as for Algorithm 4.1. Correctness for this algorithm is shown by the following key lemma. We will denote the input tree by T and the tree after performing Algorithm 4.3 by T ′ . Observe that T is an initial segment of T ′ . Lemma 10. Algorithm 4.3 ensures a (k + 1)-stable minimal splitting tree. Proof. Let us proof stability. Let s and t be in the same leaf of T ′ and let x ∈ I∗ be such that λ(s, x) ≠ λ(t, x). We show that |x| > k + 1. Suppose for the sake of contradiction that |x| ≤ k + 1. Let u be the leaf containing s and t and write x = ax′ . We see that δ(s, a) and δ(t, a) are separated by k-stability of T. So the node v = lca(δ(l(u), a)) has children and an associated sequence σ(v). There are two cases: – |σ(v)| < k, then aσ(v) separates s and t and is of length ≤ k. This case contradicts the k-stability of T. – |σ(v)| = k, then the loop in Algorithm 4.3 will consider this case and split. Note that this may not split s and t (it may occur that aσ(v) splits different elements in l(u)). We can repeat the above argument inductively for the newly created leaf containing s and t. By finiteness of l(u), the induction will stop and, in the end, s and t are split. Both cases end in contradiction, so we conclude that |x| > k + 1. Let us now prove minimality. It suffices to consider only newly split states in T ′ . Let s and t be two states with |σ(lca(s, t))| = k + 1. Let x ∈ I∗ be a sequence such that λ(s, x) ≠ λ(t, x). We need to show that |x| ≥ k + 1. Since x ≠ ϵ we can write x = ax′ and consider the states s′ = δ(s, a) and t′ = δ(t, a) which are separated by x′ . Two things can happen: – The states s′ and t′ are in the same leaf in T. Then by k-stability of T we get λ(s′ , y) = λ(t′ , y) for all y ∈ I≤k . So |x′ | > k. – The states s′ and t′ are in different leaves in T and let u = lca(s′ , t′ ). Then aσ(u) separates s and t. Since s and t are in the same leaf in T we get |aσ(u)| ≥ k + 1 by k-stability. This means that |σ(u)| ≥ k and by minimality of T we get |x′ | ≥ k. In both cases we have shown that |x| ≥ k + 1 as required. □ Example 11. Figure 4.2a shows a stable and minimal splitting tree T for the machine in Figure 4.1. This tree is constructed by Algorithm 4.2 as follows. It executes the same as Algorithm 4.1 until we consider the node labelled {s0 , s2 }. At this point k = 1. We observe that the sequence of lca(δ({s0 , s2 }, a)) has length 2, which is too long, so we continue with the next input. We find that we can indeed split w.r.t. the state after b, so the associated sequence is ba. Continuing, we obtain the same partition as before, but with smaller witnesses. The internal data structure (a refinable partition) is shown in Figure 4.2(b): the array with the permutation of the states is at the bottom, and every block includes 66 1 2 3 4 5 6 7 8 Chapter 4 Require: An FSM M with n states Ensure: A stable, minimal splitting tree T initialise T to be a tree with a single node labelled S repeat find a ∈ I, B ∈ P(T ) such that we can split B w.r.t. output λ(⋅, a) expand the u ∈ T with l(u) = B as described in (split-output) until P(T ) is acceptable for k = 1 to n − 1 do invoke Algorithm 4.3 or Algorithm 4.4 on T for k end for Algorithm 4.2 1 2 3 4 5 6 Constructing a stable and minimal splitting tree. Require: A k-stable and minimal splitting tree T Ensure: T is a (k + 1)-stable, minimal splitting tree for all leaves u ∈ T and all inputs a ∈ I do v ← lca(δ(l(u), a)) if v is an inner node and |σ(v)| = k then expand u as described in (split-state) (this generates new leaves) end if end for Algorithm 4.3 A step towards the stability of a splitting tree. an indication of the slice containing its label and a pointer to its parent (as our final algorithm needs to find the parent block, but never the child blocks). B2 B4 B6 (a) σ=a B8 σ=b σ=bσ(B2 ) B0 B5 s2 s0 B3 B10 s4 σ=aσ(B4 ) σ=aσ(B6 ) B1 B9 s5 s1 B7 s3 (b) Figure 4.2 (a) A complete and minimal splitting tree for the FSM in Figure 4.1 and (b) its internal refinable partition data structure. Minimal Separating Sequences for All Pairs of States 67 3 Optimising the Algorithm In this section, we present an improvement on Algorithm 4.3 that uses two ideas described by Hopcroft (1971) in his seminal paper on minimising finite automata: using the inverse transition set, and processing the smaller half. The algorithm that we present is a drop-in replacement, so that Algorithm 4.2 stays the same except for some bookkeeping. This way, we can establish correctness of the new algorithms more easily. The variant presented in this section reduces the amount of redundant computations that were made in Algorithm 4.3. Using Hopcroft’s first idea, we turn our algorithm upside down: instead of searching for the lca for each leaf, we search for the leaves u for which l(u) ⊆ δ−1 (l(v), a), for each potential lca v and input a. To keep the order of splits as before, we define k-candidates. Definition 12. A k-candidate is a node v with |σ(v)| = k. A k-candidate v and an input a can be used to split a leaf u if v = lca(δ(l(u), a)), because in this case there are at least two states s, t in l(u) such that δ(s, a) and δ(t, a) are in labels of different nodes in C(v). Refining u this way is called splitting u with respect to (v, a). The set C(u) is constructed according to (split-state), where each child w ∈ C(v) defines a child uw of u with states l(uw ) = {s ∈ l(u) | δ(s, a) ∈ l(w)} = l(u) ∩ δ−1 (l(w), a). (4.1) In order to perform the same splits in each layer as before, we maintain a list Lk of k-candidates. We keep the list in order of the construction of nodes, because when we split w.r.t. a child of a node u before we split w.r.t. u, the result is not well-defined. Indeed, the order on Lk is the same as the order used by Algorithm 4.2. So far, the improved algorithm still would have time complexity 𝒪(mn). To reduce the complexity we have to use Hopcroft’s second idea of processing the smaller half. The key idea is that, when we fix a k-candidate v, all leaves are split with respect to (v, a) simultaneously. Instead of iterating over of all leaves to refine them, we iterate over s ∈ δ−1 (l(w), a) for all w in C(v) and look up in which leaf it is contained to move s out of it. From Lemma 8 by Knuutila (2001) it follows that we can skip one of the children of v. This lowers the time complexity to 𝒪(m log n). In order to move s out of its leaf, each leaf u is associated with a set of temporary children C′ (u) that is initially empty, and will be finalised after iterating over all s and w. In Algorithm 4.4 we use the ideas described above. For each k-candidate v and input a, we consider all children w of v, except for the largest one (in case of multiple largest children, we skip one of these arbitrarily). For each state s ∈ δ−1 (l(w), a) we consider the leaf u containing it. If this leaf does not have an associated temporary 68 Chapter 4 Require: A k-stable and minimal splitting tree T, and a list Lk Ensure: T is a (k + 1)-stable and minimal splitting tree, and a list Lk+1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 Lk+1 ← ∅ for all k-candidates v in Lk in order do let w′ be a node in C(v) with |l(w′ )| ≥ |l(w)| for all nodes w ∈ C(v) for all inputs a in I do for all nodes w in C(v) ∖ {w′ } do for all states s in δ−1 (l(w), a) do locate leaf u such that s ∈ l(u) if C′ (u) does not contain node uw then add a new node uw to C′ (u) end if move s from l(u) to l(uw ) end for end for for all leaves u with C′ (u) ≠ ∅do if |l(u)| = 0 then if |C′ (u)| = 1 then recover u by moving its elements back and clear C′ (u) continue with the next leaf end if set p = u and C(u) = C′ (u) else construct a new node p and set C(p) = C′ (u) ∪ {u} insert p in the tree in the place where u was end if set σ(p) = aσ(v) append p to Lk+1 and clear C′ (u) end for end for end for Algorithm 4.4 A better step towards the stability of a splitting tree. child for w we create such a child (line 9), if this child exists we move s into that child (line 11). Once we have done the simultaneous splitting for the candidate v and input a, we finalise the temporary children. This is done at lines 14–26. If there is only one temporary child with all the states, no split has been made and we recover this node (line 17). In the other case we make the temporary children permanent. Minimal Separating Sequences for All Pairs of States 69 The states remaining in u are those for which δ(s, a) is in the child of v that we have skipped; therefore we will call it the implicit child. We should not touch these states to keep the theoretical time bound. Therefore, we construct a new parent node p that will “adopt” the children in C′ (u) together with u (line 20). We will now explain why considering all but the largest children of a node lowers the algorithm’s time complexity. Let T be a splitting tree in which we colour all children of each node blue, except for the largest one. Then: Lemma 13. A state s is in at most (log2 n) − 1 labels of blue nodes. Proof. Observe that every blue node u has a sibling u′ such that |l(u′ )| ≥ |l(u)|. So the parent p(u) has at least 2|l(u)| states in its label, and the largest blue node has at most n/2 states. Suppose a state s is contained in m blue nodes. When we walk up the tree starting at the leaf containing s, we will visit these m blue nodes. With each visit we can double the lower bound of the number of states. Hence n/2 ≥ 2m and m ≤ (log2 n) − 1. □ Corollary 14. A state s is in at most log2 n sets δ−1 (l(u), a), where u is a blue node and a is an input in I. If we now quantify over all transitions, we immediately get the following result. We note that the number of blue nodes is at most n − 1, but since this fact is not used, we leave this to the reader. Corollary 15. Let ℬ denote the set of blue nodes and define 𝒳 = {(b, a, s) | b ∈ ℬ, a ∈ I, s ∈ δ−1 (l(b), a)}. Then 𝒳 has at most m log2 n elements. The important observation is that when using Algorithm 4.4 we iterate in total over every element in 𝒳 at most once. Theorem 16. Algorithm 4.2 using Algorithm 4.4 runs in 𝒪(m log n) time. Proof. We prove that bookkeeping does not increase time complexity by discussing the implementation. Inverse transition. δ−1 can be constructed as a preprocessing step in 𝒪(m). State sorting. As described in Section 1.2, we maintain a refinable partition data structure. Each time new pair of a k-candidate v and input a is considered, leaves are split by performing a bucket sort. First, buckets are created for each node in w ∈ C(v) ∖ w′ and each leaf u that contains one or more elements from δ−1 (l(w), a), where w′ is a largest child of v. The buckets are filled by iterating over the states in δ−1 (l(w), a) for all w. Then, a pivot is 70 Chapter 4 set for each leaf u such that exactly the states that have been placed in a bucket can be moved right of the pivot (and untouched states in δ−1 (l(w′ ), a) end up left of the pivot). For each leaf u, we iterate over the states in its buckets and the corresponding indices right of its pivot, and we swap the current state with the one that is at the current index. For each bucket a new leaf node is created. The refinable partition is updated such that the current state points to the most recently created leaf. This way, we assure constant time lookup of the leaf for a state, and we can update the array in constant time when we move elements out of a leaf. Largest child. For finding the largest child, we maintain counts for the temporary children and a current biggest one. On finalising the temporary children we store (a reference to) the biggest child in the node, so that we can skip this node later in the algorithm. Storing sequences. The operation on line 25 is done in constant time by using a linked list. □ 4 Application in Conformance Testing A splitting tree can be used to extract relevant information for two classical test generation methods: a characterisation set for the W-method and a separating family for the HSI-method. For an introduction and comparison of FSM-based test generation methods we refer to Dorofeeva, et al. (2010) or Chapter 2. Definition 17. A set W ⊂ I∗ is called a characterisation set if for every pair of inequivalent states s, t there is a sequence w ∈ W such that λ(s, w) ≠ λ(t, w). Lemma 18. Let T be a complete splitting tree, then the set {σ(u) | u ∈ T } is a characterisation set. Proof. Let W = {σ(u) | u ∈ T }. Let s, t ∈ S be inequivalent states, then by completeness s and t are contained in different leaves of T. Hence u = lca(s, t) exists and σ(u) ∈ W separates s and t. This shows that W is a characterisation set. □ Lemma 19. A characterisation set with minimal length sequences can be constructed in time 𝒪(m log n). Proof. By Lemma 18 the sequences associated with the inner nodes of a splitting tree form a characterisation set. By Theorem 16, such a tree can be constructed in time 𝒪(m log n). Traversing the tree to obtain the characterisation set is linear in the number of nodes (and hence linear in the number of states). □ Minimal Separating Sequences for All Pairs of States 71 Definition 20. A collection of sets {Hs }s∈S is called a separating family if for every pair of inequivalent states s, t there is a sequence ℎ such that λ(s, ℎ) ≠ λ(t, ℎ) and ℎ is a prefix of some ℎs ∈ Hs and some ℎt ∈ Ht . Lemma 21. Let T be a complete splitting tree, the sets {σ(u) | s ∈ l(u), u ∈ T }s∈S form a separating family. Proof. Let Hs = {σ(u) | s ∈ l(u)}. Let s, t ∈ S be inequivalent states, then by completeness s and t are contained in different leaves of T. Hence u = lca(s, t) exists. Since both s and t are contained in l(u), the separating sequence σ(u) is contained in both sets Hs and Ht . Therefore, it is a (trivial) prefix of some word ℎs ∈ Hs and some ℎt ∈ Ht . Hence {Hs }s∈S is a separating family. □ Lemma 22. A separating family with minimal length sequences can be constructed in time 𝒪(m log n + n2 ). Proof. The separating family can be constructed from the splitting tree by collecting all sequences of all parents of a state (by Lemma 21). Since we have to do this for every state, this takes 𝒪(n2 ) time. □ For test generation one also needs a transition cover. This can be constructed in linear time with a breadth first search. We conclude that we can construct all necessary information for the W-method in time 𝒪(m log n) as opposed the the 𝒪(mn) algorithm used by Dorofeeva, et al. (2010). Furthermore, we conclude that we can construct all the necessary information for the HSI-method in time 𝒪(m log n + n2 ), improving on the the reported bound 𝒪(mn3 ) by Hierons and Türker (2015). The original HSImethod was formulated differently and might generate smaller sets. We conjecture that our separating family has the same size if we furthermore remove redundant prefixes. This can be done in 𝒪(n2 ) time using a trie data structure. 5 Experimental Results We have implemented Algorithms 4.3 in Go, and we have compared their running time on two sets of FSMs.16 The first set is from Smeenk, et al. (2015a), where FSMs for embedded control software were automatically constructed. These FSMs are of increasing size, varying from 546 to 3 410 states, with 78 inputs and up to 151 outputs. The second set is inferred from Hopcroft (1971), where two classes of finite automata, A and B, are described that serve as a worst case for Algorithms 4.3 respectively. The FSMs that we have constructed for these automata have 1 input, 2 outputs, and 22 – 215 states. The running times in seconds on an Intel Core i5-2500 are plotted in Figure 4.3. We note that different slopes imply different complexity classes, since both axes have a logarithmic scale. 16 Available at https://github.com/Jaxan/partition. 72 Chapter 4 102 102 101 100 10−2 100 10−4 10−1 500 1000 2000 3000 (a) Embedded control software. Figure 4.3 black. 10−6 22 26 211 215 (b) Class A (dashed) and class B (solid). Running time in seconds of Algorithm 4.3 in grey and Algorithm 4.4 in 6 Conclusion In this chapter we have described an efficient algorithm for constructing a set of minimal-length sequences that pairwise distinguish all states of a finite state machine. By extending Hopcroft’s minimisation algorithm, we are able to construct such sequences in 𝒪(m log n) for a machine with m transitions and n states. This improves on the traditional 𝒪(mn) method that is based on the classic algorithm by Moore. As an upshot, the sequences obtained form a characterisation set and a separating family, which play a crucial in conformance testing. Two key observations were required for a correct adaptation of Hopcroft’s algorithm. First, it is required to perform splits in order of the length of their associated sequences. This guarantees minimality of the obtained separating sequences. Second, it is required to consider nodes as a candidate before any one of its children are considered as a candidate. This order follows naturally from the construction of a splitting tree. Experimental results show that our algorithm outperforms the classic approach for both worst-case finite state machines and models of embedded control software. Applications of minimal separating sequences such as the ones described by Dorofeeva, et al. (2010) and Smeenk, et al. (2015a) therefore show that our algorithm is useful in practice. Part 2: Nominal Techniques 74 Chapter Chapter 5 Learning Nominal Automata Joshua Moerman Radboud University Matteo Sammartino University College London Bartek Klin University of Warsaw Alexandra Silva University College London Michał Szynwelski University of Warsaw Abstract We present an Angluin-style algorithm to learn nominal automata, which are acceptors of languages over infinite (structured) alphabets. The abstract approach we take allows us to seamlessly extend known variations of the algorithm to this new setting. In particular, we can learn a subclass of nominal non-deterministic automata. An implementation using a recently developed Haskell library for nominal computation is provided for preliminary experiments. This chapter is based on the following publication: Moerman, J., Sammartino, M., Silva, A., Klin, B., & Szynwelski, M. (2017). Learning nominal automata. In Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages, POPL. ACM. doi:10.1145/3009837.3009879 76 Chapter 5 Automata are a well established computational abstraction with a wide range of applications, including modelling and verification of (security) protocols, hardware, and software systems. In an ideal world, a model would be available before a system or protocol is deployed in order to provide ample opportunity for checking important properties that must hold and only then the actual system would be synthesised from the verified model. Unfortunately, this is not at all the reality: Systems and protocols are developed and coded in short spans of time and if mistakes occur they are most likely found after deployment. In this context, it has become popular to infer or learn a model from a given system just by observing its behaviour or response to certain queries. The learned model can then be used to ensure the system is complying to desired properties or to detect bugs and design possible fixes. Automata learning, or regular inference, is a widely used technique for creating an automaton model from observations. The original algorithm by Angluin (1987) works for deterministic finite automata, but since then has been extended to other types of automata, including Mealy machines and I/O automata (see Niese, 2003, §8.5, and Aarts & Vaandrager, 2010), and even a special class of context-free grammars (see Isberner, 2015, §6) Angluin’s algorithm is sometimes referred to as active learning, because it is based on direct interaction of the learner with an oracle (“the Teacher”) that can answer different types of queries. This is in contrast with passive learning, where a fixed set of positive and negative examples is given and no interaction with the system is possible. In this chapter, staying in the realm of active learning, we will extend Angluin’s algorithm to a richer class of automata. We are motivated by situations in which a program model, besides control flow, needs to represent basic data flow, where data items are compared for equality (or for other theories such as total ordering). In these situations, values for individual symbols are typically drawn from an infinite domain and automata over infinite alphabets become natural models, as witnessed by a recent trend (Aarts, et al., 2015; Bojańczyk, et al., 2014; Bollig, et al., 2013; Cassel, et al., 2016; D’Antoni & Veanes, 2014). One of the foundational approaches to formal language theory for infinite alphabets uses the notion of nominal sets (Bojańczyk, et al., 2014). The theory of nominal sets originates from the work of Fraenkel in 1922, and they were originally used to prove the independence of the axiom of choice and other axioms. They have been rediscovered in Computer Science by Gabbay and Pitts (see Pitts, 2013 for historical notes), as an elegant formalism for modelling name binding, and since then they form the basis of many research projects in the semantics and concurrency community. In a nutshell, nominal sets are infinite sets equipped with symmetries which make them finitely representable and tractable for algorithms. We make crucial use of this feature in the development of a learning algorithm. Our main contributions are the following. – A generalisation of Angluin’s original algorithm to nominal automata. The generalisation follows a generic pattern for transporting computation models from Learning Nominal Automata 77 finite sets to nominal sets, which leads to simple correctness proofs and opens the door to further generalisations. The use of nominal sets with different symmetries also creates potential for generalisation, e.g., to languages with time features (Bojańczyk & Lasota, 2012) or data dependencies represented as graphs (Montanari & Sammartino, 2014). – An extension of the algorithm to nominal non-deterministic automata (nominal NFAs). To the best of our knowledge, this is the first learning algorithm for nondeterministic automata over infinite alphabets. It is important to note that, in the nominal setting, NFAs are strictly more expressive than DFAs. We learn a subclass of the languages accepted by nominal NFAs, which includes all the languages accepted by nominal DFAs. The main advantage of learning NFAs directly is that they can provide exponentially smaller automata when compared to their deterministic counterpart. This can be seen both as a generalisation and as an optimisation of the algorithm. – An implementation using a recently developed Haskell library tailored to nominal computation – NLambda, or Nλ, by Klin and Szynwelski (2016). Our implementation is the first non-trivial application of a novel programming paradigm of functional programming over infinite structures, which allows the programmer to rely on convenient intuitions of searching through infinite sets in finite time. This chapter is organised as follows. In Section 1, we present an overview of our contributions (and the original algorithm) highlighting the challenges we faced in the various steps. In Section 2, we revise some basic concepts of nominal sets and automata. Section 3 contains the core technical contributions: The new algorithm and proof of correctness. In Section 4, we describe an algorithm to learn nominal non-deterministic automata. Section 5 contains a description of NLambda, details of the implementation, and results of preliminary experiments. Section 6 contains a discussion of related work. We conclude this chapter with a discussion section where also future directions are presented. 1 Overview of the Approach In this section, we give an overview through examples. We will start by explaining the original algorithm for regular languages over finite alphabets, and then explain the challenges in extending it to nominal languages. Angluin’s algorithm L∗ provides a procedure to learn the minimal DFA accepting a certain (unknown) language ℒ. The algorithm has access to a teacher which answers two types of queries: – membership queries, consisting of a single word w ∈ A∗ , to which the teacher will reply whether w ∈ ℒ or not; 78 Chapter 5 – equivalence queries, consisting of a hypothesis DFA H, to which the teacher replies yes if ℒ(H) = ℒ, and no otherwise, providing a counterexample w ∈ ℒ(H)△ℒ (where △ denotes the symmetric difference of two languages). The learning algorithm works by incrementally building an observation table, which at each stage contains partial information about the language ℒ. The algorithm is able to fill the table with membership queries. As an example, and to set notation, consider the following table (over the alphabet A = {a, b}). ϵ ϵ 0 S ∪ S⋅A a 0 b 0 E a 0 1 0 aa 1 0 0 row : S ∪ S⋅A → 2E row(u)(v) = 1 ⟺ uv ∈ ℒ This table indicates that ℒ contains at least aa and definitely does not contain the words ϵ, a, b, ba, baa, aaa. Since row is fully determined by the language ℒ, we will from now on refer to an observation table as a pair (S, E), leaving the language ℒ implicit. Given an observation table (S, E) one can construct a deterministic automaton M(S, E) = (Q, q0 , δ, F) where – Q = {row(s) | s ∈ S} is a finite set of states; – F = {row(s) | s ∈ S, row(s)(ϵ) = 1} ⊆ Q is the set of final states; – q0 = row(ϵ) is the initial state; – δ : Q × A → Q is the transition function given by δ(row(s), a) = row(sa). For this to be well-defined, we need to have ϵ ∈ S (for the initial state) and ϵ ∈ E (for final states), and for the transition function there are two crucial properties of the table that need to hold: Closedness and consistency. An observation table (S, E) is closed if for all t ∈ S⋅A there exists an s ∈ S such that row(t) = row(s). An observation table (S, E) is consistent if, whenever s1 and s2 are elements of S such that row(s1 ) = row(s2 ), for all a ∈ A, row(s1 a) = row(s2 a). Each time the algorithm constructs an automaton, it poses an equivalence query to the teacher. It terminates when the answer is yes, otherwise it extends the table with the counterexample provided. 1.1 Simple Example of Execution Angluin’s algorithm is displayed in Algorithm 5.1. Throughout this section, we will consider the language(s) ℒn = {ww | w ∈ A∗ , |w| = n} . If the alphabet A is finite then ℒn is regular for any n ∈ ℕ, and there is a finite DFA accepting it. Learning Nominal Automata 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 79 S, E ← {ϵ} repeat while (S, E) is not closed or not consistent do if (S, E) is not closed then find s1 ∈ S, a ∈ A such that row(s1 a) ≠ row(s) for all s ∈ S S ← S ∪ {s1 a} end if if (S, E) is not consistent then find s1 , s2 ∈ S, a ∈ A, and e ∈ E such that row(s1 ) = row(s2 ) and ℒ(s1 ae) ≠ ℒ(s2 ae) E ← E ∪ {ae} end if end while Make the conjecture M(S, E) if the Teacher replies no, with a counter-example t then S ← S ∪ pref(t) end if until the Teacher replies yes to the conjecture M(S, E) return M(S, E) Algorithm 5.1 The L∗ learning algorithm from Angluin (1987). The language ℒ1 = {aa, bb} looks trivial, but the minimal DFA recognising it has as many as 5 states. Angluin’s algorithm will terminate in (at most) 5 steps. We illustrate some relevant ones. Step 1 We start from S, E = {ϵ}, and we fill the entries of the table below by asking membership queries for ϵ, a and b. The table is closed and consistent, so we construct the hypothesis 𝒜1 , where q0 = row(ϵ) = {ϵ ↦ 0}: ϵ ϵ 0 a 0 b 0 𝒜1 = q0 a, b The Teacher replies no and gives the counterexample aa, which is in ℒ1 but it is not accepted by 𝒜1 . Therefore, line 16 of the algorithm is triggered and we set S = {ϵ, a, aa}. Step 2 The table becomes the one on the left below. It is closed, but not consistent: Rows ϵ and a are identical, but appending a leads to different rows, as depicted. 80 Chapter 5 Therefore, line 10 is triggered and an extra column a, highlighted in red, is added. The new table is closed and consistent and a new hypothesis 𝒜2 is constructed. ϵ a a aa b ab aaa aab ϵ 0 0 a 1 0 0 0 0 ϵ a aa b ab aaa aab ϵ 0 0 1 0 0 0 0 a 0 1 0 0 0 0 0 b q0 𝒜2 = q1 a a a, b b q2 The Teacher again replies no and gives the counterexample bb, which should be accepted by 𝒜2 but it is not. Therefore we put S ← S ∪ {b, bb}. Step 3 The new table is the one on the left. It is closed, but ϵ and b violate consistency, when b is appended. Therefore we add the column b and we get the table on the right, which is closed and consistent. The new hypothesis is 𝒜3 . b ϵ a aa b bb ab aaa aab ba bba bbb ϵ 0 0 1 0 1 0 0 0 0 0 0 a 0 1 0 0 b 0 0 0 0 0 0 0 ϵ a aa b bb ab aaa aab ba bba bbb ϵ 0 0 1 0 1 0 0 0 0 0 0 a 0 1 0 0 0 0 0 0 0 0 0 b 0 0 0 1 0 0 0 0 0 0 0 a q0 𝒜3 = b a q2 b a, b b q1 a q3 The Teacher replies no and provides the counterexample babb, so S ← S ∪ {ba, bab}. Step 4 One more step brings us to the correct hypothesis 𝒜4 (details are omitted). a 𝒜4 = q1 q0 b a q3 b q2 a, b b a q4 a, b Learning Nominal Automata 81 1.2 Learning Nominal Languages Consider now an infinite alphabet A = {a, b, c, d, …}. The language ℒ1 becomes {aa, bb, cc, dd, …}. Classical theory of finite automata does not apply to this kind of languages, but one may draw an infinite deterministic automaton that recognises ℒ1 in the standard sense: ≠a qa a a 𝒜5 = q0 b qb b q3 A q4 A ≠b ⋮ A ≠a where ⟶ and ⟶ stand for the infinitely-many transitions labelled by elements of A and A ∖ {a}, respectively. This automaton is infinite, but it can be finitely presented in a variety of ways, for example: ∀x ∈ A 𝒜6 = q0 x qx x q3 A q4 A ≠x One can formalise the quantifier notation above (or indeed the “dots” notation above that) in several ways. A popular solution is to consider finite register automata (Demri & Lazic, 2009 and Kaminski & Francez, 1994), i.e., finite automata equipped with a finite number of registers where alphabet letters can be stored and later compared for equality. Our language ℒ1 is recognised by a simple automaton with four states and one register. The problem of learning registered automata has been successfully attacked before by, for instance, Howar, et al. (2012). In this chapter, however, we consider nominal automata by Bojańczyk, et al. (2014) instead. These automata ostensibly have infinitely many states, but the set of states can be finitely presented in a way open to effective manipulation. More specifically, in a nominal automaton the set of states is subject to an action of permutations of a set 𝔸 of atoms, and it is finite up to that action. For example, the set of states of 𝒜5 is: {q0 , q3 , q4 } ∪ {qa | a ∈ A} and it is equipped with a canonical action of permutations π : 𝔸 → 𝔸 that maps every qa to qπ(a) and leaves q0 , q3 and q4 fixed. Technically speaking, the set of states has four orbits (one infinite orbit and three fixed points) of the action of the group of permutations of 𝔸. Moreover, it is required that in a nominal automaton the transition 82 Chapter 5 relation is equivariant, i.e., closed under the action of permutations. The automaton 𝒜5 a has this property: For example, it has a transition qa ⟶ q3 , and for any π : 𝔸 → 𝔸 π(a) there is also a transition π(qa ) = qπ(a) ⟶ q3 = π(q3 ). Nominal automata with finitely many orbits of states are equi-expressive with finite register automata (Bojańczyk, et al., 2014), but they have an important theoretical advantage: They are a direct reformulation of the classical notion of finite automaton, where one replaces finite sets with orbit-finite sets and functions (or relations) with equivariant ones. A research programme advocated by Bojańczyk, et al. is to transport various computation models, algorithms and theorems along this correspondence. This can often be done with remarkable accuracy, and our results are a witness to this. Indeed, as we shall see, nominal automata can be learned with an algorithm that is almost a verbatim copy of the classical Angluin’s one. Indeed, consider applying Angluin’s algorithm to our new language ℒ1 . The key idea is to change the basic data structure: Our observation table (S, E) will be such that S and E are equivariant subsets of A∗ , i.e., they are closed under the canonical action of atom permutations. In general, such a table has infinitely many rows and columns, so the following aspects of Algorithm 5.1 seem problematic: – line 4 and line 8: finding witnesses for closedness or consistency violations potentially require checking all infinitely many rows; – line 16: every counterexample t has infinitely many prefixes, so it is not clear how one constructs an infinite set S in finite time. However, an infinite S is necessary for the algorithm to ever succeed, because no finite automaton recognises ℒ1 . At this stage, we need to observe that due to equivariance of S, E and ℒ1 , the following crucial properties hold: (P1) the sets S, S⋅A and E admit a finite representation up to permutations; (P2) the function row is such that row(π(s))(π(e)) = row(s)(e), for all s ∈ S and e ∈ E, so the observation table admits a finite symbolic representation. Intuitively, checking closedness and consistency, and finding a witness for their violations, can be done effectively on the representations up to permutations (P1). This is sound, as row is invariant w.r.t. permutations (P2). We now illustrate these points through a few steps of the algorithm for ℒ1 . Step 1’ We start from S, E = {ϵ}. We have S⋅A = A, which is infinite but admits a finite representation. In fact, for any a ∈ A, we have A = {π(a) | π is a permutation}. Then, by (P2), row(π(a))(ϵ) = row(a)(ϵ) = 0, for all π, so the first table can be written as: ϵ ϵ 0 a 0 𝒜′1 = q0 A Learning Nominal Automata 83 It is closed and consistent. Our hypothesis is 𝒜′1 , where δ𝒜′1 (row(ϵ), x) = row(x) = q0 , for all x ∈ A. As in Step 1, the Teacher replies with the counterexample aa. Step 2’ By equivariance of ℒ1 , the counterexample tells us that all words of length 2 with two repeated letters are accepted. Therefore we extend S ϵ with the (infinite!) set of such words. The new symbolic table is depcited ϵ 0 on the right. a 0 The lower part stands for elements of S⋅A. For instance, ab stands for aa 1 words obtained by appending a fresh letter to words of length 1 (row a). ab 0 It can be easily verified that all cases are covered. Notice that the table is aaa 0 different from that of Step 2: A single b is not in the lower part, because aab 0 it can be obtained from a via a permutation. The table is closed. Now, for consistency we need to check row(ϵx) = row(ax), for all a, x ∈ A. Again, by (P2), it is enough to consider rows of the table above. Consistency is violated, because row(a) ≠ row(aa). We found a “symbolic” witness a for such violation. In order to fix consistency, while keeping E equivariant, we need to add columns for all π(a). The resulting table is ϵ a aa ab aaa aab ϵ 0 0 1 0 0 0 a 0 1 0 0 0 0 b 0 0 0 0 0 0 c 0 0 0 0 0 0 … … … … … … … where non-specified entries are 0. Only finitely many entries of the table are relevant: row(s) is fully determined by its values on letters in s and on just one letter not in s. For instance, we have row(a)(a) = 1 and row(a)(a′ ) = 0, for all a′ ∈ A ∖ {a}. The table is trivially consistent. Notice that this step encompasses both Step 2 and 3, because the rows b and bb added by Step 2 are already represented by a and aa. The hypothesis automaton is x 𝒜′2 = q0 ≠x A qx x q2 ∀x ∈ A This is again incorrect, but one additional step will give the correct hypothesis automaton 𝒜6 . 84 Chapter 5 1.3 Generalisation to Non-Deterministic Automata Since our extension of Angluin’s L∗ algorithm stays close to her original development, exploring extensions of other variations of L∗ to the nominal setting can be done in a systematic way. We will show how to extend the algorithm NL∗ for learning NFAs by Bollig, et al. (2009). This has practical implications: It is well-known that NFAs are exponentially more succinct than DFAs. This is true also in the nominal setting. However, there are challenges in the extension that require particular care. – Nominal NFAs are strictly more expressive than nominal DFAs. We will show that the nominal version of NL∗ terminates for all nominal NFAs that have a corresponding nominal DFA and, more surprisingly, that it is capable of learning some languages that are not accepted by nominal DFAs. – Language equivalence of nominal NFAs is undecidable. This does not affect the correctness proof, as it assumes a teacher which is able to answer equivalence queries accurately. For our implementation, we will describe heuristics that produce correct results in many cases. For the learning algorithm the power of non-determinism means that we can make some shortcuts during learning: If we want to make the table closed, we were previously required to find an equivalent row in the upper part; now we may find a sum of rows which, together, are equivalent to an existing row. This means that in some cases fewer rows will be added for closedness. 2 Preliminaries We recall the notions of nominal sets, nominal automata and nominal regular languages. We refer to Bojańczyk, et al. (2014) for a detailed account. Let 𝔸 be a countable set and let Perm(𝔸) be the set of permutations on 𝔸, i.e., the bijective functions π : 𝔸 → 𝔸. Permutations form a group where the identity permutation id is the unit element, inverse is functional inverse and multiplication is function composition. A nominal set (Pitts, 2013) is a set X together with a function ⋅ : Perm(𝔸) × X → X, interpreting permutations over X. Such function must be a group action of Perm(𝔸), i.e., it must satisfy id ⋅ x = x and π ⋅ (π′ ⋅ x) = (π ∘ π′ ) ⋅ x. We say that a finite A ⊂ 𝔸 supports x ∈ X whenever, for all π acting as the identity on A, we have π ⋅ x = x. In other words, permutations that only move elements outside A do not affect x. The support of x ∈ X, denoted supp(x), is the smallest finite set supporting x. We require nominal sets to have finite support, meaning that supp(x) exists for all x ∈ X. The orbit of x, denoted orb(x), is the set of elements in X reachable from x via permutations, explicitly orb(x) = {π ⋅ x | π ∈ Perm(𝔸)}. We say that X is orbit-finite whenever it is a union of finitely many orbits. Learning Nominal Automata 85 Given a nominal set X, a subset Y ⊆ X is equivariant if it is preserved by permutations, i.e., π ⋅ y ∈ Y, for all y ∈ Y. In other words, Y is a union of some orbits of X. This definition extends to the notion of an equivariant relation R ⊆ X × Y, by setting π⋅(x, y) = (π⋅x, π⋅y), for (x, y) ∈ R; similarly for relations of greater arity. The dimension of nominal set X is the maximal size of supp(x), for any x ∈ X. Every orbit-finite set has finite dimension. We define 𝔸(k) = {(a1 , …, ak ) | ai ≠ aj for i ≠ j}. For every single-orbit nominal set X with dimension k, there is a surjective equivariant map fX : 𝔸(k) → X. This map can be used to get an upper bound for the number of orbits of X1 × … × Xn , for Xi a nominal set with li orbits and dimension ki . Suppose Oi is an orbit of Xi . Then we have a surjection fO1 × ⋯ × fOn : 𝔸(ki ) × ⋯ × 𝔸(kn ) → O1 × ⋯ × On stipulating that the codomain cannot have more orbits than the domain. Let f𝔸 ({ki }) denote the number of orbits of 𝔸(k1 ) × ⋯ × 𝔸(kn ) , for any finite sequence of natural numbers {ki }. We can form at most l = l1 l2 …ln tuples of the form O1 × ⋯ × On , so X1 × ⋯ × Xn has at most lf𝔸 (k1 , …, kn ) orbits. For X single-orbit, the local symmetries are defined by the group {g ∈ Sk | fX (x1 , …, xk ) = fX (xg(1) , …, xg(k) ) for all xi ∈ X}, where k is the dimension of X and Sk is the symmetric group of permutations over k distinct elements. NFAs on sets have a finite state space. We can define nominal NFAs, with the requirement that the state space is orbit-finite and the transition relation is equivariant. A nominal NFA is a tuple (Q, A, Q0 , F, δ), where: – Q is an orbit-finite nominal set of states; – A is an orbit-finite nominal alphabet; – Q0 , F ⊆ Q are equivariant subsets of initial and final states; – δ ⊆ Q × A × Q is an equivariant transition relation. A nominal DFA is a special case of nominal NFA where Q0 = {q0 } and the transition relation is an equivariant function δ : Q × A → Q. Equivariance here can be rephrased as requiring δ(π ⋅ q, π ⋅ a) = π ⋅ δ(q, a). In most examples we take the alphabet to be A = 𝔸, but it can be any orbit-finite nominal set. For instance, A = Act × 𝔸, where Act is a finite set of actions, represents actions act(x) with one parameter x ∈ 𝔸 (actions with arity n can be represented via n-fold products of 𝔸). A language ℒ is nominal regular if it is recognised by a nominal DFA. The theory of nominal regular languages recasts the classical one using nominal concepts. A nominal Myhill-Nerode-style syntactic congruence is defined: w, w′ ∈ A∗ are equivalent w.r.t. ℒ, written w ≡ℒ w′ , whenever 86 Chapter 5 wv ∈ ℒ ⟺ w′ v ∈ ℒ for all v ∈ A∗ . This relation is equivariant and the set of equivalence classes [w]ℒ is a nominal set. Theorem 1. (Myhill-Nerode theorem for nominal sets by Bojańczyk, et al., 2014) Let ℒ be a nominal language. The following conditions are equivalent: 1. the set of equivalence classes of ≡ℒ is orbit-finite; 2. ℒ is recognised by a nominal DFA. Unlike what happens for ordinary regular languages, nominal NFAs and nominal DFAs are not equi-expressive. Here is an example of a language accepted by a nominal NFA, but not by a nominal DFA: ℒeq = {a1 …an | ai = aj , for some i < j ∈ {1, …, n}} . In the theory of nominal regular languages, several problems are decidable: Language inclusion and minimality test for nominal DFAs. Moreover, orbit-finite nominal sets can be finitely-represented, and so can be manipulated by algorithms. This is the key idea underpinning our implementation. 2.1 Different Atom Symmetries An important advantage of nominal set theory as considered by Bojańczyk, et al. (2014) is that it retains most of its properties when the structure of atoms 𝔸 is replaced with an arbitrary infinite relational structure subject to a few model-theoretic assumptions. An example alternative structure of atoms is the total order of rational numbers (ℚ, <), with the group of monotone bijections of ℚ taking the role of the group of all permutations. The theory of nominal automata remains similar, and an example nominal language over the atoms (ℚ, <) is: {a1 …an | ai ≤ aj , for some i < j ∈ {1, …, n}} which is recognised by a nominal DFA over those atoms. To simplify the presentation, in this chapter we concentrate on the “equality atoms” only. However, both the theory and the implementation can be generalised to other atom structures, with the “ordered atoms” (ℚ, <) as the simplest other example. We investigate the total order symmetry (ℚ, <) in Chapter 6. 3 Angluin’s Algorithm for Nominal DFAs In our algorithm, we will assume a teacher as described at the start of Section 1. In particular, the teacher is able to answer membership queries and equivalence queries, Learning Nominal Automata 87 now in the setting of nominal languages. We fix a target language ℒ, which is assumed to be a nominal regular language. The learning algorithm for nominal automata, νL∗ , will be very similar to L∗ in Algorithm 5.1. In fact, we only change the following lines: 6’ 11’ 16’ S ← S ∪ orb(sa) E ← E ∪ orb(ae) S ← S ∪ pref(orb(t)) (5.1) The basic data structure is an observation table (S, E, T ) where S and E are orbit-finite subsets of A∗ and T : S∪S⋅A×E → 2 is an equivariant function defined by T (s, e) = ℒ(se) for each s ∈ S ∪ S⋅A and e ∈ E. Since T is determined by ℒ we omit it from the notation. Let row : S∪S⋅A → 2E denote the curried counterpart of T. Let u ∼ v denote the relation row(u) = row(v). Definition 2. The table is called closed if for each t ∈ S⋅A there is a s ∈ S with t ∼ s. The table is called consistent if for each pair s1 , s2 ∈ S with s1 ∼ s2 we have s1 a ∼ s2 a for all a ∈ A. The above definitions agree with the abstract definitions given by Jacobs and Silva (2014) and we may use some of their results implicitly. The intuition behind the definitions is as follows. Closedness assures us that for each state we have a successor state for each input. Consistency assures us that each state has at most one successor for each input. Together it allows us to construct a well-defined minimal automaton from the observations in the table. The algorithm starts with a trivial observation table and tries to make it closed and consistent by adding orbits of rows and columns, filling the table via membership queries. When the table is closed and consistent it constructs a hypothesis automaton and poses an equivalence query. The pseudocode for the nominal version is the same as listed in Algorithm 5.1, modulo the changes displayed in (5.1). However, we have to take care to ensure that all manipulations and tests on the (possibly) infinite sets S, E and A terminate in finite time. We refer to Bojańczyk, et al. (2014) and Pitts (2013) for the full details on how to represent these structures and provide a brief sketch here. The sets S, E, A and S⋅A can be represented by choosing a representative for each orbit. The function T in turn can be represented by cells Ti,j : orb(si ) × orb(ej ) → 2 for each representative si and ej . Note, however, that the product of two orbits may consist of several orbits, so that Ti,j is not a single boolean value. Each cell is still orbit-finite and can be filled with only finitely many membership queries. Similarly the curried function row can be represented by a finite structure. To check whether the table is closed, we observe that if we have a corresponding row s ∈ S for some t ∈ S⋅A, this holds for any permutation of t. Hence it is enough to check the following: For all representatives t ∈ S⋅A there is a representative s ∈ S 88 Chapter 5 with row(t) = π ⋅ row(s) for some permutation π. Note that we only have to consider finitely many permutations, since the support is finite and so we can decide this property. Furthermore, if the property does not hold, we immediately find a witness represented by t. Consistency is a bit more complicated, but it is enough to consider the set of inconsistencies, {(s1 , s2 , a, e) | row(s1 ) = row(s2 ) ∧ row(s1 a)(e) ≠ row(s2 a)(e)}. It is an equivariant subset of S × S × A × E and so it is orbit-finite. Hence we can decide emptiness and obtain representatives if it is non-empty. Constructing the hypothesis happens in the same way as before (Section 1), where we note the state space is orbit-finite since it is a quotient of S. Moreover, the function row is equivariant, so all structure (Q0 , F and δ) is equivariant as well. The representation given above is not the only way to represent nominal sets. For example, first-order definable sets can be used as well (Klin & Szynwelski, 2016). From now on we assume to have set theoretic primitives so that each line in Algorithm 5.1 is well defined. 3.1 Correctness To prove correctness we only have to prove that the algorithm terminates, that is, only finitely many hypotheses will be produced. Correctness follows trivially from termination since the last step of the algorithm is an equivalence query to the teacher inquiring whether an hypothesis automaton accepts the target language. We start out by listing some facts about observation tables. Lemma 3. The relation ∼ is an equivariant equivalence relation. Furthermore, for all u, v ∈ S we have that u ≡ℒ v implies u ∼ v. This lemma implies that at any stage of the algorithm the number of orbits of S/∼ does not exceed the number of orbits of the minimal acceptor with state space A∗ /≡ℒ (recall that ≡ℒ is the nominal Myhill-Nerode equivalence relation). Moreover, the following lemma shows that the dimension of the state space never exceeds the dimension of the minimal acceptor. Recall that the dimension is the maximal size of the support of any state, which is different than the number of orbits. Lemma 4. We have supp([u]∼ ) ⊆ supp([u]≡ℒ ) ⊆ supp(u) for all u ∈ S. Lemma 5. The automaton constructed from a closed and consistent table is minimal. Proof. Follows from the categorical perspective by Jacobs and Silva (2014). □ We note that the constructed automaton is consistent with the table (we use that the set S is prefix-closed and E is suffix-closed (Angluin, 1987)). The following lemma shows that there are no strictly “smaller” automata consistent with the table. So the automaton is not just minimal, it is minimal w.r.t. the table. Learning Nominal Automata 89 Lemma 6. Let H be the automaton associated with a closed and consistent table (S, E). If M′ is an automaton consistent with (S, E) (meaning that se ∈ ℒ(M′ ) ⟺ se ∈ ℒ(H) for all s ∈ S ∪ S⋅A and e ∈ E) and M′ has at most as many orbits as H, then there is a surjective map f : QM′ → QH . If moreover – M′ s dimension is bounded by the dimension of H, i.e., supp(m) ⊆ supp(f(m)) for all m ∈ Q′M , and – M′ has no fewer local symmetries than H, i.e., π ⋅ f(m) = f(m) implies π ⋅ m = m for all m ∈ Q′M , then f defines an isomorphism M′ ≅ H of nominal DFAs. Proof. (All maps in this proof are equivariant.) Define a map row′ : Q′M → 2E by ∗ restricting the language map Q′M → 2A to E. First, observe that row′ (δ′ (q′0 , s)) = row(s) for all s ∈ S ∪ S⋅A, since ϵ ∈ E and M′ is consistent with the table. Second, we have {row′ (δ′ (q′0 , s)) | s ∈ S} ⊆ {row′ (q) | q ∈ M′ }. Let n be the number of orbits of H. The former set has n orbits by the first observation, the latter set has at most n orbits by assumption. We conclude that the two sets (both being equivariant) must be equal. That means that for each q ∈ M′ there is a s ∈ S such that row′ (q) = row(s). We see that row′ : M′ → {row′ (δ′ (q′0 , s)) | s ∈ S} = H is a surjective map. Since a surjective map cannot increase the dimensions of orbits and the dimensions of M′ are bounded, we note that the dimensions of the orbits in H and M′ have to agree. Similarly, surjective maps preserve local symmetries. This map must hence be an isomorphism of nominal sets. Note that row′ (q) = row′ (δ′ (q′0 , s)) implies q = δ′ (q′0 , s). It remains to prove that it respects the automaton structures. It preserve the initial state: row′ (q′0 ) = row(δ′ (q′0 , ϵ)) = row(ϵ). Now let q ∈ M′ be a state and s ∈ S such that row′ (q) = row(s). It preserves final states: q ∈ F′ ⟺ row′ (q)(ϵ) = 1 ⟺ row(s)(ϵ) = 1. Finally, it preserves the transition structure: row′ (δ′ (q, a)) = row′ (δ′ (δ′ (q′0 , s), a)) = row′ (δ′ (q′0 , sa)) = row(sa) = δ(row(s), a) □ The above proof is an adaptation of Angluin’s proof for automata over sets. We will now prove termination of the algorithm by proving that all steps are productive. Theorem 7. The algorithm terminates and is hence correct. Proof. Provided that the if-statements and set operations terminate, we are left proving that the algorithm adds (orbits of) rows and columns only finitely often. We start by proving that a table can be made closed and consistent in finite time. If the table is not closed, we find a row s1 ∈ S⋅A such that row(s1 ) ≠ row(s) for all s ∈ S. The algorithm then adds the orbit containing s1 to S. Since s1 was nonequivalent to all rows, we find that S ∪ orb(t)/∼ has strictly more orbits than S/∼. Since orbits of S/∼ cannot be more than those of A∗ /≡ℒ , this happens finitely often. 90 Chapter 5 Columns are added in case of an inconsistency. Here the algorithm finds two elements s1 , s2 ∈ S with row(s1 ) = row(s2 ) but row(s1 ae) ≠ row(s2 ae) for some a ∈ A and e ∈ E. Adding ae to E will ensure that row′ (s1 ) ≠ row′ (s2 ) (row′ is the function belonging to the updated observation table). If the two elements row′ (s1 ), row′ (s2 ) are in different orbits, the number of orbits is increased. If they are in the same orbit, we have row′ (s2 ) = π ⋅ row′ (s1 ) for some permutation π. Using row(s1 ) = row(s2 ) and row′ (s1 ) ≠ row′ (s2 ) we have: row(s1 ) = π ⋅ row(s1 ) row′ (s1 ) ≠ π ⋅ row′ (s1 ) Consider all such π and suppose there is a π and x ∈ supp(row(s1 )) such that π ⋅ x ∉ supp(row(s1 )). Then we find that π ⋅ x ∈ supp(row′ (s1 )), and so the support of the row has grown. By Lemma 4 this happens finitely often. Suppose such π and x do not exist, then we consider the finite group R = {ρ|supp([s1 ]∼ ) | row(s1 ) = ρ ⋅ row(s1 )}. We see that {ρ|supp([s1 ]∼ ) | row′ (s1 ) = ρ ⋅ row′ (s1 )} is a proper subgroup of R. So, adding a column in this case decreases the size of the group R, which can happen only finitely often. In this case a local symmetry is removed. In short, the algorithm will succeed in producing a hypothesis in each round. It remains to prove that it needs only finitely many equivalence queries. Let (S, E) be the closed and consistent table and H its corresponding hypothesis. If it is incorrect, then a second hypothesis H′ will be constructed which is consistent with the old table (S, E). The two hypotheses are nonequivalent, as H′ will handle the counterexample correctly and H does not. Therefore, H′ will have at least one orbit more, one local symmetry less, or one orbit will have strictly bigger dimension (Lemma 6), all of which can only happen finitely often. □ We remark that all the lemmas and proofs as above are close to the original ones of Angluin. However, two things are crucially different. First, adding a column does not always increase the number of (orbits of) states. It can happen that by adding a column a bigger support is found or that a local symmetry is broken. Second, the new hypothesis does not necessarily have more states, again it might have bigger dimensions or less local symmetries. From the proof Theorem 7 we observe moreover that the way we handle counterexamples is not crucial. Any other method which ensures a nonequivalent hypothesis will work. In particular our algorithm is easily adapted to include optimisations such as the ones by Maler and Pnueli (1995) and Rivest and Schapire (1993), where counterexamples are added as columns.17 17 The additional optimisation of omitting the consistency check (Rivest & Schapire, 1993) cannot be done: we always add a whole orbit to S (to keep the set equivariant) and inconsistencies can arise within an orbit. Learning Nominal Automata x q0 x x z q1,x y y q2,x,y T1 ϵ ϵ a ab aa aba abb abc 0 0 1 0 0 0 1 T2 ϵ ϵ a′ 0 0 a 0 1 {0 if a′ ≠ a else ab 1 1 {0 if a′ ≠ a,b else 91 aa 0 0 aba 0 0 abb 0 1 {0 if a′ ≠ a else abc 1 1 {0 if a′ ≠ a,b else T3 ϵ ϵ a′ 0 0 a 0 1 {0 if a′ ≠ a else ab 1 1 {0 if a′ ≠ a,b 1 if (b′ ≠ a,b ∧ a′ ≠ a,b) ∨ (b′ = b ∧ a′ ≠ a) else { 0 else b′ a′ 1 aa 0 0 aba 0 0 1 {0 if a ≠ a′ ,b′ else 1 1 abb 0 1 {0 if a′ ≠ a else 1 {0 if a ≠ a′ ,b′ else abc 1 1 {0 if a′ ≠ a,b 1 if (b′ ≠ a,b ∧ a′ ≠ a,b) ∨ (b′ = b ∧ a′ ≠ a) else { 0 else Figure 5.1 Example automaton to be learnt and three subsequent tables computed by νL∗ . In the automaton, x, y, z denote distinct atoms. 92 Chapter 5 3.2 Example Consider the target automaton in Figure 5.1 and an observation table T1 at some stage during the algorithm. We remind the reader that the table is represented in a symbolic way: The sequences in the rows and columns stand for whole orbits and the cells denote functions from the product of the orbits to 2. Since the cells can consist of multiple orbits, where each orbit is allowed to have a different value, we use a formula to specify which orbits have a 1. The table T1 has to be checked for closedness and consistency. We note that it is definitely closed. For consistency we check the rows row(ϵ) and row(a) which are equal. Observe, however, that row(ϵb)(ϵ) = 0 and row(ab)(ϵ) = 1, so we have an inconsistency. The algorithm adds the orbit orb(b) as column and extends the table, obtaining T2 . We note that, in this process, the number of orbits did grow, as the two rows are split. Furthermore, we see that both row(a) and row(ab) have empty support in T1 , but not in T2 , because row(a)(a′ ) depends on a′ being equal or different from a, similarly for row(ab)(a′ ). The table T2 is still not consistent as we see that row(ab) = row(ba) but row(abb)(c) = 1 and row(bab)(c) = 0. Hence the algorithm adds the columns orb(bc), obtaining table T3 . We note that in this case, no new orbits are obtained and no support has grown. In fact, the only change here is that the local symmetry between row(ab) and row(ba) is removed. This last table, T3 , is closed and consistent and will produce the correct hypothesis. 3.3 Query Complexity In this section, we will analyse the number of queries made by the algorithm in the worst case. Let M be the minimal target automaton with n orbits and of dimension k. We will use log in base two. Lemma 8. The number of equivalence queries En,k is 𝒪(nk log k). Proof. By Lemma 6 each hypothesis will be either 1) bigger in the number of orbits, which is bounded by n, or 2) bigger in the dimension of an orbit, which is bounded by k or 3) smaller in local symmetries of an orbit. For the last part we want to know how long a subgroup series of the permutation group Sk can be. This is bounded by the number of divisors of k!, as each subgroup divides the order of the group. We can easily bound the number of divisors of any m by log m and so one can at take a subgroup at most k log k times when starting with Sk .18 18 After publication we found a better bound by Cameron, et al. (1989): the length of the longest chain of subgroups of Sk is ⌈ 32 k⌉ − b(k) − 1, where b(k) is the number of ones in the binary representation of k. This gives a linear bound in k, instead of the ‘linearithmic’ bound. Learning Nominal Automata 93 Since the hypothesis will grow monotonically in the number of orbits and for each orbit will grow monotonically w.r.t. the remaining two dimensions, the number of equivalence queries is bound by n + n(k + k log k). □ Next we will give a bound for the size of the table. Lemma 9. The table has at most n + mEn,k orbits in S with sequences of at most length n+m, where m is the length of the longest counter example given by the teacher. The table has at most n(k + k log k + 1) orbits in E of at most length n(k + k log k + 1) Proof. In the termination proof we noted that rows are added at most n times. In addition (all prefixes of) counter examples are added as rows which add another mEn,k rows. Obviously counter examples are of length at most m and are extended at most n times, making the length at most m + n in the worst case. For columns we note that one of three dimensions approaches a bound similarly to the proof of Lemma 8. So at most n(k + k log k + 1) columns are added. Since they are suffix closed, the length is at most n(k + k log k + 1). □ Let p and l denote respectively the dimension and the number of orbits of A. Lemma 10. The number of orbits in the lower part of the table, S⋅A, is bounded by (n + mEn,k )lf𝔸 (p(n + m), p). Proof. Any sequence in S is of length at most n + m, so it contains at most p(n + m) distinct atoms. When we consider S⋅A, the extension can either reuse atoms from those p(n + m), or none at all. Since the extra letter has at most p distinct atoms, the set 𝔸(p(n+m)) × 𝔸(p) gives a bound f𝔸 (p(n + m), p) for the number of orbits of OS × OA , with OX an orbit of X. Multiplying by the number of such ordered pairs, namely (n + mEn,k )l, gives a bound for S⋅A. □ Let Cn,k,m = (n+mEn,k )(lf𝔸 (p(n+m), p)+1)n(k+k log k+1) be the maximal number of cells in the table. We note that this number is polynomial in k, l, m and n but it is not polynomial in p. Corollary 11. The number of membership queries is bounded by Cn,k,m f𝔸 (p(n + m), pn(k + k log k + 1)). 4 Learning Non-Deterministic Nominal Automata In this section, we introduce a variant of νL∗ , which we call νNL∗ , where the learnt automaton is non-deterministic. It will be based on the NL∗ algorithm by Bollig, et al. (2009), an Angluin-style algorithm for learning NFAs. The algorithm is shown 94 Chapter 5 in Algorithm 5.2. We first illustrate NL∗ , then we discuss its extension to nominal automata. NL∗ crucially relies on the use of residual finite-state automata (RFSA) (Denis, et al., 2002), which are NFAs admitting unique minimal canonical representatives. The states of this automaton correspond to Myhill-Nerode right-congruence classes, but can be exponentially smaller than the corresponding minimal DFA: Composed states, language-equivalent to sets of other states, can be dropped. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 S, E ← {ϵ} repeat while (S, E) is not RFSA-closed or not RFSA-consistent do if (S, E) is not RFSA-closed then find s ∈ S, a ∈ A such that row(sa) ∈ PR(S, E) ∖ PR⊤ (S, E) S ← S ∪ {sa} end if if (S, E) is not RFSA-consistent then find s1 , s2 ∈ S, a ∈ A, and e ∈ E such that row(s1 ) ⊑ row(s2 ) and ℒ(s1 ae) = 1, ℒ(s2 ae) = 0 E ← E ∪ {ae} end if end while Make the conjecture N(S, E) if the Teacher replies no, with a counter-example t then E ← E ∪ suff(t) end if until the Teacher replies yes to the conjecture N(S, E) return N(S, E) Algorithm 5.2 Algorithm for learning NFAs by Bollig, et al. (2009). The algorithm NL∗ equips the observation table (S, E) with a union operation, allowing for the detection of composed and prime rows. Definition 12. Let (row(s1 ) ⊔ row(s2 ))(e) = row(s1 )(e) ∨ row(s2 )(e) (regarding cells as booleans). This operation induces an ordering between rows: row(s1 ) ⊑ row(s2 ) whenever row(s1 )(e) = 1 implies row(s2 )(e) = 1, for all e ∈ E. A row row(s) is composed if row(s) = row(s1 ) ⊔ ⋯ ⊔ row(sn ), for row(si ) ≠ row(s). Otherwise it is prime. We denote by PR⊤ (S, E) the rows in the top part of the table (ranging over S) which are prime w.r.t. the whole table (not only w.r.t. the top part). We write PR(S, E) for all the prime rows of (S, E). Learning Nominal Automata 95 As in L∗ , states of hypothesis automata will be rows of (S, E) but, as the aim is to construct a minimal RFSA, only prime rows are picked. New notions of closedness and consistency are introduced, to reflect features of RFSAs. Definition 13. A table (S, E) is: – RFSA-closed if, for all t ∈ S⋅A, row(t) = ⨆{row(s) ∈ PR⊤ (S, E) | row(s) ⊑ row(t)}; – RFSA-consistent if, for all s1 , s2 ∈ S and a ∈ A, row(s1 )⊑row(s2 ) implies row(s1 a)⊑ row(s2 a). If (S, E) is not RFSA-closed, then there is a row in the bottom part of the table which is prime, but not contained in the top part. This row is then added to S (line 5). If (S, E) is not RFSA-consistent, then there is a suffix which does not preserve the containment of two existing rows, so those rows are actually incomparable. A new column is added to distinguish those rows (line 10). Notice that counterexamples supplied by the teacher are added to columns (line 16). Indeed, it is shown by Bollig, et al. (2009) that treating the counterexamples as in the original L∗ , namely adding them to rows, does not lead to a terminating algorithm. Definition 14. Given a RFSA-closed and RFSA-consistent table (S, E), the conjecture automaton is N(S, E) = (Q, Q0 , F, δ), where: – Q = PR⊤ (S, E); – Q0 = {r ∈ Q | r ⊑ row(ϵ)}; – F = {r ∈ Q | r(ϵ) = 1}; – the transition relation is given by δ(row(s), a) = {r ∈ Q | r ⊑ row(sa)}. As observed by Bollig, et al. (2009), N(S, E) is not necessarily a RFSA, but it is a canonical RFSA if it is consistent with (S, E). If the algorithm terminates, then N(S, E) must be consistent with (S, E), which ensures correctness. The termination argument is more involved than that of L∗ , but still it relies on the minimal DFA. Developing an algorithm to learn nominal NFAs is not an obvious extension of NL∗ : Non-deterministic nominal languages strictly contain nominal regular languages, so it is not clear what the developed algorithm should be able to learn. To deal with this, we introduce a nominal notion of RFSAs. They are a proper subclass of nominal NFAs, because they recognise nominal regular languages. Nonetheless, they are more succinct than nominal DFAs. 4.1 Nominal Residual Finite-State Automata Let ℒ be a nominal language and u be a finite string. The derivative of ℒ w.r.t. u is u−1 ℒ = {v ∈ A∗ | uv ∈ ℒ}. A language ℒ′ ⊆ 𝔸∗ is a residual of ℒ if there is u with ℒ′ = u−1 ℒ. Note that a residual might not be equivariant, but it does have a finite support. We write R(ℒ) for the set of 96 Chapter 5 residuals of ℒ. Residuals form an orbit-finite nominal set: They are in bijection with the state-space of the minimal nominal DFA for ℒ. A nominal residual finite-state automaton for ℒ is a nominal NFA whose states are subsets of such minimal automaton. Given a state q of an automaton, we write ℒ(q) for the set of words leading from q to a set of states containing a final one. Definition 15. A nominal residual finite-state automaton (nominal RFSA) is a nominal NFA 𝒜 such that ℒ(q) ∈ R(ℒ(𝒜)), for all q ∈ Q𝒜 . Intuitively, all states of a nominal RSFA recognise residuals, but not all residuals are recognised by a single state: There may be a residual ℒ′ and a set of states Q′ such that ℒ′ = ⋃q∈Q′ ℒ(q), but no state q′ is such that ℒ(q′ ) = ℒ′ . A residual ℒ′ is called composed if it is equal to the union of the components it strictly contains, explicitly ℒ′ = ∪{ℒ″ ∈ R(ℒ) | ℒ″ ⊊ ℒ′ }; otherwise it is called prime. In an ordinary RSFA, composed residuals have finitelymany components. This is not the case in a nominal RFSA. However, the set of components of ℒ′ always has a finite support, namely supp(ℒ′ ). The set of prime residuals PR(ℒ) is an orbit-finite nominal set, and can be used to define a canonical nominal RFSA for ℒ, which has the minimal number of states and the maximal number of transitions. This can be regarded as obtained from the minimal nominal DFA, by removing composed states and adding all initial states and transitions that do not change the recognised language. This automaton is necessarily unique. Lemma 16. Let the canonical nominal RSFA of ℒ be (Q, Q0 , F, δ) such that: – Q = PR(ℒ); – Q0 = {ℒ′ ∈ Q | ℒ′ ⊆ ℒ}; – F = {ℒ′ ∈ Q | ϵ ∈ ℒ′ }; – δ(ℒ1 , a) = {ℒ2 ∈ Q | ℒ2 ⊆ a−1 ℒ1 }. It is a well-defined nominal NFA accepting ℒ. 4.2 νNL∗ Our nominal version of NL∗ again makes use of an observation table (S, E) where S and E are equivariant subsets of A∗ and row is an equivariant function. As in the basic algorithm, we equip (S, E) with a union operation ⊔ and row containment relation ⊑, defined as in Definition 12. It is immediate to verify that ⊔ and ⊑ are equivariant. Our algorithm is a simple modification of the algorithm in Algorithm 5.2, where a few lines are replaced: Learning Nominal Automata 6’ 11’ 16’ 97 S ← S ∪ orb(sa) E ← E ∪ orb(ae) E ← E ∪ suff(orb(t)) Switching to nominal sets, several decidability issues arise. The most critical one is that rows may be the union of infinitely many component rows, as happens for residuals of nominal languages, so finding all such components can be challenging. We adapt the notion of composed to rows: row(t) is composed whenever row(t) = ⨆{row(s) | row(s) ⊏ row(t)}. where ⊏ is strict row inclusion; otherwise row(t) is prime. We now check that three relevant parts of our algorithm terminate. 1. Row containment check. The basic containment check row(s) ⊑ row(t) is decidable, as row(s) and row(t) are supported by the finite supports of s and t respectively. 2. RFSA-Closedness and RFSA-Consistency Checks. (Line 3) We first show that prime rows form orbit-finite nominal sets. Lemma 17. PR(S, E), PR⊤ (S, E) and PR(S, E) ∖ PR⊤ (S, E) are orbit-finite nominal sets. Consider now RFSA-closedness. It requires computing the set C(row(t)) of components of row(t) contained in PR⊤ (S, E) (possibly including row(t)). This may not be equivariant under permutations Perm(𝔸), but it is if we pick a subgroup. Lemma 18. The set C(row(t)) has the following properties: – supp(C(row(t))) ⊆ supp(row(t)). – it is equivariant and orbit-finite under the action of the group Gt = {π ∈ Perm(𝔸) | π|supp(row(t)) = id} of permutations fixing supp(row(t)). We established that C(row(t)) can be effectively computed, and the same holds for ⨆ C(row(t)). In fact, ⨆ is equivariant w.r.t. the whole Perm(𝔸) and then, in particular, w.r.t. Gt , so it preserves orbit-finiteness. Now, to check row(t) = ⨆ C(row(t)), we can just pick one representative of every orbit of S⋅A, because we have C(π ⋅ row(t)) = π ⋅ C(row(t)) and permutations distribute over ⊔, so permuting both sides of the equation gives again a valid equation. For RFSA-consistency, consider the two sets N = {(s1 , s2 ) ∈ S × S | row(s1 ) ⊑ row(s2 )}, and M = {(s1 , s2 ) ∈ S × S | ∀a ∈ A : row(s1 a) ⊑ row(s2 a)}. 98 Chapter 5 They are both orbit-finite nominal sets, by equivariance of row, ⊑ and A. We can check RFSA-consistency in finite time by picking orbit representatives from N and M. For each representative n ∈ N, we look for a representative m ∈ M and a permutation π such that n = π ⋅ m. If no such m and π exist, then n does not belong to any orbit of M, so it violates RFSA-consistency. 3. Finding Witnesses for Violations. (Lines 5 and 10) We can find witnesses by comparing orbit representatives of orbit-finite sets, as we did with RFSA-consistency. Specifically, we can pick representatives in S × A and S × S × A × E and check them against the following orbit-finite nominal sets: – {(s, a) ∈ S × A | row(sa) ∈ PR(S, E) ∖ PR⊤ (S, E)}; – {(s1 , s2 , a, e) ∈ S × S × A × E | row(s1 a)(e) = 1, row(s2 a)(e) = 0, row(s1 ) ⊑ row(s2 )}; 4.3 Correctness Now we prove correctness and termination of the algorithm. First, we prove that hypothesis automata are nominal NFAs. Lemma 19. The hypothesis automaton N(S, E) (see Definition 14) is a nominal NFA. N(S, E), as in ordinary NL∗ , is not always a nominal RFSA. However, we have the following. Theorem 20. If the table (S, E) is RFSA-closed, RFSA-consistent and N(S, E) is consistent with (S, E), then N(S, E) is a canonical nominal RFSA. This is proved by Bollig, et al. (2009) for ordinary RFSAs, using the standard theory of regular languages. The nominal proof is exactly the same, using derivatives of nominal regular languages and nominal RFSAs as defined in Section 4.1. Lemma 21. The table (S, E) cannot have more than n orbits of distinct rows, where n is the number of orbits of the minimal nominal DFA for the target language. Proof. Rows are residuals of ℒ, which are states of the minimal nominal DFA for ℒ, so orbits cannot be more than n. □ Theorem 22. The algorithm νNL∗ terminates and returns the canonical nominal RFSA for ℒ. Proof. If the algorithm terminates, then it must return the canonical nominal RFSA for ℒ by Theorem 20. We prove that a table can be made RFSA-closed and RFSAconsistent in finite time. This is similar to the proof of Theorem 7 and is inspired by the proof of Theorem 2 of Bollig, et al. (2009). Learning Nominal Automata 99 If the table is not RFSA-closed, we find a row s ∈ S⋅A such that row(s) ∈ PR(S, E) ∖ PR⊤ (S, E). The algorithm then adds orb(s) to S. Since s was nonequivalent to all upper prime rows, and thus from all the rows indexed by S, we find that S ∪ orb(t)/∼ has strictly more orbits than S/∼ (recall that s ∼ t ⟺ row(s) = row(t)). This addition can only be done finitely many times, because the number of orbits of S/∼ is bounded, by Lemma 21. Now, the case of RFSA-consistency needs some additional notions. Let R be the (orbit-finite) nominal set of all rows, and let I = {(r, r′ ) ∈ R × R | r ⊏ r′ } be the set of all inclusion relations among rows. The set I is orbit-finite. In fact, consider J = {(s, t) ∈ (S ∪ S⋅A) × (S ∪ S⋅A) | row(s) ⊏ row(t)}. This set is an equivariant, thus orbit-finite, subset of (S ∪ S⋅A) × (S ∪ S⋅A). The set I is the image of J via row × row, which is equivariant, so it preserves orbit-finiteness. Now, suppose the algorithm finds two elements s1 , s2 ∈ S with row(s1 ) ⊑ row(s2 ) but row(s1 a)(e) = 1 and row(s2 a)(e) = 0 for some a ∈ A and e ∈ E. Adding a column to fix RFSA-consistency may: C1) increase orbits of (S ∪ S⋅A)/∼, or; C2) decrease orbits of I, or; C3) decrease local symmetries/increase dimension of one orbit of rows. In fact, if no new rows are added (C1), we have two cases. – If row(s1 ) ⊏ row(s2 ), i.e., (row(s1 ), row(s2 )) ∈ I, then row′ (s1 ) ̸⊏ row′ (s2 ), where row′ is the new table. Therefore the orbit of (row′ (s1 ), row′ (s2 )) is not in I. Moreover, row′ (s) ⊏ row′ (t) implies row(s) ⊏ row(t) (as no new rows are added), so no new pairs are added to I. Overall, I has less orbits (C2). – If row(s1 ) = row(s2 ), then we must have row(s1 ) = π ⋅ row(s1 ), for some π, because lines 4–7 forbids equal rows in different orbits. In this case row′ (s1 ) ≠ π ⋅ row′ (s1 ) and we can use part of the proof of Theorem 7 to see that the orbit of row′ (s1 ) has bigger dimension or less local symmetries than that of row(s1 ) (C3). Orbits of (S∪S⋅A)/∼ and of I are finitely-many, by Lemma 21 and what we proved above. Moreover, local symmetries can decrease finitely many times, and the dimension of each orbit of rows is bounded by the dimension of the minimal DFA state-space. Therefore all the above changes can happen finitely many times. We have proved that the table eventually becomes RFSA-closed and RFSA-consistent. Now we prove that a finite number of equivalence queries is needed to reach the final hypothesis automaton. To do this, we cannot use a suitable version of Lemma 6, because this relies on N(S, E) being consistent with (S, E), which in general is not true (see (Bollig, et al., 2008) for an example of this). We can, however, use an argument similar to that for RFSA-consistency, because the algorithm adds columns in response to counterexamples. Let w the counterexample provided by the teacher. When 16′ is executed, the table must change. In fact, by Lemma 2 of Bollig, et al. (2009), if it does not, then w is already correctly classified by N(S, E), which is absurd. We have the following cases. E1) orbits of (S ∪ S⋅A)/∼ increase (C1). Or, E2) either: Orbits in PR(S, E) increase, or any of the following happens: Orbits in I decrease (C2), local symmetries/dimension of an orbit of rows change (C3). In fact, if E1 does not happen 100 Chapter 5 and PR(S, E), I and local symmetries/dimension of orbits of rows do not change, the automaton 𝒜 for the new table coincides with N(S, E). But N(S, E) = 𝒜 is a contradiction, because 𝒜 correctly classifies w (by Lemma 2 of Bollig, et al. (2009), as w now belongs to columns), whereas N(S, E) does not. Both E1 and E2 can only happen finitely many times. □ 4.4 Query Complexity We now give bounds for the number of equivalence and membership queries needed by νNL∗ . Let n be the number of orbits of the minimal DFA M for the target language and let k be the dimension (i.e., the size of the maximum support) of its nominal set of states. Lemma 23. The number of equivalence queries E′n,k is O(n2 f𝔸 (k, k) + nk log k). Proof. In the proof of Theorem 22, we saw that equivalence queries lead to more orbits in (S ∪ S⋅A)/∼, in PR(S, E), less orbits in I or less local symmetries/bigger dimension for an orbit. Clearly the first two ones can happen at most n times. We now estimate how many times I can decrease. Suppose (S ∪ S⋅A)/∼ has d orbits and ℎ orbits are added to it. Recall that, given an orbit O of rows of dimension at most m, f𝔸 (m, m) is an upper bound for the number of orbits in the product O × O. Since the support of rows is bounded by k, we can give a bound for the number of orbits added to I: dℎf𝔸 (k, k), for new pairs r ⊏ r′ with r in a new orbit of rows and r′ in an old one (or vice versa); plus (ℎ(ℎ − 1)/2)f𝔸 (k, k), for r and r′ both in (distinct) new orbits; plus ℎf𝔸 (k, k), for r and r′ in the same new orbit. Notice that, if PR(S, E) grows but (S ∪ S⋅A)/∼ does not, I does not increase. By Lemma 21, ℎ, d ≤ n, so I cannot decrease more than (n2 + n(n − 1)/2 + n)f𝔸 (k, k) times. Local symmetries of an orbit of rows can decrease at most k log k times (see proof of Lemma 8), and its dimension can increase at most k times. Therefore n(k + log k) is a bound for all the orbits of rows, which are at most n, by Lemma 21. Summing up, we get the main result. □ Lemma 24. Let m be the length of the longest counterexample given by the teacher. Then the table has: – at most n orbits in S, with words of length at most n; – at most mE′n,k orbits in E, with words of length at most mE′n,k . Proof. By Lemma 21, the number of orbits of rows indexed by S is at most n. Now, notice that line 5 does not add orb(sa) to S if sa ∈ S, and lines 16 and 11 cannot identify rows, so S has at most n orbits. The length of the longest word in S must be at most n, as S = {ϵ} when the algorithm starts, and line 6’ adds words with one additional symbol than those in S. Learning Nominal Automata 101 For columns, we note that both fixing RFSA-consistency and adding counterexamples increase the number of columns, but this can happen at most E′n,k times (see proof of Lemma 23). Each time at most m suffixes are added to E. □ We compute the maximum number of cells as in Section 3.3. Lemma 25. The number of orbits in the lower part of the table, S⋅A, is bounded by nlf𝔸 (pn, p). Then C′n,k,m = n(lf𝔸 (pn, p) + 1)mE′n,k is the maximal number of cells in the table. This bound is polynomial in n, m and l, but not in k and p. Corollary 26. The number of membership queries is bounded by C′n,k,m f𝔸 (pn, pmE′n,k ). 5 Implementation and Preliminary Experiments Our algorithms for learning nominal automata operate on infinite sets of rows and columns, and hence it is not immediately clear how to actually implement them on a computer. We have used NLambda, a recently developed Haskell library by Klin and Szynwelski (2016) designed to allow direct manipulation of infinite (but orbitfinite) nominal sets, within the functional programming paradigm. The semantics of NLambda is based by Bojańczyk, et al. (2012), and the library itself is inspired by Fresh O’Caml by Shinwell (2006), a language for functional programming over nominal data structures with binding. 5.1 NLambda NLambda extends Haskell with a new type Atoms. Values of this type are atomic values that can be compared for equality and have no other discernible structure. They correspond to the elements of the infinite alphabet 𝔸 described in Section 2. Furthermore, NLambda provides a unary type constructor Set. This appears similar to the the Data.Set type constructor from the standard Haskell library, but its semantics is markedly different: Whereas the latter is used to construct finite sets, the former has orbit-finite sets as values. The new constructor Set can be applied to a range of equality types that include Atoms, but also the tuple type (Atoms, Atoms), the list type [Atoms], the set type Set Atoms, and other types that provide basic infrastructure necessary to speak of supports and orbits. All these are instances of a type class NominalType specified in NLambda for this purpose. NLambda, in addition to all the standard machinery of Haskell, offers primitives to manipulate values of any nominal types τ, σ: – empty : Set τ, returns the empty set of any type; – atoms : Set Atoms, returns the (infinite but single-orbit) set of all atoms; 102 – – – – Chapter 5 insert : τ → Set τ → Set τ, adds an element to a set; map : (τ → σ) → (Set τ → Set σ), applies a function to every element of a set; sum : Set (Set τ) → Set τ, computes the union of a family of sets; isEmpty : Set τ → Formula, checks whether a set is empty. The type Formula has the role of a Boolean type. For technical reasons, it is distinct from the standard Haskell type Bool, but it provides standard logical operations, e.g., not : Formula → Formula, or : Formula → Formula → Formula, as well as a conditional operator ite : Formula → τ → τ → τ that mimics the standard if-then-else construction. It is also the result type of a built-in equality test on atoms: eq : Atoms → Atoms → Formula. Using these primitives, one builds more functions to operate on orbit-finite sets, such as a function to build singleton sets: singleton : τ → Set τ singleton x = insert x empty or a filtering function to select elements that satisfy a given predicate: filter : (τ → Formula) → Set τ → Set τ filter p s = sum (map (λx. ite (p x) (singleton x) empty) s) or functions to quantify a predicate over a set: exists, forall : (τ → Formula) → Set τ → Formula exists p s = not (isEmpty (filter p s)) forall p s = isEmpty (filter (λx. not (p x)) s) and so on. Note that these functions are written in exactly the same way as they would be for finite sets and the standard Data.Set type. This is not an accident, and indeed the programmer can use the convenient set-theoretic intuition of NLambda primitives. For example, one could conveniently construct various orbit-finite sets such as the set of all pairs of atoms: atomPairs = sum (map (λx. map (λy. (x, y)) atoms) atoms), the set of all pairs of distinct atoms: distPairs = filter (λ(x, y). not (eq x y)) atomPairs and so on. It should be stressed that all these constructions terminate in finite time, even though they formally involve infinite sets. To achieve this, values of orbit-finite set Learning Nominal Automata 103 types Set τ are internally not represented as lists or trees of elements of type τ. Instead, they are stored and manipulated symbolically, using first-order formulas over variables that range over atom values. For example, the value of distPairs above is stored as the formal expression: {(a, b) | a, b ∈ 𝔸, a ≠ b} or, more specifically, as a triple: – a pair (a, b) of “atom variables”, – a list [a, b] of those atom variables that are bound in the expression (in this case, the expression contains no free variables), – a formula a ≠ b over atom variables. All the primitives listed above, such as isEmpty, map and sum, are implemented on this internal representation. In some cases, this involves checking the satisfiability of certain formulas over atoms. In the current implementation of NLambda, an external SMT solver Z3 (de Moura & Bjørner, 2008) is used for that purpose. For example, to evaluate the expression isEmpty distPairs, NLambda makes a system call to the SMT solver to check whether the formula a ≠ b is satisfiable in the first-order theory of equality and, after receiving the affirmative answer, returns the value False. For more details about the semantics and implementation of NLambda, see Klin and Szynwelski (2016). The library itself can be downloaded from https://www .mimuw.edu.pl/~szynwelski/nlambda/. 5.2 Implementation of νL∗ and νNL∗ Using NLambda we implemented the algorithms from Sections 3 and 4. We note that the internal representation is slightly different than the one discussed in Section 3. Instead of representing the table (S, E) with actual representatives of orbits, the sets are represented logically as described above. Furthermore, the control flow of the algorithm is adapted to fit in the functional programming paradigm. In particular, recursion is used instead of a while loop. In addition to the nominal adaptation of Angluin’s algorithm νL∗ , we implemented a variant, νL∗col which adds counterexamples to the columns instead of rows. Target automata are defined using NLambda as well, using the automaton data type provided by the library. Membership queries are already implemented by the library. Equivalence queries are implemented by constructing a bisimulation (recall that bisimulation implies language equivalence), where a counterexample is obtained when two DFAs are not bisimilar. For nominal NFAs, however, we cannot implement a complete equivalence query as their language equivalence is undecidable. We approximated the equivalence by bounding the depth of the bisimulation for nominal NFAs. As an optimisation, we use bisimulation up to congruence as described by Bonchi and Pous (2015). Having an approximate teacher is a minor issue since in 104 Chapter 5 many applications no complete teacher can be implemented and one relies on testing (Aarts, et al., 2015 and Bollig, et al., 2013). For the experiments listed here the bound was chosen large enough for the learner to terminate with the correct automaton. The code can be found at https://github.com/Jaxan/nominal-lstar. 5.3 Test Cases To provide a benchmark for future improvements, we tested our algorithms on simple automata described below. We report results in Table 5.1. The experiments were performed on a machine with an Intel Core i5 (Skylake, 2.4 GHz) and 8 GB RAM. Model FIFO0 FIFO1 FIFO2 FIFO3 FIFO4 FIFO5 ℒ0 ℒ1 ℒ2 ℒ′0 ℒ′1 ℒ′2 ℒ′3 ℒeq 2 0 3 1 5 2 10 3 25 4 77 5 2 0 4 1 7 2 3 1 5 1 9 1 17 1 n/a n/a νL∗ (s) νL∗ col (s) 1.9 12.9 45.6 189 370 1337 1.3 29.6 229 4.4 15.4 46.3 89.0 n/a 1.9 7.4 22.6 107 267 697 1.4 4.7 23.1 4.9 15.4 40.5 66.8 n/a νNL∗ (s) 2 3 5 10 25 0 1 2 3 4 2.4 17.3 70.3 476 1230 ∞ ∞ ∞ 2 4 7 3 4 5 6 3 0 1 2 1 1 1 1 1 1.4 8.9 84.7 11.3 66.4 210 566 16.3 Table 5.1 Results of experiments. The column DFA (resp. RFSA) shows the number of orbits (left sub-column) and dimension (right sub-column) of the learnt minimal DFA (resp. canonical RFSA). We use ∞ when the running time is too high. Queue Data Structure. A queue is a data structure to store elements which can later be retrieved in a first-in, first-out order. It has two operations: push and pop. We define the alphabet ΣFIFO = {push(a), pop(a) | a ∈ 𝔸}. The language FIFOn contains all valid traces of push and pop using a bounded queue of size n. The minimal nominal DFA for FIFO2 is given in Figure 5.2. The state reached from q1,x via push(x) is omitted: Its outgoing transitions are those of q2,x,y , where y is replaced by x. Similar benchmarks appear in (Aarts, et al., 2015 and Isberner, et al., 2014). Double Word. ℒn = {ww | w ∈ 𝔸n } from Section 1. Learning Nominal Automata push(y) push(x) pop(𝔸) ∗ q2,x,y q1,x q0 105 pop(x) pop(x) to q1,y pop(≠ x) ⊥ pop(≠ x) push(𝔸) Figure 5.2 A nominal automaton accepting FIFO2 . NFA. Consider the language ℒeq = ⋃a∈𝔸 𝔸∗ a𝔸∗ a𝔸∗ of words where some letter appears twice. This is accepted by an NFA which guesses the position of the first occurrence of a repeated letter a and then waits for the second a to appear. The language is not accepted by a DFA (Bojańczyk, et al., 2014). Despite this νNL∗ is able to learn the automaton shown in Figure 5.3. 𝔸 𝔸 𝔸 x x q′1,x q′0 q′2 𝔸 to any q′2,x 𝔸 y to q′2,y 𝔸 Figure 5.3 A nominal NFA accepting ℒeq . Here, the transition from q′2 to q′1,x is defined as δ(q′2 , a) = {q′1,b | b ∈ 𝔸}. 𝐧-last Position. A prototypical example of regular languages which are accepted by very small NFAs is the set of words where a distinguished symbol a appears on the n-last position (Bollig, et al., 2009). We define a similar nominal language ℒ′n = ⋃a∈𝔸 a𝔸∗ a𝔸n . To accept such words non-deterministically, one simply guesses the n-last position. This language is also accepted by a much larger deterministic automaton. 6 Related Work This section compares νL∗ with other algorithms from the literature. We stress that no comparison is possible for νNL∗ , as it is the first learning algorithm for non-deterministic automata over infinite alphabets. 106 Chapter 5 The first one to consider learning automata over infinite alphabets was Sakamoto (1997). In his work the problem is reduced to L∗ with some finite sub-alphabet. The sub-alphabet grows in stages and L∗ is rerun at every stage, until the alphabet is big enough to capture the whole language. In Sakamoto’s approach, any learning algorithm can be used as a back-end. This, however, comes at a cost: It has to be rerun at every stage, and each symbol is treated in isolation, which might require more queries. Our algorithm νL∗ , instead, works with the whole alphabet from the very start, and it exploits its symmetry. An example is in Sections 1.1 and 1.2: The ordinary learner uses four equivalence queries, whereas the nominal one, using the symmetry, only needs three. Moreover, our algorithm is easier to generalise to other alphabets and computational models, such as non-determinism. More recently papers appeared on learning register automata by Cassel, et al. (2016) and Howar, et al. (2012). Their register automata are as expressive as our deterministic nominal automata. The state space is similar to our orbit-wise representation: It is formed by finitely many locations with registers. Transitions are defined symbolically using propositional logic. We remark that the most recent paper by Cassel, et al. (2016) generalises the algorithm to alphabets with different structures (which correspond to different atom symmetries in our work), but at the cost of changing Angluin’s framework. Instead of membership queries the algorithm requires more sophisticated tree queries. In our approach, using a different symmetry does not affect neither the algorithm nor its correctness proof. Tree queries can be reduced to membership queries by enumerating all n-types for some n (n-types in logic correspond to orbits in the set of n-tuples). Keeping that in mind, their complexity results are roughly the same as ours, although this is hard to verify, as they do not give bounds on the length of individual tree queries. Finally, our approach lends itself better to be extended to other variations on L∗ (of which many exist), as it is closer to Angluin’s original work. Another class of learning algorithms for systems with large alphabets is based on abstraction and refinement, which is orthogonal to the approach in this thesis but connections and possible transference of techniques are worth exploring in the future. Aarts, et al. (2015) reduce the alphabet to a finite alphabet of abstractions, and L∗ for ordinary DFAs over such finite alphabet is used. Abstractions are refined by counterexamples. Other similar approaches are by Howar, et al. (2011) and Isberner, et al. (2013), where global and local per-state abstractions of the alphabet are used, and by Mens (2017), where the alphabet can also have additional structure (e.g., an ordering relation). We also mention that Botincan and Babic (2013) give a framework for learning symbolic models of software behaviour. Berg, et al. (2006 and 2008) cope with an infinite alphabet by running L∗ (adapted to Mealy machines) using a finite approximation of the alphabet, which may be augmented when equivalence queries are answered. A smaller symbolic model is derived subsequently. Their approach, unlike ours, does not exploit the symmetry Learning Nominal Automata 107 over the full alphabet. The symmetry allows our algorithm to reduce queries and to produce the smallest possible automaton at every step. Finally we compare with results on session automata (Bollig, et al., 2013). Session automata are defined over finite alphabets just like the work by Sakamoto. However, session automata are more restrictive than deterministic nominal automata. For example, the model cannot capture an acceptor for the language of words where consecutive data values are distinct. This language can be accepted by a three orbit nominal DFA, which can be learned by our algorithm. We implemented our algorithms in the nominal library NLambda as sketched before. Other implementation options include Fresh OCaml (Shinwell, 2006), a functional programming language designed for programming over nominal data structures with binding, and Lois by Kopczyński and Toruńczyk (2016 and 2017), a C++ library for imperative nominal programming. We chose NLambda for its convenient set-theoretic primitives, but the other options remain to be explored, in particular the low-level Lois could be expected to provide more efficient implementations. 7 Discussion and Future Work In this chapter we defined and implemented extensions of several versions of L∗ and of NL∗ for nominal automata. We highlight two features of our approach: – It has strong theoretical foundations: The theory of nominal languages, covering different alphabets and symmetries (see Section 2.1); category theory, where nominal automata have been characterised as coalgebras (Ciancia & Montanari, 2010 and Kozen, et al., 2015) and many properties and algorithms (e.g., minimisation) have been studied at this abstract level. – It follows a generic pattern for transporting computation models and algorithms from finite sets to nominal sets, which leads to simple correctness proofs. These features pave the way to several extensions and improvements. Future work includes a general version of νNL∗ , parametric in the notion of sideeffect (an example is non-determinism). Different notions will yield models with different degree of succinctness w.r.t. deterministic automata. The key observation here is that many forms of non-determinism and other side effects can be captured via the categorical notion of monad, i.e., an algebraic structure, on the state-space. Monads allow generalising the notion of composed and prime state: A state is composed whenever it is obtained from other states via an algebraic operation. Our algorithm νNL∗ is based on the powerset monad, representing classical non-determinism. We are currently investigating a substitution monad, where the operation is “applying a (possibly non-injective) substitution of atoms in the support”. A minimal automaton over this monad, akin to a RFSA, will have states that can generate all the states of the associated minimal DFA via a substitution, but cannot be generated by other states 108 Chapter 5 (they are prime). For instance, we can give an automaton over the substitution monad that recognises ℒ2 from Section 1: ≠x ≠y x, [y ↦ x] q0 x qx y qxy x qy y q1 A q2 A Here [y ↦ x] means that, if that transition is taken, qxy (hence its language) is subject to y ↦ x. In general, the size of the minimal DFA for ℒn grows more than exponentially with n, but an automaton with substitutions on transitions, like the one above, only needs 𝒪(n) states. This direction is investigated in Chapter 7. In principle, thanks to the generic approach we have taken, all our algorithms should work for various kinds of atoms with more structure than just equality, as advocated by Bojańczyk, et al. (2014). Details, such as precise assumptions on the underlying structure of atoms necessary for proofs to go through, remain to be checked. In the next chapter (Chapter 6), we investigate learning with the total order symmetry. We implement this in NLambda, as well as a new tool for computing with nominal sets over the total order symmetry. The efficiency of our current implementation, as measured in Section 5.3, leaves much to be desired. There is plenty of potential for running time optimisation, ranging from improvements in the learning algorithms itself, to optimisations in the NLambda library (such as replacing the external and general-purpose SMT solver with a purposebuilt, internal one, or a tighter integration of nominal mechanisms with the underlying Haskell language as it was done by Shinwell, 2006), to giving up the functional programming paradigm for an imperative language such as LOIS (Kopczyński & Toruńczyk, 2016 and 2017). Acknowledgements We thank Frits Vaandrager and Gerco van Heerdt for useful comments and discussions. We also thank the anonymous reviewers. Chapter 6 Fast Computations on Ordered Nominal Sets David Venhoek Radboud University Joshua Moerman Radboud University Jurriaan Rot Radboud University Abstract We show how to compute efficiently with nominal sets over the total order symmetry by developing a direct representation of such nominal sets and basic constructions thereon. In contrast to previous approaches, we work directly at the level of orbits, which allows for an accurate complexity analysis. The approach is implemented as the library Ons (Ordered Nominal Sets). Our main motivation is nominal automata, which are models for recognising languages over infinite alphabets. We evaluate Ons in two applications: minimisation of automata and active automata learning. In both cases, Ons is competitive compared to existing implementations and outperforms them for certain classes of inputs. This chapter is based on the following publication: Venhoek, D., Moerman, J., & Rot, J. (2018). Fast Computations on Ordered Nominal Sets. In Theoretical Aspects of Computing - ICTAC - 15th International Colloquium, Proceedings. Springer. doi:10.1007/978-3-030-02508-3_26 110 Chapter 6 Automata over infinite alphabets are natural models for programs with unbounded data domains. Such automata, often formalised as register automata, are applied in modelling and analysis of communication protocols, hardware, and software systems (see Bojańczyk, et al., 2014; D’Antoni & Veanes, 2017; Grigore & Tzevelekos, 2016; Kaminski & Francez, 1994; Montanari & Pistore, 1997; Segoufin, 2006 and references therein). Typical infinite alphabets include sequence numbers, timestamps, and identifiers. This means one can model data flow in such automata beside the basic control flow provided by ordinary automata. Recently, it has been shown in a series of papers that such models are amenable to learning (Aarts, et al., 2015; Bollig, et al., 2013; Cassel, et al., 2016; Drews & D’Antoni, 2017; Moerman, et al., 2017; Vaandrager, 2017) with the verification of (closed source) TCP implementations by Fiterău-Broștean, et al. (2016) as a prominent example. A foundational approach to infinite alphabets is provided by the notion of nominal set, originally introduced in computer science as an elegant formalism for name binding (Gabbay & Pitts, 2002 and Pitts, 2016). Nominal sets have been used in a variety of applications in semantics, computation, and concurrency theory (see Pitts, 2013 for an overview). Bojańczyk, et al. (2014) introduce nominal automata, which allow one to model languages over infinite alphabets with different symmetries. Their results are parametric in the structure of the data values. Important examples of data domains are ordered data values (e.g., timestamps) and data values that can only be compared for equality (e.g., identifiers). In both data domains, nominal automata and register automata are equally expressive. Important for applications of nominal sets and automata are implementations. A couple of tools exist to compute with nominal sets. Notably, Nλ (Klin & Szynwelski, 2016) and Lois (Kopczyński & Toruńczyk, 2016 and 2017) provide a general purpose programming language to manipulate infinite sets.19 Both tools are based on SMT solvers and use logical formulas to represent the infinite sets. These implementations are very flexible, and the SMT solver does most of the heavy lifting, which makes the implementations themselves relatively straightforward. Unfortunately, this comes at a cost as SMT solving is in general Pspace-hard. Since the formulas used to describe sets tend to grow as more calculations are done, running times can become unpredictable. In this chapter, we use a direct representation based on symmetries and orbits, to represent nominal sets. We focus on the total order symmetry, where data values are rational numbers and can be compared for their order. Nominal automata over the total order symmetry are more expressive than automata over the equality symmetry (i.e., traditional register automata of Kaminski & Francez, 1994). A key insight is that the representation of nominal sets from Bojańczyk, et al. (2014) becomes rather simple in the total order symmetry; each orbit is presented solely by a natural number, intuitively representing the number of variables or registers. 19 Other implementations of nominal techniques that are less directly related to our setting (Mihda, Fresh OCaml, and Nominal Isabelle) are discussed in Section 5. Fast Computations on Ordered Nominal Sets 111 Our main contributions include the following. – We develop the representation theory of nominal sets over the total order symmetry. We give concrete representations of nominal sets, their products, and equivariant maps. – We provide time complexity bounds for operations on nominal sets such as intersections and membership. Using those results we give the time complexity of Moore’s minimisation algorithm (generalised to nominal automata) and prove that it is polynomial in the number of orbits. – Using the representation theory, we are able to implement nominal sets in a C++ library Ons. The library includes all the results from the representation theory (sets, products, and maps). – We evaluate the performance of Ons and compare it to Nλ and Lois, using two algorithms on nominal automata: minimisation (Bojańczyk & Lasota, 2012) and automata learning (Moerman, et al., 2017). We use randomly generated automata as well as concrete, logically structured models such as FIFO queues. For random automata, our methods are drastically faster than the other tools. On the other hand, Lois and Nλ are faster in minimising the structured automata as they exploit their logical structure. In automata learning, the logical structure is not available a-priori, and Ons is faster in most cases. The structure of this chapter is as follows. Section 1 contains background on nominal sets and their representation. Section 2 describes the concrete representation of nominal sets, equivariant maps and products in the total order symmetry. Section 3 describes the implementation Ons with complexity results, and Section 4 the evaluation of Ons on algorithms for nominal automata. Related work is discussed in Section 5, and future work in Section 6. 1 Nominal sets Nominal sets are infinite sets that carry certain symmetries, allowing a finite representation in many interesting cases. We recall their formalisation in terms of group actions, following Bojańczyk, et al. (2014) and Pitts (2013), to which we refer for an extensive introduction. 1.1 Group actions Let G be a group and X be a set. A (left) G-action is a function ⋅ : G × X → X satisfying 1 ⋅ x = x and (ℎg) ⋅ x = ℎ ⋅ (g ⋅ x) for all x ∈ X and g, ℎ ∈ G. A set X with a G-action is called a G-set and we often write gx instead of g ⋅ x. The orbit of an element x ∈ X is the set {gx | g ∈ G}. A G-set is always a disjoint union of its orbits (in other words, the orbits partition the set). We say that X is orbit-finite if it has finitely many orbits, and we denote the number of orbits by N(X). 112 Chapter 6 A map f : X → Y between G-sets is called equivariant if it preserves the group action, i.e., for all x ∈ X and g ∈ G we have g ⋅ f(x) = f(g ⋅ x). If an equivariant map f is bijective, then f is an isomorphism and we write X ≅ Y. A subset Y ⊆ X is equivariant if the corresponding inclusion map is equivariant. The product of two G-sets X and Y is given by the Cartesian product X × Y with the point-wise group action on it, i.e., g(x, y) = (gx, gy). Union and intersection of X and Y are well-defined if the two actions agree on their common elements. 1.2 Nominal sets A data symmetry is a pair (𝒟, G) where 𝒟 is a set and G is a subgroup of Sym(𝒟), the group of bijections on 𝒟. Note that the group G naturally acts on 𝒟 by defining gx = g(x). In the most studied instance, called the equality symmetry, 𝒟 is a countably infinite set and G = Sym(𝒟). In this chapter, we focus on the total order symmetry given by 𝒟 = ℚ and G = {π | π ∈ Sym(ℚ), π is monotone}. Let (𝒟, G) be a data symmetry and X be a G-set. A set of data values S ⊆ 𝒟 is called a support of an element x ∈ X if for all g ∈ G with ∀s ∈ S : gs = s we have gx = x. A G-set X is called nominal if every element x ∈ X has a finite support. Example 1. We list several examples for the total order symmetry. The set ℚ2 is nominal as each element (q1 , q2 ) ∈ ℚ2 has the finite set {q1 , q2 } as its support. The set has the following three orbits: {(q1 , q2 ) | q1 < q2 } {(q1 , q2 ) | q1 = q2 } {(q1 , q2 ) | q1 > q2 }. For a set X, the set of all subsets of size n ∈ ℕ is denoted by 𝒫n (X) = {Y ⊆ X | #Y = n}. The set 𝒫n (ℚ) is a single-orbit nominal set for each n, with the action defined by direct image: gY = {gy | y ∈ Y}. The group of monotone bijections also acts by direct image on the full power set 𝒫(ℚ), but this is not a nominal set. For instance, the set ℤ ∈ 𝒫(ℚ) of integers has no finite support. If S ⊆ 𝒟 is a support of an element x ∈ X, then any set S′ ⊆ 𝒟 such that S ⊆ S′ is also a support of x. A set S ⊂ 𝒟 is a least finite support of x ∈ X if it is a finite support of x and S ⊆ S′ for any finite support S′ of x. The existence of least finite supports is crucial for representing orbits. Unfortunately, even when elements have a finite support, in general they do not always have a least finite support. A data symmetry (𝒟, G) is said to admit least supports if every element of every nominal set has a least finite support. Both the equality and the total order symmetry admit least supports. (Bojańczyk, et al., 2014 give additional (counter)examples of data symmetries admitting least supports.) Having least finite supports is useful for a finite representation. Henceforth, we will write least support to mean least finite support. Given a nominal set X, the size of the least support of an element x ∈ X is denoted by dim(x), the dimension of x. We note that all elements in the orbit of x have the same Fast Computations on Ordered Nominal Sets 113 dimension. For an orbit-finite nominal set X, we define dim(X) = max{dim(x) | x ∈ X}. For a single-orbit set O, observe that dim(O) = dim(x) where x is any element x ∈ O. 1.3 Representing nominal orbits We represent nominal sets as collections of single orbits. The finite representation of single orbits is based on the theory of Bojańczyk, et al. (2014), which uses the technical notions of restriction and extension. We only briefly report their definitions here. However, the reader can safely move to the concrete representation theory in Section 2 with only a superficial understanding of Theorem 2 below. The restriction of an element π ∈ G to a subset C ⊆ 𝒟, written as π|C , is the restriction of the function π : 𝒟 → 𝒟 to the domain C. The restriction of a group G to a subset C ⊆ 𝒟 is defined as G|C = {π|C | π ∈ G, πC = C}. The extension of a subgroup S ≤ G|C is defined as extG (S) = {π ∈ G | π|C ∈ S}. For C ⊆ 𝒟 and S ≤ G|C , define [C, S]ec = {{gs | s ∈ extG (S)} | g ∈ G}, i.e., the set of right cosets of extG (S) in G. Then [C, S]ec is a single-orbit nominal set. Using the above, we can formulate the representation theory from Bojańczyk, et al. (2014). This gives a finite description for all single-orbit nominal sets X, namely a finite set C together with some of its symmetries. Theorem 2. Let X be a single-orbit nominal set for a data symmetry (𝒟, G) that admits least supports and let C ⊆ 𝒟 be the least support of some element x ∈ X. Then there exists a subgroup S ≤ G|C such that X ≅ [C, S]ec . The proof by Bojańczyk, et al. (2014) uses a bit of category theory: it establishes an equivalence of categories between single-orbit sets and the pairs (C, S). We will not use the language of category theory much in order to keep the chapter self-contained. 2 Representation in the total order symmetry This section develops a concrete representation of nominal sets over the total order symmetry, as well as their equivariant maps and products. It is based on the abstract representation theory from Section 1.3. From now on, by nominal set we always refer to a nominal set over the total order symmetry. Hence, our data domain is ℚ and we take G to be the group of monotone bijections. 2.1 Orbits and nominal sets From the representation in Section 1.3, we find that any single-orbit set X can be represented as a tuple (C, S). Our first observation is that the finite group S of ‘local 114 Chapter 6 symmetries’ in this representation is always trivial, i.e., S = I, where I = {1} is the trivial group. This follows from the following lemma and S ≤ G|C . Lemma 3. For every finite subset C ⊂ ℚ, we have G|C = I. Immediately, we see that (C, S) = (C, I), and hence that the orbit is fully represented by the set C. A further consequence of Lemma 3 is that each element of an orbit can be uniquely identified by its least support. This leads us to the following characterisation of [C, I]ec . Lemma 4. Given a finite subset C ⊂ ℚ, we have [C, I]ec ≅ 𝒫#C (ℚ). By Theorem 2 and the above lemmas, we can represent an orbit by a single integer n, the size of the least support of its elements. This naturally extends to (orbit-finite) nominal sets with multiple orbits by using a multiset of natural numbers, representing the size of the least support of each of the orbits. These multisets are formalised here as functions f : ℕ → ℕ. Definition 5. Given a function f : ℕ → ℕ, we define a nominal set [f]o by [f]o = ∪ {i} × 𝒫n (ℚ). n∈ℕ 1≤i≤f(n) Proposition 6. For every orbit-finite nominal set X, there is a function f : ℕ → ℕ such that X ≅ [f]o and the set {n | f(n) ≠ 0} is finite. Furthermore, the mapping between X and f is one-to-one (up to isomorphism of nominal sets) when restricting to f : ℕ → ℕ for which the set {n | f(n) ≠ 0} is finite. The presentation in terms of a function f : ℕ → ℕ enforces that there are only finitely many orbits of any given dimension. The first part of the above proposition generalises to arbitrary nominal sets by replacing the codomain of f by the class of all sets and adapting Definition 5 accordingly. However, the resulting correspondence will no longer be one-to-one. As a brief example, let us consider the set ℚ × ℚ. The elements (a, b) split in three orbits, one for a < b, one for a = b and one for a > b. These have dimension 2, 1 and 2 respectively, so the set ℚ × ℚ is represented by the multiset {1, 2, 2}. 2.2 Equivariant maps We show how to represent equivariant maps, using two basic properties. Let f : X → Y be an equivariant map. The first property is that the direct image of an orbit (in X) is again an orbit (in Y), that is to say, f is defined ‘orbit-wise’. Second, equivariant maps cannot introduce new elements in the support (but they can drop them). More precisely: Fast Computations on Ordered Nominal Sets 115 Lemma 7. Let f : X → Y be an equivariant map, and O ⊆ X a single orbit. The direct image f(O) = {f(x) | x ∈ O} is a single-orbit nominal set. Lemma 8. Let f : X → Y be an equivariant map between two nominal sets X and Y. Let x ∈ X and let C be a support of x. Then C supports f(x). Hence, equivariant maps are fully determined by associating two pieces of information for each orbit in the domain: the orbit on which it is mapped and a string denoting which elements of the least support of the input are preserved. These ingredients are formalised in the first part of the following definition. The second part describes how these ingredients define an equivariant function. Proposition 10 then states that every equivariant function can be described in this way. Definition 9. Let H = {(I1 , F1 , O1 ), …, (In , Fn , On )} be a finite set of tuples where the Ii ’s are disjoint single-orbit nominal sets, the Oi ’s are single-orbit nominal sets with dim(Oi ) ≤ dim(Ii ), and the Fi ’s are bit strings of length dim(Ii ) with exactly dim(Oi ) ones. Given a set H as above, we define fH : ⋃ Ii → ⋃ Oi as the unique equivariant function such that, given x ∈ Ii with least support C, fH (x) is the unique element of Oi with support {C(j) | Fi (j) = 1}, where Fi (j) is the j-th bit of Fi and C(j) is the j-th smallest element of C. Proposition 10. For every equivariant map f : X → Y between orbit-finite nominal sets X and Y there is a set H as in Definition 9 such that f = fH . Consider the example function min : 𝒫3 (ℚ) → ℚ which returns the smallest element of a 3-element set. Note that both 𝒫3 (ℚ) and ℚ are single orbits. Since for the orbit 𝒫3 (ℚ) we only keep the smallest element of the support, we can thus represent the function min with H = {(𝒫3 (ℚ), 100, ℚ)}. 2.3 Products The product X × Y of two nominal sets is again a nominal set and hence, it can be represented itself in terms of the dimension of each of its orbits as shown in Section 2.1. However, this approach has some disadvantages. Example 11. We start by showing that the orbit structure of products can be nontrivial. Consider the product of X = ℚ and the set Y = {(a, b) ∈ ℚ2 | a < b}. This product consists of five orbits, more than one might naively expect from the fact that both sets are single-orbit: {(a, (b, c)) | a, b, c ∈ ℚ, a < b < c}, {(a, (a, b)) | a, b ∈ ℚ, a < b}, {(b, (a, c)) | a, b, c ∈ ℚ, a < b < c}, {(b, (a, b)) | a, b ∈ ℚ, a < b}, {(c, (a, b)) | a, b, c ∈ ℚ, a < b < c}. 116 Chapter 6 We find that this product is represented by the multiset {2, 2, 3, 3, 3}. Unfortunately, this is not sufficient to accurately describe the product as it abstracts away from the relation between its elements with those in X and Y. In particular, it is not possible to reconstruct the projection maps from such a representation. The essence of our representation of products is that each orbit O in the product X × Y is described entirely by the dimension of O together with the two (equivariant) projections π1 : O → X and π2 : O → Y. This combination of the orbit and the two projection maps can already be represented using Propositions 6 and 10. However, as we will see, a combined representation for this has several advantages. For discussing such a representation, let us first introduce what it means for tuples of a set and two functions to be isomorphic: Definition 12. Given nominal sets X, Y, Z1 and Z2 , and equivariant functions l1 : Z1 → X, r1 : Z1 → Y, l2 : Z2 → X and r2 : Z2 → Y, we define (Z1 , l1 , r1 ) ≅ (Z2 , l2 , r2 ) if there exists an isomorphism ℎ : Z1 → Z2 such that l1 = l2 ∘ ℎ and r1 = r2 ∘ ℎ. Our goal is to have a representation that, for each orbit O, produces a tuple (A, f1 , f2 ) isomorphic to the tuple (O, π1 , π2 ). The next lemma gives a characterisation that can be used to simplify such a representation. Lemma 13. Let X and Y be nominal sets and (x, y) ∈ X × Y. If C, Cx , and Cy are the least supports of (x, y), x, and y respectively, then C = Cx ∪ Cy . With Proposition 10 we represent the maps π1 and π2 by tuples (O, F1 , O1 ) and (O, F2 , O2 ) respectively. Using Lemma 13 and the definitions of F1 and F2 , we see that at least one of F1 (i) and F2 (i) equals 1 for each i. We can thus combine the strings F1 and F2 into a single string P ∈ {L, R, B}∗ as follows. We set P(i) = L when only F1 (i) is 1, P(i) = R when only F2 (i) is 1, and P(i) = B when both are 1. The string P fully describes the strings F1 and F2 . This process for constructing the string P gives it two useful properties. The number of Ls and Bs in the string gives the size dimension of O1 . Similarly, the number of Rs and Bs in the string gives the dimension of O2 . We will call strings with that property valid. In conclusion, to describe a single orbit of the product X × Y, a valid string P together with the images of π1 and π2 is sufficient. Definition 14. Let P ∈ {L, R, B}∗ , and O1 ⊆ X, O2 ⊆ Y be single-orbit sets. Given a tuple (P, O1 , O2 ), where the string P is valid, define [(P, O1 , O2 )]t = (𝒫|P| (ℚ), fH1 , fH2 ), where Hi = {(𝒫|P| (ℚ), Fi , Oi )} and the string F1 is defined as the string P with Ls and Bs replaced by 1s and Rs by 0s. The string F2 is similarly defined with the roles of L and R swapped. Fast Computations on Ordered Nominal Sets 117 Proposition 15. There exists a one-to-one correspondence between the orbits O ⊆ X × Y, and tuples (P, O1 , O2 ) satisfying O1 ⊆ X, O2 ⊆ Y, and where P is a valid string, such that [(P, O1 , O2 )]t ≅ (O, π1 |O , π2 |O ). From the above proposition it follows that we can generate the product X × Y simply by enumerating all valid strings P for all pairs of orbits (O1 , O2 ) of X and Y. Given this, we can calculate the multiset representation of a product from the multiset representations of both factors. Theorem 16. For X ≅ [f]o and Y ≅ [g]o we have X × Y ≅ [ℎ]o , where ℎ(n) = n j f(i)g(j) . ( j )(n − i) 0≤i,j≤n ∑ i+j≥n Example 17. To illustrate some aspects of the above representation, let us use it to calculate the product of Example 11. First, we observe that both ℚ and S = {(a, b) ∈ ℚ2 | a < b} consist of a single orbit. Hence any orbit of the product corresponds to a triple (P, ℚ, S), where the string P satisfies |P|L + |P|B = dim(ℚ) = 1 and |P|R + |P|B = dim(S) = 2. We can now find the orbits of the product ℚ × S by enumerating all strings satisfying these equations. This yields – LRR, corresponding to the orbit {(a, (b, c)) | a, b, c ∈ ℚ, a < b < c}, – RLR, corresponding to the orbit {(b, (a, c)) | a, b, c ∈ ℚ, a < b < c}, – RRL, corresponding to the orbit {(c, (a, b)) | a, b, c ∈ ℚ, a < b < c}, – RB, corresponding to the orbit {(b, (a, b)) | a, b ∈ ℚ, a < b}, and – BR, corresponding to the orbit {(a, (a, b)) | a, b ∈ ℚ, a < b}. Each product string fully describes the corresponding orbit. To illustrate, consider the string BR. The corresponding bit strings for the projection functions are F1 = 10 and F2 = 11. From the lengths of the string we conclude that the dimension of the orbit is 2. The string F1 further tells us that the left element of the tuple consists only of the smallest element of the support. The string F2 indicates that the right element of the tuple is constructed from both elements of the support. Combining this, we find that the orbit is {(a, (a, b)) | a, b ∈ ℚ, a < b}. 2.4 Summary We summarise our concrete representation in the following table. Propositions 6, 10 and 15 correspond to the three rows in the table. Notice that in the case of maps and products, the orbits are inductively represented using the concrete representation. As a base case we can represent single orbits by their dimension. 118 Chapter 6 Object Representation Single orbit O Nominal set X = ⋃i Oi Natural number n = dim(O) Multiset of these numbers Map from single orbit f : O → Y The orbit f(O) and a bit string F Equivariant map f : X → Y Set of tuples (O, F, f(O)), one for each orbit Orbit in a product O ⊆ X × Y Product X × Y The corresponding orbits of X and Y, and a string P relating their supports Set of tuples (P, OX , OY ), one for each orbit Table 6.1 Overview of representation. 3 Implementation and Complexity of ONS The ideas outlined above have been implemented in a C++ library, Ons, and a Haskell library, Ons-hs.20 We focus here on the C++ library only, as the Haskell one is very similar. The library can represent orbit-finite nominal sets and their products, (disjoint) unions, and maps. A full description of the possibilities is given in the documentation included with Ons. As an example, the following program computes the product from Example 11. Initially, the program creates the nominal set A, containing the entirety of ℚ. Then it creates a nominal set B, such that it consists of the orbit containing the element (1, 2) ∈ ℚ × ℚ. For this, the library determines to which orbit of the product ℚ × ℚ the element (1, 2) belongs, and then stores a description of the orbit as described in Section 2. Note that this means that it internally never needs to store the element used to create the orbit. The function nomset_product then uses the enumeration of product strings mentioned in Section 2.3 to calculate the product of A and B. Finally, it prints a representative element for each of the orbits in the product. These elements are constructed based on the description of the orbits stored, filled in to make their support equal to sets of the form {1, 2, …, n}. nomset A = nomset_rationals(); nomset> B({rational(1),rational(2)}); auto AtimesB = nomset_product(A, B); // compute the product for (auto orbit : AtimesB) cout << orbit.getElement() << " "; Running this gives the following output (where /1 signifies the denominator): (1/1,(2/1,3/1)) (1/1,(1/1,2/1)) (2/1,(1/1,3/1)) (2/1,(1/1,2/1)) (3/1,(1/1,2/1)) 20 Ons can be found at https://github.com/davidv1992/ONS and Ons-hs can be found at https://github.com /Jaxan/ons-hs/. Fast Computations on Ordered Nominal Sets 119 Internally, orbit is implemented following the theory presented in Section 2, storing the dimension of the orbit it represents. It also contains sufficient information to reconstruct elements given their least support, such as the product string for orbits resulting from a product. The class nomset then uses a standard set data structure to store the collection of orbits contained in the nominal set it represents. In a similar way, eqimap stores equivariant maps by associating each orbit in the domain with the image orbit and the string representing which of the least support to keep. This is stored using a map data structure. For both nominal sets and equivariant maps, the underlying data structure is currently implemented using trees. 3.1 Complexity of operations Using the concrete representation of nominal sets, we can determine the complexity of common operations. To simplify such an analysis, we will make the following assumptions: – The comparison of two orbits takes O(1). – Constructing an orbit from an element takes O(1). – Checking whether an element is in an orbit takes O(1). These assumptions are justified as each of these operations takes time proportional to the size of the representation of an individual orbit, which in practice is small and approximately constant. For instance, the orbit 𝒫n (ℚ) is represented by just the integer n and its type. Theorem 18. If nominal sets are implemented with a tree-based set structure (as in Ons), the complexity of the following set operations is as follows. Recall that N(X) denotes the number of orbits of X. We use p and f to denote functions implemented in whatever way the user wants, which we assume to take O(1) time. The software assumes these are equivariant, but this is not verified. Operation Test x ∈ X Test X ⊆ Y Calculate X ∪ Y Calculate X ∩ Y Calculate {x ∈ X | p(x)} Calculate {f(x) | x ∈ X} Calculate X × Y Complexity O(log N(X)) O(min(N(X) + N(Y), N(X) log N(Y))) O(N(X) + N(Y)) O(N(X) + N(Y)) O(N(X)) O(N(X) log N(X)) O(N(X × Y)) ⊆ O(3dim(X)+dim(Y) N(X)N(Y)) Table 6.2 Time complexity of operations on nominal sets. Proof. Since most parts are proven similarly, we only include proofs for the first and last item. 120 Chapter 6 Membership. To decide x ∈ X, we first construct the orbit containing x, which is done in constant time. Then we use a logarithmic lookup to decide whether this orbit is in our set data structure. Hence, membership checking is O(log(N(X))). Products. Calculating the product of two nominal sets is the most complicated construction. For each pair of orbits in the original sets X and Y, all product strings need to be generated. Each product orbit itself is constructed in constant time. By generating these orbits in-order, the resulting set takes O(N(X × Y)) time to construct. We can also give an explicit upper bound for the number of orbits in terms of the input. Recall that orbits in a product are represented by strings of length at most dim(X) + dim(Y). (If the string is shorter, we pad it with one of the symbols.) Since there are three symbols (L, R and B), the product of X and Y will have at most 3dim(X)+dim(Y) N(X)N(Y) orbits. It follows that taking products has time complexity of O(3dim(X)+dim(Y) N(X)N(Y)). □ 4 Results and evaluation in automata theory In this section we consider applications of nominal sets to automata theory. As mentioned in the introduction, nominal sets are used to formalise languages over infinite alphabets. These languages naturally arise as the semantics of register automata. The definition of register automata is not as simple as that of ordinary finite automata. Consequently, transferring results from automata theory to this setting often requires non-trivial proofs. Nominal automata, instead, are defined as ordinary automata by replacing finite sets with orbit-finite nominal sets. The theory of nominal automata is developed by Bojańczyk, et al. (2014) and it is shown that many algorithms, such as minimisation (based on the Myhill-Nerode equivalence), from automata theory transfer to nominal automata. Not all algorithms work: e.g., the subset construction fails for nominal automata. As an example we consider the following language on rational numbers: ℒint = {a1 b1 ⋯an bn | ai , bi ∈ ℚ, ai < ai+1 < bi+1 < bi for all i}. We call this language the interval language as a word w ∈ ℚ∗ is in the language when it denotes a sequence of nested intervals. This language contains arbitrarily long words. For this language it is crucial to work with an infinite alphabet as for each finite set C ⊂ ℚ, the restriction ℒint ∩ C∗ is just a finite language. Note that the language is equivariant: w ∈ ℒint ⟺ wg ∈ ℒint for any monotone bijection g, because nested intervals are preserved by monotone maps.21 Indeed, ℒint is a nominal set, although it is not orbit-finite. Informally, the language ℒint can be accepted by the automaton depicted in Figure 6.1. Here we allow the automaton to store rational numbers and compare them to 21 The G-action on words is defined point-wise: g(w1 …wn ) = (gw1 )…(gwn ). Fast Computations on Ordered Nominal Sets 121 new symbols. For example, the transition from q2 to q3 is taken if any value c between a and b is read and then the currently stored value a is replaced by c. For any other value read at state q2 the automaton transitions to the sink state q4 . Such a transition structure is made precise by the notion of nominal automata. q0 a q1 (a) b>a a 0 otherwise δ(⊥, x) = ⊥ The automaton is depicted in Figure 7.1 for the case n = 3. The language accepted by this automaton assigns to a word w the first element of the queue after executing the instructions in w from left to right, and ⊥ if the input is ill-behaved, i.e., Pop is applied to an empty queue or Put(a) to a full queue. Definition 30. Let Σ, O be Pm-sets. A separated language is an equivariant map of the form Σ(∗) → O. A separated automaton 𝒜 = (Q, δ, o, q0 ) consists of Q, o and q0 defined as in a nominal automaton, and an equivariant transition function δ : Q ∗ Σ → Q. The separated language semantics of such an automaton is given by the map s : Q ∗ (∗) Σ → O, defined by s(x, ϵ) = o(x), 27 s(x, aw) = s(δ(x, a), w) We use a reactive version of the queue data structure which is slightly different from the versions of Isberner, et al. (2014) and Moerman, et al. (2017). Separation and Renaming in Nominal Sets ϵ o=⊥ Pop ⊥ o=⊥ Pop Put(a) a o=a Pop goes to b Put(b) ab o=a Pop goes to bc Put(c) 145 abc o=a Put(d) Σ Figure 7.1 The FIFO automaton from Example 29 with n = 3. The right-most state consists of five orbits as we can take a, b, c distinct, all the same, or two of them equal in three different ways. Consequently, the complete state space has ten orbits. The output of each state is denoted in the lower part. for all x ∈ Q, a ∈ Σ and w ∈ Σ(∗) such that x # aw and a # w. Let s♭ : Q → (Σ(∗) −−∗ O) be the transpose of s. Then s♭ (q0 ) : Σ(∗) → O corresponds to a separated language, this is called the separated language accepted by 𝒜. By definition of the separated product, the transition function is only defined on a state x and letter a ∈ Σ if x # a. In Example 36 below, we describe the bounded FIFO as a separated automaton, and describe its accepted language. First, we show how the language semantics of separated nominal automata extends to a language over all words, provided that both the input alphabet Σ and the output alphabet O are Sb-sets. Definition 31. Let Σ and O be nominal Sb-sets. An Sb-equivariant function L : Σ∗ → O is called an Sb-language. Notice the difference between an Sb-language L : Σ∗ → O and a Pm-language L′ : (UΣ)∗ → U(O). They are both functions from Σ∗ to O, but the latter is only Pm-equivariant, while the former satisfies the stronger property of Sb-equivariance. Languages over separated words, and Sb-languages, are connected as follows. Theorem 32. Suppose Σ, O are both nominal Sb-sets, and suppose dim(Σ) ≤ 1. There is a one-to-one correspondence S : (UΣ)(∗) → UO Pm-equivariant S : Σ∗ → O Sb-equivariant between separated languages and Sb-nominal languages. From S to S, this is given by application of the forgetful functor and restricting to the subset of separated words. 146 Chapter 7 For the converse direction, given w = a1 …an ∈ Σ∗ , let b1 , …, bn ∈ Σ such that w # bi for all i, and bi # bj for all i, j with i ≠ j. Define m ∈ Sb by m(a) = ai {a if a = bi for some i otherwise Then S(a1 a2 a3 ⋯an ) = m ⋅ S(b1 b2 b3 ⋯bn ). Proof. There is the following chain of one-to-one correspondences, from the results of the previous section: (UΣ)(∗) → UO F(UΣ)(∗) → O (FUΣ)∗ → O Σ∗ → O by Theorem 16 by Corollary 18 by Lemma 19 □ Thus, every separated automaton over U(Σ), U(O) gives rise to an Sb-language S, corresponding to the language S accepted by the automaton. Any nominal automaton 𝒜 restricts to a separated automaton, formally described in Definition 33. It turns out that if the (Pm)-language accepted by 𝒜 is actually an Sb-language, then the restricted automaton already represents this language, as the extension S of the associated separated language S (Theorem 34). Hence, in such a case, the restricted separated automaton suffices to describe the language of 𝒜. Definition 33. Let i : Q ∗ U(Σ) → Q × U(Σ) be the natural inclusion map. A nominal automaton 𝒜 = (Q, δ, o, q0 ) induces a separated automaton 𝒜∗ , by setting 𝒜∗ = (Q, δ ∘ i, o, q0 ). Theorem 34. Suppose Σ, O are both Sb-sets, and suppose dim(Σ) ≤ 1. Let L : (UΣ)∗ → UO be the Pm-nominal language accepted by a nominal automaton 𝒜, and suppose L is Sb-equivariant. Let S be the separated language accepted by 𝒜∗ . Then L = U(S). Proof. If follows from the one-to-one correspondence in Theorem 32: on the bottom there are two languages (L and U(S)), while there is only the restriction of L on the top. We conclude that L = U(S). □ As we will see in Example 36, separated automata allow us to represent Sb-languages in a much smaller way than nominal automata. Given a nominal automaton 𝒜, a smaller separated automaton can be obtained by computing the reachable part of the restriction 𝒜∗ . The reachable part is defined similarly (but only where δ is defined) and denoted by R(𝒜∗ ) as well. Separation and Renaming in Nominal Sets Proposition 35. 147 For any nominal automaton 𝒜, we have R(𝒜∗ ) ⊆ R(𝒜). The converse inclusion of the above proposition does certainly not hold, as shown by the following example. Example 36. Let 𝒜 be the automaton modelling a bounded FIFO queue (for some n), from Example 29. The Pm-nominal language L accepted by 𝒜 is Sb-equivariant: it is closed under application of arbitrary substitutions. The separated automaton 𝒜∗ is given simply by restricting the transition function to Q ∗ Σ, i.e., a Put(a)-transition from a state w ∈ Q exists only if a does not occur in w. The separated language S accepted by this new automaton is the restriction of the nominal language of 𝒜 to separated words. By Theorem 34, we have L = U(S). Hence, the separated automaton 𝒜∗ represents L, essentially by closing the associated separated language S under all substitutions. The reachable part of 𝒜∗ is given by R𝒜∗ = 𝔸(≤n) ∪ {⊥}. Clearly, restricting 𝒜∗ to the reachable part does not affect the accepted language. However, while the orginal state space Q has exponentially many orbits in n, R𝒜∗ has only n + 1 orbits! Thus, taking the reachable part of R𝒜∗ yields a separated automaton which represents the FIFO language L in a much smaller way than the original automaton. 3.1 Separated automata: coalgebraic perspective Nominal automata and separated automata can be presented as coalgebras on the category of Pm-nominal sets. In this section we revisit the above results from this perspective, and generalise from (equivariant) languages to finitely supported languages. In particular, we retrieve the extension from separated languages to Sb-languages, by establishing Sb-languages as a final separated automaton. The latter result follows by instantiating a well-known technique for lifting adjunctions to categories of coalgebras, using the results of Section 2. In the remainder of this section we assume familiarity with the theory of coalgebras, see, e.g., Jacobs (2016) and Rutten (2000). Definition 37. Let M be a submonoid of Sb, and let Σ, O be nominal M-sets, referred to as the input and output alphabet respectively. We define the functor BM : M-Nom → M-Nom by BM (X) = O×(Σ →M fs X). An (M)-nominal (Moore) automaton is a BM -coalgebra. A BM -coalgebra can be presented as a nominal set Q together with the pairing ⟨o, δ♭ ⟩ : Q → O × (Σ →M fs Q) of an equivariant output function o : Q → O, and (the transpose of) an equivariant transition function δ : Q × Σ → Q. In case M = Pm, this coincides with the automata 148 Chapter 7 of Definition 28, omitting initial states. The language semantics is generalised accordingly, as follows. Given such a BM -coalgebra (Q, ⟨o, δ♭ ⟩), the language semantics l : Q × Σ∗ → O is given by l(x, ε) = o(x) , l(x, aw) = l(δ(x, a), w) for all x ∈ S, a ∈ Σ and w ∈ Σ∗ . Theorem 38. Let M be a submonoid of Sb, let Σ, O be nominal M-sets. The nom∗ M inal M-set Σ∗ →M fs O extends to a final BM -coalgebra (Σ →fs O, ζ), such that the unique homomorphism from a given BM -coalgebra is the transpose l♭ of the language semantics. A separated automaton (Definition 30, without initial states) corresponds to a coalgebra for the functor B∗ : Pm-Nom → Pm-Nom given by B∗ (X) = O×(Σ −−∗ X). The separated language semantics arises by finality. Theorem 39. The set Σ(∗) −−∗ O is the carrier of a final B∗ -coalgebra, such that the unique coalgebra homomorphism from a given B∗ -coalgebra (Q, ⟨o, δ⟩) is the transpose s♭ of the separated language semantics s : Q ∗ Σ(∗) → O (Definition 30). Next, we provide an alternative final B∗ -coalgebra which assigns Sb-nominal languages to states of separated nominal automata. The essence is to obtain a final B∗ -coalgebra from the final BSb -coalgebra. In order to prove this, we use a technique to lift adjunctions to categories of coalgebras. This technique occurs regularly in the coalgebraic study of automata (Jacobs, et al., 2015; Kerstan, et al., 2014; Klin & Rot, 2016). Theorem 40. Let Σ be a Pm-set, and O an Sb-set. Define B∗ and BSb accordingly, as B∗ (X) = UO × (Σ −−∗ X) and BSb (X) = O × (FΣ →Sb fs X). There is an adjunction F ⊣ U in: F CoAlg(B∗ ) ⊥ CoAlg(BSb ) U where F and U coincide with F and U respectively on carriers. Proof. There is a natural isomorphism λ : B∗ U → UBSb given by id × ϕ ≅ λ : UO × (Σ −−∗ UX) −−→ UO × U(FΣ →Sb → U(O × (FΣ →Sb fs X) − fs X)), where ϕ is the isomorphism from Theorem 24 and the isomorphism on the right comes from U being a right adjoint. The result now follows from Theorem 2.14 of Hermida and Jacobs (1998). In particular, U(X, γ) = (UX, λ−1 ∘ U(γ)). □ Separation and Renaming in Nominal Sets 149 Since right adjoints preserve limits, and final objects in particular, we obtain the following. This gives an Sb-semantics of separated automata through finality. Corollary 41. Let ((FΣ)∗ →Sb fs O, ζ) be the final BSb -coalgebra (Theorem 38). The ∗ Sb B∗ -coalgebra U(Σ →fs O, ζ) is final and carried by the set (FΣ)∗ →Sb fs O of Sb-nominal languages. 4 Related and future work Fiore and Turi (2001) described a similar adjunction between certain presheaf categories. However, Staton (2007) describes in his thesis that the usage of presheaves allows for many degenerate models and one should look at sheaves instead. The category of sheaves is equivalent to the category of nominal sets. Staton transfers the adjunction of Fiore and Turi to the sheaf categories. We conjecture that the adjunction presented in this paper is equivalent, but defined in more elementary means. The monoidal property of F, which is crucial for our application in automata, has not been discussed before. An interesting line of research is the generalisation to other symmetries by Bojańczyk, et al. (2014). In particular, the total order symmetry is relevant, since it allows one to compare elements on their order, as often used in data words. In this case the symmetries are given by the group of all monotone bijections. Many results of nominal sets generalise to this symmetry. For monotone substitutions, however, the situation seems more subtle. For example, we note that a substitution which maps two values to the same value actually maps all the values in between to that value. Whether the adjunction from Theorem 16 generalises to other symmetries is left as future work. This research was motivated by learning nominal automata. If we know a nominal automaton recognises an Sb-language, then we are better off learning a separated automaton directly. From the Sb-semantics of separated automata, it follows that we have a Myhill-Nerode theorem, which means that learning is feasible. We expect that this can be useful, since we can achieve an exponential reduction this way. Bojańczyk, et al. (2014) prove that nominal automata are equivalent to register automata in terms of expressiveness. However, when translating from register automata with n states to nominal automata, we may get exponentially many orbits. This happens for instance in the FIFO automaton (Example 29). We have shown that the exponential blow-up is avoidable by using separated automata, for this example and in general for Sb-equivariant languages. An open problem is whether the latter requirement can be relaxed, by adding separated transitions only locally in a nominal automaton. A possible step in this direction is to consider the monad T = UF on Pm-Nom and incorporate it in the automaton model. We believe that this is the hypothesised “substitution monad” from Chapter 5. The monad is monoidal (sending separated products to Cartesian products) and if X is an orbit-finite nominal set, then so is T (X). 150 Chapter 7 This means that we can consider nominal T-automata and we can perhaps determinise them using coalgebraic methods (Silva, et al., 2013). Acknowledgements We would like to thank Gerco van Heerdt for his useful comments. Curriculum Vitae Joshua Moerman was born in 1991 in Utrecht, the Netherlands. After graduating gymnasium at the Christiaan Huygens College in Eindhoven, 2009, he followed a double bachelor programme in mathematics and computer science at the Radboud University in Nijmegen. In 2013, he obtained both bachelors summa cum laude and continued with a master in mathematics. He obtained the degree of Master of Science in Mathematics summa cum laude in 2015, with a specialisation in algebra and topology. In February 2015, he started his Ph.D. research under supervision of Frits Vaandrager, Sebastiaan Terwijn, and Alexandra Silva. This was a joint project between the computer science institute (iCIS) and the mathematics departement (part of IMAPP) of the Radboud University. During the four years of his Ph.D. research, he spent a total of six months at the University College London, UK. As of April 2019, Joshua works as a postdoctoral researcher in the group of JoostPieter Katoen at the RWTH Aachen, Germany. 170