Active automata learning from membership and equivalence queries is a foundational problem with numerous applications. We propose a novel variant of the active automata learning problem: actively learn finite automata using preference queries---i.e., queries about the relative position of two sequences in a total preorder---instead of membership queries. Our solution is REMAP, a novel algorithm which leverages a symbolic observation table along with unification and constraint solving to navigate a space of symbolic hypotheses (each representing a set of automata), and uses satisfiability-solving to construct a concrete automaton (specifically a Moore machine) from a symbolic hypothesis. REMAP is guaranteed to correctly infer the minimal automaton with polynomial query complexity under exact equivalence queries, and achieves PAC--identification (\(\varepsilon\)-approximate, with high probability) of the minimal automaton using sampling-based equivalence queries. Our empirical evaluations of REMAP on the task of learning reward machines for two reinforcement learning domains indicate REMAP scales to large automata and is effective at learning correct automata from consistent teachers, under both exact and sampling-based equivalence queries.
Input alphabet \(\Sigma^I = \{a,b\}\)
Output alphabet \(\Sigma^O = \{0,1\}\)
Set of States \(Q = \{q_\varepsilon, q_b, q_{ba}\}\)
Initial State \(q_0 = q_\varepsilon\)
Regular language (set of sequences) \(\mathcal{R} = a^*b\)
Classification Function \(f(s) = \left\{\begin{matrix} 1 & \text{ for all } s \in \mathcal{R}\\ 0 & \text{ for all } s \not\in \mathcal{R}\end{matrix}\right.\)
The goal of active automata learning is for the learner to exactly learn a representation of a sequence classifier that the teacher has in mind. For example, when learning deterministic finite automata, the objective can be described as having the learner successfully classify which sequences formed from an alphabet belong to certain classes (e.g. what words are part of a regular language). The learner interacts with a teacher by asking queries. One interpretation of this problem is actively learning a multiclass sequence classification model, where the sequences belong to a finite number of classes. We can represent the target classification model as a function \(f:(\Sigma^I)^*\rightarrow\Sigma^O\) which maps sequences of the input alphabet \(\Sigma^I\) to the output alphabet set \(\Sigma^O\). The goal is to use queries and arrive at an automaton \(\langle Q, q_0, \Sigma^I, \Sigma^O, \delta, L \rangle\) which exactly models the function \(f\). In this representation, \(Q\) is the set of states, \(q_0\) is the initial state, \(\Sigma^I\) and \(\Sigma^O\) are the input and output alphabets, respectively, \(\delta : Q \times \Sigma^I \rightarrow Q\) is the deterministic transition function, and \(L : Q\rightarrow \Sigma^O\) is the function which labels each state with an element from the output alphabet.
In the classic formulation of actively learning deterministic finite automata, the \(L^*\) algorithm relies on membership and equivalence queries to learn a sequence classifier with only two classes. Membership queries return the concrete class that a sequence belongs to, e.g. \(\text{memQ}(s) \in \Sigma^O\). In the example above, the membership query of a sequence returns either a \(0\) or a \(1\). For equivalence queries, a check is performed to see if the proposed hypothesis automaton classifies all sequences correctly. That is, does \(L(\delta(q_0,s))=f(s)\) for all \(s\in(\Sigma^I)^*\)? If yes, then the proposed hypothesis is correct. If not, then the teacher provides the learner with a counterexample sequence that was misclassified by the proposed hypothesis.
In REMAP, we consider how a multiclass classifier of sequences can be learned from preference information. Accordingly, we replace membership queries with preference queries, and we imbue the equivalence with an extra piece of information: feedback.
Active automata learning algorithms store information recieved from queries into a datastructure, usually an observation table. An observation table is a 2-dimensional array with rows and columns indexed by sequences. The rows are indexed by prefix sequences, and the columns are indexed by suffix sequences. Then, for a given prefix \(s\) and a given suffix \(e\), the value of the entry located at \((s,e)\) is \(T(s\cdot e)\) for the sequence \(s\cdot e\).
The structure of an observation table is such that when certain properties are satisfied, a hypothesis automaton can be constructed from it.
In particular, the states, transition function, and state labels can be read off from the table:
\[\begin{array}{ll}
Q = \{\text{row}(s) | s \in S\} & \text{The states are unique rows}\\
q_0 = \text{row}(\varepsilon) & \text{The initial state is the empty sequence row}\\
\delta(\text{row}(s),\sigma) = \text{row}(s\cdot\sigma) & \text{For all } s\in S \text{ and } \sigma \in \Sigma^I\\
L(\text{row}(s)) = T(s\cdot \varepsilon) & \text{Output label of each state}
\end{array}\]
For intuition on why, consider the following:
Entries in the table are emprical observations
States represent sets of sequences, given an initial state
States are also represented by suffix observations
Transitions are also encoded in the table
In \(L^*\), the value of \(T(s\cdot e)\) is a concrete value in the output alphabet \(\Sigma^O\), since concrete values are returned from membership queries. However, in REMAP, the values of \(T(s\cdot e)\) are symbols, specifically variables, because preference queries do not return concrete values. This means that REMAP utilizes a symbolic observation table, and the required properties on the table are that it be unified, closed, and consistent.
Because preference queries in REMAP return constraints on variables, the symbolic observation table must be unified in order for a closed and consistency check to be made. In particular, without any constraint information, each entry in the table is a unique variable, e.g. \(T(s_1\cdot e_1) = v_{11}, T(s_2\cdot e_1) = v_{21}\). When a preference query is asked, the response provides a critical piece of information for the learner. It reveals to the learner whether a pair of variables in the table should be considered equal or inequal. By leveraging variable equality information, variables can be grouped into equivalence classes, and the representatives chosen from each equivalence class will replace all non-representatives in the table and in the gathered constraints. This is the process of unification.
REMAP has several theoretical guarantees, including those of query complexity and those about PAC--identification. Below, we provide intuition for the guarantees on minimalism, correctness, and termination.
REMAP returns the minimal Moore machine consistent with the teacher's preferences.
Intuition: An automaton with fewer states than the minimum will misclassify at least 1 sequence.
REMAP has termination and correctness guarantees; termination must occur when the upper bounds on states and known representative values is reached.
Intuition: At least one of the number of states and the number of known representative values must increase each time a new hypothesis is made. This implies progress is always made towards the upper bound.
@misc{hsiung2023remap,
title={Automata Learning from Preference and Equivalence Queries},
author={Eric Hsiung and Joydeep Biswas and Swarat Chaudhuri},
year={2023},
eprint={2308.09301},
archivePrefix={arXiv},
primaryClass={cs.LG}
}