Automata Learning from Preference and Equivalence Queries

The University of Texas at Austin
CAV 2025

REMAP Automata Learning from Preference and Equivalence Queries


REMAP is an active automata learning algorithm which collects preferences between sequences and represents them in the form of constraints and a symbolic observation table. By utilizing unification and constraint solving, REMAP navigates the space of symbolic hypotheses and utilizes satisfiability solving to arrive at a concrete hypothesis.

Abstract

Active automata learning from membership and equivalence queries is a foundational problem with numerous applications. We propose a novel variant of the active automata learning problem: actively learn finite automata using preference queries---i.e., queries about the relative position of two sequences in a total preorder---instead of membership queries. Our solution is REMAP, a novel algorithm which leverages a symbolic observation table along with unification and constraint solving to navigate a space of symbolic hypotheses (each representing a set of automata), and uses satisfiability-solving to construct a concrete automaton (specifically a Moore machine) from a symbolic hypothesis. REMAP is guaranteed to correctly infer the minimal automaton with polynomial query complexity under exact equivalence queries, and achieves PAC--identification (\(\varepsilon\)-approximate, with high probability) of the minimal automaton using sampling-based equivalence queries. Our empirical evaluations of REMAP on the task of learning reward machines for two reinforcement learning domains indicate REMAP scales to large automata and is effective at learning correct automata from consistent teachers, under both exact and sampling-based equivalence queries.

Active Automata Learning

An example finite automaton for the regular language a*b

Input alphabet \(\Sigma^I = \{a,b\}\)

Output alphabet \(\Sigma^O = \{0,1\}\)

Set of States \(Q = \{q_\varepsilon, q_b, q_{ba}\}\)

Initial State \(q_0 = q_\varepsilon\)

Regular language (set of sequences) \(\mathcal{R} = a^*b\)

Classification Function \(f(s) = \left\{\begin{matrix} 1 & \text{ for all } s \in \mathcal{R}\\ 0 & \text{ for all } s \not\in \mathcal{R}\end{matrix}\right.\)

The goal of active automata learning is for the learner to exactly learn a representation of a sequence classifier that the teacher has in mind. For example, when learning deterministic finite automata, the objective can be described as having the learner successfully classify which sequences formed from an alphabet belong to certain classes (e.g. what words are part of a regular language). The learner interacts with a teacher by asking queries. One interpretation of this problem is actively learning a multiclass sequence classification model, where the sequences belong to a finite number of classes. We can represent the target classification model as a function \(f:(\Sigma^I)^*\rightarrow\Sigma^O\) which maps sequences of the input alphabet \(\Sigma^I\) to the output alphabet set \(\Sigma^O\). The goal is to use queries and arrive at an automaton \(\langle Q, q_0, \Sigma^I, \Sigma^O, \delta, L \rangle\) which exactly models the function \(f\). In this representation, \(Q\) is the set of states, \(q_0\) is the initial state, \(\Sigma^I\) and \(\Sigma^O\) are the input and output alphabets, respectively, \(\delta : Q \times \Sigma^I \rightarrow Q\) is the deterministic transition function, and \(L : Q\rightarrow \Sigma^O\) is the function which labels each state with an element from the output alphabet.

Queries in Classical L*

Membership and Equivalence Queries in L-star

In the classic formulation of actively learning deterministic finite automata, the \(L^*\) algorithm relies on membership and equivalence queries to learn a sequence classifier with only two classes. Membership queries return the concrete class that a sequence belongs to, e.g. \(\text{memQ}(s) \in \Sigma^O\). In the example above, the membership query of a sequence returns either a \(0\) or a \(1\). For equivalence queries, a check is performed to see if the proposed hypothesis automaton classifies all sequences correctly. That is, does \(L(\delta(q_0,s))=f(s)\) for all \(s\in(\Sigma^I)^*\)? If yes, then the proposed hypothesis is correct. If not, then the teacher provides the learner with a counterexample sequence that was misclassified by the proposed hypothesis.

Preference and Equivalence Queries in REMAP

Preference and Equivalence Queries in REMAP

In REMAP, we consider how a multiclass classifier of sequences can be learned from preference information. Accordingly, we replace membership queries with preference queries, and we imbue the equivalence with an extra piece of information: feedback.

The Symbolic Observation Table

Active automata learning algorithms store information recieved from queries into a datastructure, usually an observation table. An observation table is a 2-dimensional array with rows and columns indexed by sequences. The rows are indexed by prefix sequences, and the columns are indexed by suffix sequences. Then, for a given prefix \(s\) and a given suffix \(e\), the value of the entry located at \((s,e)\) is \(T(s\cdot e)\) for the sequence \(s\cdot e\).

The structure of an observation table is such that when certain properties are satisfied, a hypothesis automaton can be constructed from it. In particular, the states, transition function, and state labels can be read off from the table: \[\begin{array}{ll} Q = \{\text{row}(s) | s \in S\} & \text{The states are unique rows}\\ q_0 = \text{row}(\varepsilon) & \text{The initial state is the empty sequence row}\\ \delta(\text{row}(s),\sigma) = \text{row}(s\cdot\sigma) & \text{For all } s\in S \text{ and } \sigma \in \Sigma^I\\ L(\text{row}(s)) = T(s\cdot \varepsilon) & \text{Output label of each state} \end{array}\]

For intuition on why, consider the following:

Entries in the table are emprical observations

States represent sets of sequences, given an initial state

States are also represented by suffix observations

Transitions are also encoded in the table

In \(L^*\), the value of \(T(s\cdot e)\) is a concrete value in the output alphabet \(\Sigma^O\), since concrete values are returned from membership queries. However, in REMAP, the values of \(T(s\cdot e)\) are symbols, specifically variables, because preference queries do not return concrete values. This means that REMAP utilizes a symbolic observation table, and the required properties on the table are that it be unified, closed, and consistent.

Representatives, Equivalence Classes, and Unification

Unification in REMAP

Because preference queries in REMAP return constraints on variables, the symbolic observation table must be unified in order for a closed and consistency check to be made. In particular, without any constraint information, each entry in the table is a unique variable, e.g. \(T(s_1\cdot e_1) = v_{11}, T(s_2\cdot e_1) = v_{21}\). When a preference query is asked, the response provides a critical piece of information for the learner. It reveals to the learner whether a pair of variables in the table should be considered equal or inequal. By leveraging variable equality information, variables can be grouped into equivalence classes, and the representatives chosen from each equivalence class will replace all non-representatives in the table and in the gathered constraints. This is the process of unification.

Theoretical Guarantees

REMAP has several theoretical guarantees, including those of query complexity and those about PAC--identification. Below, we provide intuition for the guarantees on minimalism, correctness, and termination.

Minimalism

REMAP returns the minimal Moore machine consistent with the teacher's preferences.

Intuition: An automaton with fewer states than the minimum will misclassify at least 1 sequence.

Termination

Termination in REMAP

REMAP has termination and correctness guarantees; termination must occur when the upper bounds on states and known representative values is reached.

Intuition: At least one of the number of states and the number of known representative values must increase each time a new hypothesis is made. This implies progress is always made towards the upper bound.

BibTeX

@misc{hsiung2023remap,
        title={Automata Learning from Preference and Equivalence Queries}, 
        author={Eric Hsiung and Joydeep Biswas and Swarat Chaudhuri},
        year={2023},
        eprint={2308.09301},
        archivePrefix={arXiv},
        primaryClass={cs.LG}
      }