probably approximately correct (PAC) learning is a DFA learning scheme. Suppose you have a concept \(c \in C\). We desire to find a hypothesis \(h \in H\) which gets as close to the boundary between concepts as possible.
We want to minimize false positives and false negatives.
constituents
- Instance space \(X\)
- Concept class \(C\) (functions over \(X\))
- Hypothesis class \(H\) (functions over \(X\))
- “proper learning” \(H=C\) —we are done
- \(A\) PAC-learns \(C\) if
- \(\forall c \in C, \forall D \sim X\), when \(A\) gets inputs sampled from \(D\) and outputs \(h \in H\), we want…
\begin{equation} P_{A} [ P_{x \in D}[h(x) \neq c(x)] > \delta] < \epsilon \end{equation}
our learning scheme wants the probability of an error beyond a super large \(\delta\) to be small \(< \epsilon\).
Occam’s Razor
any algorithm \(A\) that outputs a small hypothesis \(h\) that works on the input, then \(A\) has PAC-learned and its likely to generalize (you have to get very unlucky to have a simple explanation to explain a large bunch of input because its quite hard to overfit).
PAC-learn a DFA
Occam’s Razor, we should just keep building a DFA until you get the right one starting from the smallest DFA.
butt… that’s exponential. so:
let’s define \(L^{?}\) such that either \(w \in L^{?}\), or \(w \not \in L^{?}\) , or we don’t know. We want to say \(L^{?}\) distinguishes \(x\) and \(y\), then there exists some \(z\) such that \(x z \in L^{?}\) and \(y z \not \in L^{?}\) or vise versa.
this is not an equivalence relation, beause ther’s is no transitivityy.
you probably can’t though, learn DFAs actively; you can learn automata actively.