PAC Learning

probably approximately correct (PAC) learning is a DFA learning scheme. Suppose you have a concept \(c \in C\). We desire to find a hypothesis \(h \in H\) which gets as close to the boundary between concepts as possible.

We want to minimize false positives and false negatives.

constituents

Instance space \(X\)
Concept class \(C\) (functions over \(X\))
Hypothesis class \(H\) (functions over \(X\))
- “proper learning” \(H=C\) —we are done
\(A\) PAC-learns \(C\) if
- \(\forall c \in C, \forall D \sim X\), when \(A\) gets inputs sampled from \(D\) and outputs \(h \in H\), we want…

\begin{equation} P_{A} [ P_{x \in D}[h(x) \neq c(x)] > \delta] < \epsilon \end{equation}

our learning scheme wants the probability of an error beyond a super large \(\delta\) to be small \(< \epsilon\).

Occam’s Razor

any algorithm \(A\) that outputs a small hypothesis \(h\) that works on the input, then \(A\) has PAC-learned and its likely to generalize (you have to get very unlucky to have a simple explanation to explain a large bunch of input because its quite hard to overfit).

PAC-learn a DFA

Occam’s Razor, we should just keep building a DFA until you get the right one starting from the smallest DFA.

butt… that’s exponential. so:

let’s define \(L^{?}\) such that either \(w \in L^{?}\), or \(w \not \in L^{?}\) , or we don’t know. We want to say \(L^{?}\) distinguishes \(x\) and \(y\), then there exists some \(z\) such that \(x z \in L^{?}\) and \(y z \not \in L^{?}\) or vise versa.

this is not an equivalence relation, beause ther’s is no transitivityy.

you probably can’t though, learn DFAs actively; you can learn automata actively.