Houjun Liu

PAC Learning

probably approximately correct (PAC) learning is a DFA learning scheme. Suppose you have a concept \(c \in C\). We desire to find a hypothesis \(h \in H\) which gets as close to the boundary between concepts as possible.

We want to minimize false positives and false negatives.

constituents

  • Instance space \(X\)
  • Concept class \(C\) (functions over \(X\))
  • Hypothesis class \(H\) (functions over \(X\))
    • “proper learning” \(H=C\) —we are done
  • \(A\) PAC-learns \(C\) if
    • \(\forall c \in C, \forall D \sim X\), when \(A\) gets inputs sampled from \(D\) and outputs \(h \in H\), we want…

\begin{equation} P_{A} [ P_{x \in D}[h(x) \neq c(x)] > \delta] < \epsilon \end{equation}

our learning scheme wants the probability of an error beyond a super large \(\delta\) to be small \(< \epsilon\).

Occam’s Razor

any algorithm \(A\) that outputs a small hypothesis \(h\) that works on the input, then \(A\) has PAC-learned and its likely to generalize (you have to get very unlucky to have a simple explanation to explain a large bunch of input because its quite hard to overfit).

PAC-learn a DFA

Occam’s Razor, we should just keep building a DFA until you get the right one starting from the smallest DFA.

butt… that’s exponential. so:

let’s define \(L^{?}\) such that either \(w \in L^{?}\), or \(w \not \in L^{?}\) , or we don’t know. We want to say \(L^{?}\) distinguishes \(x\) and \(y\), then there exists some \(z\) such that \(x z \in L^{?}\) and \(y z \not \in L^{?}\) or vise versa.

this is not an equivalence relation, beause ther’s is no transitivityy.

you probably can’t though, learn DFAs actively; you can learn automata actively.