Reliable RL

Thinking about advances in the capabilities of RL: Knowledge Discovery -> Reasoning (programming assistance) ->(ongoing)-> Robotics

Insight: as time goes on, the “risk-criticality” of our applications increase; yet, as risk critical scenarios increase, its harder to get data.

Reliable Feedback Loop

General desirable structure…

Verify (claims and requirements) => Safeguard (safe continuous deployment) => Generalize (via compositional generalization—incrementing adding behavior without loosing behavior) => Verify => …

Deal with Stochasticity

An RL algorithm is explicable, if, WHP, running on the same MDP with fixed randomness results in the same outcomes.

=> \(\epsilon\) optimal replicable algorithms for tabular / linear settings with sample complexity polynomial i parameters.

Quantization for Tie Break

Compositional Generalization

We can decompose relevant problems into subparts, and thus allowing us to compose them together into solving new task.