Thinking about advances in the capabilities of RL: Knowledge Discovery -> Reasoning (programming assistance) ->(ongoing)-> Robotics
Insight: as time goes on, the “risk-criticality” of our applications increase; yet, as risk critical scenarios increase, its harder to get data.
Reliable Feedback Loop
General desirable structure…
Verify (claims and requirements) => Safeguard (safe continuous deployment) => Generalize (via compositional generalization—incrementing adding behavior without loosing behavior) => Verify => …
Deal with Stochasticity
An RL algorithm is explicable, if, WHP, running on the same MDP with fixed randomness results in the same outcomes.
=> \(\epsilon\) optimal replicable algorithms for tabular / linear settings with sample complexity polynomial i parameters.
Quantization for Tie Break
Compositional Generalization
We can decompose relevant problems into subparts, and thus allowing us to compose them together into solving new task.
