Sarsa (Lambda)
Last edited: August 8, 2025Sarsa (Lambda) is SARSA with Eligibility Traces (\(\lambda\)).
Previous approaches to deal with Partially Observable Markov Decision Process:
- memory-based state estimation (beliefs)
- special planning methods
Key question: Can we use MDP reinforcement learning to deal with POMDPs?
Background
Recall MDP SARSA:
\begin{equation} Q(s,a) \leftarrow Q(s,a) + \alpha \qty [(r + \gamma Q(s’, a’)) - Q(s,a)] \end{equation}
Recall that, sparse rewards with SARSA can take a long time to learn because it takes time to backpropgate.
Hence, we use Eligibility Traces, which keeps track of what’s “eligible” for updates:
SARSOP
Last edited: August 8, 2025Big problem: curse of dimensionality and the curse of history.
PBVI and HSVI tries to sample the belief simplex generally. But instead we should try to sample OPTIMAL REACHABLE SET.
Background
Recall one-step lookahead in POMDP. The difficulty here is that the sum over all of the alpha-vectors is still very hard. So, in PBVI, we only do this to a small set of beliefs
SARSOP
- sample \(R^{*}\)
- backup
- prune
Initialization
choose an initial belief, action, and observation using “suitable heuristics”. Initialize a set of alpha vectors corresponding to this belief.
SAT is in NP
Last edited: August 8, 2025Recall SAT is in NP because if \(\phi \in \text{SAT}\), then there is a short (poly-n space), efficiently (poly-time) checkable proof (by just reading out the satisfying assignment).
Satelite Assignment Problem
Last edited: August 8, 2025Goal: for a bunch of satellite with
\begin{equation} \alpha \qty(\beta) = \text{argmax}_{x \in X}\sum_{i=1}^{n} \sum_{j=1}^{m} \beta_{ij}x_{ij} \end{equation}
where there’s benefit matrix of Agent assigned to Task, \(\beta\). This is greedy and can be soled with Hungarian Method. But, this becomes hard when satellites MOVE and becomes sequential! and stuff starts running out of time: it becomes sequential with dependenices of past to future.
Solution: Multi-Agent RL. But, vanilla solution will conflict because the dominants strategy maybe the same for each agent.
scalander notes
Last edited: August 8, 2025- adding emails need sequential typing
- hitting enter should move on to the next page
- date selection
- “Monday next week” doesn’t NLP
- reading calendar output isn’t sorted
bugs
reading other people’s calendars isn’t working
need some information about that the heck is actually happening on the scheduling
show other people’s overall availibliity in the scheduling page
the weight number doesn’t make sense—correct alt-text and make the number work
have the idea of a “meeting owner”, and we only reach out to them to confirm final date + have an ability to add a message; also allow the message owner to change to alternate schedules on that date