How Did Economists Get It So Wrong?
Last edited: August 8, 2025A reading: (Krugman 2009)
Reflection
The discussion here of the conflict between “saltwater” and “freshwater” (Keynesian and Neoclassical) economists is very interesting when evaluated from the perspective of our recent impending recession.
One particular statement that resonated with me in the essay was the fact that a crisis simply “pushed the freshwater economists into further absurdity.” It is interesting to see that, once a theory has been well-established and insulated in a community, it becomes much more difficult to parcel out as something that could be wrong.
hsbi
Last edited: August 8, 2025HSVI
Last edited: August 8, 2025Improving PBVI without sacrificing quality.
Initialization
We first initialize HSVI with a set of alpha vectors \(\Gamma\), representing the lower-bound, and a list of tuples of \((b, U(b))\) named \(\Upsilon\), representing the upper-bound. We call the value functions they generate as \(\bar{V}\) and \(\underline V\).
Lower Bound
Set of alpha vectors: best-action worst-state (HSVI1), blind lower bound (HSVI2)
Calculating \(\underline{V}(b)\)
\begin{equation} \underline{V}_{\Gamma} = \max_{\alpha} \alpha^{\top}b \end{equation}
Upper Bound
- solving fully-observable MDP
- Project \(b\) into the point-set
- Projected the upper bound onto a convex hull (HSVI2: via approximate convex hull projection)
Calculating \(\bar{V}(b)\)
Recall that though the lower-bound is given by alpha vectors, the upper bound is given in terms of a series of tuples \((b, U(b)) \in \Upsilon\).
Hungarian Method
Last edited: August 8, 2025HybPlan
Last edited: August 8, 2025“Can we come up a policy that, if not fast, at least reach the goal!”
Background
Stochastic Shortest-Path
we are at an initial state, and we have a series of goal states, and we want to reach to the goal states.
We can solve this just by:
- value iteration
- simulate a trajectory and only updating reachable state: RTDP, LRTDP
- MBP
Problem
MDP + Goal States
- \(S\): set of states
- \(A\): actions
- \(P(s’|s,a)\): transition
- \(C\): reward
- \(G\): absorbing goal states
Approach
Combining LRTDP with anytime dynamics
