Its like a transforming distributions procedure, but your \(f\) is not constrained to be differentiable. So you can still sample from it.
we perform a random sample of possible next state (weighted by the action you took, meaning an instantiation of \(s’ \sim T(\cdot | s,a)\)) and reward \(R(s,a)\) from current state