MOEReview Rajbhandari: DeepSpeed MoE
Proposes: more MoEs at later layers + a shared expert.