MOEReview Rajbhandari: DeepSpeed MoE

Proposes: more MoEs at later layers + a shared expert.