MOEReview Pan: Dense Training Sparse Inference
Train experts densely, and then during inference keep only topk