MOEReview Pan: Dense Training Sparse Inference

Train experts densely, and then during inference keep only topk