Split \(Q\) projection and attention out projection into experts, with one router coordinating them.Better than MHA performanec.