an advantage function is a method for scoring a policy based on how much additional value it provides compared to the greedy policy:

\begin{align} A(s,a) &= Q(s,a) - U(s) \\ &= Q(s,a) - \max_{a}Q(s,a) \end{align}

that is, how much does your policy’s action-value function differ from that of choosing the action that maximizes the utility.

For a greedy policy that just optimizes this exact metric, \(A =0\).