sfc: Use write-combining to reduce TX latency

Based on work by Neil Turton <nturton@solarflare.com> and
Kieran Mansley <kmansley@solarflare.com>.

The BIU has now been verified to handle 3- and 4-dword writes within a
single 128-bit register correctly.  This means we can enable write-
combining and only insert write barriers between writes to distinct
registers.

This has been observed to save about 0.5 us when pushing a TX
descriptor to an empty TX queue.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
3 files changed