MIPS32: improvements in code generation (mostly 64-bit ALU ops)

Specifically:
- Use the delay slot in InvokeRuntime() for direct entry points
- Use kNoOutputOverlap wherever possible
- Improve and/or/xor/add/sub with 64-bit integer constants
- Improve 64-bit shifts by a constant amount on R2+
- More efficient load/store of 64-bit constants (especially, 0 & +0.0)

Change-Id: I86d2217c8b5b8e2a9371effc2ce38b9eec62782b
5 files changed