New instruction simplifications. Extra dce pass. Allow more per block repeats.

Rationale:
We were missing some obvious simplifications, which left performance
at the table for e.g. CaffeineLogic compiled with dx (4200us->2700us).
The constant for allowing a repeat on a BB seemed very low, at the
very least it should depend on the BB size.

Test: test-art-host

Change-Id: Ic234566e117593e12c936d556222e4cd4f928105
9 files changed