ART: Optimize ADD/SUB+ADD_shift into ADDS/SUBS+CINC for HDiv/HRem

HDiv/HRem having a constant divisor are optimized by using
multiplication of the dividend by a sort of reciprocal of the divisor.
The multiplication result might need some corrections to be finalized.
The last correction is to increment by 1, if the result is negative.
Currently it is done with 'add result, temp_result, temp_result, lsr #31 or #63'.
Such ADD usually has latency 2, e.g. on Cortex-A55.
However if one of the corrections is ADD or SUB, the sign can be detected
with ADDS/SUBS. They set the N flag if the result is negative.
This allows to use CINC which has latency 1:
  adds temp_result, temp_result, dividend
  cinc out, temp_result, mi

This CL implements this optimization.

Test: test.py --host --optimizing --jit
Test: test.py --target --optimizing --jit
Change-Id: Ia6aac6771908e992c86e32fe1694a82bd1b7af0b
4 files changed