ARM64: Combine LSR+ADD into ADD_shift for Int32 HDiv/HRem

HDiv/HRem having a constant divisor are optimized by using
multiplication of the dividend by a sort of reciprocal of the divisor.
In case of Int32 the multiplication is done into a 64-bit register
high 32 bits of which are only used.
The multiplication result might need some ADD/SUB corrections.
Currently it is done by extracting high 32 bits with LSR and applying
ADD/SUB. However we can do correcting ADD/SUB on high 32 bits and extracting
those bits with the final right shift. This will eliminate the
extracting LSR instruction.

This CL implements this optimization.

Test: test.py --host --optimizing --jit
Test: test.py --target --optimizing --jit
Change-Id: I5ba557aa283291fd76d61ac0eb733cf6ea975116
diff --git a/compiler/optimizing/code_generator_arm64.h b/compiler/optimizing/code_generator_arm64.h
index 1e1c2a9..8349732 100644
--- a/compiler/optimizing/code_generator_arm64.h
+++ b/compiler/optimizing/code_generator_arm64.h
@@ -342,22 +342,14 @@
                              vixl::aarch64::Label* false_target);
   void DivRemOneOrMinusOne(HBinaryOperation* instruction);
   void DivRemByPowerOfTwo(HBinaryOperation* instruction);
-
-  // Helper to generate code producing the final result of HDiv/HRem with a constant divisor.
-  // 'temp_result' holds the result of multiplication of the dividend by a sort of reciprocal
-  // of the divisor (magic_number). Based on magic_number and divisor, temp_result might need
-  // to be corrected before applying final_right_shift.
-  // If the code is generated for HRem the final temp_result is used for producing the
-  // remainder.
-  void GenerateResultDivRemWithAnyConstant(bool is_rem,
-                                           int final_right_shift,
-                                           int64_t magic_number,
-                                           int64_t divisor,
-                                           vixl::aarch64::Register dividend,
-                                           vixl::aarch64::Register temp_result,
-                                           vixl::aarch64::Register out,
-                                           // This function may acquire a scratch register.
-                                           vixl::aarch64::UseScratchRegisterScope* temps_scope);
+  void GenerateIncrementNegativeByOne(vixl::aarch64::Register out,
+                                      vixl::aarch64::Register in, bool use_cond_inc);
+  void GenerateResultRemWithAnyConstant(vixl::aarch64::Register out,
+                                        vixl::aarch64::Register dividend,
+                                        vixl::aarch64::Register quotient,
+                                        int64_t divisor,
+                                        // This function may acquire a scratch register.
+                                        vixl::aarch64::UseScratchRegisterScope* temps_scope);
   void GenerateInt64DivRemWithAnyConstant(HBinaryOperation* instruction);
   void GenerateInt32DivRemWithAnyConstant(HBinaryOperation* instruction);
   void GenerateDivRemWithAnyConstant(HBinaryOperation* instruction);