Fix ARM64 SystemArrayCopy intrinsic with large constant dest position.

Make sure we do not deplete the whole VIXL scratch register pool, so
that VIXL can still use IP0 as a temporary when emitting
macro-instructions.

Test: art/test/testrunner/testrunner.py --optimizing --target --64
Bug: 37256530
Change-Id: I5da22e552297fad87db5763e2dab60ae6a7a43af
diff --git a/compiler/optimizing/intrinsics_arm64.cc b/compiler/optimizing/intrinsics_arm64.cc
index 423fd3c..8485c32 100644
--- a/compiler/optimizing/intrinsics_arm64.cc
+++ b/compiler/optimizing/intrinsics_arm64.cc
@@ -2755,9 +2755,17 @@
         // Make sure `tmp` is not IP0, as it is clobbered by
         // ReadBarrierMarkRegX entry points in
         // ReadBarrierSystemArrayCopySlowPathARM64.
+        DCHECK(temps.IsAvailable(ip0));
         temps.Exclude(ip0);
         Register tmp = temps.AcquireW();
         DCHECK_NE(LocationFrom(tmp).reg(), IP0);
+        // Put IP0 back in the pool so that VIXL has at least one
+        // scratch register available to emit macro-instructions (note
+        // that IP1 is already used for `tmp`). Indeed some
+        // macro-instructions used in GenSystemArrayCopyAddresses
+        // (invoked hereunder) may require a scratch register (for
+        // instance to emit a load with a large constant offset).
+        temps.Include(ip0);
 
         // /* int32_t */ monitor = src->monitor_
         __ Ldr(tmp, HeapOperand(src.W(), monitor_offset));