ARM64: Implicit suspend checks using LDR.

Implement implicit suspend checks in compiled managed code.
Use a single instruction `ldr x21, [x21, #0]` for the check
where `x21` points to a field in `Thread` that points to
itself until we request a checkpoint or suspension and set
it to null. After the null becomes visible to a running
thread, it requires two loads to get a segmentation fault
that is intercepted and redirected to a suspend check.

This involves a trade-off between the speed of a single
suspend check (a single LDR is faster than LDR+TST+BEQ/BNE)
and time to suspend where we now need to wait for two LDRs
and incur fault handling overhead. The time to suspend was
previously measured to be acceptable with the long tail
being comparable to the explicit suspend check.

Golem results for art-opt-cc (higher is better):
linux-armv8         before   after
Jacobi              597.49  637.92 (+6.766%) [1.3 noise]
Towers              934.00  991.00 (+6.103%) [1.4 noise]
QuicksortTest      5108.82 5622.46 (+10.05%) [1.6 noise]
StringPoolBench    8353.00 9806.00 (+17.39%) [4.4 noise]
LongInductionBench  1.0468  1.5100 (+44.26%) [0.4 noise]
IntInductionBench   1.1710  1.7715 (+51.28%) [0.4 noise]
(These are four benchmarks with highest "significance" and
two with highest improvement as reported by Golem.)

It is also interesting to compare this with a revert of
    https://android-review.googlesource.com/1905055
which was the last change dealing with suspend checks and
which regressed these benchmarks.
Golem results for art-opt-cc (higher is better):
linux-armv8         revert   after
Jacobi              616.36  637.92 (+3.497%) [0.7 noise]
Towers              943.00  991.00 (+5.090%) [1.2 noise]
QuicksortTest      5186.83 5622.46 (+8.399%) [1.4 noise]
StringPoolBench    8992.00 9806.00 (+9.052%) [2.4 noise]
LongInductionBench  1.1895  1.5100 (+26.94%) [0.3 noise]
IntInductionBench   1.3210  1.7715 (+34.10%) [0.3 noise]

Prebuilt sizes for aosp_blueline-userdebug:
 - before:
   arm64/boot*.oat: 16994120
   oat/arm64/services.odex: 45848752
 - revert https://android-review.googlesource.com/1905055 :
   arm64/boot*.oat: 16870672 (-121KiB)
   oat/arm64/services.odex: 45577248 (-265KiB)
 - after:
   arm64/boot*.oat: 16575552 (-409KiB; -288KiB v. revert)
   oat/arm64/services.odex: 44877064 (-949KiB; -684KiB v. revert)

Test: testrunner.py --target --optimizing --jit --interpreter --64
Bug: 38383823
Change-Id: I1827689a3fb7f3c38310b87c80c7724bd7364a66
diff --git a/compiler/optimizing/code_generator_arm64.cc b/compiler/optimizing/code_generator_arm64.cc
index 775bfcf..6272276 100644
--- a/compiler/optimizing/code_generator_arm64.cc
+++ b/compiler/optimizing/code_generator_arm64.cc
@@ -1974,8 +1974,22 @@
   __ Dmb(InnerShareable, type);
 }
 
+bool CodeGeneratorARM64::CanUseImplicitSuspendCheck() const {
+  // Use implicit suspend checks if requested in compiler options unless there are SIMD
+  // instructions in the graph. The implicit suspend check saves all FP registers as
+  // 64-bit (in line with the calling convention) but SIMD instructions can use 128-bit
+  // registers, so they need to be saved in an explicit slow path.
+  return GetCompilerOptions().GetImplicitSuspendChecks() && !GetGraph()->HasSIMD();
+}
+
 void InstructionCodeGeneratorARM64::GenerateSuspendCheck(HSuspendCheck* instruction,
                                                          HBasicBlock* successor) {
+  if (codegen_->CanUseImplicitSuspendCheck()) {
+    __ Ldr(kImplicitSuspendCheckRegister, MemOperand(kImplicitSuspendCheckRegister));
+    codegen_->RecordPcInfo(instruction, instruction->GetDexPc());
+    return;
+  }
+
   SuspendCheckSlowPathARM64* slow_path =
       down_cast<SuspendCheckSlowPathARM64*>(instruction->GetSlowPath());
   if (slow_path == nullptr) {
@@ -3569,7 +3583,9 @@
   if (info != nullptr && info->IsBackEdge(*block) && info->HasSuspendCheck()) {
     codegen_->MaybeIncrementHotness(/* is_frame_entry= */ false);
     GenerateSuspendCheck(info->GetSuspendCheck(), successor);
-    return;
+    if (!codegen_->CanUseImplicitSuspendCheck()) {
+      return;  // `GenerateSuspendCheck()` emitted the jump.
+    }
   }
   if (block->IsEntryBlock() && (previous != nullptr) && previous->IsSuspendCheck()) {
     GenerateSuspendCheck(previous->AsSuspendCheck(), nullptr);