ART: Change main-thread thread paging scheme

Modify the code that ensures we can install a stack guard page into
the main thread.

A recent kernel change means that our previous approach of using a
free pointer does not work. It is important to actually extend the
stack properly. For portability, use a function with a large stack
frame (suggested by and adapted from hboehm).

Bug: 62952017
Test: m
Test: m test-art-host
Test: Device boots (x86_64 emulator)
Test: Device boots (bullhead)
Change-Id: Ic2a0c3d6d05a1ea9f655329d147b46949e1b9db3
diff --git a/runtime/thread.cc b/runtime/thread.cc
index be1614b..5edd071 100644
--- a/runtime/thread.cc
+++ b/runtime/thread.cc
@@ -549,27 +549,40 @@
   //
   // We map in the stack by reading every page from the stack bottom (highest address)
   // to the stack top. (We then madvise this away.) This must be done by reading from the
-  // current stack pointer downwards. Any access more than a page below the current SP
-  // might cause a segv.
-  // TODO: This comment may be out of date. It seems possible to speed this up. As
-  //       this is normally done once in the zygote on startup, ignore for now.
+  // current stack pointer downwards.
   //
-  // AddressSanitizer does not like the part of this functions that reads every stack page.
-  // Looks a lot like an out-of-bounds access.
+  // Accesses too far below the current machine register corresponding to the stack pointer (e.g.,
+  // ESP on x86[-32], SP on ARM) might cause a SIGSEGV (at least on x86 with newer kernels). We
+  // thus have to move the stack pointer. We do this portably by using a recursive function with a
+  // large stack frame size.
 
-  // (Defensively) first remove the protection on the protected region as will want to read
+  // (Defensively) first remove the protection on the protected region as we'll want to read
   // and write it. Ignore errors.
   UnprotectStack();
 
   VLOG(threads) << "Need to map in stack for thread at " << std::hex <<
       static_cast<void*>(pregion);
 
-  // Read every page from the high address to the low.
-  volatile uint8_t dont_optimize_this;
-  UNUSED(dont_optimize_this);
-  for (uint8_t* p = stack_top; p >= pregion; p -= kPageSize) {
-    dont_optimize_this = *p;
-  }
+  struct RecurseDownStack {
+    // This function has an intentionally large stack size.
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wframe-larger-than="
+    NO_INLINE
+    static void Touch(uintptr_t target) {
+      volatile size_t zero = 0;
+      // Use a large local volatile array to ensure a large frame size. Do not use anything close
+      // to a full page for ASAN. It would be nice to ensure the frame size is at most a page, but
+      // there is no pragma support for this.
+      volatile char space[kPageSize - 256];
+      char sink ATTRIBUTE_UNUSED = space[zero];
+      if (reinterpret_cast<uintptr_t>(space) >= target + kPageSize) {
+        Touch(target);
+      }
+      zero *= 2;  // Try to avoid tail recursion.
+    }
+#pragma GCC diagnostic pop
+  };
+  RecurseDownStack::Touch(reinterpret_cast<uintptr_t>(pregion));
 
   VLOG(threads) << "(again) installing stack protected region at " << std::hex <<
       static_cast<void*>(pregion) << " to " <<