Speed up MterpShouldSwitchInterpreters check

We were often performing a pair of TLS reads in order to determine in
the current thread has any pending asynchronous exceptions (exceptions
thrown by the JVMTI StopThread function). This is quite slow and was
impacting some benchmarks. Since it is expected that asynchronous
exceptions are extremely rare we will first check to see if any
asynchronous exceptions have been sent on the current process. Only if
at least one asynchronous exception has been thrown will we do the
expensive TLS lookups to determine if one has been thrown on the
current thread.

Using a global instance value without synchronization or atomics is ok
here since the checkpoint that actually sets the async_exception_thrown
flag provides synchronization by either occurring on the target thread
or passing the checkpoint.

According to go/lem this gives us a 7% increase on the caffeine
string benchmark.

Test: go/lem runs
Test: ./test.py --host -j50
Bug: 68010816
Change-Id: I62684a5b3a7fc7cc600f5efd2a2393d9c4025917
diff --git a/runtime/interpreter/mterp/mterp.cc b/runtime/interpreter/mterp/mterp.cc
index 92dd19e..987298b 100644
--- a/runtime/interpreter/mterp/mterp.cc
+++ b/runtime/interpreter/mterp/mterp.cc
@@ -151,8 +151,14 @@
       Dbg::IsDebuggerActive() ||
       // An async exception has been thrown. We need to go to the switch interpreter. MTerp doesn't
       // know how to deal with these so we could end up never dealing with it if we are in an
-      // infinite loop.
-      UNLIKELY(Thread::Current()->IsAsyncExceptionPending());
+      // infinite loop. Since this can be called in a tight loop and getting the current thread
+      // requires a TLS read we instead first check a short-circuit runtime flag that will only be
+      // set if something tries to set an async exception. This will make this function faster in
+      // the common case where no async exception has ever been sent. We don't need to worry about
+      // synchronization on the runtime flag since it is only set in a checkpoint which will either
+      // take place on the current thread or act as a synchronization point.
+      (UNLIKELY(runtime->AreAsyncExceptionsThrown()) &&
+       Thread::Current()->IsAsyncExceptionPending());
 }
 
 
diff --git a/runtime/runtime.cc b/runtime/runtime.cc
index f09b6c9..d15de38 100644
--- a/runtime/runtime.cc
+++ b/runtime/runtime.cc
@@ -256,6 +256,7 @@
       force_native_bridge_(false),
       is_native_bridge_loaded_(false),
       is_native_debuggable_(false),
+      async_exceptions_thrown_(false),
       is_java_debuggable_(false),
       zygote_max_failed_boots_(0),
       experimental_flags_(ExperimentalFlags::kNone),
diff --git a/runtime/runtime.h b/runtime/runtime.h
index 6b01cc2..476b71f 100644
--- a/runtime/runtime.h
+++ b/runtime/runtime.h
@@ -586,6 +586,14 @@
     is_native_debuggable_ = value;
   }
 
+  bool AreAsyncExceptionsThrown() const {
+    return async_exceptions_thrown_;
+  }
+
+  void SetAsyncExceptionsThrown() {
+    async_exceptions_thrown_ = true;
+  }
+
   // Returns the build fingerprint, if set. Otherwise an empty string is returned.
   std::string GetFingerprint() {
     return fingerprint_;
@@ -899,6 +907,10 @@
   // Whether we are running under native debugger.
   bool is_native_debuggable_;
 
+  // whether or not any async exceptions have ever been thrown. This is used to speed up the
+  // MterpShouldSwitchInterpreters function.
+  bool async_exceptions_thrown_;
+
   // Whether Java code needs to be debuggable.
   bool is_java_debuggable_;
 
diff --git a/runtime/thread.cc b/runtime/thread.cc
index bec1c90..cb350ed 100644
--- a/runtime/thread.cc
+++ b/runtime/thread.cc
@@ -3700,6 +3700,7 @@
 
 void Thread::SetAsyncException(ObjPtr<mirror::Throwable> new_exception) {
   CHECK(new_exception != nullptr);
+  Runtime::Current()->SetAsyncExceptionsThrown();
   if (kIsDebugBuild) {
     // Make sure we are in a checkpoint.
     MutexLock mu(Thread::Current(), *Locks::thread_suspend_count_lock_);