JNI: Inline fast-path for `JniMethodEnd()`.

Golem results for art-opt-cc (higher is better):
linux-ia32                       before after
NativeDowncallStaticNormal       46.766 51.016 (+9.086%)
NativeDowncallStaticNormal6      42.268 45.748 (+8.235%)
NativeDowncallStaticNormalRefs6  41.355 44.776 (+8.272%)
NativeDowncallVirtualNormal      46.361 52.527 (+13.30%)
NativeDowncallVirtualNormal6     41.812 45.206 (+8.118%)
NativeDowncallVirtualNormalRefs6 40.500 44.169 (+9.059%)
(The NativeDowncallVirtualNormal result for x86 is skewed
by one extra good run as Golem reports the best result in
the summary. Using the second best and most frequent
result 50.5, the improvement is only around 8.9%.)
linux-x64                        before after
NativeDowncallStaticNormal       44.169 47.976 (+8.620%)
NativeDowncallStaticNormal6      43.198 46.836 (+8.423%)
NativeDowncallStaticNormalRefs6  38.481 44.687 (+16.13%)
NativeDowncallVirtualNormal      43.672 47.405 (+8.547%)
NativeDowncallVirtualNormal6     42.268 45.726 (+8.182%)
NativeDowncallVirtualNormalRefs6 41.355 44.687 (+8.057%)
(The NativeDowncallStaticNormalRefs6 result for x86-64 is
a bit inflated because recent results jump between ~38.5
and ~40.5. If we take the latter as the baseline, the
improvements is only around 10.3%.)
linux-armv7                      before after
NativeDowncallStaticNormal       10.659 14.620 (+37.16%)
NativeDowncallStaticNormal6      9.8377 13.120 (+33.36%)
NativeDowncallStaticNormalRefs6  8.8714 11.454 (+29.11%)
NativeDowncallVirtualNormal      10.511 14.349 (+36.51%)
NativeDowncallVirtualNormal6     9.9701 13.347 (+33.87%)
NativeDowncallVirtualNormalRefs6 8.9241 11.454 (+28.35%)
linux-armv8                      before after
NativeDowncallStaticNormal       10.608 16.329 (+53.93%)
NativeDowncallStaticNormal6      10.179 15.347 (+50.76%)
NativeDowncallStaticNormalRefs6  9.2457 13.705 (+48.23%)
NativeDowncallVirtualNormal      9.9850 14.903 (+49.25%)
NativeDowncallVirtualNormal6     9.9206 14.757 (+48.75%)
NativeDowncallVirtualNormalRefs6 8.8235 12.789 (+44.94%)

Test: m test-art-host-gtest
Test: testrunner.py --host --optimizing
Test: run-gtests.sh
Test: testrunner.py --target --optimizing
Bug: 172332525
Change-Id: Ie144bc4f7f82be95790ea7d3123b81a3b6bfa603
diff --git a/compiler/utils/jni_macro_assembler.h b/compiler/utils/jni_macro_assembler.h
index 659ff4c..0d82458 100644
--- a/compiler/utils/jni_macro_assembler.h
+++ b/compiler/utils/jni_macro_assembler.h
@@ -252,9 +252,18 @@
   virtual void CallFromThread(ThreadOffset<kPointerSize> offset) = 0;
 
   // Generate fast-path for transition to Native. Go to `label` if any thread flag is set.
+  // The implementation can use `scratch_regs` which should be callee save core registers
+  // (already saved before this call) and must preserve all argument registers.
   virtual void TryToTransitionFromRunnableToNative(
       JNIMacroLabel* label, ArrayRef<const ManagedRegister> scratch_regs) = 0;
 
+  // Generate fast-path for transition to Runnable. Go to `label` if any thread flag is set.
+  // The implementation can use `scratch_regs` which should be core argument registers
+  // not used as return registers and it must preserve the `return_reg` if any.
+  virtual void TryToTransitionFromNativeToRunnable(JNIMacroLabel* label,
+                                                   ArrayRef<const ManagedRegister> scratch_regs,
+                                                   ManagedRegister return_reg) = 0;
+
   // Generate suspend check and branch to `label` if there is a pending suspend request.
   virtual void SuspendCheck(JNIMacroLabel* label) = 0;