arm/arm64: Use marking register in JNI stubs.

Do not load `is_gc_marking` from the `Thread` when it is
already available in r8 on arm and x20 on arm64.

Golem results for art-opt-cc on Odroid-C2 (higher is better):
linux-armv7                     before after
NativeDowncallStaticNormal      5.4429 5.5021 (+1.088%)
NativeDowncallStaticNormal6     5.1163 5.1498 (+0.6554%)
NativeDowncallStaticNormalRefs6 4.8876 4.9188 (+0.6394%)
NativeDowncallStaticFast        15.992 16.505 (+3.207%)
NativeDowncallStaticFast6       13.466 13.705 (+1.775%)
NativeDowncallStaticFastRefs6   11.994 12.183 (+1.578%)
linux-armv8                     before after
NativeDowncallStaticNormal      5.8594 5.9026 (+0.7378)
NativeDowncallStaticNormal6     5.5198 5.5607 (+0.7414)
NativeDowncallStaticNormalRefs6 5.1498 5.1862 (+0.7072)
NativeDowncallStaticFast        17.057 17.439 (+2.242%)
NativeDowncallStaticFast6       14.478 14.757 (+1.922%)
NativeDowncallStaticFastRefs6   12.183 12.376 (+1.584%)

Test: m test-art-host-gtest
Test: run-gtests.sh
Test: testrunner.py --target --optimizing --gcstress
Bug: 172332525
Change-Id: I595cd0e17a480cdfd86c548a4f9853f4b86f4047
2 files changed