Age | Commit message (Collapse) | Author |
|
It was the only enum in the file
Bug: 329378408
Test: art/test/testrunner/testrunner.py --host --64 --optimizing -b
Change-Id: If0e385324afa3685f648135ba9b60e6bc79ba0ed
|
|
Passing a `dex_file` and `method_idx` makes testing
unnecessarily difficult.
Test: m test-art-host-gtest
Test: testrunner.py --host --optimizing
Bug: 288983053
Change-Id: Ice79423ec568e254547acd4448fb82e2ad11b79c
|
|
Golem results for art-opt-cc (higher is better):
linux-armv7 (Odroid-C2) before after
NativeDowncallStaticFast 21.622 21.923 (+1.395%)
NativeDowncallStaticFast6 18.491 18.719 (+1.236%)
NativeDowncallStaticFastRefs6 15.347 15.504 (+1.025%)
NativeDowncallVirtualFast 20.741 21.319 (+2.787%)
NativeDowncallVirtualFast6 18.953 19.183 (+1.218%)
NativeDowncallVirtualFastRefs6 15.500 15.663 (+1.053%)
NativeDowncallStaticNormal 14.620 14.757 (0.9495%)
NativeDowncallStaticNormal6 13.120 13.235 (+0.8823%)
NativeDowncallStaticNormalRefs6 11.454 11.538 (+0.7258%)
NativeDowncallVirtualNormal 14.216 14.486 (+1.898%)
NativeDowncallVirtualNormal6 13.347 13.466 (+0.8978%)
NativeDowncallVirtualNormalRefs6 11.538 11.628 (+0.7752%)
linux-armv7 (Raspberry Pi 4) before after
NativeDowncallStaticFast 43.305 42.331 (-2.250%)
NativeDowncallStaticFast6 35.608 37.369 (+4.945%)
NativeDowncallStaticFastRefs6 31.390 31.793 (+1.285%)
NativeDowncallVirtualFast 33.814 31.825 (-5.882%)
NativeDowncallVirtualFast6 34.311 36.445 (+6.220%)
NativeDowncallVirtualFastRefs6 31.762 32.419 (+2.069%)
NativeDowncallStaticNormal 13.848 14.244 (+2.859%)
NativeDowncallStaticNormal6 13.592 13.725 (+0.9804%)
NativeDowncallStaticNormalRefs6 12.671 12.536 (-1.061%)
NativeDowncallVirtualNormal 13.979 13.848 (-0.9397%)
NativeDowncallVirtualNormal6 13.242 13.592 (+2.647%)
NativeDowncallVirtualNormalRefs6 12.364 12.358 (-0.094%)
linux-armv8 (Odroid-C2) before after
NativeDowncallStaticFast 24.752 25.160 (+1.648%)
NativeDowncallStaticFast6 22.571 22.908 (+1.494%)
NativeDowncallStaticFastRefs6 19.183 19.183 (unchanged)
NativeDowncallVirtualFast 21.622 22.244 (+2.879%)
NativeDowncallVirtualFast6 21.319 21.934 (+2.887%)
NativeDowncallVirtualFastRefs6 17.448 17.848 (+2.296%)
NativeDowncallStaticNormal 17.048 17.250 (+1.183%)
NativeDowncallStaticNormal6 15.992 16.161 (+1.054%)
NativeDowncallStaticNormalRefs6 14.085 14.216 (+0.9314%)
NativeDowncallVirtualNormal 15.504 15.826 (+2.077%)
NativeDowncallVirtualNormal6 15.347 15.663 (+2.064%)
NativeDowncallVirtualNormalRefs6 13.466 13.586 (+0.8859%)
linux-armv8 (Raspberry Pi 4) before after
NativeDowncallStaticFast 38.366 40.796 (+6.335%)
NativeDowncallStaticFast6 38.347 40.419 (+5.405%)
NativeDowncallStaticFastRefs6 31.636 32.528 (+2.820%)
NativeDowncallVirtualFast 35.201 37.406 (+6.266%)
NativeDowncallVirtualFast6 34.000 35.626 (+4.782%)
NativeDowncallVirtualFastRefs6 27.201 27.201 (unchanged)
NativeDowncallStaticNormal 14.808 15.107 (+2.024%)
NativeDowncallStaticNormal6 14.955 14.428 (-3.526%)
NativeDowncallStaticNormalRefs6 14.174 13.855 (-2.254%)
NativeDowncallVirtualNormal 14.735 14.307 (-2.904%)
NativeDowncallVirtualNormal6 14.244 14.385 (+0.9921%)
NativeDowncallVirtualNormalRefs6 14.105 14.244 (+0.9823%)
linux-ia32 before after
NativeDowncallStaticFast 223.66 233.77 (+4.516%)
NativeDowncallStaticFast6 159.76 163.92 (+2.602%)
NativeDowncallStaticFastRefs6 137.16 141.72 (+3.324%)
NativeDowncallVirtualFast 211.79 224.05 (+5.791%)
NativeDowncallVirtualFast6 149.85 154.00 (+2.769%)
NativeDowncallVirtualFastRefs6 132.17 136.93 (+3.603%)
NativeDowncallStaticNormal 51.091 51.091 (unchanged)
NativeDowncallStaticNormal6 45.680 45.703 (+0.0497%)
NativeDowncallStaticNormalRefs6 44.732 45.161 (+0.9606%)
NativeDowncallVirtualNormal 50.450 50.450 (unchanged)
NativeDowncallVirtualNormal6 45.161 45.161 (unchanged)
NativeDowncallVirtualNormalRefs6 44.125 44.147 (+0.496%)
linux-x64 before after
NativeDowncallStaticFast 173.07 181.05 (+4.611%)
NativeDowncallStaticFast6 156.50 161.34 (+3.092%)
NativeDowncallStaticFastRefs6 130.37 131.61 (+0.9499%)
NativeDowncallVirtualFast 169.00 174.83 (+3.447%)
NativeDowncallVirtualFast6 148.13 149.35 (+0.8243%)
NativeDowncallVirtualFastRefs6 127.31 130.11 (+2.200%)
NativeDowncallStaticNormal 47.952 47.952 (unchanged)
NativeDowncallStaticNormal6 46.789 46.789 (unchanged)
NativeDowncallStaticNormalRefs6 44.643 44.643 (unchanged)
NativeDowncallVirtualNormal 47.358 47.358 (unchanged)
NativeDowncallVirtualNormal6 45.703 45.680 (-0.0497%)
NativeDowncallVirtualNormalRefs6 44.643 44.643 (unchanged)
Test: m test-art-host-gtest
Test: testrunner.py --host
Test: run-gtests.sh
Test: testrunner.py --target --optimizing
Bug: 172332525
Change-Id: I9606412c658cae8b7583308facf5ba095a982349
|
|
... on arm/arm64 for local reference frame manipulation.
Golem results for art-opt-cc (higher is better):
linux-armv7 (Odroid-C2) before after
NativeDowncallStaticFast 21.622 21.622 (unchanged)
NativeDowncallStaticFast6 18.500 18.491 (-0.0500%)
NativeDowncallStaticFastRefs6 15.354 15.354 (unchanged)
NativeDowncallVirtualFast 21.027 20.741 (-1.361%)
NativeDowncallVirtualFast6 18.953 18.953 (unchanged)
NativeDowncallVirtualFastRefs6 15.504 15.504 (unchanged)
NativeDowncallStaticNormal 14.620 14.620 (unchanged)
NativeDowncallStaticNormal6 13.120 13.120 (unchanged)
NativeDowncallStaticNormalRefs6 11.454 11.454 (unchanged)
NativeDowncallVirtualNormal 14.342 14.216 (-0.8823%)
NativeDowncallVirtualNormal6 13.347 13.347 (unchanged)
NativeDowncallVirtualNormalRefs6 11.538 11.544 (+0.0481%)
linux-armv7 (Raspberry Pi 4) before after
NativeDowncallStaticFast 41.937 41.564 (-0.8906%)
NativeDowncallStaticFast6 33.234 35.608 (+7.144%)
NativeDowncallStaticFastRefs6 30.527 31.469 (+3.085%)
NativeDowncallVirtualFast 37.531 35.429 (-5.600%)
NativeDowncallVirtualFast6 32.803 34.125 (+4.028%)
NativeDowncallVirtualFastRefs6 30.500 31.500 (+3.279%)
NativeDowncallStaticNormal 13.599 14.112 (+3.773%)
NativeDowncallStaticNormal6 13.599 13.599 (unchanged)
NativeDowncallStaticNormalRefs6 12.358 12.677 (+2.580%)
NativeDowncallVirtualNormal 13.473 13.848 (+2.781%)
NativeDowncallVirtualNormal6 13.235 13.242 (+0.0495%)
NativeDowncallVirtualNormalRefs6 12.165 12.364 (+1.632%)
linux-armv8 (Odroid-C2) before after
NativeDowncallStaticFast 23.988 24.765 (+3.238%)
NativeDowncallStaticFast6 21.923 22.571 (+2.955%)
NativeDowncallStaticFastRefs6 18.719 19.183 (+2.480%)
NativeDowncallVirtualFast 21.027 21.622 (+2.828%)
NativeDowncallVirtualFast6 20.267 21.319 (+5.190%)
NativeDowncallVirtualFastRefs6 16.683 17.448 (+4.583%)
NativeDowncallStaticNormal 16.683 17.057 (+2.239%)
NativeDowncallStaticNormal6 15.656 15.992 (+2.149%)
NativeDowncallStaticNormalRefs6 13.958 14.085 (+0.9054)
NativeDowncallVirtualNormal 15.196 15.504 (+2.026%)
NativeDowncallVirtualNormal6 15.049 15.347 (+1.980%)
NativeDowncallVirtualNormalRefs6 13.006 13.466 (+3.541%)
linux-armv8 (Raspberry Pi 4) before after
NativeDowncallStaticFast 36.482 38.366 (+5.164%)
NativeDowncallStaticFast6 37.406 38.366 (+2.564%)
NativeDowncallStaticFastRefs6 28.770 31.652 (+10.02%)
NativeDowncallVirtualFast 34.000 35.201 (+3.532%)
NativeDowncallVirtualFast6 33.251 34.000 (+2.254%)
NativeDowncallVirtualFastRefs6 26.474 27.201 (+2.747%)
NativeDowncallStaticNormal 14.237 14.606 (+2.592%)
NativeDowncallStaticNormal6 14.244 14.948 (+4.942%)
NativeDowncallStaticNormalRefs6 13.012 14.181 (+8.983%)
NativeDowncallVirtualNormal 14.105 14.663 (+3.954%)
NativeDowncallVirtualNormal6 13.979 14.735 (+5.406%)
NativeDowncallVirtualNormalRefs6 13.725 14.244 (+3.775%)
The Odroid-C2 results appear essentially unchanged for armv7
(with some minor regressions within noise) and only little
better for armv8 (but still likely within noise). On the
Raspberry Pi 4, there appears to be some improvement for
armv7 and a decent improvement for armv8 but there is higher
level of noise than on Odroid-C2. Results from this single
run are not very conclusive but we expect to see a clear
trend in the data after submission.
Test: m test-art-host-gtest
Test: testrunner.py --host --optimizing
Test: run-gtest.sh
Test: testrunner.py --target --optimizing
Bug: 172332525
Change-Id: I01033950355c988c8a0e7ed6bdb6e585587dcfb4
|
|
Test: Modify kPreferredAllocSpaceBegin = 0x90000000, then
testrunner.py --target --64 --ndebug --optimizing
Bug: 283082089
Change-Id: Ifb82d616a0d9664a2e7f5f96a1a79ddce5862cdf
|
|
Always return the same scratch registers, regardless of the
return type. This helps making more JNI stubs identical for
better reuse, such as deduplicating them in oat files.
Test: m test-art-host-gtest
Test: testrunner.py --host --optimizing
Test: testrunner.py --target --optimizing
Bug: 288983053
Change-Id: I7e7cebde1555de5a9d36e2bfca539a3bb918e6fa
|
|
Leave a few `gUseReadBarrier` uses in JNI macro assemblers.
We shall deaal with these later.
Test: m test-art-host-gtest
Test: testrunner.py --host --optimizing
Test: run-gtests.sh
Test: testrunner.py --target --optimizing
Bug: 289805127
Change-Id: I9d2aa245cee4c650129f169a82beda7dc0dd6a35
|
|
Implement the required `WriteCIE()`, fix a bug in the
`art_jni_dlsym_lookup_critical_stub`, fix reference loads
to be zero-extended and enable the JNI compiler for riscv64.
Test: m test-art-host-gtest
Test: testrunner.py --host --optimizing
Test: run-gtests.sh
Test: testrunner.py --target --64 --ndebug --prebuild --no-prebuild -t 178
Test: # Edit `run-test` to disable checker, then
testrunner.py --target --64 --ndebug --cdex-none --optimizing
# 7 tests fail (pre-existing failures): 004-StackWalk, 137-cfi,
# 2042-reference-processing, 597-deopt-busy-loop, 629-vdex-speed,
# 638-checker-inline-cache-intrinsic and 661-oat-writer-layout.
Test: aosp_cf_riscv64_phone-userdebug boots without crashes.
Bug: 283082089
Change-Id: Ifd47098b7428919b601dd22a130ad1bd51ae516d
|
|
Test: m test-art-host-gtest
Test: testrunner.py --host --optimizing
Test: run-gtests.sh
Bug: 283082089
Change-Id: I2d6e8d029a74004076b6d514205a147ce1145f03
|
|
The code used to copy the final generated code twice: from assembler to
CodeAllocator, and then to CodeAllocator to SwapAllocator/JitMemory.
The assemblers never depended on the exact location of the generated
code, so just drop that feature.
Test: test.py
Change-Id: I8dc82e4926097092b9aac336a5a5d40f79dc62ca
|
|
Implement all JNI macro assembler functions needed by the
JNI compiler to compile stubs for @CriticalNative methods.
Enable most JNI compiler tests for @CriticalNative methods
and document the reasons for keeping the remaining few tests
disabled.
Change `Riscv64Assembler::AddConst*` to store intermediate
results in `TMP` to avoid unaligned SP in the middle of a
macro operation.
Test: m test-art-host-gtest
Test: run-gtests.sh
Bug: 283082089
Change-Id: I226cab7b2ffcab375a67eb37efdd093779c5c8c4
|
|
Test: m test-art-host-gtest
Bug: 283082089
Change-Id: Ie088ad01f6170ecea9c96c10199cc7efd722210c
|
|
Bug: 169680875
Test: mmm art
Change-Id: Ic0cc320891c42b07a2b5520a584d2b62052e7235
|
|
This reverts commit 4297f22d902cf156e14c330147215d5f2fa9bd7f.
Bug: 279728780
Reason for revert: Resolve classes in inliner.
Change-Id: I4f93ac5d195eb2f473ec50fe7cc70881dcddee6f
|
|
This reverts commit 1c262ad3f1fd9f9b07d16afe70990cd8bfdc3bda.
Reason for revert: CHECKer failures e.g. https://ci.chromium.org/ui/p/art/builders/ci/host-x86-cms/8703/overview
Change-Id: I48fd251a52b5f18e3ea192bc2102df509d578aaf
|
|
Tweak a few compiler behavior to make it work and match test
expectations.
Test: test.py
Bug: 279728780
Change-Id: I350ff313ca53e2c19b637af7521683cc2b09d66f
|
|
Results for the timeGetBytesAscii#EMPTY benchmark from the
libcore's StringToBytesBenchmark suite on blueline-userdebug
with the cpu frequencies fixed at 1420800 (cpus 0-3; little)
and 1459200 (cpus 4-7; big):
32-bit little: ~415 -> ~390
64-bit little: ~415 -> ~390
32-bit big: ~180 -> ~170
64-bit big: ~180 -> ~170
Test: m test-art-host-gtest
Test: testrunner.py --host --optimizing --debug --ndebug
Test: run-gtests.sh
Test: testrunner.py --target --optimizing --debug --ndebug
Bug: 172332525
Change-Id: I0e19d583e5141e99a8b8c6fd9ae125fe7c9e02e7
|
|
The code compiles on other architectures that implement JNI compiler,
because they use these variables. However the code won't compile for
RISC-V as it falls into the default (unsupported) case.
Test: lunch aosp_riscv64-userdebug && m dist
Change-Id: I16010e806fe6c51fb0a7a20111e0d1feefde018c
|
|
We had instrumentation_levels and instrumentation_stubs_installed which
were kind of similar but slightly different in what they actually
represent. Their meaning also changed with the recent changes to avoid
instrumentation stubs. They were used sometimes incorrectly in the code.
This CL:
1. Renames instrumentation_stubs_installed to run_exit_hooks
2. Renames the instrumentation level to not refer instrumentation stubs
3. Fixes a few places that should have checked for the instrumentation
level but checked for instrumention_stubs_installed.
Bug: 206029744
Test: art/test.py
Change-Id: I20a6e9442661a6465c92321904c846d35ebb1e53
|
|
Test: m test-art-host-gtest
Test: testrunner.py --host --optimizing
Change-Id: I04dc99e1491219442ba128f57a08112ae4783b92
|
|
Using NthCallerStackVisitor is expensive since that involves decoding
method header and other tasks that are reasonably expensive especially
when called on every method exit. When calling method exit hooks from
JITed code a lot of this information like the frame_size, calling method
are already known and can be directly passed to the method exit hook
instead of computing them.
Locally this change improves the performance by 70% on debuggable-cc
config of golem benchmarks.
Bug: 253232638
Test: art/test.py
Change-Id: I3a1d80748c6d85e5fa1d3bd4aec0b29962ba0156
|
|
We don't need to anything when there are no method entry listeners on
method entry. So tighten the check so we call method entry hook only
when method entry listeners are installed. Earlier, we used to call
whenever the stack is instrumented or instrumentation stubs are
installed.
Drive by fix: remove unused constant from runtime.def
Bug: 253232638
Test: art/test.py
Change-Id: I6bdb4207804fd9c79fd7f21500c00b47e12beef3
|
|
This CL would compile both CC and userfaultfd GC in the art library,
enabling us to choose either of the two during boot time depending on
whether the device has userfaultfd kernel feature or not.
The CC GC is still chosen unless we use ART_USE_READ_BARRIER=false
during build time. This behavior will later be changed to choosing CC
*only* if ART_USE_READ_BARRIER=true is used. In other cases, if the
device has userfaultfd support then that GC will be chosen.
Bug: 160737021
Bug: 230021033
Test: art/test/testrunner/testrunner.py
Change-Id: I370f1a9f6b8cdff8c2ce3cf7aa936bccd7ed675f
|
|
This reverts commit 26aef1213dbdd7ab03688d898cf802c8c8d7e610.
Reason for revert: Relanding after a fix. When checking if the caller
is deoptimizaeble we should consider the outer caller and not the
inlined method that we could be executing currently.
Bug: 222479430
Change-Id: I37cbc8f1b34113a36a92c3801db72b16d2b9c81a
|
|
This reverts commit fc067a360d14db5f84fd4b58e0dee6cb04ee759b.
Reason for revert: test failures on jit-on-first-use: https://android-build.googleplex.com/builds/submitted/8821659/art-jit-on-first-use/latest/view/logs/build_error.log
Change-Id: Ie9bc243baac777ecc4f47cc961494ca6ab3ef4c6
|
|
Introduce a new flag to identify if JITed code was compiled with
instrumentation support. We used to check if the runtime is java
debuggable to check for instrumentation support of JITed code. We only
set the java debuggable at runtime init and never changed it after. So
this check was sufficient since we always JIT code with instrumentation
support in debuggable runtimes.
We want to be able to change the runtime to debuggable after the runtime
has started. As a first step, introduce a new flag to explicitly check
if JITed code was compiled with instrumentation support. Use this flag
to check if code needs entry / exit stubs and to check if code is async
deoptimizeable.
Bug: 222479430
Test: art/test.py
Change-Id: Ibcaeab869aa8ce153920a801dcc60988411c775b
|
|
This reverts commit 1d1d25eea72cf22aed802352a82588d97403f7b6.
Reason for revert: Relanding after fix to failures:
https://android-review.googlesource.com/c/platform/cts/+/2145979
Bug: 206029744
Change-Id: Id3c7508c86f9aeb0ddfc1c4792ed54f003b88e77
|
|
debuggable""
This reverts commit 6fb0acc14459a856c35b642e3368aff853259260.
Reason for revert: Breaks android.jvmti.cts.JvmtiHostTest
https://buganizer.corp.google.com/issues/237991413
Change-Id: I00fb58080693ddebc03c7b62ea67c91150ef7a21
|
|
This reverts commit 5c9b55aa95295a287abd86f1e7fbe98c3f35ffd6.
Reason for revert: Relanding with fixes for failure
Fixes:
1. Arm64 needs to use 64-bit registers
2. We cannot deoptimize directly from GenericJniEndTrampoline since we
only have a refs and args frame. So call the method exit hooks from
art_quick_generic_jni_trampoline.
Change-Id: If1f08eca69626f60f42f10205b482a3764610846
|
|
This reverts commit 90f12677f80169dc3ef919c2067349f94b943e7f.
Reason for revert: Failures on device
https://ci.chromium.org/ui/p/art/builders/ci/angler-armv7-ndebug/3058/overview
https://ci.chromium.org/ui/p/art/builders/ci/angler-armv8-ndebug/3049/overview
Change-Id: I43f943f9180b8c76db02a2a5c228a209a2f18a82
|
|
Don't install instrumentation stubs for native methods in debuggable
runtimes. The GenericJniTrampoline is updated to call method entry /
exit hooks. When JITing JNI stubs in debuggable runtimes we also include
calls to method entry / exit hooks when required.
Bug: 206029744
Test: art/test.py
Change-Id: I1d92ddb1d03daed74d88f5c70d38427dc6055446
|
|
This reverts commit fb1b08cbb9c6ac149d75de16c14fdaa8b68baaa4.
Reason for revert: Reland after a fix. We had to update untagging in jni_dlsym_lookup_stub as well.
Change-Id: Id936e9e60f9e87e96f1a9a79cd2118631ad1616b
|
|
runtime""
This reverts commit 5da52cd20ea0d24b038ae20c6c96aa22ac3a24a0.
Reason for revert: https://ci.chromium.org/ui/p/art/builders/ci/host-x86_64-cdex-fast/5172/overview
Change-Id: I9cebbaa145810547531a90af9da7961c0b6255d1
|
|
This reverts commit 570ade8a6600d368a9e24b64cfa0a1907929166a.
Reason for revert: Relanding after a fix for failures. The original cl breaks the invariant that we would always use AOT code for native methods if there is AOT code. This invariant is necessary to get the header when walking the stack. This CL fixes it by not relying on the invariant but instead tagging the sp to differentiate between JIT and AOT code in debuggable runtimes. Non-debuggable runtimes still have the invariant.
Change-Id: I5141281f04202d41988021d53bfe30a48bc4db9c
|
|
This reverts commit aa5a644f17aab27dee172642a276bd24e69a5b54.
Reason for revert: The original CL was wrongly blamed for
unrelated breakage. The LUCI row contains 6 red cells but
one is a known flake and the other 5 are mis-attributed and
clicking through to the manifests reveals the real culprit
to be https://android-review.googlesource.com/2049787 .
Change-Id: Ie34d9652d3cbe882a73f9eece0d30dfd9a3d15a6
Test: Rely on TreeHugger.
Bug: 181943478
|
|
This reverts commit 601f4e9955be4d25b5ecfe7779d6981a5c1fcbca.
Reason for revert: Bot redness e.g. https://ci.chromium.org/ui/p/art/builders/ci/angler-armv7-debug/2490/overview
Change-Id: If4d84625273305453ff4bb80554b5c8baca241d1
|
|
Avoid using a new arena for every JNI compilation.
Test: m test-art-host-gtest
Test: testrunner.py --host --optimizing
Bug: 181943478
Change-Id: I7d0b51941116ab0ad90f7e509577a7a3f32550ac
|
|
Follow-up to aosp/1988868 in which we added the (D)CHECK_IMPLIES
macro. This CL uses it on compiler/ occurrences found by a regex.
Test: art/test/testrunner/testrunner.py --host --64 --optimizing -b
Change-Id: If63aed969bfb8b31d6fbbcb3bca2b04314c894b7
|
|
Golem results for art-opt-cc (higher is better):
linux-ia32 before after
NativeDowncallStaticNormal 46.766 51.016 (+9.086%)
NativeDowncallStaticNormal6 42.268 45.748 (+8.235%)
NativeDowncallStaticNormalRefs6 41.355 44.776 (+8.272%)
NativeDowncallVirtualNormal 46.361 52.527 (+13.30%)
NativeDowncallVirtualNormal6 41.812 45.206 (+8.118%)
NativeDowncallVirtualNormalRefs6 40.500 44.169 (+9.059%)
(The NativeDowncallVirtualNormal result for x86 is skewed
by one extra good run as Golem reports the best result in
the summary. Using the second best and most frequent
result 50.5, the improvement is only around 8.9%.)
linux-x64 before after
NativeDowncallStaticNormal 44.169 47.976 (+8.620%)
NativeDowncallStaticNormal6 43.198 46.836 (+8.423%)
NativeDowncallStaticNormalRefs6 38.481 44.687 (+16.13%)
NativeDowncallVirtualNormal 43.672 47.405 (+8.547%)
NativeDowncallVirtualNormal6 42.268 45.726 (+8.182%)
NativeDowncallVirtualNormalRefs6 41.355 44.687 (+8.057%)
(The NativeDowncallStaticNormalRefs6 result for x86-64 is
a bit inflated because recent results jump between ~38.5
and ~40.5. If we take the latter as the baseline, the
improvements is only around 10.3%.)
linux-armv7 before after
NativeDowncallStaticNormal 10.659 14.620 (+37.16%)
NativeDowncallStaticNormal6 9.8377 13.120 (+33.36%)
NativeDowncallStaticNormalRefs6 8.8714 11.454 (+29.11%)
NativeDowncallVirtualNormal 10.511 14.349 (+36.51%)
NativeDowncallVirtualNormal6 9.9701 13.347 (+33.87%)
NativeDowncallVirtualNormalRefs6 8.9241 11.454 (+28.35%)
linux-armv8 before after
NativeDowncallStaticNormal 10.608 16.329 (+53.93%)
NativeDowncallStaticNormal6 10.179 15.347 (+50.76%)
NativeDowncallStaticNormalRefs6 9.2457 13.705 (+48.23%)
NativeDowncallVirtualNormal 9.9850 14.903 (+49.25%)
NativeDowncallVirtualNormal6 9.9206 14.757 (+48.75%)
NativeDowncallVirtualNormalRefs6 8.8235 12.789 (+44.94%)
Test: m test-art-host-gtest
Test: testrunner.py --host --optimizing
Test: run-gtests.sh
Test: testrunner.py --target --optimizing
Bug: 172332525
Change-Id: Ie144bc4f7f82be95790ea7d3123b81a3b6bfa603
|
|
And add a regression test. This was broken by
https://android-review.googlesource.com/1898923
but it was not caught by any direct tests.
Test: Additional test in JniCompilerTest.
Test: m test-art-host-gtest
Test: testrunner.py --host --optimizing
Bug: 172332525
Bug: 208831945
Change-Id: I41d4999bbf43f8e58c88b87da47be6f7409d9ce1
|
|
Golem results for art-opt-cc (higher is better):
linux-ia32 before after
NativeDowncallStaticNormal 35.306 47.382 (+34.20%)
NativeDowncallStaticNormal6 32.951 42.247 (+28.21%)
NativeDowncallStaticNormalRefs6 17.866 41.355 (+131.5%)
NativeDowncallVirtualNormal 35.341 46.836 (+32.53%)
NativeDowncallVirtualNormal6 32.403 41.791 (+28.97%)
NativeDowncallVirtualNormalRefs6 32.131 40.500 (+26.05%)
linux-x64 before after
NativeDowncallStaticNormal 33.350 43.716 (+31.08%)
NativeDowncallStaticNormal6 31.096 43.176 (+38.85%)
NativeDowncallStaticNormalRefs6 30.617 38.500 (+25.75%)
NativeDowncallVirtualNormal 33.234 43.672 (+32.41%)
NativeDowncallVirtualNormal6 30.617 42.247 (+37.98%)
NativeDowncallVirtualNormalRefs6 32.131 42.701 (+32.90%)
linux-armv7 before after
NativeDowncallStaticNormal 7.8701 9.9651 (+26.62%)
NativeDowncallStaticNormal6 7.4147 8.9463 (+20.66%)
NativeDowncallStaticNormalRefs6 6.8830 8.3868 (+21.85%)
NativeDowncallVirtualNormal 7.8316 9.8377 (+25.61%)
NativeDowncallVirtualNormal6 7.4147 9.3596 (+26.23%)
NativeDowncallVirtualNormalRefs6 6.6794 8.4325 (+26.25%)
linux-armv8 before after
NativeDowncallStaticNormal 7.6372 9.8571 (+29.07%)
NativeDowncallStaticNormal6 7.4147 9.4905 (+28.00%)
NativeDowncallStaticNormalRefs6 6.8527 8.6705 (+26.53%)
NativeDowncallVirtualNormal 7.4147 9.3183 (+25.67%)
NativeDowncallVirtualNormal6 7.0755 9.2593 (+30.86%)
NativeDowncallVirtualNormalRefs6 6.5604 8.2967 (+26.47%)
Note that NativeDowncallStaticNormalRefs6 on x86 has been
jumping like crazy since
https://android-review.googlesource.com/1905055
between ~17.6 and ~32.4 for completely unrelated changes,
so if we take the 32.4 as a baseline, the improvement is
only ~27.6% in line with the other x86 benchmarks.
Test: m test-art-host-gtest
Test: testrunner.py --host --optimizing
Test: run-gtests.sh
Test: testrunner.py --target --optimizing
Bug: 172332525
Change-Id: I771a4765bd3a7c4e58b94be4155515241ea6fa3c
|
|
Implement implicit suspend checks in compiled managed code.
Use a single instruction `ldr x21, [x21, #0]` for the check
where `x21` points to a field in `Thread` that points to
itself until we request a checkpoint or suspension and set
it to null. After the null becomes visible to a running
thread, it requires two loads to get a segmentation fault
that is intercepted and redirected to a suspend check.
This involves a trade-off between the speed of a single
suspend check (a single LDR is faster than LDR+TST+BEQ/BNE)
and time to suspend where we now need to wait for two LDRs
and incur fault handling overhead. The time to suspend was
previously measured to be acceptable with the long tail
being comparable to the explicit suspend check.
Golem results for art-opt-cc (higher is better):
linux-armv8 before after
Jacobi 597.49 637.92 (+6.766%) [1.3 noise]
Towers 934.00 991.00 (+6.103%) [1.4 noise]
QuicksortTest 5108.82 5622.46 (+10.05%) [1.6 noise]
StringPoolBench 8353.00 9806.00 (+17.39%) [4.4 noise]
LongInductionBench 1.0468 1.5100 (+44.26%) [0.4 noise]
IntInductionBench 1.1710 1.7715 (+51.28%) [0.4 noise]
(These are four benchmarks with highest "significance" and
two with highest improvement as reported by Golem.)
It is also interesting to compare this with a revert of
https://android-review.googlesource.com/1905055
which was the last change dealing with suspend checks and
which regressed these benchmarks.
Golem results for art-opt-cc (higher is better):
linux-armv8 revert after
Jacobi 616.36 637.92 (+3.497%) [0.7 noise]
Towers 943.00 991.00 (+5.090%) [1.2 noise]
QuicksortTest 5186.83 5622.46 (+8.399%) [1.4 noise]
StringPoolBench 8992.00 9806.00 (+9.052%) [2.4 noise]
LongInductionBench 1.1895 1.5100 (+26.94%) [0.3 noise]
IntInductionBench 1.3210 1.7715 (+34.10%) [0.3 noise]
Prebuilt sizes for aosp_blueline-userdebug:
- before:
arm64/boot*.oat: 16994120
oat/arm64/services.odex: 45848752
- revert https://android-review.googlesource.com/1905055 :
arm64/boot*.oat: 16870672 (-121KiB)
oat/arm64/services.odex: 45577248 (-265KiB)
- after:
arm64/boot*.oat: 16575552 (-409KiB; -288KiB v. revert)
oat/arm64/services.odex: 44877064 (-949KiB; -684KiB v. revert)
Test: testrunner.py --target --optimizing --jit --interpreter --64
Bug: 38383823
Change-Id: I1827689a3fb7f3c38310b87c80c7724bd7364a66
|
|
Move JNI entrypoints to `jni_entrypoints_<arch>.S` and
shared helper macros to `asm_support_<arch>.S`. Introduce
some new macros to reduce code duplication. Fix x86-64
using ESP in the JNI lock slow path.
Rename JNI lock/unlock and read barrier entrypoints to pull
the "jni" to the front and drop "quick" from their names.
Test: m test-art-host-gtest
Test: testrunner.py --host --optimizing
Test: run-gtests.sh
Test: testrunner.py --target --optimizing
Bug: 172332525
Change-Id: I20d059b07b308283db6c4e36a508480d91ad07fc
|
|
This reverts commit 02e0eb7eef35b03ae9eed60f02c889a6be400de9.
Reason for revert: Fixed the arm64 UNLOCK_OBJECT_FAST_PATH
macro to use the correct label for one branch to slow path.
Change-Id: I311687e877c54229af1613db2928e47b3ef0b6f2
Test: m test-art-host-gtest
Test: testrunner.py --host --optimizing
Test: run-gtests.sh
Test: testrunner.py --target --optimizing
Bug: 172332525
|
|
This reverts commit c17656bcf477e57d59ff051037c96994fd0ac8f2.
Reason for revert: Broke tests.
At least the arm64 macro UNLOCK_OBJECT_FAST_PATH uses
an incorrect label for one branch to slow path.
Bug: 172332525
Bug: 207408813
Change-Id: I6764dcfcba3b3d780fc13a66d6e676a3e3946a0f
|
|
Lock and unlock in dedicated entrypoints instead of the
`JniMethodStart*()` and `JniMethodEnd*()` entrypoints.
Update x86 and x86-64 lock/unlock entrypoints to use the
same checks as arm and arm64.
Test: m test-art-host-gtest
Test: testrunner.py --host --optimizing
Test: run-gtests.sh
Test: testrunner.py --target --optimizing
Bug: 172332525
Change-Id: I82b5af211aa22479f8b0eec7f3a50bc92ec87eca
|
|
Spill outgoing stack arguments directly to their stack slots
(except for `this` on x86) and convert such references to
`jobject` while spilling. Use the `MoveArguments()` call for
both argument spilling and loading regsister arguments to
let the assembler use multi-register stores.
Improve arm64 JNI assembler to use LDP/STP in the relevant
situations when spilling and loading registers.
Fix arm JNI assembler that called `CreateJObject()` with
a bogus input register in one case.
Golem results for art-opt-cc (higher is better):
linux-ia32 before after
NativeDowncallStaticNormal6 25.074 25.578 (+2.011%)
NativeDowncallStaticNormalRefs6 25.248 25.248 (0%)
NativeDowncallVirtualNormal6 24.913 25.248 (+1.344%)
NativeDowncallVirtualNormalRefs6 25.074 25.086 (+0.482%)
linux-x64 before after
NativeDowncallStaticNormal6 27.000 26.987 (-0.0500%)
NativeDowncallStaticNormalRefs6 25.411 25.411 (0%)
NativeDowncallVirtualNormal6 25.248 25.086 (-0.6395%)
NativeDowncallVirtualNormalRefs6 25.086 25.074 (-0.0492%)
linux-armv7 before after
NativeDowncallStaticNormal6 5.9259 6.0663 (+2.368%)
NativeDowncallStaticNormalRefs6 5.6232 5.7061 (+1.474%)
NativeDowncallVirtualNormal6 5.3659 5.4536 (+1.636%)
NativeDowncallVirtualNormalRefs6 5.0879 5.1595 (+1.407%)
linux-armv8 before after
NativeDowncallStaticNormal6 6.0663 6.2651 (+3.277%)
NativeDowncallStaticNormalRefs6 5.7279 5.8824 (+2.696%)
NativeDowncallVirtualNormal6 5.9494 6.0663 (+1.964%)
NativeDowncallVirtualNormalRefs6 5.5581 5.6630 (+1.888%)
(The x86 and x86-64 differences seem to be lost in noise.)
Test: m test-art-host-gtest
Test: testrunner.py --host --optimizing --jit
Test: run-gtests.sh
Test: testrunner.py --target --optimizing --jit
Bug: 172332525
Change-Id: Iaba8244c44d410bb1a4e31f90e4387ee5cc51bec
|
|
Golem results for art-opt-cc (higher is better):
linux-ia32 before after
NativeDowncallStaticFast 222.00 222.17 (+0.0751%)
NativeDowncallStaticFast6 139.86 161.00 (+15.11%)
NativeDowncallStaticFastRefs6 131.00 137.86 (+5.238%)
NativeDowncallVirtualFast 211.79 217.17 (+2.543%)
NativeDowncallVirtualFast6 137.36 150.55 (+9.599%)
NativeDowncallVirtualFastRefs6 131.50 132.60 (+0.8382%)
linux-x64 before after
NativeDowncallStaticFast 173.15 173.24 (+0.0499%)
NativeDowncallStaticFast6 135.50 157.61 (+16.31%)
NativeDowncallStaticFastRefs6 127.06 134.87 (+6.147%)
NativeDowncallVirtualFast 163.67 165.83 (+1.321%)
NativeDowncallVirtualFast6 128.18 147.35 (+14.96%)
NativeDowncallVirtualFastRefs6 123.44 130.74 (+5.914%)
linux-armv7 before after
NativeDowncallStaticFast 21.622 21.622 (0%)
NativeDowncallStaticFast6 17.250 18.719 (+8.518%)
NativeDowncallStaticFastRefs6 14.757 15.663 (+6.145%)
NativeDowncallVirtualFast 21.027 21.319 (+1.388%)
NativeDowncallVirtualFast6 17.439 18.953 (+8.680%)
NativeDowncallVirtualFastRefs6 14.764 15.992 (+8.319%)
linux-armv8 before after
NativeDowncallStaticFast 23.244 23.610 (+1.575%)
NativeDowncallStaticFast6 18.719 21.622 (+15.50%)
NativeDowncallStaticFastRefs6 14.757 18.491 (+20.89%)
NativeDowncallVirtualFast 20.197 21.319 (+5.554%)
NativeDowncallVirtualFast6 18.272 21.027 (+15.08%)
NativeDowncallVirtualFastRefs6 13.951 16.865 (+20.89%)
(The arm64 NativeDowncallVirtualFast reference value is very
low, resulting in an unexpected +5.554% improvement. As the
previous results seem to jump between 20.197 and 20.741,
the actual improvement is probably just around 2.5%.)
Test: m test-art-host-gtest
Test: testrunner.py --host --optimizing
Test: run-gtests.sh
Test: testrunner.py --target --optimizing
Bug: 172332525
Change-Id: I2b596414458b48a758826eafc223529e9f2fe059
|
|
Preserve all argument registers in the slow path to prepare
for moving arguments in registers for @FastNative. Move the
read barrier check earlier as it logically belongs to the
transition frame creation. For Baker read barriers, add a
mark bit check with fast return to the main path.
Test: m test-art-host-gtest
Test: testrunner.py --host --optimizing
Test: run-gtests.sh
Test: testrunner.py --target --optimizing
Bug: 172332525
Change-Id: I50bbc0bc9d54577281e7667aafebb4a53a539af1
|
|
This reverts commit 2ca0900e98d826644960eefeb8a21c84850c9e04.
Reason for revert: Fixed instrumentation for suspend check
from JNI stub, added a commented-out DCHECK() and a test.
The commented-out DCHECK() was correctly catching the bug
with the original submission but it also exposed deeper
issues with the instrumentation framework, so we cannot
fully enable it - bug 204766614 has been filed for this.
Original message:
Inline suspend check from `GoToRunnableFast()` to JNI stubs.
The only remaining code in `JniMethodFast{Start,End}()` is a
debug mode check that the method is @FastNative, so remove
the call altogether as we prefer better performance over the
debug mode check. Replace `JniMethodFastEndWithReference()`
with a simple `JniDecodeReferenceResult()`.
Golem results for art-opt-cc (higher is better):
linux-ia32 before after
NativeDowncallStaticFast 149.00 226.77 (+52.20%)
NativeDowncallStaticFast6 107.39 140.29 (+30.63%)
NativeDowncallStaticFastRefs6 104.50 130.54 (+24.92%)
NativeDowncallVirtualFast 147.28 207.09 (+40.61%)
NativeDowncallVirtualFast6 106.39 136.93 (+28.70%)
NativeDowncallVirtualFastRefs6 104.50 130.54 (+24.92%)
linux-x64 before after
NativeDowncallStaticFast 133.10 173.50 (+30.35%)
NativeDowncallStaticFast6 109.12 135.73 (+24.39%)
NativeDowncallStaticFastRefs6 105.29 127.18 (+20.79%)
NativeDowncallVirtualFast 127.74 167.66 (+31.25%)
NativeDowncallVirtualFast6 106.39 128.12 (+20.42%)
NativeDowncallVirtualFastRefs6 105.29 127.18 (+20.79%)
linux-armv7 before after
NativeDowncallStaticFast 18.058 21.622 (+19.74%)
NativeDowncallStaticFast6 14.903 17.057 (+14.45%)
NativeDowncallStaticFastRefs6 13.006 14.620 (+12.41%)
NativeDowncallVirtualFast 17.848 21.027 (+17.81%)
NativeDowncallVirtualFast6 15.196 17.439 (+14.76%)
NativeDowncallVirtualFastRefs6 12.897 14.764 (+14.48%)
linux-armv8 before after
NativeDowncallStaticFast 19.183 23.610 (+23.08%)
NativeDowncallStaticFast6 16.161 19.183 (+18.71%)
NativeDowncallStaticFastRefs6 13.235 15.041 (+13.64%)
NativeDowncallVirtualFast 17.839 20.741 (+16.26%)
NativeDowncallVirtualFast6 15.500 18.272 (+17.88%)
NativeDowncallVirtualFastRefs6 12.481 14.209 (+13.84%)
Test: m test-art-host-gtest
Test: testrunner.py --host --optimizing
Test: run-gtests.sh
Test: testrunner.py --target --optimizing
Test: testrunner.py --host --jit --no-image
Test: testrunner.py --host --optimizing --debuggable -t 2005
Bug: 172332525
Bug: 204766614
Change-Id: I9cc7583fc11c457a53fe2d1a24a8befc0f36410d
|