arm: Implement VarHandle CAS intrinsics.
Using benchmarks provided by
https://android-review.googlesource.com/1420959
on blueline little cores with fixed frequency 1420800:
before after
CompareAndSetStaticFieldInt 26.452 0.031
CompareAndSetStaticFieldString 31.672 0.037
CompareAndSetFieldInt 29.569 0.033
CompareAndSetFieldString 34.095 0.042
WeakCompareAndSetStaticFieldInt 26.470 0.031
WeakCompareAndSetStaticFieldString 31.604 0.038
WeakCompareAndSetFieldInt 29.619 0.033
WeakCompareAndSetFieldString 34.058 0.040
WeakCompareAndSetPlainStaticFieldInt 26.508 0.026
WeakCompareAndSetPlainStaticFieldString 31.675 0.031
WeakCompareAndSetPlainFieldInt 29.635 0.028
WeakCompareAndSetPlainFieldString 34.116 0.034
WeakCompareAndSetAcquireStaticFieldInt 26.512 0.030
WeakCompareAndSetAcquireStaticFieldString 31.661 0.035
WeakCompareAndSetAcquireFieldInt 29.661 0.032
WeakCompareAndSetAcquireFieldString 34.120 0.038
WeakCompareAndSetReleaseStaticFieldInt 26.566 0.027
WeakCompareAndSetReleaseStaticFieldString 31.659 0.034
WeakCompareAndSetReleaseFieldInt 29.676 0.029
WeakCompareAndSetReleaseFieldString 34.204 0.037
CompareAndExchangeStaticFieldInt 25.550 0.031
CompareAndExchangeStaticFieldString 31.219 0.039
CompareAndExchangeFieldInt 28.923 0.032
CompareAndExchangeFieldString 33.622 0.040
CompareAndExchangeAcquireStaticFieldInt 25.559 0.029
CompareAndExchangeAcquireStaticFieldString 31.177 0.037
CompareAndExchangeAcquireFieldInt 28.807 0.031
CompareAndExchangeAcquireFieldString 33.524 0.038
CompareAndExchangeReleaseStaticFieldInt 25.481 0.027
CompareAndExchangeReleaseStaticFieldString 31.132 0.036
CompareAndExchangeReleaseFieldInt 28.825 0.029
CompareAndExchangeReleaseFieldString 33.511 0.038
Oddly, this rewrite makes the Unsafe CAS benchmarks regress
a bit on this configuration. However, experiments show that
adding useless CLZ+LSR operating on a temporary register
(corresponding to the old code's result calculation) would
restore the performance to the old level. We prefer not to
add these useless instructions as the situation is likely
to be reversed on different CPU cores.
Test: Covered by existing tests.
Test: testrunner.py --target --32 --optimizing
Test: Repeat with ART_USE_READ_BARRIER=false ART_HEAP_POISONING=true.
Test: Repeat with ART_READ_BARRIER_TYPE=TABLELOOKUP.
Test: run-gtests.sh
Bug: 71781600
Change-Id: I591009d7494533cdf60a47be2f8826144e059ff5
6 files changed