Age | Commit message (Collapse) | Author |
|
... to `HControlFlowSimplifier` because "control flow" is
the correct technical term.
Test: m test-art-host-gtest
Test: testrunner.py --host --optimizing
Change-Id: I2607ac699fa33c3e7ca7f54364e1e8497148412b
|
|
... to `HCodeFlowSimplifier` in preparation for adding
more optimizations to this pass.
Test: m test-art-host-gtest
Test: testrunner.py --host --optimizing
Change-Id: Icb05c3455d93a7b939f82ced9b08165e533bb21a
|
|
After GVN, we deduplicate instructions and we might have more
information regarding the nullability of the SSA variable. We can
run RTP to insert the BoundType instructions, which will later be
used by InstructionSimplifier to remove the NullCheck. RTP will be
run conditionally, only if GVN did at least one replacement.
It improves ~0.1% of odex size for speed compiled apps.
Bug: 369206455
Test: art/test/testrunner/testrunner.py --host --64 -b --optimizing
Change-Id: I0a4a104690b3fe9ac4118642ab9e9916dc30a9c5
|
|
Create InstructionSimplifierRiscv64 optimization.
Replace Shl (1|2|3) and Add with Riscv64ShiftAdd IR instruction.
By compiling with dex2oat all the methods of applications below I got:
Facebook: 45 cases
TikTok: 26 cases
YouTube: 19 cases of the pattern.
Test: art/test/testrunner/testrunner.py --target --64 --ndebug --optimizing
Change-Id: I88903450d998983bb2a628942112d7518099c3f5
|
|
This reverts commit 3dccb13f4e92db37a13359e126c5ddc12cb674b5.
Also includes the fix for incrementing hotness that got reverted:
aosp/2906378
Bug: 313040662
Reduces jank on compose view scrolling for 4 iterations:
- For Go Mokey:
- Before: ~698 frames drawn / ~13.87% janky frames
- After: ~937 frames drawn / ~5.52% janky frames
- For Pixel 8 pro:
- Before: ~2440 frames drawn / ~0.90% janky frames
- After: ~2450 frames drawn / ~0.55% janky frames
Reason for revert: Reduce inlining threshold for baseline.
Change-Id: Iee5cd4c3ceb7715caf9299b56551aae6f0259769
|
|
This reverts commit 5eb1fd0dae3832ceee2102613bb08c291daca6f3.
Reason for revert: In aosp/2903248 we implemented a faster way of
doing `ReplaceUsesDominatedBy` which is used by `VisitIf`. The impact
of `VisitIf` is now small enough that running VisitIf in all passes
is faster that the previous implementation running some of the time.
This CLs re-enables the optimization in all constant folding passes
because:
A) Lets this optimization (and others that can use the result) kick in
earlier
B) Run it for callee graphs in the inliner (which has been disabled as
of CL aosp/2543831)
C) Consistency of the ConstantFolding pass, which helps to have a
simpler mental model
Bug: 278626992
Test: art/test/testrunner/testrunner.py --host --64 --optimizing -b
Test: Locally compiled GMS and compared time to compile
Change-Id: I5dc5f591557c8de0bc4d23dbfd0b91b5b7e56ab5
|
|
This reverts commit 1a6b5b318aa69903a74dd10312a77bd8ee7c4cf6.
Reason for revert: asan failure
Change-Id: Ie9da0b04c899d6cb37148e7a3542190e65737787
|
|
This reverts commit c8309515d099992b7cab8f2b8c6db3ed77671ff4.
Bug: 313040662
Reason for revert: remove call to slow path on back edges.
Change-Id: I3fe52295afcb0be4b4062f8d9060adb4abb64375
|
|
This reverts commit 41c5dde40d1c75d36a7f984c8d72ec65fbff3111.
Reason for revert: breaks test.java.util.Arrays.Sorting
Change-Id: I03385c9f1efff4b8e8bd315827dde6ed774bbb52
|
|
And introduce inlined inline caches, which customize an inline cache for
the top-level method being compiled.
Reduces jank on compose view scrolling for 20 seconds:
- For Go Mokey:
- Before: ~525 frames drawn / ~14.64% janky frames
- After: ~891 frames drawn / ~4.74% janky frames
- For Pixel 8 pro:
- Before: ~2443 frames drawn / ~0.91% janky frames
- After: ~2447 frames drawn / ~0.65% janky frames
Bug: 313040662
Test: test.py
Change-Id: Ibaa746c6bd3c665b18ec9cd29cb477cf21023467
|
|
Leave a few `gUseReadBarrier` uses in JNI macro assemblers.
We shall deaal with these later.
Test: m test-art-host-gtest
Test: testrunner.py --host --optimizing
Test: run-gtests.sh
Test: testrunner.py --target --optimizing
Bug: 289805127
Change-Id: I9d2aa245cee4c650129f169a82beda7dc0dd6a35
|
|
And pass integral stack args sign-extended to 64 bits for
direct @CriticalNative calls. Enable direct @CriticalNative
call codegen unconditionally and also enable `HClinitCheck`
codegen and extend the 178-app-image-native-method run-test
to properly test these use cases.
Test: # Edit `run-test` to disable checker, then
testrunner.py --target --64 --ndebug --optimizing
# Ignore 6 pre-existing failures (down from 7).
Bug: 283082089
Change-Id: Ia514c62006c7079b04182cc39e413eb2deb089c1
|
|
Test: Rely on TreeHugger.
Change-Id: I4e3c0ba13d576ef62121d47ebc4965f6667b624f
|
|
It was taking a lot of time for the improvement it got. We can get
99.99% of the improvement, with only one VisitIf call. This is
roughly 20% of the compile time it used to take.
Bug: 278626992
Fixes: 278626992
Test: art/test/testrunner/testrunner.py --host --64 --optimizing -b
Change-Id: Icc00c9ad6a9eb4f4fd18677bcb65655cbbe9d027
|
|
Having a long chain of loop header phis hangs up the compiler.
Note that we can still compile the method if we skip the
InductionVarAnalysis phase.
Bug: 246249941
Fixes: 246249941
Bug: 32545907
Test: art/test/testrunner/testrunner.py --host --64 --optimizing -b
Test: dex2oat compile the app in 246249941
Change-Id: Id2d14b1c4d787f98d656055274c7cfcae6491686
|
|
We can eliminate redundant write barriers as we don't need several
for the same receiver. For example:
```
MyObject o;
o.inner_obj = io;
o.inner_obj2 = io2;
o.inner_obj3 = io3;
```
We can keep the write barrier for `inner_obj` and remove the other
two.
Note that we cannot perform this optimization across
invokes, suspend check, or instructions that can throw.
Local improvements (pixel 5, speed compile):
* System server: -280KB (-0.56%)
* SystemUIGoogle: -330KB (-1.16%)
* AGSA: -3876KB (-1.19%)
Bug: 260843353
Fixes: 260843353
Change-Id: Ibf98efbe891ee00e46125853c3e97ae30aa3ff30
|
|
This reverts commit 0a51605ddd81635135463dab08b6f7c21b58ffb0.
Reason for revert: Reland after some of the required work
was merged in other CLs.
Also address a TODO from the original CL to mark required
symbols with EXPORT in `intrinsic_objects.h`.
Also mark symbols in new files as HIDDEN.
Bug: 186902856
Test: m test-art-host-gtest
Test: testrunner.py --host --optimizing
Change-Id: I936d448983928af23614ca82c2d0bf9a645e2c52
|
|
Notable changes:
1) Wiring of the graph now allows for inlinees graph ending in
TryBoundary, or Goto in some special cases.
2) Building a graph with try catch for inlining may add an extra
Goto block.
3) Oat version bump.
4) Reduced kMaximumNumberOfCumulatedDexRegisters from 32 to 20.
Bug: 227283224
Test: art/test/testrunner/testrunner.py --host --64 --optimizing -b
Change-Id: Ic2fd956de24b72d1de29b4cd3d0b2a1ddab231d8
|
|
This reverts commit fa1034c563b44c4f557814c50e2678e14dcd1d13.
Reason for revert: Relanding after float/double fix. In short,
don't deal with floats/doubles since they bring a lot of edge cases e.g.
if (f == 0.0f) {
// f is not guaranteed to be 0.0f, e.g. it could be -0.0f.
}
Bug: 240543764
Change-Id: I400bdab71dba0934e6f1740538fe6e6c0a7bf5fc
|
|
This reverts commit c6b816ceb2b35300c937ef2e7d008598b6afba21.
Reason for revert: Broke libcore test https://ci.chromium.org/ui/p/art/builders/ci/angler-armv7-ndebug/3179/overview
Change-Id: I4f238bd20cc485e49078104e0225c373cac23415
|
|
We have knowledge of the value of some variables at compile time due
to the fact they are used in if clauses. For example:
if (variable == constant) {
// SSA `variable` guaranteed to be equal to constant here.
} else {
// No guarantees can be made here (except for booleans since
// they only have two values).
}
Similarly with `variable != constant`.
We can also apply this to boolean parameters e.g.
void foo (boolean val) {
if (val) {
// `val` guaranteed to be true here.
...
}
...
}
Test: art/test/testrunner/testrunner.py --host --64 --optimizing -b
Change-Id: I55df0252d672870993d06e5ac92f5bba44d902bd
|
|
Create "Phi placeholders" for tracking heap values that can
merge from different values and try to match existing Phis
or create new Phis to replace loads. For Phi placeholders
from loop headers we do not know whether they are fed by
unknown values through back-edges when processing the loop
header, so we delay processing loads that depend on them
until we walked the entire graph. We then try to match them
with existing instructions (when the location is unchanged
in the loop) or Phis or create new Phis if needed. If we
find a loop Phi placeholder fed with unknown value from a
back-edge, we mark the Phi placeholder unreplaceable and
reprocess loads and stores to propagate the unknown value.
This can sometimes allow other loads to be replaced. At the
end we re-calculate the heap values to find stores that can
be eliminated because they write over the same value.
Golem results:
art-opt-cc arm arm64 x86 x86-64
CaffeineFloat +6.7% +3.0% +5.9% +3.8%
KotlinMicroWhen +33.7% +4.8% +1.8% +0.6%
art-opt (more noisy than art-opt-cc)
CaffeineFloat +4.1% +4.4% +7.8% +10.5%
KotlinMicroWhen +33.6% +2.0% +1.8% +1.8%
The MoveLiteralColumn benchmark seems to gain significantly
(up to 22% on art-opt-cc but under 10% on art-opt) but it is
very noisy and the results are therefore unreliable.
Insignificant code size changes for aosp_blueline-userdebug:
- before:
arm boot*.oat: 15303468
arm64 boot*.oat: 18184736
services.odex: 25195944
grep -c pAllocObject boot.arm64.oatdump.txt: 27213
grep -c pAllocArray boot.arm64.oatdump.txt: 3620
- after:
arm boot*.oat: 15299524 (-4KiB, -0.03%)
arm64 boot*.oat: 18176528 (-8KiB, -0.05%)
services.odex: 25191832 (-4KiB, -0.02%)
grep -c pAllocObject boot.arm64.oatdump.txt: 27206 (-7)
grep -c pAllocArray boot.arm64.oatdump.txt: 3615 (-5)
Test: New tests in 530-checker-lse.
Test: m test-art-host-gtest
Test: testrunner.py --host --optimizing
Test: blueline-userdebug boots.
Bug: 77906240
Change-Id: Ia9fe0cd3530f9d3941650dfefc00a7f7fd821994
|
|
If a float or double argument needs to be passed in core
register to a @CriticalNative method due to soft-float
native ABI, insert a fake call to Float.floatToRawIntBits()
or Double.doubleToRawLongBits() to satisfy type checks in
the compiler.
We cannot do that for intrinsics that expect those inputs in
actual FP registers, so we still prevent such intrinsics
from using `kCallCriticalNative`. This should be irrelevant
if an actual intrinsic implementation is emitted. There are
currently two unimplemented intrinsics that are affected by
the carve-out, namely MathRoundDouble and FP16ToHalf, and
four intrinsics implemented only when ARMv8A is supported,
namely MathRint, MathRoundFloat, MathCeil and MathFloor.
Test: testrunner.py --target --32 -t 178-app-image-native-method
Bug: 112189621
Change-Id: Id14ef4f49f8a0e6489f97dc9588c0e6a5c122632
|
|
Test: m
Bug: 161896447
Bug: 161850439
Bug: 161336379
Change-Id: Iabc29fa43b4b5a403699d6bca95e9a2cb8945d77
|
|
A pattern seen in libcore and SPECjvm2008 workloads is a pair of HRem/HDiv
having the same dividend and divisor. The code generator processes
them separately and generates duplicated instructions calculating HDiv.
This CL adds detection of such a pattern to the instruction simplifier.
This optimization affects HInductionVarAnalysis and HLoopOptimization
preventing some loop optimizations. To avoid this the instruction simplifier
has the loop_friendly mode which means not to optimize HRems if they are in a loop.
A microbenchmark run on Pixel 3 shows the following improvements:
| little cores | big cores
arm32 Int32 | +21% | +40%
arm32 Int64 | +46% | +44%
arm64 Int32 | +27% | +14%
arm64 Int64 | +33% | +27%
Test: 411-checker-instruct-simplifier-hrem
Test: test.py --host --optimizing --jit --gtest --interpreter
Test: test.py --target --optimizing --jit --interpreter
Test: run-gtests.sh
Change-Id: I376a1bd299d7fe10acad46771236edd5f85dfe56
|
|
Make LSA a helper class, not an optimization pass. Move all
its allocations to ScopedArenaAllocator to reduce the peak
memory usage a little bit.
Test: m test-art-host-gtest
Test: testrunner.py --host --optimizing
Change-Id: I7fc634abe732d22c99005921ffecac5207bcf05f
|
|
This avoids passing the `VariableSizedHandleScope*` argument
around and eliminates HGraph::inexact_object_rti_ and its
initialization. The latter shall allow running Optimizing
gtests that do not require type information without creating
a Runtime in future. (To be implemented in a separate CL.)
Test: m test-art-host-gtest
Test: testrunner.py --host --optmizing
Test: aosp_taimen-userdebug boots.
Change-Id: I36fe9bc556c6d610d644c8c14cc74c9985a14d64
|
|
ART vectorizer assumes that there is single size of SIMD
register used for the whole program. Make this assumption explicit
and refactor the code.
Note: This is a base for the future introduction of SIMD slots of
size other than 8 or 16 bytes.
Test: test-art-target, test-art-host.
Change-Id: Id699d5e3590ca8c655ecd9f9ed4e63f49e3c4f9c
|
|
Test: aosp_taimen-userdebug boots.
Test: m test-art-host-gtest
Test: testrunner.py --host --optimizing
Bug: 147346243
Change-Id: I97fdc15e568ae3fe390efb1da690343025f84944
|
|
This reverts commit e2727154f25e0db9a5bb92af494d8e47b181dfcf.
Reason for revert: Breaks ASAN tests (ODR violation).
Bug: 142365358
Change-Id: I38103d74a1297256c81d90872b6902ff1e9ef7a4
|
|
Make symbols in compiler/optimizing hidden by a namespace
attribute. The unit intrinsic_objects.{h,cc} is excluded as
it is needed by dex2oat.
As the symbols are no longer exported, gtests are now linked
with the static version of the libartd-compiler library.
libart-compiler.so size:
- before:
arm: 2396152
arm64: 3345280
- after:
arm: 2016176 (-371KiB, -15.9%)
arm64: 2874480 (-460KiB, -14.1%)
Test: m test-art-host-gtest
Test: testrunner.py --host --optimizing --jit
Bug: 142365358
Change-Id: I1fb04a33351f53f00b389a1642e81a68e40912a8
|
|
Handles compiler.
Bug: 116054210
Test: WITH_TIDY=1 mmma art
Change-Id: I5cdfe73c31ac39144838a2736146b71de037425e
|
|
Treat verification results and image classes as mutable
only in CompilerDriver::PreCompile(), and treat them as
immutable during compilation, accessed through the
CompilerOptions. This severs the dependency of the inliner
on the CompilerDriver.
Test: m test-art-host-gtest
Test: testrunner.py --host --optimizing
Change-Id: I594a0213ca6a5003c19b4bd488af98db4358d51d
|
|
This patch performs instruction simplification to
generate instructions andn, blsmsk and blsr on
cpus that have avx2.
Test: test.py --host --64, test-art-host-gtest
Change-Id: Ie41a1b99ac2980f1e9f6a831a7d639bc3e248f0f
Signed-off-by: Shalini Salomi Bodapati <shalini.salomi.bodapati@intel.com>
|
|
Instead just recognize the intrinsic when creating an invoke
instruction.
Also remove some old code related to compiler driver sharpening.
Test: test.py
Change-Id: Iecb668f30e95034970fcf57160ca12092c9c610d
|
|
Make the last sharpening helper (methods) like the other
helpers: being invoked by the instruction builder.
Test: test.py
Change-Id: Ic80a454f9b59b0b4ef7825590b24402500ba851c
|
|
|
|
This reverts commit 61908880e6565acfadbafe93fa64de000014f1a6.
Reason for revert: By failing to round multiply results, it does not follow Java rounding rules.
Change-Id: Ic0ef08691bef266c9f8d91973e596e09ff3307c6
|
|
|
|
This patch adds a new cpu vaiant named kabylake and performs
instruction simplification to generate VectorMulitplyAccumulate.
Test: ./test.py --host --64
Change-Id: Ie6cc882dadf1322dd4d3ae49bfdb600b0c447765
Signed-off-by: Gupta Kumar, Sanjiv <sanjiv.kumar.gupta@intel.com>
|
|
Check for non-PIC boot image as a testing config instead.
Honor the config for HInvokeStaticOrDirect sharpening.
Test: m test-art-host-gtest
Test: testrunner.py --host --optimizing
Change-Id: I3645f4fefe322f1fd64ea88a2b41a35ceccea688
|
|
Removes CompilerDriver dependency from ImageWriter and
several other classes.
Test: m test-art-host-gtest
Test: testrunner.py --host --optimizing
Test: Pixel 2 XL boots.
Test: m test-art-target-gtest
Test: testrunner.py --target --optimizing
Change-Id: I3c5b8ff73732128b9c4fad9405231a216ea72465
|
|
Rationale:
The change introduces actual conditional passes
(dependence on inliner). This ensures more
cases are optimized downstream without
needlessly introducing compile-time.
NOTE:
Some checker tests needed to be rewritten
due to subtle changes in the phase ordering.
No optimizations were harmed in the process,
though.
Bug: b/78171933, b/74026074
Test: test-art-host,target
Change-Id: I335260df780e14ba1f22499ad74d79060c7be44d
|
|
Change constructor to use a reference to a dex file.
Remove duplicated logic for GetCodeItemSize.
Bug: 63756964
Test: test-art-host
Change-Id: I69af8b93abdf6bdfa4454e16db8f4e75883bca46
|
|
Move all the DexFile related source to a common subdirectory dex/ of
runtime.
Bug: 71361973
Test: make -j 50 test-art-host
Change-Id: I59e984ed660b93e0776556308be3d653722f5223
|
|
Make code item fields private and use accessors. Added a hand full of
friend classes to reduce the size of the change.
Changed default to be nullable and removed CreateNullable.
CreateNullable was a bad API since it defaulted to the unsafe, may
add a CreateNonNullable if it's important for performance.
Motivation:
Have a different layout for code items in cdex.
Bug: 63756964
Test: test-art-host-gtest
Test: test/testrunner/testrunner.py --host
Test: art/tools/run-jdwp-tests.sh '--mode=host' '--variant=X32' --debug
Change-Id: I42bc7435e20358682075cb6de52713b595f95bf9
|
|
This helps save memory by avoiding the allocation of
HEnvironment and related objects for AOT references to
boot image strings and classes (kBootImage* load kinds)
and also for JIT references (kJitTableAddress).
Compiling aosp_taimen-userdebug boot image, the most memory
hungry method BatteryStats.dumpLocked() needs
- before:
Used 55105384 bytes of arena memory...
...
UseListNode 10009704
Environment 423248
EnvVRegs 20676560
...
- after:
Used 50559176 bytes of arena memory...
...
UseListNode 8568936
Environment 365680
EnvVRegs 17628704
...
Test: m test-art-host-gtest
Test: testrunner.py --host --optimizing --jit
Bug: 34053922
Change-Id: I68e73a438e6ac8e8908e6fccf53bbeea8a64a077
|
|
Rationale:
Refactors the way we set up optimization passes
in the compiler into a more centralized approach.
The refactoring also found some "holes" in the
existing mechanism (missing string lookup in
the debugging mechanism, or inablity to set
alternative name for optimizations that may repeat).
Bug: 64538565
Test: test-art-host test-art-target
Change-Id: Ie5e0b70f67ac5acc706db91f64612dff0e561f83
|
|
Remove all copies of 'MaybeRecordStat', replacing them with a single
OptimizingCompilerStats::MaybeRecordStat helper.
Change-Id: I83b96b41439dccece3eee2e159b18c95336ea933
|
|
This patch refactors the way GraphChecker is invoked, utilizing the
same scoping mechanism as pass timing and graph visualizer. Therefore,
GraphChecker will now run not just after instances of HOptimization
but after the builders and reg alloc, too.
Change-Id: I8173b98b79afa95e1fcbf3ac9630a873d7f6c1d4
|