Age | Commit message (Collapse) | Author |
|
Move `SchedulingLatencyVisitor{ARM,ARM64}` to .cc files.
Test: m test-art-host-gtest
Test: testrunner.py --host --optimizing
Test: run-gtests.sh
Test: testrunner.py --target --optimizing
Change-Id: I15cb1a4cbef00a328fec947189412c502bf80f46
|
|
Add a comment and remove dead code.
Test: m test-art-host-gtest
Test: testrunner.py --host --optimizing
Change-Id: Ie14a1bd9633f322be7cb4bb312a1232df6697cbd
|
|
Test: m test-art-host-gtest
Test: testrunner.py --host --optimizing
Bug: 181943478
Change-Id: I1fc3234243315ee1133d5a12c52d7f42ea8273bc
|
|
It has been disabled for a while and it has bit rotted
Bug: 298176183
Test: art/test/testrunner/testrunner.py --host --64 -b --optimizing
Test: m test-art-host-gtest-art_compiler_tests64
Change-Id: I4fcd8b3d18a3388e078b5cb3c340b2e270aefef7
|
|
After the old implementation was renamed in
https://android-review.googlesource.com/2526708 ,
we introduce a new function with the old name but new
behavior, just `DCHECK()`-ing the instruction kind before
casting down the pointer. We change appropriate calls from
`As##type##OrNull()` to `As##type()` to avoid unncessary
run-time checks and reduce the size of libart-compiler.so.
Test: m test-art-host-gtest
Test: testrunner.py --host --optimizing
Test: run-gtests.sh
Test: testrunner.py --target --optimizing
Bug: 181943478
Change-Id: I025681612a77ca2157fed4886ca47f2053975d4e
|
|
The null type check in the current implementation of
`HInstruction::As##type()` often cannot be optimized away
by clang++. It is therefore beneficial to have two functions
HInstruction::As##type()
HInstruction::As##type##OrNull()
where the first function never returns null but the second
one can return null. The additional text "OrNull" shall also
flag the possibility of yielding null to the developer which
may help avoid bugs similar to what we have seen previously.
This requires renaming the existing function that can return
null and introducing new function that cannot. However,
defining the new function `HInstruction::As##type()` in the
same change as renaming the old one would risk introducing
bugs by missing a rename. Therefore we simply rename the old
function here and the new function shall be introduced in a
separate change with all behavioral changes being explicit.
Test: m test-art-host-gtest
Test: testrunner.py --host --optimizing
Test: buildbot-build.sh --target
Bug: 181943478
Change-Id: I4defd85038e28fe3506903ba3f33f723682b3298
|
|
This reverts commit 0a51605ddd81635135463dab08b6f7c21b58ffb0.
Reason for revert: Reland after some of the required work
was merged in other CLs.
Also address a TODO from the original CL to mark required
symbols with EXPORT in `intrinsic_objects.h`.
Also mark symbols in new files as HIDDEN.
Bug: 186902856
Test: m test-art-host-gtest
Test: testrunner.py --host --optimizing
Change-Id: I936d448983928af23614ca82c2d0bf9a645e2c52
|
|
LSA will run in graphs with acquire loads (i.e. monitor enter and
volatile load) and release stores (i.e. monitor exit and volatile
stores).
Helps both LSE and the Scheduler, and brings code size and memory use
reductions. For example, ~40KB (~0.1%) reduction in memory use when
compiling android framework in armv8.
Code size gains (locally run on Pixel 5 w/ AOSP):
Android Google Search App (AGSA): 209KB
System server: 44KB
System UI: 20KB
which is ~0.1% for each compile.
Bug: 227283233
Test: art/test/testrunner/testrunner.py --host --64 --optimizing -b
Change-Id: I9ac79cf2324348414186f95e531c98b4215b28ea
|
|
We can generalize HNativeDebugInfo to be used as a Nop (i.e. no
instructions are generated), and give it the option of having an
environment to keep the current HNativeDebugInfo logic working.
Test: art/test/testrunner/testrunner.py --host --64 --optimizing -b
Change-Id: I06b3a36e8b124bcda858d2c9cd8ff0ab21caea36
|
|
Follow-up to aosp/1988868 in which we added the (D)CHECK_IMPLIES
macro. This CL uses it on compiler/ occurrences found by a regex.
Test: art/test/testrunner/testrunner.py --host --64 --optimizing -b
Change-Id: If63aed969bfb8b31d6fbbcb3bca2b04314c894b7
|
|
Use the correct target for predicated get.
Also remove an always-false condition from LSE.
Test: m
Bug: 188188275
Bug: 188847019
Change-Id: I731e181c8c0d812120dc4fad0c011158053fa7a8
|
|
This reverts commit 791df7a161ecfa28eb69862a4bc285282463b960.
This unreverts commit fc1ce4e8be0d977e3d41699f5ec746d68f63c024.
This unreverts commit b8686ce4c93eba7192ed7ef89e7ffd9f3aa6cd07.
We incorrectly failed to include PredicatedInstanceFieldGet in a few
conditions, including a DCHECK. This caused tests to fail under the
read-barrier-table-lookup configuration.
Reason for revert: Fixed 2 incorrect checks
Bug: 67037140
Test: ./art/test/testrunner/run_build_test_target.py -j70 art-gtest-read-barrier-table-lookup
Change-Id: I32b01b29fb32077fb5074e7c77a0226bd1fcaab4
|
|
This reverts commit fc1ce4e8be0d977e3d41699f5ec746d68f63c024.
Bug: 67037140
Reason for revert: Fails read-barrier-table-lookup tests.
Change-Id: I373867c728789bc14a4370b93a045481167d5f76
|
|
This reverts commit 47ac53100303e7e864b7f6d65f17b23088ccf1d6.
There was a bug in LSE where we would incorrectly record the
shadow$_monitor_ field as not having a default initial value. This
caused partial LSE to be unable to compile the Object.identityHashCode
function, causing crashes. This issue was fixed in a parent CL. Also
updated all Offsets in LSE_test to be outside of the object header
regardless of configuration.
Test: ./test.py --host
Bug: 67037140
Reason for revert: Fixed issue with shadow$_monitor_ field and offsets
Change-Id: I4fb2afff4d410da818db38ed833927dfc0f6be33
|
|
This reverts commit b8686ce4c93eba7192ed7ef89e7ffd9f3aa6cd07.
Bug: 67037140
Reason for revert: Fails a few tests.
Change-Id: Icf0635bffbfbba93bf0a5b854a9582c418198136
|
|
Add partial load-store elimination to the LSE pass. Partial LSE will
move object allocations which only escape along certain execution
paths closer to the escape point and allow more values to be
eliminated. It does this by creating new predicated load and store
instructions that are used when an object has only escaped some of the
time. In cases where the object has not escaped a default value will
be used.
Test: ./test.py --host
Test: ./test.py --target
Bug: 67037140
Change-Id: Idde67eb59ec90de79747cde17b552eec05b58497
|
|
We incorrectly handled merging unknowns in some situations.
Specifically in cases where we are unable to materialize loop-phis we
could end up with PureUnknowns which could end up hiding stores that
need to be kept.
In an unrelated issue we were incorrectly considering some values as
escapes when live at the point of an invoke. Since
SearchPhiPlaceholdersForKeptStores used a more precise notion of
escapes we could end up removing stores without being able to replace
the values.
This reverts commit 2316b3a0779f3721a78681f5c70ed6624ecaebef.
This unreverts commit b6837f0350ff66c13582b0e94178dd5ca283ff0a
This reverts commit fe270426c8a2a69a8f669339e83b86fbf40e25a1.
This unreverts commit bb6cda60e4418c0ab557ea4090e046bed8206763.
Bug: 67037140
Bug: 173120044
Reason for revert: Fixed issue causing incorrect store elimination
Test: ./test.py --host
Test: Boot cuttlefish
atest FrameworksServicesTests:com.android.server.job.BackgroundRestrictionsTest#testPowerWhiteList
Change-Id: I2ebae9ccfaf5169d551c5019b547589d0fce1dc9
|
|
This reverts commit b6837f0350ff66c13582b0e94178dd5ca283ff0a
This unreverts commit fe270426c8a2a69a8f669339e83b86fbf40e25a1.
This rereverts commit bb6cda60e4418c0ab557ea4090e046bed8206763.
Bug: 67037140
Bug: 173120044
Reason for revert: Git-blame seems to point to the CL as cause of
b/173120044. Revert during investigation.
Change-Id: I46f557ce79c15f07f4e77aacded1926b192754c3
|
|
A ScopedArenaAllocator in a single test was accidentally loaded using
operator new which is not supported. This caused a memory leak.
This reverts commit fe270426c8a2a69a8f669339e83b86fbf40e25a1.
This unreverts commit bb6cda60e4418c0ab557ea4090e046bed8206763.
Bug: 67037140
Reason for revert: Fixed memory leak in
LoadStoreAnalysisTest.PartialEscape test case
Test: SANITIZE_HOST=address ASAN_OPTIONS=detect_leaks=0 m test-art-host-gtest-dependencies
Run art_compiler_tests
Change-Id: I34fa2079df946ae54b8c91fa771a44d56438a719
|
|
This reverts commit bb6cda60e4418c0ab557ea4090e046bed8206763.
Bug: 67037140
Reason for revert: memory leak detected in the test.
Change-Id: I81cc2f61494e96964d8be40389eddcd7c66c9266
|
|
This is the first piece of partial LSE for art. This CL adds analysis
tools needed to implement partial LSE. More immediately, it improves
LSE so that it will remove stores that are provably non-observable
based on the location they occur. For example:
```
Foo o = new Foo();
if (xyz) {
check(foo);
foo.x++;
} else {
foo.x = 12;
}
return foo.x;
```
The store of 12 can be removed because the only escape in this method
is unreachable and was not executed by the point we reach the store.
The main purpose of this CL is to add the analysis tools needed to
implement partial Load-Store elimination. Namely it includes tracking
of which blocks are escaping and the groups of blocks that we cannot
remove allocations from.
The actual impact of this change is incredibly minor, being triggered
only once in a AOSP code. go/lem shows only minor effects to
compile-time and no effect on the compiled code. See
go/lem-allight-partial-lse-2 for numbers. Compile time shows an
average of 1.4% regression (max regression is 7% with 0.2 noise).
This CL adds a new 'reachability' concept to the HGraph. If this has
been calculated it allows one to quickly query whether there is any
execution path containing two blocks in a given order. This is used to
define a notion of sections of graph from which the escape of some
allocation is inevitable.
Test: art_compiler_tests
Test: treehugger
Bug: 67037140
Change-Id: I0edc8d6b73f7dd329cb1ea7923080a0abe913ea6
|
|
Make LSA a helper class, not an optimization pass. Move all
its allocations to ScopedArenaAllocator to reduce the peak
memory usage a little bit.
Test: m test-art-host-gtest
Test: testrunner.py --host --optimizing
Change-Id: I7fc634abe732d22c99005921ffecac5207bcf05f
|
|
When scheduling code like the following:
LOOP:
v2=phi(v0, v1)
use(v2)
v1=...
goto LOOP
the instruction scheduler can move 'v1=...' before 'use(v2)'. This
causes live ranges of v1 and v2 to intersect and results to a MOV
instruction to be created.
The CL fixes this.
Improvements, Pixel3:
Little CPU, arm64
micro/GCCLoops
Example12 14.1%
Example10b 11.0%
Example23 8.1%
Example24 6.6%
Example10a 5.0%
FFT workload 4.7%
Compress workload 1.2%
Little CPU, arm32
micro/GCCLoops
Example23 7.5%
Example24 4.3%
MonteCarlo workload 1.35%
Big CPU, arm32 and arm64
No significant improvements
No significant regressions (> 5%) are found.
Test: test.py --host --optimizing --jit --gtest
Test: test.py --target --optimizing --jit
Test: run-gtests.sh
Change-Id: I1e4282af18f2d51fde5325a0c00a57e8bbc4fbed
|
|
This reverts commit e2727154f25e0db9a5bb92af494d8e47b181dfcf.
Reason for revert: Breaks ASAN tests (ODR violation).
Bug: 142365358
Change-Id: I38103d74a1297256c81d90872b6902ff1e9ef7a4
|
|
Make symbols in compiler/optimizing hidden by a namespace
attribute. The unit intrinsic_objects.{h,cc} is excluded as
it is needed by dex2oat.
As the symbols are no longer exported, gtests are now linked
with the static version of the libartd-compiler library.
libart-compiler.so size:
- before:
arm: 2396152
arm64: 3345280
- after:
arm: 2016176 (-371KiB, -15.9%)
arm64: 2874480 (-460KiB, -14.1%)
Test: m test-art-host-gtest
Test: testrunner.py --host --optimizing --jit
Bug: 142365358
Change-Id: I1fb04a33351f53f00b389a1642e81a68e40912a8
|
|
The CL moves functionality from SchedulingGraph to other classes,
deletes unused code and moves code used for testing to the tests source
file:
1. SchedulingGraph::AddDependency: move checks whether a dependency has
been added to SchedulingNode::Add*Predecessor as it is a SchedulingNode
responsibility to keep a unique set of predecessors.
2. Create SideEffectDependencyAnalysis class. Code doing side effect dependency
analysis is moved from SchedulingGraph into the class.
3. Remove SchedulingGraph::HasImmediate*Dependency methods as there are
SchedulingNode::Has*Dependency methods for such kind of checks.
4. SchedulingGraph::HasImmediate*Dependency(HInstruction,HInstruction) methods
are only used by tests. Their code is moved to a new class TestSchedulingGraph in
the tests source file.
Test: test.py --host --optimizing --jit --gtest
Test: test.py --target --optimizing --jit
Test: run-gtests.sh
Change-Id: Id16eb6e9f8b9706e616dff0ccc1d0353ed968367
|
|
Handles compiler.
Bug: 116054210
Test: WITH_TIDY=1 mmma art
Change-Id: I5cdfe73c31ac39144838a2736146b71de037425e
|
|
Rely on transitive dependencies instead of adding the full
set of side effect dependencies.
Compiling a certain apk, the most memory hungry method has
the scheduler memory allocations in ArenaStack hidden by
the register allocator:
- before:
MEM: used: 155744672, allocated: 168446408, lost: 12036488
Scheduler 155744672
- after:
MEM: used: 5181680, allocated: 7096776, lost: 114752
SsaLiveness 4683440
RegAllocator 314312
RegAllocVldt 183928
The total arena memory used, including the ArenaAllocator
not listed above, goes from 167170024 to 16607032 (-90%).
(Measured with kArenaAllocatorCountAllocations=true,
kArenaAllocatorPreciseTracking=false.)
Test: m test-art-host-gtest
Test: testrunner.py --host --optimizing
Bug: 64312607
Change-Id: I825bcfb490171070c46ad6d1f460785f4e75cfd7
|
|
This reduces the peak memory used for large methods with
multiple blocks to schedule.
Compiling the aosp_taimen-userdebug boot image, the most
memory hungry method BatteryStats.dumpLocked has the
Scheduler memory allocations in ArenaStack hidden by the
register allocator:
- before:
MEM: used: 8300224, allocated: 9175040, lost: 197360
Scheduler 8300224
- after:
MEM: used: 5914296, allocated: 7864320, lost: 78200
SsaLiveness 5532840
RegAllocator 144968
RegAllocVldt 236488
The total arena memory used, including the ArenaAllocator
not listed above, goes from 44333648 to 41950324 (-5.4%).
(Measured with kArenaAllocatorCountAllocations=true,
kArenaAllocatorPreciseTracking=false.)
Also remove one unnecessary -Wframe-larger-than= workaround
and add one workaround for large frame with the above arena
alloc tracking flags.
Test: m test-art-host-gtest
Test: testrunner.py --host
Bug: 34053922
Change-Id: I7fd8d90dcc13b184b1e5bd0bcac072388710a129
|
|
Rationale:
This refactoring introduces data types to heap locations.
This will allow better type disambiguation in the future.
As a first showcase, it already removes rather error-prone
"exceptional" code in LSE dealing with array types on null
values. Furthermore, many LSA specific details started to "leak"
into clients, which is also error-prone. This refactoring moves
such details back into just LSA, where it belongs.
Test: test-art-host,target
Bug: b/77906240
Change-Id: Id327bbe86dde451a942c9c5f9e83054c36241882
|
|
Rationale:
The change adds a return value to Run() in preparation of
conditional pass execution. The value returned by Run() is
best effort, returning false means no optimizations were
applied or no useful information was obtained. I filled
in a few cases with more exact information, others
still just return true. In addition, it integrates inlining
as a regular pass, avoiding the ugly "break" into
optimizations1 and optimziations2.
Bug: b/78171933, b/74026074
Test: test-art-host,target
Change-Id: Ia39c5c83c01dcd79841e4b623917d61c754cf075
|
|
Rationale:
Having explicit MIN/MAX/ABS operations (in contrast
with intrinsics) simplifies recognition and optimization
of these common operations (e.g. constant folding, hoisting,
detection of saturation arithmetic). Furthermore, mapping
conditionals, selectors, intrinsics, etc. (some still TBD)
onto these operations generalizes the way they are optimized
downstream substantially.
Bug: b/65164101
Test: test-art-host,target
Change-Id: I69240683339356e5a012802f179298f0b04c6726
|
|
NOTE: step 1 of 2 for
"Introduce MIN/MAX/ABS as HIR nodes."
Rationale:
Having explicit MIN/MAX/ABS operations (in contrast
with intrinsics) simplifies recognition and optimization
of these common operations (e.g. constant folding, hoisting,
detection of saturation arithmetic). Furthermore, mapping
conditionals, selectors, intrinsics, etc. (some still TBD)
onto these operations generalizes the way they are optimized
downstream substantially.
Bug: b/65164101
Test: test-art-host,target
Change-Id: I9c93987197216158ba02c8aca2385086adedabc4
|
|
Test: test-art-host
Test: test-art-target
Test: load_store_analysis_test
Change-Id: I7d819061ec9ea12f86a926566c3845231fce6e26
|
|
Adding InstructionSet::kLast shall make it easier to encode
the InstructionSet in fewer bits using BitField<>. However,
introducing `kLast` into the `art` namespace is not a good
idea, so we change the InstructionSet to an enum class.
This also uncovered a case of InstructionSet::kNone being
erroneously used instead of vixl32::Condition::None(), so
it's good to remove `kNone` from the `art` namespace.
Test: m test-art-host-gtest
Test: testrunner.py --host --optimizing
Change-Id: I6fa6168dfba4ed6da86d021a69c80224f09997a6
|
|
Memory needed to compile the two most expensive methods for
aosp_angler-userdebug boot image:
BatteryStats.dumpCheckinLocked() : 21.1MiB -> 20.2MiB
BatteryStats.dumpLocked(): 42.0MiB -> 40.3MiB
This is because all the memory previously used by the graph
builder is reused by later passes.
And finish the "arena"->"allocator" renaming; make renamed
allocator pointers that are members of classes const when
appropriate (and make a few more members around them const).
Test: m test-art-host-gtest
Test: testrunner.py --host
Bug: 64312607
Change-Id: Ia50aafc80c05941ae5b96984ba4f31ed4c78255e
|
|
Memory needed to compile the two most expensive methods for
aosp_angler-userdebug boot image:
BatteryStats.dumpCheckinLocked() : 25.1MiB -> 21.1MiB
BatteryStats.dumpLocked(): 49.6MiB -> 42.0MiB
This is because all the memory previously used by Scheduler
is reused by the register allocator; the register allocator
has a higher peak usage of the ArenaStack.
And continue the "arena"->"allocator" renaming.
Test: m test-art-host-gtest
Test: testrunner.py --host
Bug: 64312607
Change-Id: Idfd79a9901552b5147ec0bf591cb38120de86b01
|
|
Passes using local ArenaAllocator were hiding their memory
usage from the allocation counting, making it difficult to
track down where memory was used. Using ScopedArenaAllocator
reveals the memory usage.
This changes the HGraph constructor which requires a lot of
changes in tests. Refactor these tests to limit the amount
of work needed the next time we change that constructor.
Test: m test-art-host-gtest
Test: testrunner.py --host
Test: Build with kArenaAllocatorCountAllocations = true.
Bug: 64312607
Change-Id: I34939e4086b500d6e827ff3ef2211d1a421ac91a
|
|
Replace most uses of the runtime's Primitive in compiler
with a new class DataType. This prepares for introducing
new types, such as Uint8, that the runtime does not need
to know about.
Test: m test-art-host-gtest
Test: testrunner.py --host
Bug: 23964345
Change-Id: Iec2ad82454eec678fffcd8279a9746b90feb9b0c
|
|
|
|
Unresolved field accesses are not scheduled either since it's not know
whether they are volatile or not, and they are already expensive anyway.
Test: 706-checker-scheduler
Change-Id: Ie736542590a2459ee9b597e090fbedd4b527782a
|
|
HeapLocationCollector does alias analysis globally instead of at block
scale. For example doing it locally breaks the pre-existence based alias
analysis. It's also expensive to do it for each basic block.
Test: run-test/gtest on target/host, 662-regression-alias
Bug: 64018485
Change-Id: If001e2961b5a52b50b1bcefd5e4a89d9c25f25b8
|
|
Based on aliasing information from heap location collector,
instruction scheduling can further eliminate side-effect
dependencies between memory accesses to different locations,
and perform better scheduling on memory loads and stores.
Performance improvements of this CL, measured on Cortex-A53:
| benchmarks | ARM64 backend | ARM backend |
|----------------+---------------|-------------|
| algorithm | 0.1 % | 0.1 % |
| benchmarksgame | 0.5 % | 1.3 % |
| caffeinemark | 0.0 % | 0.0 % |
| math | 5.1 % | 5.0 % |
| stanford | 1.1 % | 0.6 % |
| testsimd | 0.4 % | 0.1 % |
Compilation time impact is negligible, because this
heap location load store analysis is only performed
on loop basic blocks that get instruction scheduled.
Test: m test-art-host
Test: m test-art-target
Test: 706-checker-scheduler
Change-Id: I43d7003c09bfab9d3a1814715df666aea9a7360b
|
|
bug: 62855731
Test: test.py
Change-Id: I7904257174ce11a138ca769172dbc2e33e10ef76
|
|
Make actual types more explicit, either by replacing "auto"
with actual type or by assigning std::pair<> elements of
an "auto" variable to typed variables. Avoid binding const
references to temporaries. Avoid copying a container.
Test: m test-art-host-gtest
Change-Id: I1a59f9ba1ee15950cacfc5853bd010c1726de603
|
|
Performance improvements on various benchmarks with this CL:
benchmarks improvements
---------------------------
algorithm 1%
benchmarksgame 2%
caffeinemark 2%
math 3%
stanford 4%
Tested on ARM Cortex-A53 CPU.
The code size impact is negligible.
Test: m test-art-host
Test: m test-art-target
Change-Id: I314c90c09ce27e3d224fc686ef73c7d94a6b5a2c
|
|
This commit adds a new `HInstructionScheduling` pass that performs
basic scheduling on the `HGraph`.
Currently, scheduling is performed at the block level, so no
`HInstruction` ever leaves its block in this pass.
The scheduling process iterates through blocks in the graph. For
blocks that we can and want to schedule:
1) Build a dependency graph for instructions. It includes data
dependencies (inputs/uses), but also environment dependencies and
side-effect dependencies.
2) Schedule the dependency graph. This is a topological sort of the
dependency graph, using heuristics to decide what node to schedule
first when there are multiple candidates. Currently the heuristics
only consider instruction latencies and schedule first the
instructions that are on the critical path.
Test: m test-art-host
Test: m test-art-target
Change-Id: Iec103177d4f059666d7c9626e5770531fbc5ccdc
|