Age | Commit message (Collapse) | Author |
|
Move `SchedulingLatencyVisitor{ARM,ARM64}` to .cc files.
Test: m test-art-host-gtest
Test: testrunner.py --host --optimizing
Test: run-gtests.sh
Test: testrunner.py --target --optimizing
Change-Id: I15cb1a4cbef00a328fec947189412c502bf80f46
|
|
This reverts commit 0a51605ddd81635135463dab08b6f7c21b58ffb0.
Reason for revert: Reland after some of the required work
was merged in other CLs.
Also address a TODO from the original CL to mark required
symbols with EXPORT in `intrinsic_objects.h`.
Also mark symbols in new files as HIDDEN.
Bug: 186902856
Test: m test-art-host-gtest
Test: testrunner.py --host --optimizing
Change-Id: I936d448983928af23614ca82c2d0bf9a645e2c52
|
|
When scheduling code like the following:
LOOP:
v2=phi(v0, v1)
use(v2)
v1=...
goto LOOP
the instruction scheduler can move 'v1=...' before 'use(v2)'. This
causes live ranges of v1 and v2 to intersect and results to a MOV
instruction to be created.
The CL fixes this.
Improvements, Pixel3:
Little CPU, arm64
micro/GCCLoops
Example12 14.1%
Example10b 11.0%
Example23 8.1%
Example24 6.6%
Example10a 5.0%
FFT workload 4.7%
Compress workload 1.2%
Little CPU, arm32
micro/GCCLoops
Example23 7.5%
Example24 4.3%
MonteCarlo workload 1.35%
Big CPU, arm32 and arm64
No significant improvements
No significant regressions (> 5%) are found.
Test: test.py --host --optimizing --jit --gtest
Test: test.py --target --optimizing --jit
Test: run-gtests.sh
Change-Id: I1e4282af18f2d51fde5325a0c00a57e8bbc4fbed
|
|
This reverts commit e2727154f25e0db9a5bb92af494d8e47b181dfcf.
Reason for revert: Breaks ASAN tests (ODR violation).
Bug: 142365358
Change-Id: I38103d74a1297256c81d90872b6902ff1e9ef7a4
|
|
Make symbols in compiler/optimizing hidden by a namespace
attribute. The unit intrinsic_objects.{h,cc} is excluded as
it is needed by dex2oat.
As the symbols are no longer exported, gtests are now linked
with the static version of the libartd-compiler library.
libart-compiler.so size:
- before:
arm: 2396152
arm64: 3345280
- after:
arm: 2016176 (-371KiB, -15.9%)
arm64: 2874480 (-460KiB, -14.1%)
Test: m test-art-host-gtest
Test: testrunner.py --host --optimizing --jit
Bug: 142365358
Change-Id: I1fb04a33351f53f00b389a1642e81a68e40912a8
|
|
The CL moves functionality from SchedulingGraph to other classes,
deletes unused code and moves code used for testing to the tests source
file:
1. SchedulingGraph::AddDependency: move checks whether a dependency has
been added to SchedulingNode::Add*Predecessor as it is a SchedulingNode
responsibility to keep a unique set of predecessors.
2. Create SideEffectDependencyAnalysis class. Code doing side effect dependency
analysis is moved from SchedulingGraph into the class.
3. Remove SchedulingGraph::HasImmediate*Dependency methods as there are
SchedulingNode::Has*Dependency methods for such kind of checks.
4. SchedulingGraph::HasImmediate*Dependency(HInstruction,HInstruction) methods
are only used by tests. Their code is moved to a new class TestSchedulingGraph in
the tests source file.
Test: test.py --host --optimizing --jit --gtest
Test: test.py --target --optimizing --jit
Test: run-gtests.sh
Change-Id: Id16eb6e9f8b9706e616dff0ccc1d0353ed968367
|
|
Preparation for moving CompilerDriver and other stuff
from libart-compiler.so to dex2oat.
Test: m test-art-host-gtest
Test: testrunner.py --host --optimizing
Change-Id: Ic221ebca4b8c79dd1549316921ace655f2e3f0fe
|
|
Remove all uses of macros 'FINAL' and 'OVERRIDE' and replace them with
'final' and 'override' specifiers. Remove all definitions of these
macros as well, which were located in these files:
- libartbase/base/macros.h
- test/913-heaps/heaps.cc
- test/ti-agent/ti_macros.h
ART is now using C++14; the 'final' and 'override' specifiers have
been introduced in C++11.
Test: mmma art
Change-Id: I256c7758155a71a2940ef2574925a44076feeebf
|
|
This reduces the peak memory used for large methods with
multiple blocks to schedule.
Compiling the aosp_taimen-userdebug boot image, the most
memory hungry method BatteryStats.dumpLocked has the
Scheduler memory allocations in ArenaStack hidden by the
register allocator:
- before:
MEM: used: 8300224, allocated: 9175040, lost: 197360
Scheduler 8300224
- after:
MEM: used: 5914296, allocated: 7864320, lost: 78200
SsaLiveness 5532840
RegAllocator 144968
RegAllocVldt 236488
The total arena memory used, including the ArenaAllocator
not listed above, goes from 44333648 to 41950324 (-5.4%).
(Measured with kArenaAllocatorCountAllocations=true,
kArenaAllocatorPreciseTracking=false.)
Also remove one unnecessary -Wframe-larger-than= workaround
and add one workaround for large frame with the above arena
alloc tracking flags.
Test: m test-art-host-gtest
Test: testrunner.py --host
Bug: 34053922
Change-Id: I7fd8d90dcc13b184b1e5bd0bcac072388710a129
|
|
Change the default parameters for HashSet<std::string> to
allow passing StringPiece as a key, avoiding an unnecessary
allocation. Use the HashSet<std::string> instead of
std::unordered_set<std::string>. Rename HashSet<> functions
that mirror std::unordered_multiset<> to lower-case.
Fix CompilerDriver::LoadImageClasses() to avoid using
invalidated iterator.
Test: m test-art-host-gtest
Test: testrunner.py --host
Change-Id: I7f8b82ee0b07befc5a0ee1c420b08a2068ad931e
|
|
Rationale:
This refactoring introduces data types to heap locations.
This will allow better type disambiguation in the future.
As a first showcase, it already removes rather error-prone
"exceptional" code in LSE dealing with array types on null
values. Furthermore, many LSA specific details started to "leak"
into clients, which is also error-prone. This refactoring moves
such details back into just LSA, where it belongs.
Test: test-art-host,target
Bug: b/77906240
Change-Id: Id327bbe86dde451a942c9c5f9e83054c36241882
|
|
Rationale:
The change adds a return value to Run() in preparation of
conditional pass execution. The value returned by Run() is
best effort, returning false means no optimizations were
applied or no useful information was obtained. I filled
in a few cases with more exact information, others
still just return true. In addition, it integrates inlining
as a regular pass, avoiding the ugly "break" into
optimizations1 and optimziations2.
Bug: b/78171933, b/74026074
Test: test-art-host,target
Change-Id: Ia39c5c83c01dcd79841e4b623917d61c754cf075
|
|
Treat as scheduling barriers those vector instructions whose live
ranges exceed the vectorized loop boundaries. This is a workaround
for the lack of notion of SIMD register in the compiler; around a
call we have to save/restore all live SIMD&FP registers (only
lower 64 bits of SIMD&FP registers are callee saved) so don't
reorder such vector instructions.
Test: 706-checker-scheduler, test-art-host, test-art-target
Bug: 69667779
Change-Id: I31e57518339d41545a0c519f7299afe381a8286c
|
|
Rationale:
Refactors the way we set up optimization passes
in the compiler into a more centralized approach.
The refactoring also found some "holes" in the
existing mechanism (missing string lookup in
the debugging mechanism, or inablity to set
alternative name for optimizations that may repeat).
Bug: 64538565
Test: test-art-host test-art-target
Change-Id: Ie5e0b70f67ac5acc706db91f64612dff0e561f83
|
|
Cleanup errors from upstream cpplint in preparation
for moving art's cpplint fork to upstream tip-of-tree cpplint.
Test: cd art && mm
Bug: 68951293
Change-Id: I15faed4594cbcb8399850f8bdee39d42c0c5b956
|
|
Memory needed to compile the two most expensive methods for
aosp_angler-userdebug boot image:
BatteryStats.dumpCheckinLocked() : 21.1MiB -> 20.2MiB
BatteryStats.dumpLocked(): 42.0MiB -> 40.3MiB
This is because all the memory previously used by the graph
builder is reused by later passes.
And finish the "arena"->"allocator" renaming; make renamed
allocator pointers that are members of classes const when
appropriate (and make a few more members around them const).
Test: m test-art-host-gtest
Test: testrunner.py --host
Bug: 64312607
Change-Id: Ia50aafc80c05941ae5b96984ba4f31ed4c78255e
|
|
Memory needed to compile the two most expensive methods for
aosp_angler-userdebug boot image:
BatteryStats.dumpCheckinLocked() : 25.1MiB -> 21.1MiB
BatteryStats.dumpLocked(): 49.6MiB -> 42.0MiB
This is because all the memory previously used by Scheduler
is reused by the register allocator; the register allocator
has a higher peak usage of the ArenaStack.
And continue the "arena"->"allocator" renaming.
Test: m test-art-host-gtest
Test: testrunner.py --host
Bug: 64312607
Change-Id: Idfd79a9901552b5147ec0bf591cb38120de86b01
|
|
Passes using local ArenaAllocator were hiding their memory
usage from the allocation counting, making it difficult to
track down where memory was used. Using ScopedArenaAllocator
reveals the memory usage.
This changes the HGraph constructor which requires a lot of
changes in tests. Refactor these tests to limit the amount
of work needed the next time we change that constructor.
Test: m test-art-host-gtest
Test: testrunner.py --host
Test: Build with kArenaAllocatorCountAllocations = true.
Bug: 64312607
Change-Id: I34939e4086b500d6e827ff3ef2211d1a421ac91a
|
|
Let clang-format reorder the header includes.
Derived with:
* .clang-format:
BasedOnStyle: Google
IncludeIsMainRegex: '(_test|-inl)?$'
* Steps:
find . -name '*.cc' -o -name '*.h' | xargs sed -i.bak -e 's/^#include/ #include/' ; git commit -a -m 'ART: Include cleanup'
git-clang-format -style=file HEAD^
manual inspection
git commit -a --amend
Test: mmma art
Change-Id: Ia963a8ce3ce5f96b5e78acd587e26908c7a70d02
|
|
Based on aliasing information from heap location collector,
instruction scheduling can further eliminate side-effect
dependencies between memory accesses to different locations,
and perform better scheduling on memory loads and stores.
Performance improvements of this CL, measured on Cortex-A53:
| benchmarks | ARM64 backend | ARM backend |
|----------------+---------------|-------------|
| algorithm | 0.1 % | 0.1 % |
| benchmarksgame | 0.5 % | 1.3 % |
| caffeinemark | 0.0 % | 0.0 % |
| math | 5.1 % | 5.0 % |
| stanford | 1.1 % | 0.6 % |
| testsimd | 0.4 % | 0.1 % |
Compilation time impact is negligible, because this
heap location load store analysis is only performed
on loop basic blocks that get instruction scheduled.
Test: m test-art-host
Test: m test-art-target
Test: 706-checker-scheduler
Change-Id: I43d7003c09bfab9d3a1814715df666aea9a7360b
|
|
Performance improvements on various benchmarks with this CL:
benchmarks improvements
---------------------------
algorithm 1%
benchmarksgame 2%
caffeinemark 2%
math 3%
stanford 4%
Tested on ARM Cortex-A53 CPU.
The code size impact is negligible.
Test: m test-art-host
Test: m test-art-target
Change-Id: I314c90c09ce27e3d224fc686ef73c7d94a6b5a2c
|
|
Add explicit field initialization to default value where necessary.
Also clean up interpreter intrinsics header.
Test: m
Change-Id: I7a850ac30dcccfb523a5569fb8400b9ac892c8e5
|
|
This commit adds a new `HInstructionScheduling` pass that performs
basic scheduling on the `HGraph`.
Currently, scheduling is performed at the block level, so no
`HInstruction` ever leaves its block in this pass.
The scheduling process iterates through blocks in the graph. For
blocks that we can and want to schedule:
1) Build a dependency graph for instructions. It includes data
dependencies (inputs/uses), but also environment dependencies and
side-effect dependencies.
2) Schedule the dependency graph. This is a topological sort of the
dependency graph, using heuristics to decide what node to schedule
first when there are multiple candidates. Currently the heuristics
only consider instruction latencies and schedule first the
instructions that are on the critical path.
Test: m test-art-host
Test: m test-art-target
Change-Id: Iec103177d4f059666d7c9626e5770531fbc5ccdc
|