| Age | Commit message (Collapse) | Author |
|
Support all condition types inside the condition when performing
diamond loop auto-vectorization. This allows diamond loop
auto-vectorization to be performed on a greater variety of loops.
To support this change, new vector condition nodes are added to
mirror the scalar condition nodes.
Also add a new gtest class to test whether predicated vectorization
can be performed on different combinations of condition types and
data types.
Authors: Chris Jones <christopher.jones@arm.com>,
Konstantin Baladurin <konstantin.baladurin@arm.com>
Test: export ART_FORCE_TRY_PREDICATED_SIMD=true && \
art/test.py --target --optimizing
Test: art/test.py --target --host --optimizing
Test: 661-checker-simd-cf-loops
Test: art/test.py --gtest art_compiler_tests
Change-Id: Ic9c925f1a58ada13d9031de3b445dcd4f77764b7
|
|
Make `OptimizingUnitTestHelper::Make*()` functions add the
the new instruction to the block. If the block already ends
with a control flow instruction, the new instruction is
inserted before the control flow instruction (some tests
create the control flow before adding instruction). Add new
helper functions for additional instruction types, rename
and clean up existing helpers.
Test: m test-art-host-gtest
Change-Id: I0bb88bc4d2ff6ce98ddbec25990a1ae68f582042
|
|
Keep tests disabled for riscv64 if they are disabled for arm
and arm64.
Fix `GetInstructionSetFromELF()` to recognize riscv64.
This partially reverts commit
476491deb9f9abb04f25888a20280d950714048b .
Test: run-gtests.sh
Bug: 271573990
Change-Id: Iccf1e2b1e93ed09eaf27884c34696f42fd752ec4
|
|
Bug: b/271573990
Test: gtests on host:
lunch aosp_riscv64-userdebug && m test-art-host-gtest
Test: gtests on target (on a Linux RISC-V VM):
lunch aosp_riscv64-userdebug
export ART_TEST_SSH_USER=ubuntu
export ART_TEST_SSH_HOST=localhost
export ART_TEST_SSH_PORT=10001
export ART_TEST_ON_VM=true
. art/tools/buildbot-utils.sh
art/tools/buildbot-build.sh --target
# Create, boot and configure the VM.
art/tools/buildbot-vm.sh create
art/tools/buildbot-vm.sh boot
art/tools/buildbot-vm.sh setup-ssh # password: 'ubuntu'
art/tools/buildbot-cleanup-device.sh
art/tools/buildbot-setup-device.sh
art/tools/buildbot-sync.sh
art/tools/run-gtests.sh
Change-Id: I278e3453406a91a5e9d03645cafb9a9d1f82d896
|
|
After the old implementation was renamed in
https://android-review.googlesource.com/2526708 ,
we introduce a new function with the old name but new
behavior, just `DCHECK()`-ing the instruction kind before
casting down the pointer. We change appropriate calls from
`As##type##OrNull()` to `As##type()` to avoid unncessary
run-time checks and reduce the size of libart-compiler.so.
Test: m test-art-host-gtest
Test: testrunner.py --host --optimizing
Test: run-gtests.sh
Test: testrunner.py --target --optimizing
Bug: 181943478
Change-Id: I025681612a77ca2157fed4886ca47f2053975d4e
|
|
The null type check in the current implementation of
`HInstruction::As##type()` often cannot be optimized away
by clang++. It is therefore beneficial to have two functions
HInstruction::As##type()
HInstruction::As##type##OrNull()
where the first function never returns null but the second
one can return null. The additional text "OrNull" shall also
flag the possibility of yielding null to the developer which
may help avoid bugs similar to what we have seen previously.
This requires renaming the existing function that can return
null and introducing new function that cannot. However,
defining the new function `HInstruction::As##type()` in the
same change as renaming the old one would risk introducing
bugs by missing a rename. Therefore we simply rename the old
function here and the new function shall be introduced in a
separate change with all behavioral changes being explicit.
Test: m test-art-host-gtest
Test: testrunner.py --host --optimizing
Test: buildbot-build.sh --target
Bug: 181943478
Change-Id: I4defd85038e28fe3506903ba3f33f723682b3298
|
|
Check that the flags are up to date in graph checker. Mainly a
correctness check CL but it brings slight code size reduction
(e.g. not needing vreg info if HasMonitorOperations is false).
Update loop_optimization_test to stop using `LocalRun` directly
as it meant that it was breaking assumptions (i.e. top_loop_ was
nullptr when it was expected to have a value).
Bug: 264278131
Test: art/test/testrunner/testrunner.py --host --64 --optimizing -b
Change-Id: I29765b3be46d4bd7c91ea9c80f7565a3c88fae2e
|
|
This reverts commit 0a51605ddd81635135463dab08b6f7c21b58ffb0.
Reason for revert: Reland after some of the required work
was merged in other CLs.
Also address a TODO from the original CL to mark required
symbols with EXPORT in `intrinsic_objects.h`.
Also mark symbols in new files as HIDDEN.
Bug: 186902856
Test: m test-art-host-gtest
Test: testrunner.py --host --optimizing
Change-Id: I936d448983928af23614ca82c2d0bf9a645e2c52
|
|
The only Optimizing test that actually needs a Runtime is
the ReferenceTypePropagationTest, so we make it subclass
CommonCompilerTest explicitly and change OptimizingUnitTest
to subclass CommonArtTest for the other tests.
On host, each test that initializes the Runtime takes ~220ms
more than without initializing the Runtime. For example, the
ConstantFoldingTest that has 10 individual tests previously
took over 2.2s to run but without the Runtime initialization
it takes around 3-5ms. On target, running 32-bit gtests on
taimen with run-gtests.sh (single-threaded) goes from
~28m47s to ~26m13s, a reduction of ~9%.
Test: m test-art-host-gtest
Test: run-gtests.sh
Change-Id: I43e50ed58e52cc0ad04cdb4d39801bfbae840a3d
|
|
ART vectorizer assumes that there is single size of SIMD
register used for the whole program. Make this assumption explicit
and refactor the code.
Note: This is a base for the future introduction of SIMD slots of
size other than 8 or 16 bytes.
Test: test-art-target, test-art-host.
Change-Id: Id699d5e3590ca8c655ecd9f9ed4e63f49e3c4f9c
|
|
This reverts commit e2727154f25e0db9a5bb92af494d8e47b181dfcf.
Reason for revert: Breaks ASAN tests (ODR violation).
Bug: 142365358
Change-Id: I38103d74a1297256c81d90872b6902ff1e9ef7a4
|
|
Make symbols in compiler/optimizing hidden by a namespace
attribute. The unit intrinsic_objects.{h,cc} is excluded as
it is needed by dex2oat.
As the symbols are no longer exported, gtests are now linked
with the static version of the libartd-compiler library.
libart-compiler.so size:
- before:
arm: 2396152
arm64: 3345280
- after:
arm: 2016176 (-371KiB, -15.9%)
arm64: 2874480 (-460KiB, -14.1%)
Test: m test-art-host-gtest
Test: testrunner.py --host --optimizing --jit
Bug: 142365358
Change-Id: I1fb04a33351f53f00b389a1642e81a68e40912a8
|
|
Handles compiler.
Bug: 116054210
Test: WITH_TIDY=1 mmma art
Change-Id: I5cdfe73c31ac39144838a2736146b71de037425e
|
|
Removes CompilerDriver dependency from ImageWriter and
several other classes.
Test: m test-art-host-gtest
Test: testrunner.py --host --optimizing
Test: Pixel 2 XL boots.
Test: m test-art-target-gtest
Test: testrunner.py --target --optimizing
Change-Id: I3c5b8ff73732128b9c4fad9405231a216ea72465
|
|
Implement loop peeling/unrolling routines using SuperblockCloner.
Fixes bug b/74198030 and provides tests for it.
Bug: b/74198030
Test: superblock_cloner_test.cc, loop_optimization_test.cc.
Change-Id: Id0d9a91575c88f8e45186441b14e903b89b007dd
|
|
Original implementation of "Make sure the loop has only one
pre-header" had an assumption that the header had no phi
functions since loops with multiple preheaders now only may exist
during graph building before ssa construction; all of the
optimizations preserve the single-preheader invariant. This code is
used by DCE; DCE was called multiple times but after graph building
preheader transformation was never executed. However if someone
introduces a optimization which might not keep the invariant
(e.g. loop peeling) the data flow adjustments must be performed.
Test: loop_optimization_test.cc
Test: test-art-target, test-art-host
Change-Id: I88bb0aad2dd5241addef7fe9cda474a6868bf532
|
|
Passes using local ArenaAllocator were hiding their memory
usage from the allocation counting, making it difficult to
track down where memory was used. Using ScopedArenaAllocator
reveals the memory usage.
This changes the HGraph constructor which requires a lot of
changes in tests. Refactor these tests to limit the amount
of work needed the next time we change that constructor.
Test: m test-art-host-gtest
Test: testrunner.py --host
Test: Build with kArenaAllocatorCountAllocations = true.
Bug: 64312607
Change-Id: I34939e4086b500d6e827ff3ef2211d1a421ac91a
|
|
Replace most uses of the runtime's Primitive in compiler
with a new class DataType. This prepares for introducing
new types, such as Uint8, that the runtime does not need
to know about.
Test: m test-art-host-gtest
Test: testrunner.py --host
Bug: 23964345
Change-Id: Iec2ad82454eec678fffcd8279a9746b90feb9b0c
|
|
Test: market scan.
Change-Id: I58b23b8d254883f30619ea3602d34bf93618d432
|
|
Fix the issue when after loop header's predecessors reordering in
SimplifyLoops phi inputs are not reordered correspondingly.
Test: loop_optimization_test.cc, test-art-host, test-art-target.
Change-Id: I8a251a0a953d751f9bb67da58181e47d225d90e6
|
|
Rationale:
Break-out CL of ART Vectorizer: number 3.
The purpose is making the original CL smaller
and easier to review.
Bug: 34083438
Test: test-art-host
Change-Id: I7cece807ee4f5fcaeae41f1deed33ac263447b77
|
|
Add abstraction for uint16_t type index.
Test: m test-art-host
Change-Id: I47708741c7c579cbbe59ab723c1e31c5fe71f83a
|
|
Rationale:
Ownership of graph's linear order and iterators was
a bit unclear now that other phases are using it.
New approach allows phases to compute their own
order, while ssa_liveness is sole owner for graph
(since it is not mutated afterwards).
Also shortens lifetime of loop's arena.
Test: test-art-host
Change-Id: Ib7137d1203a1e0a12db49868f4117d48a4277f30
|
|
loop_optimization_test uses memory from HLoopOptimization's
allocator, which is scoped by the Run method.
Fix is to pass custom allocator.
test: m test-art-host-gtest
Change-Id: I359330e22202519f400a26da5403eeb00f0b2db4
|
|
Rationale:
We are planning to add more and more loop related optimizations
and this framework provides the basis to do so. For starters,
the framework optimizes dead induction, induction that can be
replaced with a simpler closed-form, and eliminates dead loops
completely (either pre-existing or as a result of induction
removal).
Speedup on e.g. Benchpress Loop is 73x (17.5us. -> 0.24us.)
[with the potential for more exploiting outer loop too]
Test: 618-checker-induction et al.
Change-Id: If80a809acf943539bf6726b0030dcabd50c9babc
|