Age | Commit message (Collapse) | Author |
|
We were missing to update some cases when `instruction` wasn't present
in the induction analysis but `fetch` was. We can use `fetch` itself
when querying for the InductionInfo to cover both the previous and
new cases.
Bug: 351868772
Fixes: 351868772
Test: art/test/testrunner/testrunner.py --host --64 --optimizing -b
Test: m test-art-host-gtest
Change-Id: I3c412efdef36d242b0182ed80ce62567c0e3eafc
|
|
We can now detect and remove loops that require an is_taken test e.g.
int a = 0;
for (int i = 0; i < n; i++) {
a += 1;
}
can be turned into `if (n < 0) then 0 else n`.
Part of this logic can be reused in the future to help eliminate
BoundsCheck instructions.
Bug: 304967775
Fixes: 304967775
Test: art/test/testrunner/testrunner.py --host --64 -b --optimizing
Change-Id: I944f3408e623a0652977d4c3f72d29caf9c1f908
|
|
We can move the logic from GenerateLastValueLinear to GenerateCode,
where we can:
* be more granular, and
* reuse that code for other methods.
To do so we add an allow_potential_overflow variable. This is done
because BCE does perform operations that might overflow, but it
uses HDeoptimize instructions to guard against that. Loop
optimization doesn't add HDeoptimize instructions so we must be
more careful.
Bug: 231415860
Test: art/test/testrunner/testrunner.py --host --64 --optimizing -b
Change-Id: I1d689e5ddd87d96707673b6065ab0cc48b288617
|
|
After the old implementation was renamed in
https://android-review.googlesource.com/2526708 ,
we introduce a new function with the old name but new
behavior, just `DCHECK()`-ing the instruction kind before
casting down the pointer. We change appropriate calls from
`As##type##OrNull()` to `As##type()` to avoid unncessary
run-time checks and reduce the size of libart-compiler.so.
Test: m test-art-host-gtest
Test: testrunner.py --host --optimizing
Test: run-gtests.sh
Test: testrunner.py --target --optimizing
Bug: 181943478
Change-Id: I025681612a77ca2157fed4886ca47f2053975d4e
|
|
The null type check in the current implementation of
`HInstruction::As##type()` often cannot be optimized away
by clang++. It is therefore beneficial to have two functions
HInstruction::As##type()
HInstruction::As##type##OrNull()
where the first function never returns null but the second
one can return null. The additional text "OrNull" shall also
flag the possibility of yielding null to the developer which
may help avoid bugs similar to what we have seen previously.
This requires renaming the existing function that can return
null and introducing new function that cannot. However,
defining the new function `HInstruction::As##type()` in the
same change as renaming the old one would risk introducing
bugs by missing a rename. Therefore we simply rename the old
function here and the new function shall be introduced in a
separate change with all behavioral changes being explicit.
Test: m test-art-host-gtest
Test: testrunner.py --host --optimizing
Test: buildbot-build.sh --target
Bug: 181943478
Change-Id: I4defd85038e28fe3506903ba3f33f723682b3298
|
|
This reverts commit 0a51605ddd81635135463dab08b6f7c21b58ffb0.
Reason for revert: Reland after some of the required work
was merged in other CLs.
Also address a TODO from the original CL to mark required
symbols with EXPORT in `intrinsic_objects.h`.
Also mark symbols in new files as HIDDEN.
Bug: 186902856
Test: m test-art-host-gtest
Test: testrunner.py --host --optimizing
Change-Id: I936d448983928af23614ca82c2d0bf9a645e2c52
|
|
In the calcuation of `a * i + b`, `a` itself is calculated by doing:
`(end - start) + (step - 1) / step`
(Note that we add `step - 1` as a way of doing `ceiling`).
This way of calculating `a` can overflow and produce the wrong result
if end and start are in opposite sides of the spectrum.
We can force `a` to be a constant to guarantee that the right result
will be generated when doing loop optimization.
Bug: 231415860
Test: art/test/testrunner/testrunner.py --host --64 --optimizing -b
Change-Id: Ic056441f8d672b3c48cbbd2f3e4ebd7528e2c65b
|
|
Instead of `in_body`, propagate the context block and loop
information to make better decisions for trip count if the
context is outside the loop.
In particular, fix `InductionVarRange::IsConstant()` to take
and use this information instead of assuming that we are
asking about values in the loop body.
For trip count with context outside the loop, we know that
the value shall be the maximum trip count if the context
is dominated by the loop control exit block.
Test: Enable run-test 835-b216762268.
Test: m test-art-host-gtest
Test: testrunner.py --host --optimizing
Bug: 216762268
Change-Id: Id564ba75c812d54abdd9b229e643cc8ab4701c52
|
|
This reverts commit e2727154f25e0db9a5bb92af494d8e47b181dfcf.
Reason for revert: Breaks ASAN tests (ODR violation).
Bug: 142365358
Change-Id: I38103d74a1297256c81d90872b6902ff1e9ef7a4
|
|
Make symbols in compiler/optimizing hidden by a namespace
attribute. The unit intrinsic_objects.{h,cc} is excluded as
it is needed by dex2oat.
As the symbols are no longer exported, gtests are now linked
with the static version of the libartd-compiler library.
libart-compiler.so size:
- before:
arm: 2396152
arm64: 3345280
- after:
arm: 2016176 (-371KiB, -15.9%)
arm64: 2874480 (-460KiB, -14.1%)
Test: m test-art-host-gtest
Test: testrunner.py --host --optimizing --jit
Bug: 142365358
Change-Id: I1fb04a33351f53f00b389a1642e81a68e40912a8
|
|
Implement scalar loop unrolling for small loops
(on arm64) with known trip count to reduce loop check
and branch penalty and to provide more opportunities
for instruction scheduling.
Note: this functionality is turned off by default now.
Test: cloner_test.cc
Test: test-art-target, test-art-host
Change-Id: Ic27fd8fb0bc0d7b69251252da37b8b510bc30acc
|
|
Rationale:
Recognize reductions in loops. Note that reductions are *not*
optimized yet (we would proceed with e.g. unrolling and vectorization).
This CL merely sets up the basic detection framework. Also does
a bit of cleanup on loop optimization code.
Bug: 64091002
Test: test-art-host
Change-Id: I0f52bd7ca69936315b03d02e83da743b8ad0ae72
|
|
Rationale:
We missed vectorizing a simple stencil operation
due to inaccurate unit stride analysis and failure
to detect single runtime data dependence test.
Test: test-art-host, test-art-target
Change-Id: I07ba03455bfb1c0aff371c1244a1328f885d0916
|
|
With regression test (found by fuzz testing).
Bug: 37033123
Test: test-art-target
Change-Id: Id738b2a3a353985c3d0bf3beeb581a31f1fcbc3f
|
|
Rationale:
Break-out CL of ART Vectorizer: 2 OF many.
The purpose is making the original CL smaller
and easier to review.
Bug: 34083438
Test: test-art-host
Change-Id: I46d297eba504af3850a5998ee279ea9f7b38bed8
|
|
Rationale:
Avoids the unnecessary loop control overhead, suspend check,
and exposes more opportunities for constant folding in the
resulting loop body. Fully unrolls loop in execute() of
the Dhrystone benchmark (3% to 8% improvements).
Test: test-art-host
Change-Id: If30f38caea9e9f87a929df041dfb7ed1c227aba3
|
|
Rationale:
Information on polynomial sequences is nice to further enhance
BCE and last-value assignment. In this case, this CL enables more
loop optimizations for benchpress' Sum (80 x speedup). Also
changed rem-based geometric induction to wrap-around induction.
Test: test-art-host
Change-Id: Ie4d2659edefb814edda2c971c1f70ba400c31111
|
|
Rationale:
Information on geometric and polynomial (coming soon) sequences
are nice to have to further enhance BCE and last-value assignment.
Test: test-art-host
Change-Id: Ib5e2998c3eb1009def6fd00b82935da7c3ba7c6e
|
|
Rationale:
Rather than half-baked reconstructing cycles during loop optimizations,
this CL passes the SCC computed during induction variable analysis
to the loop optimizer (trading some memory for more optimizations).
This further improves CaffeineLogic from 6000us down to 4200us (dx)
and 2200us to 1690us (jack). Note that this is on top of prior
improvements in previous CLs. Also, some narrowing type concerns
are taken care of during transfer operations.
Test: test-art-host
Change-Id: Ice2764811a70073c5014b3a05fb51f39fd2f4c3c
|
|
Rationale:
This helps to eliminate more dead induction. For example,
CaffeineLogic when compiled with latest Jack improves with
a 1.3 speedup (2900us -> 2200us) due to eliminating first
loop (second loop can be removed also, but for a later
case). The currently benchmarks.dex has a different construct
for the periodics, however, still to be recognized.
Test: test-art-host
Change-Id: Ia81649a207a2b1f03ead0855436862ed4e4f45e0
|
|
Rationale:
This is a commonly used construct (e.g. x = !x for booleans
and x ^= 1 for integers). This CL prepares some upcoming
optimizations that exploit such inductions.
Change-Id: I46edffb9de1075a836995daf5c2dfff7891f3034
Test: 530-checker-loops2 and induction_var_analysis_test
|
|
Rationale:
Empty preheader simplification has been simplified
to a much more general empty block removal optimization
step. Incremental updating of induction variable
analysis enables repeated elimination or simplification
of induction cycles.
This enabled an extra layer of optimization for
e.g. Benchpress Loop (17.5us. -> 0.24us. -> 0.08us).
So the original 73x speedup is now multiplied
by another 3x, for a total of about 218x.
Test: 618-checker-induction et al.
Change-Id: I394699981481cdd5357e0531bce88cd48bd32879
|
|
Also added utility to update fetches in induction nodes.
Rationale:
This is a first step towards the larger CL that introduces
a new loop optimization framework in the optimizing compiler
(see https://android-review.googlesource.com/#/c/271392/3).
Change-Id: Ibecd674c8146d9665340e68718c498555646129a
Tests: induction_var_range_test
|
|
Rationale:
Uses range analysis while determining whether trip-counts
are "safe", which improves analysis of triangular loops.
Also implements more effective triangular loop analysis
by evaluating induction information only once and using
a top level hint (instead of the "iterative refinement"
that was used earlier). Also fixes analysis of triangular
trip counts that may wrap-around. All with tests.
Test: see induction_var_range_test/530-checker-loops*
BUG=27151190
Change-Id: I1877c8ce0c9a52005900eb9dfdbb1918df100278
|
|
Rationale:
Marking more loops as always-taken avoids generating
unnecessary new top tests while marking more loops
are non-infinite enables more optimizations. This
CL helps with these improvements. Also, some more
code is shared between induction and range analysis
and a bug with refining ranges has been fixed.
Bug: 27151190
Change-Id: Iecc0d7f32ae4779ee5424cda9dcc20816220935e
|
|
Rationale:
Avoid testing initial range if nothing is known.
Change-Id: I22646a5fd6e4481245d1a2f57891d2805550489f
|
|
Rationale: this change list improves analysis of triangular loops
both by changing loop order for induction analysis
(enabling range analysis in inner loops) and by
some symbolic improvements during range analysis;
also, a mul/div bug has been fixed (with pass/fail
unit tests); lastly this change list prepares some
follow up optimizations.
Change-Id: I84a03e848405009541c3fa8e3d3c2f430e100087
|
|
Rationale: Using a step-wise approach (rather than expanding all ranges
at once) increases the opportunities for statically removing
bound checks, as demonstrated by the new checker tests.
Change-Id: Icbfd9406523a069e1fb7508546ea94f896e5a255
|
|
Rationale: added ability to generate taken-test, prompt back need
for finite-test; cleaned up the API now that bounds
check needs are all known.
Change-Id: I3d09b249965d1a980c09381240de175ca4b2e455
|
|
Rationale: used by dynamic BCE (done in another CL).
Change-Id: Ia6ce75da57b5298fba74622822ae0bae69c74188
|
|
As shown in the induction analysis presentation, trip-counts need to
deal with potential taken/not-taken situations (so that trip-count
is either valid in the full loop or just in the loop-body proper)
and potential finite/infinite situations (the latter can still be
analyzed but may need to run-time test later to guard against the
infinite conditions). This CL provides that information.
Change-Id: I0445d8e836b80a3614af217ce3e39d766e77b986
|
|
(1) replaced min/max macro as previously required.
(2) removed some redundant code by merging min/max into one.
Change-Id: I610879a06d550346bfac7e6e12ec0299ba226a37
|
|
Rationale:
Using min/max values for "unknowns" is a bit wasteful,
since it eliminates two useful values. Replaced this
with additional boolean to make cases more accurate.
Added few cases to handle examples found in real-life.
Change-Id: I211f8d9a28b1ae79abdb55fb4569716f21d8043b
|
|
Rationale: Finally! After lots of very large CLs, now a small CL
that uses the new induction variable analysis in BCE
(statically, using this dynamically with de-opt is TBD).
Despite its relative small size, be aware though,
since the CL introduces a new phase to the compiler.
Change-Id: If5555a173fd5d55d147c63138ef51fc296fa1414
|
|
Rationale: by computing an upper bound on the trip count of each
loop after induction var analysis has completed, a
simple range analysis yields lower and upper bounds on
all induced expressions in a loop; this analysis
plugs directly into BCE (follow-up CL).
Change-Id: I46a3fe48721ca372547199b39a3498c47992597d
|