summaryrefslogtreecommitdiff
path: root/compiler/optimizing/scheduler.h
AgeCommit message (Collapse)Author
2024-02-13Optimizing: Refactor `HScheduler`. Vladimir Marko
Move `SchedulingLatencyVisitor{ARM,ARM64}` to .cc files. Test: m test-art-host-gtest Test: testrunner.py --host --optimizing Test: run-gtests.sh Test: testrunner.py --target --optimizing Change-Id: I15cb1a4cbef00a328fec947189412c502bf80f46
2022-11-07Reland "Make compiler/optimizing/ symbols hidden." VladimĂ­r Marko
This reverts commit 0a51605ddd81635135463dab08b6f7c21b58ffb0. Reason for revert: Reland after some of the required work was merged in other CLs. Also address a TODO from the original CL to mark required symbols with EXPORT in `intrinsic_objects.h`. Also mark symbols in new files as HIDDEN. Bug: 186902856 Test: m test-art-host-gtest Test: testrunner.py --host --optimizing Change-Id: I936d448983928af23614ca82c2d0bf9a645e2c52
2019-10-29Fix intersecting live ranges created by instruction scheduler Evgeny Astigeevich
When scheduling code like the following: LOOP: v2=phi(v0, v1) use(v2) v1=... goto LOOP the instruction scheduler can move 'v1=...' before 'use(v2)'. This causes live ranges of v1 and v2 to intersect and results to a MOV instruction to be created. The CL fixes this. Improvements, Pixel3: Little CPU, arm64 micro/GCCLoops Example12 14.1% Example10b 11.0% Example23 8.1% Example24 6.6% Example10a 5.0% FFT workload 4.7% Compress workload 1.2% Little CPU, arm32 micro/GCCLoops Example23 7.5% Example24 4.3% MonteCarlo workload 1.35% Big CPU, arm32 and arm64 No significant improvements No significant regressions (> 5%) are found. Test: test.py --host --optimizing --jit --gtest Test: test.py --target --optimizing --jit Test: run-gtests.sh Change-Id: I1e4282af18f2d51fde5325a0c00a57e8bbc4fbed
2019-10-14Revert "Make compiler/optimizing/ symbols hidden." Vladimir Marko
This reverts commit e2727154f25e0db9a5bb92af494d8e47b181dfcf. Reason for revert: Breaks ASAN tests (ODR violation). Bug: 142365358 Change-Id: I38103d74a1297256c81d90872b6902ff1e9ef7a4
2019-10-14Make compiler/optimizing/ symbols hidden. Vladimir Marko
Make symbols in compiler/optimizing hidden by a namespace attribute. The unit intrinsic_objects.{h,cc} is excluded as it is needed by dex2oat. As the symbols are no longer exported, gtests are now linked with the static version of the libartd-compiler library. libart-compiler.so size: - before: arm: 2396152 arm64: 3345280 - after: arm: 2016176 (-371KiB, -15.9%) arm64: 2874480 (-460KiB, -14.1%) Test: m test-art-host-gtest Test: testrunner.py --host --optimizing --jit Bug: 142365358 Change-Id: I1fb04a33351f53f00b389a1642e81a68e40912a8
2019-05-16ART: Refactor SchedulingGraph for consistency and clarity Evgeny Astigeevich
The CL moves functionality from SchedulingGraph to other classes, deletes unused code and moves code used for testing to the tests source file: 1. SchedulingGraph::AddDependency: move checks whether a dependency has been added to SchedulingNode::Add*Predecessor as it is a SchedulingNode responsibility to keep a unique set of predecessors. 2. Create SideEffectDependencyAnalysis class. Code doing side effect dependency analysis is moved from SchedulingGraph into the class. 3. Remove SchedulingGraph::HasImmediate*Dependency methods as there are SchedulingNode::Has*Dependency methods for such kind of checks. 4. SchedulingGraph::HasImmediate*Dependency(HInstruction,HInstruction) methods are only used by tests. Their code is moved to a new class TestSchedulingGraph in the tests source file. Test: test.py --host --optimizing --jit --gtest Test: test.py --target --optimizing --jit Test: run-gtests.sh Change-Id: Id16eb6e9f8b9706e616dff0ccc1d0353ed968367
2019-02-20ART: Reduce dependencies on CompilerDriver. Vladimir Marko
Preparation for moving CompilerDriver and other stuff from libart-compiler.so to dex2oat. Test: m test-art-host-gtest Test: testrunner.py --host --optimizing Change-Id: Ic221ebca4b8c79dd1549316921ace655f2e3f0fe
2018-08-28Use 'final' and 'override' specifiers directly in ART. Roland Levillain
Remove all uses of macros 'FINAL' and 'OVERRIDE' and replace them with 'final' and 'override' specifiers. Remove all definitions of these macros as well, which were located in these files: - libartbase/base/macros.h - test/913-heaps/heaps.cc - test/ti-agent/ti_macros.h ART is now using C++14; the 'final' and 'override' specifiers have been introduced in C++11. Test: mmma art Change-Id: I256c7758155a71a2940ef2574925a44076feeebf
2018-08-02Reuse arena memory for each block in scheduler. Vladimir Marko
This reduces the peak memory used for large methods with multiple blocks to schedule. Compiling the aosp_taimen-userdebug boot image, the most memory hungry method BatteryStats.dumpLocked has the Scheduler memory allocations in ArenaStack hidden by the register allocator: - before: MEM: used: 8300224, allocated: 9175040, lost: 197360 Scheduler 8300224 - after: MEM: used: 5914296, allocated: 7864320, lost: 78200 SsaLiveness 5532840 RegAllocator 144968 RegAllocVldt 236488 The total arena memory used, including the ArenaAllocator not listed above, goes from 44333648 to 41950324 (-5.4%). (Measured with kArenaAllocatorCountAllocations=true, kArenaAllocatorPreciseTracking=false.) Also remove one unnecessary -Wframe-larger-than= workaround and add one workaround for large frame with the above arena alloc tracking flags. Test: m test-art-host-gtest Test: testrunner.py --host Bug: 34053922 Change-Id: I7fd8d90dcc13b184b1e5bd0bcac072388710a129
2018-06-21Use HashSet<std::string> instead of unordered_set<>. Vladimir Marko
Change the default parameters for HashSet<std::string> to allow passing StringPiece as a key, avoiding an unnecessary allocation. Use the HashSet<std::string> instead of std::unordered_set<std::string>. Rename HashSet<> functions that mirror std::unordered_multiset<> to lower-case. Fix CompilerDriver::LoadImageClasses() to avoid using invalidated iterator. Test: m test-art-host-gtest Test: testrunner.py --host Change-Id: I7f8b82ee0b07befc5a0ee1c420b08a2068ad931e
2018-05-15Refactoring LSE/LSA: introduce heap location type Aart Bik
Rationale: This refactoring introduces data types to heap locations. This will allow better type disambiguation in the future. As a first showcase, it already removes rather error-prone "exceptional" code in LSE dealing with array types on null values. Furthermore, many LSA specific details started to "leak" into clients, which is also error-prone. This refactoring moves such details back into just LSA, where it belongs. Test: test-art-host,target Bug: b/77906240 Change-Id: Id327bbe86dde451a942c9c5f9e83054c36241882
2018-04-26Step 1 of 2: conditional passes. Aart Bik
Rationale: The change adds a return value to Run() in preparation of conditional pass execution. The value returned by Run() is best effort, returning false means no optimizations were applied or no useful information was obtained. I filled in a few cases with more exact information, others still just return true. In addition, it integrates inlining as a regular pass, avoiding the ugly "break" into optimizations1 and optimziations2. Bug: b/78171933, b/74026074 Test: test-art-host,target Change-Id: Ia39c5c83c01dcd79841e4b623917d61c754cf075
2017-12-13ARM64: Workaround for the callee saved FP registers and SIMD. Artem Serov
Treat as scheduling barriers those vector instructions whose live ranges exceed the vectorized loop boundaries. This is a workaround for the lack of notion of SIMD register in the compiler; around a call we have to save/restore all live SIMD&FP registers (only lower 64 bits of SIMD&FP registers are callee saved) so don't reorder such vector instructions. Test: 706-checker-scheduler, test-art-host, test-art-target Bug: 69667779 Change-Id: I31e57518339d41545a0c519f7299afe381a8286c
2017-11-20Refactored optimization passes setup. Aart Bik
Rationale: Refactors the way we set up optimization passes in the compiler into a more centralized approach. The refactoring also found some "holes" in the existing mechanism (missing string lookup in the debugging mechanism, or inablity to set alternative name for optimizations that may repeat). Bug: 64538565 Test: test-art-host test-art-target Change-Id: Ie5e0b70f67ac5acc706db91f64612dff0e561f83
2017-11-08cpplint: Cleanup errors Igor Murashkin
Cleanup errors from upstream cpplint in preparation for moving art's cpplint fork to upstream tip-of-tree cpplint. Test: cd art && mm Bug: 68951293 Change-Id: I15faed4594cbcb8399850f8bdee39d42c0c5b956
2017-10-11Use ScopedArenaAllocator for building HGraph. Vladimir Marko
Memory needed to compile the two most expensive methods for aosp_angler-userdebug boot image: BatteryStats.dumpCheckinLocked() : 21.1MiB -> 20.2MiB BatteryStats.dumpLocked(): 42.0MiB -> 40.3MiB This is because all the memory previously used by the graph builder is reused by later passes. And finish the "arena"->"allocator" renaming; make renamed allocator pointers that are members of classes const when appropriate (and make a few more members around them const). Test: m test-art-host-gtest Test: testrunner.py --host Bug: 64312607 Change-Id: Ia50aafc80c05941ae5b96984ba4f31ed4c78255e
2017-10-09Use ScopedArenaAllocator for register allocation. Vladimir Marko
Memory needed to compile the two most expensive methods for aosp_angler-userdebug boot image: BatteryStats.dumpCheckinLocked() : 25.1MiB -> 21.1MiB BatteryStats.dumpLocked(): 49.6MiB -> 42.0MiB This is because all the memory previously used by Scheduler is reused by the register allocator; the register allocator has a higher peak usage of the ArenaStack. And continue the "arena"->"allocator" renaming. Test: m test-art-host-gtest Test: testrunner.py --host Bug: 64312607 Change-Id: Idfd79a9901552b5147ec0bf591cb38120de86b01
2017-10-06ART: Use ScopedArenaAllocator for pass-local data. Vladimir Marko
Passes using local ArenaAllocator were hiding their memory usage from the allocation counting, making it difficult to track down where memory was used. Using ScopedArenaAllocator reveals the memory usage. This changes the HGraph constructor which requires a lot of changes in tests. Refactor these tests to limit the amount of work needed the next time we change that constructor. Test: m test-art-host-gtest Test: testrunner.py --host Test: Build with kArenaAllocatorCountAllocations = true. Bug: 64312607 Change-Id: I34939e4086b500d6e827ff3ef2211d1a421ac91a
2017-07-24ART: Include cleanup Andreas Gampe
Let clang-format reorder the header includes. Derived with: * .clang-format: BasedOnStyle: Google IncludeIsMainRegex: '(_test|-inl)?$' * Steps: find . -name '*.cc' -o -name '*.h' | xargs sed -i.bak -e 's/^#include/ #include/' ; git commit -a -m 'ART: Include cleanup' git-clang-format -style=file HEAD^ manual inspection git commit -a --amend Test: mmma art Change-Id: Ia963a8ce3ce5f96b5e78acd587e26908c7a70d02
2017-06-30Disambiguate memory accesses in instruction scheduling xueliang.zhong
Based on aliasing information from heap location collector, instruction scheduling can further eliminate side-effect dependencies between memory accesses to different locations, and perform better scheduling on memory loads and stores. Performance improvements of this CL, measured on Cortex-A53: | benchmarks | ARM64 backend | ARM backend | |----------------+---------------|-------------| | algorithm | 0.1 % | 0.1 % | | benchmarksgame | 0.5 % | 1.3 % | | caffeinemark | 0.0 % | 0.0 % | | math | 5.1 % | 5.0 % | | stanford | 1.1 % | 0.6 % | | testsimd | 0.4 % | 0.1 % | Compilation time impact is negligible, because this heap location load store analysis is only performed on loop basic blocks that get instruction scheduled. Test: m test-art-host Test: m test-art-target Test: 706-checker-scheduler Change-Id: I43d7003c09bfab9d3a1814715df666aea9a7360b
2017-05-08Instruction scheduling for ARM. xueliang.zhong
Performance improvements on various benchmarks with this CL: benchmarks improvements --------------------------- algorithm 1% benchmarksgame 2% caffeinemark 2% math 3% stanford 4% Tested on ARM Cortex-A53 CPU. The code size impact is negligible. Test: m test-art-host Test: m test-art-target Change-Id: I314c90c09ce27e3d224fc686ef73c7d94a6b5a2c
2017-03-27ART: Clean up field initialization Andreas Gampe
Add explicit field initialization to default value where necessary. Also clean up interpreter intrinsics header. Test: m Change-Id: I7a850ac30dcccfb523a5569fb8400b9ac892c8e5
2017-01-25AArch64: Add HInstruction scheduling support. Alexandre Rames
This commit adds a new `HInstructionScheduling` pass that performs basic scheduling on the `HGraph`. Currently, scheduling is performed at the block level, so no `HInstruction` ever leaves its block in this pass. The scheduling process iterates through blocks in the graph. For blocks that we can and want to schedule: 1) Build a dependency graph for instructions. It includes data dependencies (inputs/uses), but also environment dependencies and side-effect dependencies. 2) Schedule the dependency graph. This is a topological sort of the dependency graph, using heuristics to decide what node to schedule first when there are multiple candidates. Currently the heuristics only consider instruction latencies and schedule first the instructions that are on the critical path. Test: m test-art-host Test: m test-art-target Change-Id: Iec103177d4f059666d7c9626e5770531fbc5ccdc