Quick compiler compile-time/memory use improvement - platform/art

diff options

author	buzbee <buzbee@google.com>	2013-11-08 11:07:02 -0800
committer	buzbee <buzbee@google.com>	2013-11-14 08:01:23 -0800
commit	17189ac098b2f156713db1821b49db7b2f018bbe (patch)
tree	112bbb910fadb310ef98d3b1bb306f9ef876984e /tools/cpplint.py
parent	3b6f0fae76fddf81930a263a075dc87b6039b7fc (diff)

Quick compiler compile-time/memory use improvement

This CL delivers a surprisingly large reduction in compile time, as well as a significant reduction in memory usage by conditionally removing a CFG construction feature introduced to support LLVM bitcode generation. In short, bitcode requires all potential exception edges to be explicitly present in the CFG. The Quick compiler (based on the old JIT), can ignore, at least for the purposes of dataflow analysis, potential throw points that do not have a corresponding catch block. To support LLVM, we create a target basic block for every potentially throwing instruction to give us a destination for the exception edge. Then, following the check elimination pass, we remove blocks whose edges have gone away. However, if we're not using LLVM, we can skip the creation of those things in the first place. The savings are significant. Single-threaded compilation time on the host looks to be reduced by something in the vicinity of 10%. We create roughly 60% fewer basic blocks (and, importantly, the creation of fewer basic block nodes has a multiplying effect on memory use reduction because it results in fewer dataflow bitmaps). Some basic block redution stats: boot: 2325802 before, 844846 after. Phonesky: 485472 before, 156014 after. PlusOne: 806232 before, 243156 after. Thinkfree: 864498 before, 264858 after. Another nice side effect of this change is giving the basic block optimization pass generally larger scope. For arena memusage in the boot class path (compiled on the host): Average Max Before: 50,863 88,017,820 After: 41,964 4,914,208 The huge reduction in max arena memory usage is due to the collapsing of a large initialization method. Specifically, with complete exception edges org.ccil.cowan.tagsoup.Scheme requires 13,802 basic blocks. With exception edges collapsed, it requires 4. This change also caused 2 latent bugs to surface. 1) The dex parsing code did not expect that the target of a switch statement could reside in the middle of the same basic block ended by that same switch statement. 2) The x86 backend introduced a 5-operand LIR instruction for indexed memops. However, there was no corresponding change to the use/def mask creation code. Thus, the 5th operand was never included in the use/def mask. This allowed the instruction scheduling code to incorrectly move a use above a definition. We didn't see this before because the affected x86 instructions were only used for aget/aput, and prior to this CL those Dalvik opcodes caused a basic block break because of the implied exception edge - which prevented the code motion. And finally, also included is some minor tuning of register use weighting. Change-Id: I3f2cab7136dba2bded71e9e33b452b95e8fffc0e

Diffstat (limited to 'tools/cpplint.py')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: