New instruction simplifications. Extra dce pass. Allow more per block repeats.
Rationale:
We were missing some obvious simplifications, which left performance
at the table for e.g. CaffeineLogic compiled with dx (4200us->2700us).
The constant for allowing a repeat on a BB seemed very low, at the
very least it should depend on the BB size.
Test: test-art-host
Change-Id: Ic234566e117593e12c936d556222e4cd4f928105
diff --git a/compiler/optimizing/optimizing_compiler.cc b/compiler/optimizing/optimizing_compiler.cc
index 19fd6f9..a484760 100644
--- a/compiler/optimizing/optimizing_compiler.cc
+++ b/compiler/optimizing/optimizing_compiler.cc
@@ -755,6 +755,8 @@
HDeadCodeElimination* dce1 = new (arena) HDeadCodeElimination(
graph, stats, "dead_code_elimination$initial");
HDeadCodeElimination* dce2 = new (arena) HDeadCodeElimination(
+ graph, stats, "dead_code_elimination$after_inlining");
+ HDeadCodeElimination* dce3 = new (arena) HDeadCodeElimination(
graph, stats, "dead_code_elimination$final");
HConstantFolding* fold1 = new (arena) HConstantFolding(graph);
InstructionSimplifier* simplify1 = new (arena) InstructionSimplifier(graph, stats);
@@ -795,6 +797,7 @@
select_generator,
fold2, // TODO: if we don't inline we can also skip fold2.
simplify2,
+ dce2,
side_effects,
gvn,
licm,
@@ -804,7 +807,7 @@
fold3, // evaluates code generated by dynamic bce
simplify3,
lse,
- dce2,
+ dce3,
// The codegen has a few assumptions that only the instruction simplifier
// can satisfy. For example, the code generator does not expect to see a
// HTypeConversion from a type to the same type.