New instruction simplifications. Extra dce pass. Allow more per block repeats.

Rationale:
We were missing some obvious simplifications, which left performance
at the table for e.g. CaffeineLogic compiled with dx (4200us->2700us).
The constant for allowing a repeat on a BB seemed very low, at the
very least it should depend on the BB size.

Test: test-art-host

Change-Id: Ic234566e117593e12c936d556222e4cd4f928105
diff --git a/compiler/optimizing/optimizing_compiler.cc b/compiler/optimizing/optimizing_compiler.cc
index 19fd6f9..a484760 100644
--- a/compiler/optimizing/optimizing_compiler.cc
+++ b/compiler/optimizing/optimizing_compiler.cc
@@ -755,6 +755,8 @@
   HDeadCodeElimination* dce1 = new (arena) HDeadCodeElimination(
       graph, stats, "dead_code_elimination$initial");
   HDeadCodeElimination* dce2 = new (arena) HDeadCodeElimination(
+      graph, stats, "dead_code_elimination$after_inlining");
+  HDeadCodeElimination* dce3 = new (arena) HDeadCodeElimination(
       graph, stats, "dead_code_elimination$final");
   HConstantFolding* fold1 = new (arena) HConstantFolding(graph);
   InstructionSimplifier* simplify1 = new (arena) InstructionSimplifier(graph, stats);
@@ -795,6 +797,7 @@
     select_generator,
     fold2,  // TODO: if we don't inline we can also skip fold2.
     simplify2,
+    dce2,
     side_effects,
     gvn,
     licm,
@@ -804,7 +807,7 @@
     fold3,  // evaluates code generated by dynamic bce
     simplify3,
     lse,
-    dce2,
+    dce3,
     // The codegen has a few assumptions that only the instruction simplifier
     // can satisfy. For example, the code generator does not expect to see a
     // HTypeConversion from a type to the same type.