New instruction simplifications. Extra dce pass. Allow more per block repeats.

Rationale:
We were missing some obvious simplifications, which left performance
at the table for e.g. CaffeineLogic compiled with dx (4200us->2700us).
The constant for allowing a repeat on a BB seemed very low, at the
very least it should depend on the BB size.

Test: test-art-host

Change-Id: Ic234566e117593e12c936d556222e4cd4f928105
diff --git a/compiler/optimizing/nodes.h b/compiler/optimizing/nodes.h
index 6a45149..ce2edde 100644
--- a/compiler/optimizing/nodes.h
+++ b/compiler/optimizing/nodes.h
@@ -1855,6 +1855,15 @@
   size_t InputCount() const { return GetInputRecords().size(); }
   HInstruction* InputAt(size_t i) const { return InputRecordAt(i).GetInstruction(); }
 
+  bool HasInput(HInstruction* input) const {
+    for (const HInstruction* i : GetInputs()) {
+      if (i == input) {
+        return true;
+      }
+    }
+    return false;
+  }
+
   void SetRawInputAt(size_t index, HInstruction* input) {
     SetRawInputRecordAt(index, HUserRecord<HInstruction*>(input));
   }