optimizing: add block-scoped constructor fence merging pass

Introduce a new "Constructor Fence Redundancy Elimination" pass.
The pass currently performs local optimization only, i.e. within instructions
in the same basic block.

All constructor fences preceding a publish (e.g. store, invoke) get
merged into one instruction.

==============

OptStat#ConstructorFenceGeneratedNew:   43825
OptStat#ConstructorFenceGeneratedFinal: 17631  <+++
OptStat#ConstructorFenceRemovedLSE:     164
OptStat#ConstructorFenceRemovedPFRA:    9391
OptStat#ConstructorFenceRemovedCFRE:    16133  <---

Removes ~91.5% of the 'final' constructor fences in RitzBenchmark:

(We do not distinguish the exact reason that a fence was created, so
it's possible some "new" fences were also removed.)

==============

Test: art/test/run-test --host --optimizing 476-checker-ctor-fence-redun-elim
Bug: 36656456
Change-Id: I8020217b448ad96ce9b7640aa312ae784690ad99
diff --git a/compiler/optimizing/nodes.h b/compiler/optimizing/nodes.h
index 29be8ac..3e4928b 100644
--- a/compiler/optimizing/nodes.h
+++ b/compiler/optimizing/nodes.h
@@ -6634,13 +6634,24 @@
   // Returns how many HConstructorFence instructions were removed from graph.
   static size_t RemoveConstructorFences(HInstruction* instruction);
 
+  // Combine all inputs of `this` and `other` instruction and remove
+  // `other` from the graph.
+  //
+  // Inputs are unique after the merge.
+  //
+  // Requirement: `this` must not be the same as `other.
+  void Merge(HConstructorFence* other);
+
   // Check if this constructor fence is protecting
   // an HNewInstance or HNewArray that is also the immediate
   // predecessor of `this`.
   //
+  // If `ignore_inputs` is true, then the immediate predecessor doesn't need
+  // to be one of the inputs of `this`.
+  //
   // Returns the associated HNewArray or HNewInstance,
   // or null otherwise.
-  HInstruction* GetAssociatedAllocation();
+  HInstruction* GetAssociatedAllocation(bool ignore_inputs = false);
 
   DECLARE_INSTRUCTION(ConstructorFence);