optimizing: add block-scoped constructor fence merging pass

Introduce a new "Constructor Fence Redundancy Elimination" pass.
The pass currently performs local optimization only, i.e. within instructions
in the same basic block.

All constructor fences preceding a publish (e.g. store, invoke) get
merged into one instruction.

==============

OptStat#ConstructorFenceGeneratedNew:   43825
OptStat#ConstructorFenceGeneratedFinal: 17631  <+++
OptStat#ConstructorFenceRemovedLSE:     164
OptStat#ConstructorFenceRemovedPFRA:    9391
OptStat#ConstructorFenceRemovedCFRE:    16133  <---

Removes ~91.5% of the 'final' constructor fences in RitzBenchmark:

(We do not distinguish the exact reason that a fence was created, so
it's possible some "new" fences were also removed.)

==============

Test: art/test/run-test --host --optimizing 476-checker-ctor-fence-redun-elim
Bug: 36656456
Change-Id: I8020217b448ad96ce9b7640aa312ae784690ad99
diff --git a/compiler/optimizing/optimizing_compiler_stats.h b/compiler/optimizing/optimizing_compiler_stats.h
index d6da73c..af7ab2f 100644
--- a/compiler/optimizing/optimizing_compiler_stats.h
+++ b/compiler/optimizing/optimizing_compiler_stats.h
@@ -91,6 +91,7 @@
   kConstructorFenceGeneratedFinal,
   kConstructorFenceRemovedLSE,
   kConstructorFenceRemovedPFRA,
+  kConstructorFenceRemovedCFRE,
   kLastStat
 };
 
@@ -211,6 +212,7 @@
       case kConstructorFenceGeneratedFinal: name = "ConstructorFenceGeneratedFinal"; break;
       case kConstructorFenceRemovedLSE: name = "ConstructorFenceRemovedLSE"; break;
       case kConstructorFenceRemovedPFRA: name = "ConstructorFenceRemovedPFRA"; break;
+      case kConstructorFenceRemovedCFRE: name = "ConstructorFenceRemovedCFRE"; break;
 
       case kLastStat:
         LOG(FATAL) << "invalid stat "