ART: ARM64: Support DotProd SIMD idiom.

Implement support for vectorization idiom which performs dot
product of two vectors and adds the result to wider precision
components in the accumulator.

viz. DOT_PRODUCT([ a1, .. , am], [ x1, .. , xn ], [ y1, .. , yn ]) =
                 [ a1 + sum(xi * yi), .. , am + sum(xj * yj) ],
     for m <= n, non-overlapping sums,
     for either both signed or both unsigned operands x, y.

The patch shows up to 7x performance improvement on a micro
benchmark on Cortex-A57.

Test: 684-checker-simd-dotprod.
Test: test-art-host, test-art-target.

Change-Id: Ibab0d51f537fdecd1d84033197be3ebf5ec4e455
diff --git a/compiler/optimizing/loop_optimization.h b/compiler/optimizing/loop_optimization.h
index 2b202fd..1a842c4 100644
--- a/compiler/optimizing/loop_optimization.h
+++ b/compiler/optimizing/loop_optimization.h
@@ -82,6 +82,7 @@
     kNoReduction     = 1 << 9,   // no reduction
     kNoSAD           = 1 << 10,  // no sum of absolute differences (SAD)
     kNoWideSAD       = 1 << 11,  // no sum of absolute differences (SAD) with operand widening
+    kNoDotProd       = 1 << 12,  // no dot product
   };
 
   /*
@@ -217,6 +218,11 @@
                          bool generate_code,
                          DataType::Type type,
                          uint64_t restrictions);
+  bool VectorizeDotProdIdiom(LoopNode* node,
+                             HInstruction* instruction,
+                             bool generate_code,
+                             DataType::Type type,
+                             uint64_t restrictions);
 
   // Vectorization heuristics.
   Alignment ComputeAlignment(HInstruction* offset,