Implemented BitCount as an intrinsic. With unit test.
Rationale:
Recognizing this important operation as an intrinsic has
various advantages:
(1) having the no-side-effects/no-throw allows for
much more GVN/LICM/BCE.
(2) Some architectures, like x86_64, provide direct
support for this operation.
Performance improvements on X86_64:
CheckersEvalBench (32-bit bitboard): 27,210KNS -> 36,798KNS = + 35%
ReversiEvalBench (64-bit bitboard): 52,562KNS -> 89,086KNS = + 69%
Change-Id: I65d549b0469b7909b12c6611cdc34a8640a5751f
diff --git a/compiler/optimizing/intrinsics.cc b/compiler/optimizing/intrinsics.cc
index c6da9a3..5caf077 100644
--- a/compiler/optimizing/intrinsics.cc
+++ b/compiler/optimizing/intrinsics.cc
@@ -176,6 +176,16 @@
}
// Misc data processing.
+ case kIntrinsicBitCount:
+ switch (GetType(method.d.data, true)) {
+ case Primitive::kPrimInt:
+ return Intrinsics::kIntegerBitCount;
+ case Primitive::kPrimLong:
+ return Intrinsics::kLongBitCount;
+ default:
+ LOG(FATAL) << "Unknown/unsupported op size " << method.d.data;
+ UNREACHABLE();
+ }
case kIntrinsicNumberOfLeadingZeros:
switch (GetType(method.d.data, true)) {
case Primitive::kPrimInt: