Implemented BitCount as an intrinsic. With unit test.
Rationale:
Recognizing this important operation as an intrinsic has
various advantages:
(1) having the no-side-effects/no-throw allows for
much more GVN/LICM/BCE.
(2) Some architectures, like x86_64, provide direct
support for this operation.
Performance improvements on X86_64:
CheckersEvalBench (32-bit bitboard): 27,210KNS -> 36,798KNS = + 35%
ReversiEvalBench (64-bit bitboard): 52,562KNS -> 89,086KNS = + 69%
Change-Id: I65d549b0469b7909b12c6611cdc34a8640a5751f
diff --git a/compiler/utils/x86_64/assembler_x86_64.h b/compiler/utils/x86_64/assembler_x86_64.h
index 01d28e3..6f0847e 100644
--- a/compiler/utils/x86_64/assembler_x86_64.h
+++ b/compiler/utils/x86_64/assembler_x86_64.h
@@ -647,6 +647,11 @@
void bsrq(CpuRegister dst, CpuRegister src);
void bsrq(CpuRegister dst, const Address& src);
+ void popcntl(CpuRegister dst, CpuRegister src);
+ void popcntl(CpuRegister dst, const Address& src);
+ void popcntq(CpuRegister dst, CpuRegister src);
+ void popcntq(CpuRegister dst, const Address& src);
+
void rorl(CpuRegister reg, const Immediate& imm);
void rorl(CpuRegister operand, CpuRegister shifter);
void roll(CpuRegister reg, const Immediate& imm);