ARM64: FP16.compare() intrinsic for ARMv8

This CL implements an intrinsic for compare() method with
ARMv8.2 FP16 instructions.

The performance improvements using timeCompareFP16 FP16Intrinsic
micro intrinsic benchmark on pixel4:
- Java implementation libcore.util.FP16.compare:
    - big cluster only: 742
    - little cluster only: 2286
- arm64 compare Intrinisic implementation:
    - big cluster only: 492 (~34% faster)
    - little cluster only: 1535 (~33% faster)
The benchmark can be found in the following patch:
https://android-review.linaro.org/c/linaro/art-testing/+/21039

Authors: Usama Arif, Edward Pickup, Joel Goddard

Test: 580-checker-fp16
Test: art/test/testrunner/run_build_test_target.py -j80 art-test-javac

Change-Id: Idbe9f56f964f044e6d725bd696459fb04d2ac76c
9 files changed