Assembly region TLAB allocation fast path for arm.

This is for the CC collector.

Share the common fast path code with the tlab fast path code.

Speedup (on N5):
        BinaryTrees:  2291 ->  902 ms (-60%)
        MemAllocTest: 2137 -> 1845 ms (-14%)

Bug: 9986565
Bug: 12687968

Change-Id: Ica63094ec2f85eaa4fd04d202a20090399275d85
diff --git a/runtime/asm_support.h b/runtime/asm_support.h
index 879364e..d5f0dff 100644
--- a/runtime/asm_support.h
+++ b/runtime/asm_support.h
@@ -101,6 +101,11 @@
 ADD_TEST_EQ(THREAD_ID_OFFSET,
             art::Thread::ThinLockIdOffset<__SIZEOF_POINTER__>().Int32Value())
 
+// Offset of field Thread::tls32_.is_gc_marking.
+#define THREAD_IS_GC_MARKING_OFFSET 52
+ADD_TEST_EQ(THREAD_IS_GC_MARKING_OFFSET,
+            art::Thread::IsGcMarkingOffset<__SIZEOF_POINTER__>().Int32Value())
+
 // Offset of field Thread::tlsPtr_.card_table.
 #define THREAD_CARD_TABLE_OFFSET 128
 ADD_TEST_EQ(THREAD_CARD_TABLE_OFFSET,