Parellel mark stack processing

Enabled parallel mark stack processing by using a thread pool.

Optimized object scanning by removing dependent loads for IsClass.

Performance:
Prime: ~10% speedup of partial GC.
Nakasi: ~50% speedup of partial GC.

Change-Id: I43256a068efc47cb52d93108458ea18d4e02fccc
diff --git a/src/common_test.h b/src/common_test.h
index 67d2266..f564bbd 100644
--- a/src/common_test.h
+++ b/src/common_test.h
@@ -388,6 +388,10 @@
     compiler_.reset(new Compiler(compiler_backend, instruction_set, true, 2, false, image_classes_.get(),
                                  true, true));
 
+    // Create the heap thread pool so that the GC runs in parallel for tests. Normally, the thread
+    // pool is created by the runtime.
+    runtime_->GetHeap()->CreateThreadPool();
+
     runtime_->GetHeap()->VerifyHeap();  // Check for heap corruption before the test
   }