Replace weak-ref access disable checkpoint with STW pause
Disabling weak-ref access in ConcurrentCopying collector can lead to
deadlocks. For instance, if mutator M1 acquires W1 mutex and then
participates in the checkpoint and then gets blocked in getReferent(),
waiting for the gc-thread to finish reference processing. Mutator M2
waits for M1 to release W1 so that it can acquire the mutex before
participating in the checkpoint. On the other hand, GC-thread waits
for M2 to finish checkpoint.
A STW pause avoids the deadlock by ensuring that mutators are not
blocked on weak-ref access before the pause, and GC-thread can make
progress after the pause in reference processing.
Bug: 195336624
Bug: 195261575
Test: art/test/testrunner/testrunner.py
Change-Id: I03d6bcd4d53f37ec84064edd8292951d30f48eaf
diff --git a/runtime/gc/collector/concurrent_copying.cc b/runtime/gc/collector/concurrent_copying.cc
index a850da7..47ee3b6 100644
--- a/runtime/gc/collector/concurrent_copying.cc
+++ b/runtime/gc/collector/concurrent_copying.cc
@@ -1630,11 +1630,16 @@
// for the last time before transitioning to the shared mark stack mode, which would process new
// refs that may have been concurrently pushed onto the mark stack during the ProcessMarkStack()
// call above. At the same time, disable weak ref accesses using a per-thread flag. It's
- // important to do these together in a single checkpoint so that we can ensure that mutators
- // won't newly gray objects and push new refs onto the mark stack due to weak ref accesses and
+ // important to do these together so that we can ensure that mutators won't
+ // newly gray objects and push new refs onto the mark stack due to weak ref accesses and
// mutators safely transition to the shared mark stack mode (without leaving unprocessed refs on
// the thread-local mark stacks), without a race. This is why we use a thread-local weak ref
// access flag Thread::tls32_.weak_ref_access_enabled_ instead of the global ones.
+ // We must use a stop-the-world pause to disable weak ref access. A checkpoint may lead to a
+ // deadlock if one mutator acquires a low-level mutex and then gets blocked while accessing
+ // a weak-ref (after participating in the checkpoint), and another mutator indefinitely waits
+ // for the mutex before it participates in the checkpoint. Consequently, the gc-thread blocks
+ // forever as the checkpoint never finishes (See runtime/mutator_gc_coord.md).
SwitchToSharedMarkStackMode();
CHECK(!self->GetWeakRefAccessEnabled());
// Now that weak refs accesses are disabled, once we exhaust the shared mark stack again here
@@ -2044,21 +2049,36 @@
void ConcurrentCopying::RevokeThreadLocalMarkStacks(bool disable_weak_ref_access,
Closure* checkpoint_callback) {
Thread* self = Thread::Current();
- RevokeThreadLocalMarkStackCheckpoint check_point(this, disable_weak_ref_access);
+ Locks::mutator_lock_->AssertSharedHeld(self);
ThreadList* thread_list = Runtime::Current()->GetThreadList();
- gc_barrier_->Init(self, 0);
- size_t barrier_count = thread_list->RunCheckpoint(&check_point, checkpoint_callback);
- // If there are no threads to wait which implys that all the checkpoint functions are finished,
- // then no need to release the mutator lock.
- if (barrier_count == 0) {
- return;
+ RevokeThreadLocalMarkStackCheckpoint check_point(this, disable_weak_ref_access);
+ if (disable_weak_ref_access) {
+ // We're the only thread that could possibly ask for exclusive access here.
+ Locks::mutator_lock_->SharedUnlock(self);
+ {
+ ScopedPause pause(this);
+ MutexLock mu(self, *Locks::thread_list_lock_);
+ checkpoint_callback->Run(self);
+ for (Thread* thread : thread_list->GetList()) {
+ check_point.Run(thread);
+ }
+ }
+ Locks::mutator_lock_->SharedLock(self);
+ } else {
+ gc_barrier_->Init(self, 0);
+ size_t barrier_count = thread_list->RunCheckpoint(&check_point, checkpoint_callback);
+ // If there are no threads to wait which implys that all the checkpoint functions are finished,
+ // then no need to release the mutator lock.
+ if (barrier_count == 0) {
+ return;
+ }
+ Locks::mutator_lock_->SharedUnlock(self);
+ {
+ ScopedThreadStateChange tsc(self, kWaitingForCheckPointsToRun);
+ gc_barrier_->Increment(self, barrier_count);
+ }
+ Locks::mutator_lock_->SharedLock(self);
}
- Locks::mutator_lock_->SharedUnlock(self);
- {
- ScopedThreadStateChange tsc(self, kWaitingForCheckPointsToRun);
- gc_barrier_->Increment(self, barrier_count);
- }
- Locks::mutator_lock_->SharedLock(self);
}
void ConcurrentCopying::RevokeThreadLocalMarkStack(Thread* thread) {