Reduce and consistently report suspend timeouts
Break suspend waits into more shorter futex timed waits. This means
that in overload situations, the suspending thread has to wake up
more often before we time out. This should reduce timeouts, but
incurs the risk of turning some of them into ANRs.
Refactor most of the SuspendThreadByThreadId and SuspendThreadByPeer
code into a single SuspendThread function, which
a) Reports timeouts with consistently useful information
b) Supports suspension attempts with 1/4 the normal timeout.
(I intentionallly made the 1/4 a very hard-coded constant.
Making it tunable didn't seem very useful, and I think this makes
the code and comments easier to read.)
This required that we be able to recover from a failed suspend
attempt, by decrementing the suspend count. Added that facility.
For consistency and improved debug error checking, added a parameter
to RemoveFirstSuspendBarrier.
Have SuspendAll report more information about culprit task,
as we already did for SuspendByThreadId.
Also in order to support that, ensure that PassActiveSuspendBarriers
continues to hold suspend_count_lock_ until all barriers are
decremented.
Arrange for the monitor inflation code to use these fractional
timeouts so that we can retry to acquire the lock monitor after
each shorter timeout. This gives us one way to minimize the timeouts.
Have SuspendThread re-check at end whether the barrier was
decremented while we were preparing failure diagnostics. If so,
just succeed instead. Empirically, this does arise occasionally.
Change one of the signatures of FromManagedThread to accomodate our
use case in which it is hard to obtain a ScopedObjectAccess.
Report Resume() calls on unsuspended threads a bit more aggressively.
This must never happen unless it is requested through jvmti.
Test: Checked kShortSuspendTimeouts error messages for run tests.
Test: Treehugger
Bug: 297973401
Bug: 321625381
Change-Id: I7de7e8478fca14e20f7b7628b4a0a95df6a402a2
9 files changed