From 476ed072f0dcd30f83188ccb3200a963c4ae11d7 Mon Sep 17 00:00:00 2001 From: Mitch Phillips Date: Wed, 29 May 2024 13:18:31 +0200 Subject: Workaround bad GC of tombstone watcher. There's some bad behaviour going on that's exclusive to WatchOS (discovered on the Pixel Watch 2). The TombstoneWatcher class is responsible for watching the /data/tombstones/ directory. It is a member of the NativeTombstoneManager class, which is created when ActivityManager starts, and lives until the device is powered off. Unfortunately, some bad behaviour (unclear exactly what at this stage) means that the TombstoneWatcher class is spuriously garbage collected, even though the parent NativeTombstoneMenager class is still alive and reachable. I chatted with hboehm@ from the ART team, and his hypothesis is that because the TombstoneWatcher member is initialized in the constructor and never used again, some optimiser is doing the wrong thing and thinking that it's unused and can be GC'd. The TombstoneWatcher uses inotify in native code to execute a callback when a file is added to a directory, so it's clearly in use, but unfortunately the bug allows this class to be GC'd anyway. What this means, in practice, is that app native crashes on WatchOS are not being handled correctly. This includes: - Apps cannot get their own crashes through the ApplicationExitInfo::getHistoricalProcessExitReasons API. - Crash reports are never sent to Google, with the exception of tombstones that last until the next reboot, where they're one-time scooped up and sent to Google. This has substantially increased latency (requires a reboot) and also probably drops quite a number of reports because there's a limit of tombstones on the device, and a rate limit of how many are sent by GMS. Bug: 339371242 Test: atest CtsGwpAsanTestCases on eos (Pixel Watch 2, WatchOS) Change-Id: I9226e4368b03bd4742fccdafde6018f145da63e6 --- .../java/com/android/server/os/NativeTombstoneManager.java | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/services/core/java/com/android/server/os/NativeTombstoneManager.java b/services/core/java/com/android/server/os/NativeTombstoneManager.java index f6e7ef3d50e9..3bcaf58d0a13 100644 --- a/services/core/java/com/android/server/os/NativeTombstoneManager.java +++ b/services/core/java/com/android/server/os/NativeTombstoneManager.java @@ -56,6 +56,7 @@ import java.io.File; import java.io.FileInputStream; import java.io.FileNotFoundException; import java.io.IOException; +import java.lang.ref.Reference; import java.util.ArrayList; import java.util.Collections; import java.util.Optional; @@ -73,6 +74,9 @@ public final class NativeTombstoneManager { private final Context mContext; private final Handler mHandler; + // TODO(b/339371242): The garbage collector is misbehaving, and we must have + // a reference to this member outside the constructor. More details in the + // corresponding comment elsewhere in this class. private final TombstoneWatcher mWatcher; private final ReentrantLock mTmpFileLock = new ReentrantLock(); @@ -139,6 +143,14 @@ public final class NativeTombstoneManager { processName = parsedTombstone.get().getProcessName(); } BootReceiver.addTombstoneToDropBox(mContext, path, isProtoFile, processName, mTmpFileLock); + + // TODO(b/339371242): An optimizer on WearOS is misbehaving and this member is being garbage + // collected as it's never referenced inside this class outside of the constructor. But, + // it's a file watcher, and needs to stay alive to do its job. So, add a cheap check here to + // force the GC to behave itself. From a technical perspective, it's possible that we need + // to add this trick to every single member function, but this seems to work correctly in + // practice and avoids polluting a lot more of this class. + Reference.reachabilityFence(mWatcher); } private Optional handleProtoTombstone(File path, boolean addToList) { -- cgit v1.2.3-59-g8ed1b