mm, oom_reaper: hide oom reaped tasks from OOM killer more carefully
Commit 36324a990cf5 ("oom: clear TIF_MEMDIE after oom_reaper managed to
unmap the address space") not only clears TIF_MEMDIE for oom reaped task
but also set OOM_SCORE_ADJ_MIN for the target task to hide it from the
oom killer. This works in simple cases but it is not sufficient for
(unlikely) cases where the mm is shared between independent processes
(as they do not share signal struct). If the mm had only small amount
of memory which could be reaped then another task sharing the mm could
be selected and that wouldn't help to move out from the oom situation.
Introduce MMF_OOM_REAPED mm flag which is checked in oom_badness (same
as OOM_SCORE_ADJ_MIN) and task is skipped if the flag is set. Set the
flag after __oom_reap_task is done with a task. This will force the
select_bad_process() to ignore all already oom reaped tasks as well as
no such task is sacrificed for its parent.
Signed-off-by: Michal Hocko <mhocko@suse.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 415f7eb..c0376ef 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -174,8 +174,13 @@
if (!p)
return 0;
+ /*
+ * Do not even consider tasks which are explicitly marked oom
+ * unkillable or have been already oom reaped.
+ */
adj = (long)p->signal->oom_score_adj;
- if (adj == OOM_SCORE_ADJ_MIN) {
+ if (adj == OOM_SCORE_ADJ_MIN ||
+ test_bit(MMF_OOM_REAPED, &p->mm->flags)) {
task_unlock(p);
return 0;
}
@@ -513,7 +518,7 @@
* This task can be safely ignored because we cannot do much more
* to release its memory.
*/
- tsk->signal->oom_score_adj = OOM_SCORE_ADJ_MIN;
+ set_bit(MMF_OOM_REAPED, &mm->flags);
out:
mmput(mm);
return ret;