CHROMIUM: DROP: mm/oom_kill: Avoid deadlock; allow multiple victims

DROP THIS PATCH ON REBASE. It is part of a patch series that attempts to solve similar problems to the OOM reaper upstream. NOTE that the OOM reaper patches weren't backported because mm patches in general are fairly intertwined and picking the OOM reaper without an mm expert and lots of careful testing could cause new mysterious problems. Compared to the OOM reaper, this patch series: + Is simpler to reason about. + Is not very intertwined to the rest of the system. + Can be fairly easily reasoned about to see that it should do no harm, even if it doesn't fix every problem. - Can more easily get into the state where all memory reserves are gone, since it allows more than one task to be in "MEMDIE". - Doesn't free memory as quickly. The reaper kills anonymous / swapped memory even before the doomed task exits. - Contains the magic "100 ms" heuristic, which is non-ideal.
author: Douglas Anderson <dianders@chromium.org> 2017-03-30 16:43:24 -0700
committer: Moyster <oysterized@gmail.com> 2018-12-01 22:08:45 +0100
commit: 02cf60c4811080078829bcf8574de547eaad3b79 (patch)
tree: f70aa75b21c288984e4feb905675f6d8cc3eb3f8 /mm
parent: 2bdd9a2042a0e12d96c545773d9d8038c920f813 (diff)
1 files changed, 27 insertions, 4 deletions
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 21dd99eac..1a15cc0e1 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -44,6 +44,8 @@ int sysctl_oom_kill_allocating_task;
 int sysctl_oom_dump_tasks = 1;
 static DEFINE_SPINLOCK(zone_scan_lock);
 
+static unsigned long last_victim;
+
 #ifdef CONFIG_NUMA
 /**
  * has_intersects_mems_allowed() - check task eligiblity for kill
@@ -264,14 +266,32 @@ enum oom_scan_t oom_scan_process_thread(struct task_struct *task,
 		return OOM_SCAN_CONTINUE;
 
 	/*
-	 * This task already has access to memory reserves and is being killed.
-	 * Don't allow any other task to have access to the reserves.
+	 * We found a task that we already tried to kill, but it hasn't
+	 * finished dying yet.  Generally we want to avoid choosing another
+	 * victim until it finishes.  If we choose lots of victims then we'll
+	 * use up our memory reserves and none of the tasks will be able to
+	 * exit.
+	 *
+	 * ...but we can't wait forever.  If a task persistently refuses to
+	 * die then it might be waiting on a resource (mutex or whatever) that
+	 * won't be released until _some other_ task runs.  ...and maybe that
+	 * other is blocked waiting on memory (deadlock!).  If it's been
+	 * "long enough" then we'll just skip over existing victims and pick
+	 * someone new to kill.
 	 */
 	if (test_tsk_thread_flag(task, TIF_MEMDIE)) {
 		if (unlikely(frozen(task)))
 			__thaw_task(task);
-		if (!force_kill)
-			return OOM_SCAN_ABORT;
+		if (!force_kill) {
+			if (time_after(jiffies,
+				       last_victim + msecs_to_jiffies(100))) {
+				pr_warn("Task %s:%d refused to die\n",
+					task->comm, task->pid);
+				return OOM_SCAN_CONTINUE;
+			} else {
+				return OOM_SCAN_ABORT;
+			}
+		}
 	}
 	if (!task->mm)
 		return OOM_SCAN_CONTINUE;
@@ -435,6 +455,7 @@ void oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
 	 */
 	if (p->flags & PF_EXITING) {
 		set_tsk_thread_flag(p, TIF_MEMDIE);
+		last_victim = jiffies;
 		put_task_struct(p);
 		return;
 	}
@@ -538,6 +559,7 @@ void oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
 	rcu_read_unlock();
 
 	set_tsk_thread_flag(victim, TIF_MEMDIE);
+	last_victim = jiffies;
 	do_send_sig_info(SIGKILL, SEND_SIG_FORCED, victim, true);
 	put_task_struct(victim);
 }
@@ -670,6 +692,7 @@ void out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask,
 	 */
 	if (fatal_signal_pending(current) || current->flags & PF_EXITING) {
 		set_thread_flag(TIF_MEMDIE);
+		last_victim = jiffies;
 		return;
 	}
author	Douglas Anderson <dianders@chromium.org>	2017-03-30 16:43:24 -0700
committer	Moyster <oysterized@gmail.com>	2018-12-01 22:08:45 +0100
commit	02cf60c4811080078829bcf8574de547eaad3b79 (patch)
tree	f70aa75b21c288984e4feb905675f6d8cc3eb3f8 /mm
parent	2bdd9a2042a0e12d96c545773d9d8038c920f813 (diff)