destroy_workqueue() can livelock Pointed out by Michal Schmidt <mschmidt@redhat.com>. The bug was introduced in 2.6.22 by me. cleanup_workqueue_thread() does flush_cpu_workqueue(cwq) in a loop until ->worklist becomes empty. This is live-lockable, a re-niced caller can get CPU after wake_up() and insert a new barrier before the lower-priority cwq->thread has a chance to clear ->current_work. Change cleanup_workqueue_thread() to do flush_cpu_workqueue(cwq) only once. We can rely on the fact that run_workqueue() won't return until it flushes all works. So it is safe to call kthread_stop() after that, the "should stop" request won't be noticed until run_workqueue() returns. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Michal Schmidt <mschmidt@redhat.com> Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit: 13c22168b7276dffe49dc66675d5a78f6d288e0d [log] [tgz]
author: Oleg Nesterov <oleg@tv-sign.ru> Tue Jul 17 04:03:55 2007 -0700
committer: Linus Torvalds <torvalds@woody.linux-foundation.org> Tue Jul 17 10:23:03 2007 -0700
tree: 4062929954f04db9c24be08cba94a0ed6e7fd65f
parent: 87a7defb0d4255d5aea2c5067813b26836127983 [diff] [blame]
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 1935302..58e5c15 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c

@@ -752,18 +752,17 @@
 	if (cwq->thread == NULL)
 		return;
 
+	flush_cpu_workqueue(cwq);
 	/*
-	 * If the caller is CPU_DEAD the single flush_cpu_workqueue()
-	 * is not enough, a concurrent flush_workqueue() can insert a
-	 * barrier after us.
+	 * If the caller is CPU_DEAD and cwq->worklist was not empty,
+	 * a concurrent flush_workqueue() can insert a barrier after us.
+	 * However, in that case run_workqueue() won't return and check
+	 * kthread_should_stop() until it flushes all work_struct's.
 	 * When ->worklist becomes empty it is safe to exit because no
 	 * more work_structs can be queued on this cwq: flush_workqueue
 	 * checks list_empty(), and a "normal" queue_work() can't use
 	 * a dead CPU.
 	 */
-	while (flush_cpu_workqueue(cwq))
-		;
-
 	kthread_stop(cwq->thread);
 	cwq->thread = NULL;
 }
commit	13c22168b7276dffe49dc66675d5a78f6d288e0d	[log] [tgz]
author	Oleg Nesterov <oleg@tv-sign.ru>	Tue Jul 17 04:03:55 2007 -0700
committer	Linus Torvalds <torvalds@woody.linux-foundation.org>	Tue Jul 17 10:23:03 2007 -0700
tree	4062929954f04db9c24be08cba94a0ed6e7fd65f
parent	87a7defb0d4255d5aea2c5067813b26836127983 [diff] [blame]