- 09 May, 2007 40 commits
-
-
Frank Filz authored
I have been investigating a module reference count leak on the server for rpcsec_gss_krb5.ko. It turns out the problem is a reference count leak for the security context in net/sunrpc/auth_gss/svcauth_gss.c. The problem is that gss_write_init_verf() calls gss_svc_searchbyctx() which does a rsc_lookup() but never releases the reference to the context. There is another issue that rpc.svcgssd sets an "end of time" expiration for the context By adding a cache_put() call in gss_svc_searchbyctx(), and setting an expiration timeout in the downcall, cache_clean() does clean up the context and the module reference count now goes to zero after unmount. I also verified that if the context expires and then the client makes a new request, a new context is established. Here is the patch to fix the kernel, I will start a separate thread to discuss what expiration time should be set by rpc.svcgssd. Acked-by: "J. Bruce Fields" <bfields@citi.umich.edu> Signed-off-by: Frank Filz <ffilzlnx@us.ibm.com> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
NeilBrown authored
It's not necessarily correct to assume that the xdr_buf used to hold the server's reply must have page data whenever it has tail data. And there's no need for us to deal with that case separately anyway. Acked-by: "J. Bruce Fields" <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
NeilBrown authored
We need to zero various parts of 'exp' before any 'goto out', otherwise when we go to free the contents... we die. Signed-off-by: Neil Brown <neilb@suse.de> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Akinobu Mita authored
register_rpc_pipefs() needs to clean up rpc_inode_cache by kmem_cache_destroy() on register_filesystem() failure. init_sunrpc() needs to unregister rpc_pipe_fs by unregister_rpc_pipefs() when rpc_init_mempool() returns error. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Neil Brown <neilb@suse.de> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Jeff Layton authored
When the kernel calls svc_reserve to downsize the expected size of an RPC reply, it fails to account for the possibility of a checksum at the end of the packet. If a client mounts a NFSv2/3 with sec=krb5i/p, and does I/O then you'll generally see messages similar to this in the server's ring buffer: RPC request reserved 164 but used 208 While I was never able to verify it, I suspect that this problem is also the root cause of some oopses I've seen under these conditions: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=227726 This is probably also a problem for other sec= types and for NFSv4. The large reserved size for NFSv4 compound packets seems to generally paper over the problem, however. This patch adds a wrapper for svc_reserve that accounts for the possibility of a checksum. It also fixes up the appropriate callers of svc_reserve to call the wrapper. For now, it just uses a hardcoded value that I determined via testing. That value may need to be revised upward as things change, or we may want to eventually add a new auth_op that attempts to calculate this somehow. Unfortunately, there doesn't seem to be a good way to reliably determine the expected checksum length prior to actually calculating it, particularly with schemes like spkm3. Signed-off-by: Jeff Layton <jlayton@redhat.com> Acked-by: Neil Brown <neilb@suse.de> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Acked-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Eric W. Biederman authored
Acked-by: Neil Brown <neilb@suse.de> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
NeilBrown authored
Now that sk_defer_lock protects two different things, make the name more generic. Also don't bother with disabling _bh as the lock is only ever taken from process context. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Peter Staubach authored
The NFSv2 and NFSv3 servers do not handle WRITE requests for 0 bytes correctly. The specifications indicate that the server should accept the request, but it should mostly turn into a no-op. Currently, the server will return an XDR decode error, which it should not. Attached is a patch which addresses this issue. It also adds some boundary checking to ensure that the request contains as much data as was requested to be written. It also correctly handles an NFSv3 request which requests to write more data than the server has stated that it is prepared to handle. Previously, there was some support which looked like it should work, but wasn't quite right. Signed-off-by: Peter Staubach <staubach@redhat.com> Acked-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Adrian Bunk authored
nfs4_acl_add_ace() can now be removed. Signed-off-by: Adrian Bunk <bunk@stusta.de> Acked-by: Neil Brown <neilb@cse.unsw.edu.au> Acked-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Oleg Nesterov authored
Thanks to Jarek Poplawski for the ideas and for spotting the bug in the initial draft patch. cancel_rearming_delayed_work() currently has many limitations, because it requires that dwork always re-arms itself via queue_delayed_work(). So it hangs forever if dwork doesn't do this, or cancel_rearming_delayed_work/ cancel_delayed_work was already called. It uses flush_workqueue() in a loop, so it can't be used if workqueue was freezed, and it is potentially live- lockable on busy system if delay is small. With this patch cancel_rearming_delayed_work() doesn't make any assumptions about dwork, it can re-arm itself via queue_delayed_work(), or queue_work(), or do nothing. As a "side effect", cancel_work_sync() was changed to handle re-arming works as well. Disadvantages: - this patch adds wmb() to insert_work(). - slowdowns the fast path (when del_timer() succeeds on entry) of cancel_rearming_delayed_work(), because wait_on_work() is called unconditionally. In that case, compared to the old version, we are doing "unneeded" lock/unlock for each online CPU. On the other hand, this means we don't need to use cancel_work_sync() after cancel_rearming_delayed_work(). - complicates the code (.text grows by 130 bytes). [akpm@linux-foundation.org: fix speling] Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: David Chinner <dgc@sgi.com> Cc: David Howells <dhowells@redhat.com> Cc: Gautham Shenoy <ego@in.ibm.com> Acked-by: Jarek Poplawski <jarkao2@o2.pl> Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Gautham R Shenoy authored
We are anyway kthread_stop()ping other per-cpu kernel threads after move_task_off_dead_cpu(), so we can do it with the stop_machine_run thread as well. I just checked with Vatsa if there was any subtle reason why they had put in the kthread_bind() in cpu.c. Vatsa cannot seem to recollect any and I can't see any. So let us just remove the kthread_bind. Signed-off-by: Gautham R Shenoy <ego@in.ibm.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Oleg Nesterov authored
Currently kernel threads use sigprocmask(SIG_BLOCK) to protect against signals. This doesn't prevent the signal delivery, this only blocks signal_wake_up(). Every "killall -33 kthreadd" means a "struct siginfo" leak. Change kthreadd_setup() to set all handlers to SIG_IGN instead of blocking them (make a new helper ignore_signals() for that). If the kernel thread needs some signal, it should use allow_signal() anyway, and in that case it should not use CLONE_SIGHAND. Note that we can't change daemonize() (should die!) in the same way, because it can be used along with CLONE_SIGHAND. This means that allow_signal() still should unblock the signal to work correctly with daemonize()ed threads. However, disallow_signal() doesn't block the signal any longer but ignores it. NOTE: with or without this patch the kernel threads are not protected from handle_stop_signal(), this seems harmless, but not good. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Oleg Nesterov authored
worker_thread() inherits ignored SIGCHLD and numa_default_policy() from its parent, kthreadd. No need to setup this again. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Oleg Nesterov authored
allow_signal(SIGCHLD) does all necessary job, no need to call do_sigaction() prior to. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Oleg Nesterov authored
usbatm_do_heavy_init() calls allow_signal() which plays with parent process's ->sighand. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: Duncan Sands <duncan.sands@free.fr> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Eric W. Biederman authored
When a kernel thread calls daemonize, instead of reparenting the thread to init reparent the thread to kthreadd next to the threads created by kthread_create. This is really just a stop gap until daemonize goes away, but it does ensure no kernel threads are under init and they are all in one place that is easy to find. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Eric W. Biederman authored
Currently there is a circular reference between work queue initialization and kthread initialization. This prevents the kthread infrastructure from initializing until after work queues have been initialized. We want the properties of tasks created with kthread_create to be as close as possible to the init_task and to not be contaminated by user processes. The later we start our kthreadd that creates these tasks the harder it is to avoid contamination from user processes and the more of a mess we have to clean up because the defaults have changed on us. So this patch modifies the kthread support to not use work queues but to instead use a simple list of structures, and to have kthreadd start from init_task immediately after our kernel thread that execs /sbin/init. By being a true child of init_task we only have to change those process settings that we want to have different from init_task, such as our process name, the cpus that are allowed, blocking all signals and setting SIGCHLD to SIG_IGN so that all of our children are reaped automatically. By being a true child of init_task we also naturally get our ppid set to 0 and do not wind up as a child of PID == 1. Ensuring that tasks generated by kthread_create will not slow down the functioning of the wait family of functions. [akpm@linux-foundation.org: use interruptible sleeps] Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Oleg Nesterov authored
____call_usermodehelper() has no reason for flush_signals(). It is a fresh forked process which is going to exec a user-space application or exit on failure. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Oleg Nesterov authored
flush_work(wq, work) doesn't need the first parameter, we can use cwq->wq (this was possible from the very beginnig, I missed this). So we can unify flush_work_keventd and flush_work. Also, rename flush_work() to cancel_work_sync() and fix all callers. Perhaps this is not the best name, but "flush_work" is really bad. (akpm: this is why the earlier patches bypassed maintainers) Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Jeff Garzik <jeff@garzik.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Tejun Heo <htejun@gmail.com> Cc: Auke Kok <auke-jan.h.kok@intel.com>, Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Christoph Lameter authored
Shutdown the cache_reaper if the cpu is brought down and set the cache_reap.func to NULL. Otherwise hotplug shuts down the reaper for good. Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Oleg Nesterov authored
We already depend on fact that all sub-threads have ->exit_signal == -1, no need to set it in zap_other_threads(). Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Oleg Nesterov authored
worker_thread() can miss freeze_process()->signal_wake_up() if it happens between try_to_freeze() and prepare_to_wait(). We should check freezing() before entering schedule(). This race was introduced by me in [PATCH 1/1] workqueue: don't migrate pending works from the dead CPU Looks like mm/vmscan.c:kswapd() has the same race. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Oleg Nesterov authored
worker_thread() doesn't need to "Block and flush all signals", this was already done by its caller, kthread(). Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Oleg Nesterov authored
We don't have any users, and it is not so trivial to use NOAUTOREL works correctly. It is better to simplify API. Delete NOAUTOREL support and rename work_release to work_clear_pending to avoid a confusion. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Oleg Nesterov authored
net/ipv4/ipvs/ip_vs_core.c module_exit ip_vs_cleanup ip_vs_control_cleanup cancel_rearming_delayed_work // done This is unsafe. The module may be unloaded and the memory may be freed while defense_work's handler is still running/preempted. Do flush_work(&defense_work.work) after cancel_rearming_delayed_work(). Alternatively, we could add flush_work() to cancel_rearming_delayed_work(), but note that we can't change cancel_delayed_work() in the same manner because it may be called from atomic context. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Oleg Nesterov authored
cancel_rearming_delayed_workqueue(wq, dwork) doesn't need the first parameter. We don't hang on un-queued dwork any longer, and work->data doesn't change its type. This means we can always figure out "wq" from dwork when it is needed. Remove this parameter, and rename the function to cancel_rearming_delayed_work(). Re-create an inline "obsolete" cancel_rearming_delayed_workqueue(wq) which just calls cancel_rearming_delayed_work(). Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Oleg Nesterov authored
Cleanup. A number of per_cpu_ptr(wq->cpu_wq, cpu) users have to check that cpu is valid for this wq. Make a simple helper. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Oleg Nesterov authored
Change queue_delayed_work() to use queue_delayed_work_on() to avoid the code duplication (saves 133 bytes). Q: queue_delayed_work() enqueues &dwork->work directly when delay == 0, why? [jirislaby@gmail.com: oops fix] Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Oleg Nesterov authored
Currently typeof(delayed_work->work.data) is "struct workqueue_struct" when the timer is pending "struct cpu_workqueue_struct" whe the work is queued This makes impossible to use flush_fork(delayed_work->work) in addition to cancel_delayed_work/cancel_rearming_delayed_work, not good. Change queue_delayed_work/delayed_work_timer_fn to use cwq, not wq. This complicates (and uglifies) these functions a little bit, but alows us to use flush_fork(dwork) and imho makes the whole code more consistent. Also, document the fact that cancel_rearming_delayed_work() doesn't garantee the completion of work->func() upon return. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Oleg Nesterov authored
CPU_UP_PREPARE binds cwq->thread to the new CPU. So CPU_UP_CANCELED tries to wake up the task which is bound to the failed CPU. With this patch we don't bind cwq->thread until CPU becomes online. The first wake_up() after kthread_create() is a bit special, make a simple helper for that. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Gautham R Shenoy <ego@in.ibm.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Oleg Nesterov authored
The only caller of init_workqueues() is do_basic_setup(). Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Oleg Nesterov authored
Add explicit workqueue_struct->singlethread flag. This lessens .text a little, but most importantly this allows us to manipulate wq->list without changine the meaning of is_single_threaded(). Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Oleg Nesterov authored
The code like if (is_single_threaded(wq)) do_something(singlethread_cpu); else { for_each_cpu_mask(cpu, cpu_populated_map) do_something(cpu); } looks very annoying. We can add "static cpumask_t cpu_singlethread_map" and simplify the code. Lessens .text a bit, and imho makes the code more readable. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Oleg Nesterov authored
cancel_rearming_delayed_workqueue(dwork) will hang forever if dwork was not scheduled, because in that case cancel_delayed_work()->del_timer_sync() never returns true. I don't know if there are any callers which may have problems, but this is not so convenient, and the fix is very simple. Q: looks like we don't need "struct workqueue_struct *wq" parameter. If the timer was aborted successfully, get_wq_data() == wq. Is it worth to add the new function? Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Oleg Nesterov authored
work->func() may sleep, it's a bug to call run_workqueue() with irqs disabled. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Oleg Nesterov authored
Because it has no callers. Actually, I think the whole idea of run_scheduled_work() was not right, not good to mix "unqueue this work and execute its ->func()" in one function. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Oleg Nesterov authored
Currently CPU_DEAD uses kthread_stop() to stop cwq->thread and then transfers cwq->worklist to another CPU. However, it is very unlikely that worker_thread() will notice kthread_should_stop() before flushing cwq->worklist. It is only possible if worker_thread() was preempted after run_workqueue(cwq), a new work_struct was added, and CPU_DEAD happened before cwq->thread has a chance to run. This means that take_over_work() mostly adds unneeded complications. Note also that kthread_stop() is not good per se, wake_up_process() may confuse work->func() if it sleeps waiting for some event. Remove take_over_work() and migrate_sequence complications. CPU_DEAD sets the cwq->should_stop flag (introduced by this patch) and waits for cwq->thread to flush cwq->worklist and exit. Because the dead CPU is not on cpu_online_map, no more works can be added to that cwq. cpu_populated_map was introduced to optimize for_each_possible_cpu(), it is not strictly needed, and it is more a documentation in fact. Saves 418 bytes. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com> Cc: "Pallipadi, Venkatesh" <venkatesh.pallipadi@intel.com> Cc: Gautham shenoy <ego@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Oleg Nesterov authored
Pointed out by Srivatsa Vaddagiri. cleanup_workqueue_thread() sets cwq->thread = NULL and does kthread_stop(). This breaks the "if (cwq->thread == current)" logic in flush_cpu_workqueue() and leads to deadlock. Kill the thead first, then clear cwq->thread. workqueue_mutex protects us from create_workqueue_thread() so we don't need cwq->lock. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com> Cc: "Pallipadi, Venkatesh" <venkatesh.pallipadi@intel.com> Cc: Gautham shenoy <ego@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Oleg Nesterov authored
Many thanks to Srivatsa Vaddagiri for the helpful discussion and for spotting the bug in my previous attempt. work->func() (and thus flush_workqueue()) must not use workqueue_mutex, this leads to deadlock when CPU_DEAD does kthread_stop(). However without this mutex held we can't detect CPU_DEAD in progress, which can move pending works to another CPU while the dead one is not on cpu_online_map. Change flush_workqueue() to use for_each_possible_cpu(). This means that flush_cpu_workqueue() may hit CPU which is already dead. However in that case !list_empty(&cwq->worklist) || cwq->current_work != NULL means that CPU_DEAD in progress, it will do kthread_stop() + take_over_work() so we can proceed and insert a barrier. We hold cwq->lock, so we are safe. Also, add migrate_sequence incremented by take_over_work() under cwq->lock. If take_over_work() happened before we checked this CPU, we should see the new value after spin_unlock(). Further possible changes: remove CPU_DEAD handling (along with take_over_work, migrate_sequence) from workqueue.c. CPU_DEAD just sets cwq->please_exit_after_flush flag. CPU_UP_PREPARE->create_workqueue_thread() clears this flag, and creates the new thread if cwq->thread == NULL. This way the workqueue/cpu-hotplug interaction is almost zero, workqueue_mutex just protects "workqueues" list, CPU_LOCK_ACQUIRE/CPU_LOCK_RELEASE go away. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com> Cc: "Pallipadi, Venkatesh" <venkatesh.pallipadi@intel.com> Cc: Gautham shenoy <ego@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-
Oleg Nesterov authored
Currently ->freezeable is per-cpu, this is wrong. CPU_UP_PREPARE creates cwq->thread which is not freezeable. Move ->freezeable to workqueue_struct. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com> Cc: "Pallipadi, Venkatesh" <venkatesh.pallipadi@intel.com> Cc: Gautham shenoy <ego@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-