• Paul Mackerras's avatar
    perf_counter: Fix race in attaching counters to tasks and exiting · c93f7669
    Paul Mackerras authored
    Commit 564c2b21 ("perf_counter: Optimize context switch between
    identical inherited contexts") introduced a race where it is possible
    that a counter being attached to a task could get attached to the
    wrong task, if the task is one that has inherited its context from
    another task via fork.  This happens because the optimized context
    switch could switch the context to another task after find_get_context
    has read task->perf_counter_ctxp.  In fact, it's possible that the
    context could then get freed, if the other task then exits.
    
    This fixes the problem by protecting both the context switch and the
    critical code in find_get_context with spinlocks.  The context switch
    locks the cxt->lock of both the outgoing and incoming contexts before
    swapping them.  That means that once code such as find_get_context
    has obtained the spinlock for the context associated with a task,
    the context can't get swapped to another task.  However, the context
    may have been swapped in the interval between reading
    task->perf_counter_ctxp and getting the lock, so it is necessary to
    check and retry.
    
    To make sure that none of the contexts being looked at in
    find_get_context can get freed, this changes the context freeing code
    to use RCU.  Thus an rcu_read_lock() is sufficient to ensure that no
    contexts can get freed.  This part of the patch is lifted from a patch
    posted by Peter Zijlstra.
    
    This also adds a check to make sure that we can't add a counter to a
    task that is exiting.
    
    There is also a race between perf_counter_exit_task and
    find_get_context; this solves the race by moving the get_ctx that
    was in perf_counter_alloc into the locked region in find_get_context,
    so that once find_get_context has got the context for a task, it
    won't get freed even if the task calls perf_counter_exit_task.  It
    doesn't matter if new top-level (non-inherited) counters get attached
    to the context after perf_counter_exit_task has detached the context
    from the task.  They will just stay there and never get scheduled in
    until the counters' fds get closed, and then perf_release will remove
    them from the context and eventually free the context.
    
    With this, we are now doing the unclone in find_get_context rather
    than when a counter was added to or removed from a context (actually,
    we were missing the unclone_ctx() call when adding a counter to a
    context).  We don't need to unclone when removing a counter from a
    context because we have no way to remove a counter from a cloned
    context.
    
    This also takes out the smp_wmb() in find_get_context, which Peter
    Zijlstra pointed out was unnecessary because the cmpxchg implies a
    full barrier anyway.
    Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
    Acked-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
    Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
    Cc: Mike Galbraith <efault@gmx.de>
    Cc: Marcelo Tosatti <mtosatti@redhat.com>
    Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
    Cc: John Kacur <jkacur@redhat.com>
    LKML-Reference: <18974.33033.667187.273886@cargo.ozlabs.ibm.com>
    Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
    c93f7669
perf_counter.c 89.9 KB