• Zach Brown's avatar
    [PATCH] dio: only call aio_complete() after returning -EIOCBQUEUED · 8459d86a
    Zach Brown authored
    The only time it is safe to call aio_complete() is when the ->ki_retry
    function returns -EIOCBQUEUED to the AIO core.  direct_io_worker() has
    historically done this by relying on its caller to translate positive return
    codes into -EIOCBQUEUED for the aio case.  It did this by trying to keep
    conditionals in sync.  direct_io_worker() knew when finished_one_bio() was
    going to call aio_complete().  It would reverse the test and wait and free the
    dio in the cases it thought that finished_one_bio() wasn't going to.
    
    Not surprisingly, it ended up getting it wrong.  'ret' could be a negative
    errno from the submission path but it failed to communicate this to
    finished_one_bio().  direct_io_worker() would return < 0, it's callers
    wouldn't raise -EIOCBQUEUED, and aio_complete() would be called.  In the
    future finished_one_bio()'s tests wouldn't reflect this and aio_complete()
    would be called for a second time which can manifest as an oops.
    
    The previous cleanups have whittled the sync and async completion paths down
    to the point where we can collapse them and clearly reassert the invariant
    that we must only call aio_complete() after returning -EIOCBQUEUED.
    direct_io_worker() will only return -EIOCBQUEUED when it is not the last to
    drop the dio refcount and the aio bio completion path will only call
    aio_complete() when it is the last to drop the dio refcount.
    direct_io_worker() can ensure that it is the last to drop the reference count
    by waiting for bios to drain.  It does this for sync ops, of course, and for
    partial dio writes that must fall back to buffered and for aio ops that saw
    errors during submission.
    
    This means that operations that end up waiting, even if they were issued as
    aio ops, will not call aio_complete() from dio.  Instead we return the return
    code of the operation and let the aio core call aio_complete().  This is
    purposely done to fix a bug where AIO DIO file extensions would call
    aio_complete() before their callers have a chance to update i_size.
    
    Now that direct_io_worker() is explicitly returning -EIOCBQUEUED its callers
    no longer have to translate for it.  XFS needs to be careful not to free
    resources that will be used during AIO completion if -EIOCBQUEUED is returned.
     We maintain the previous behaviour of trying to write fs metadata for O_SYNC
    aio+dio writes.
    Signed-off-by: default avatarZach Brown <zach.brown@oracle.com>
    Cc: Badari Pulavarty <pbadari@us.ibm.com>
    Cc: Suparna Bhattacharya <suparna@in.ibm.com>
    Acked-by: default avatarJeff Moyer <jmoyer@redhat.com>
    Cc: <xfs-masters@oss.sgi.com>
    Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
    8459d86a
direct-io.c 34.3 KB