• Josef Bacik's avatar
    jbd: improve fsync batching · f420d4dc
    Josef Bacik authored
    There is a flaw with the way jbd handles fsync batching.  If we fsync() a
    file and we were not the last person to run fsync() on this fs then we
    automatically sleep for 1 jiffie in order to wait for new writers to join
    into the transaction before forcing the commit.  The problem with this is
    that with really fast storage (ie a Clariion) the time it takes to commit
    a transaction to disk is way faster than 1 jiffie in most cases, so
    sleeping means waiting longer with nothing to do than if we just committed
    the transaction and kept going.  Ric Wheeler noticed this when using
    fs_mark with more than 1 thread, the throughput would plummet as he added
    more threads.
    
    This patch attempts to fix this problem by recording the average time in
    nanoseconds that it takes to commit a transaction to disk, and what time
    we started the transaction.  If we run an fsync() and we have been running
    for less time than it takes to commit the transaction to disk, we sleep
    for the delta amount of time and then commit to disk.  We acheive
    sub-jiffie sleeping using schedule_hrtimeout.  This means that the wait
    time is auto-tuned to the speed of the underlying disk, instead of having
    this static timeout.  I weighted the average according to somebody's
    comments (Andreas Dilger I think) in order to help normalize random
    outliers where we take way longer or way less time to commit than the
    average.  I also have a min() check in there to make sure we don't sleep
    longer than a jiffie in case our storage is super slow, this was requested
    by Andrew.
    
    I unfortunately do not have access to a Clariion, so I had to use a
    ramdisk to represent a super fast array.  I tested with a SATA drive with
    barrier=1 to make sure there was no regression with local disks, I tested
    with a 4 way multipathed Apple Xserve RAID array and of course the
    ramdisk.  I ran the following command
    
    fs_mark -d /mnt/ext3-test -s 4096 -n 2000 -D 64 -t $i
    
    where $i was 2, 4, 8, 16 and 32.  I mkfs'ed the fs each time.  Here are my
    results
    
    type	threads		with patch	without patch
    sata	2		24.6		26.3
    sata	4		49.2		48.1
    sata	8		70.1		67.0
    sata	16		104.0		94.1
    sata	32		153.6		142.7
    
    xserve	2		246.4		222.0
    xserve	4		480.0		440.8
    xserve	8		829.5		730.8
    xserve	16		1172.7		1026.9
    xserve	32		1816.3		1650.5
    
    ramdisk	2		2538.3		1745.6
    ramdisk	4		2942.3		661.9
    ramdisk	8		2882.5		999.8
    ramdisk	16		2738.7		1801.9
    ramdisk	32		2541.9		2394.0
    Signed-off-by: default avatarJosef Bacik <jbacik@redhat.com>
    Cc: Andreas Dilger <adilger@sun.com>
    Cc: Arjan van de Ven <arjan@infradead.org>
    Cc: Ric Wheeler <rwheeler@redhat.com>
    Cc: <linux-ext4@vger.kernel.org>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    f420d4dc
jbd.h 32.6 KB