1. 30 May, 2009 4 commits
    • Nicolas Pitre's avatar
      [ARM] alternative copy_to_user: more precise fallback threshold · c626e3f5
      Nicolas Pitre authored
      Previous size thresholds were guessed from various user space benchmarks
      using a kernel with and without the alternative uaccess option.  This
      is however not as precise as a kernel based test to measure the real
      speed of each method.
      
      This adds a simple test bench to show the time needed for each method.
      With this, the optimal size treshold for the alternative implementation
      can be determined with more confidence.  It appears that the optimal
      threshold for both copy_to_user and clear_user is around 64 bytes. This
      is not a surprise knowing that the memcpy and memset implementations
      need at least 64 bytes to achieve maximum throughput.
      
      One might suggest that such test be used to determine the optimal
      threshold at run time instead, but results are near enough to 64 on
      tested targets concerned by this alternative copy_to_user implementation,
      so adding some overhead associated with a variable threshold is probably
      not worth it for now.
      Signed-off-by: default avatarNicolas Pitre <nico@marvell.com>
      c626e3f5
    • Nicolas Pitre's avatar
      [ARM] lower overhead with alternative copy_to_user for small copies · cb9dc92c
      Nicolas Pitre authored
      Because the alternate copy_to_user implementation has a higher setup cost
      than the standard implementation, the size of the memory area to copy
      is tested and the standard implementation invoked instead when that size
      is too small.  Still, that test is made after the processor has preserved
      a bunch of registers on the stack which have to be reloaded right away
      needlessly in that case, causing a measurable performance regression
      compared to plain usage of the standard implementation only.
      
      To make the size test overhead negligible, let's factorize it out of
      the alternate copy_to_user function where it is clear to the compiler
      that no stack frame is needed.  Thanks to CONFIG_ARM_UNWIND allowing
      for frame pointers to be disabled and tail call optimization to kick in,
      the overhead in the small copy case becomes only 3 assembly instructions.
      
      A similar trick is applied to clear_user as well.
      Signed-off-by: default avatarNicolas Pitre <nico@marvell.com>
      cb9dc92c
    • Lennert Buytenhek's avatar
      [ARM] alternative copy_to_user/clear_user implementation · 39ec58f3
      Lennert Buytenhek authored
      This implements {copy_to,clear}_user() by faulting in the userland
      pages and then using the regular kernel mem{cpy,set}() to copy the
      data (while holding the page table lock).  This is a win if the regular
      mem{cpy,set}() implementations are faster than the user copy functions,
      which is the case e.g. on Feroceon, where 8-word STMs (which memcpy()
      uses under the right conditions) give significantly higher memory write
      throughput than a sequence of individual 32bit stores.
      
      Here are numbers for page sized buffers on some Feroceon cores:
      
       - copy_to_user on Orion5x goes from 51 MB/s to 83 MB/s
       - clear_user on Orion5x goes from 89MB/s to 314MB/s
       - copy_to_user on Kirkwood goes from 240 MB/s to 356 MB/s
       - clear_user on Kirkwood goes from 367 MB/s to 1108 MB/s
       - copy_to_user on Disco-Duo goes from 248 MB/s to 398 MB/s
       - clear_user on Disco-Duo goes from 328 MB/s to 1741 MB/s
      
      Because the setup cost is non negligible, this is worthwhile only if
      the amount of data to copy is large enough.  The operation falls back
      to the standard implementation when the amount of data is below a certain
      threshold. This threshold was determined empirically, however some targets
      could benefit from a lower runtime determined value for optimal results
      eventually.
      
      In the copy_from_user() case, this technique does not provide any
      worthwhile performance gain due to the fact that any kind of read access
      allocates the cache and subsequent 32bit loads are just as fast as the
      equivalent 8-word LDM.
      Signed-off-by: default avatarLennert Buytenhek <buytenh@marvell.com>
      Signed-off-by: default avatarNicolas Pitre <nico@marvell.com>
      Tested-by: default avatarMartin Michlmayr <tbm@cyrius.com>
      39ec58f3
    • Nicolas Pitre's avatar
      [ARM] allow for alternative __copy_to_user/__clear_user implementations · a1f98849
      Nicolas Pitre authored
      This allows for optional alternative implementations of __copy_to_user
      and __clear_user, with a possible runtime fallback to the standard
      version when the alternative provides no gain over that standard
      version. This is done by making the standard __copy_to_user into a weak
      alias for the symbol __copy_to_user_std.  Same thing for __clear_user.
      
      Those two functions are particularly good candidates to have alternative
      implementations for, since they rely on the STRT instruction which has
      lower performances than STM instructions on some CPU cores such as
      the ARM1176 and Marvell Feroceon.
      Signed-off-by: default avatarNicolas Pitre <nico@marvell.com>
      a1f98849
  2. 23 May, 2009 5 commits
    • Linus Torvalds's avatar
      Linux 2.6.30-rc7 · 59a3759d
      Linus Torvalds authored
      59a3759d
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6 · 4a5dacec
      Linus Torvalds authored
      * git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6:
        [SCSI] mpt2sas: fix driver version inconsistency
        [SCSI] 3w-xxxx: scsi_dma_unmap fix
        [SCSI] 3w-9xxx: scsi_dma_unmap fix
        [SCSI] ses: fix problems caused by empty SES provided name
        [SCSI] fc-transport: Close state transition-window during rport deletion.
        [SCSI] initialize max_target_blocked in scsi_alloc_target
        [SCSI] fnic: Add new Cisco PCI-Express FCoE HBA
      4a5dacec
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6 · 3eb9c8be
      Linus Torvalds authored
      * git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
        [CIFS] Avoid open on possible directories since Samba now rejects them
      3eb9c8be
    • Steve French's avatar
      [CIFS] Avoid open on possible directories since Samba now rejects them · 8db14ca1
      Steve French authored
      Small change (mostly formatting) to limit lookup based open calls to
      file create only.
      
      After discussion yesteday on samba-technical about the posix lookup
      regression,  and looking at a problem with cifs posix open to one
      particular Samba version, Jeff and JRA realized that Samba server's
      behavior changed in this area (posix open behavior on files vs.
      directories).   To make this behavior consistent, JRA just made a
      fix to Samba server to alter how it handles open of directories (now
      returning the equivalent of EISDIR instead of success). Since we don't
      know at lookup time whether the inode is a directory or file (and
      thus whether posix open will succeed with most current Samba server),
      this change avoids the posix open code on lookup open (just issues
      posix open on creates).    This gets the semantic benefits we want
      (atomicity, posix byte range locks, improved write semantics on newly
      created files) and file create still is fast, and we avoid the problem
      that Jeff noticed yesterday with "openat" (and some open directory
      calls) of non-cached directories to one version of Samba server, and
      will work with future Samba versions (which include the fix jra just
      pushed into Samba server).  I confirmed this approach with jra
      yesterday and with Shirish today.
      
      Posix open is only called (at lookup time) for file create now.
      For opens (rather than creates), because we do not know if it
      is a file or directory yet, and current Samba no longer allows
      us to do posix open on dirs, we could end up wasting an open call
      on what turns out to be a dir. For file opens, we wait to call posix
      open till cifs_open.  It could be added here (lookup) in the future
      but the performance tradeoff of the extra network request when EISDIR
      or EACCES is returned would have to be weighed against the 50%
      reduction in network traffic in the other paths.
      Reviewed-by: default avatarShirish Pargaonkar <shirishp@us.ibm.com>
      Tested-by: default avatarJeff Layton <jlayton@redhat.com>
      CC: Jeremy Allison <jra@samba.org>
      Signed-off-by: default avatarSteve French <sfrench@us.ibm.com>
      8db14ca1
    • Breno Leitao's avatar
      icom: fix rmmod crash · 95caa0a9
      Breno Leitao authored
      Actually the icom driver is crashing when is being removed because
      the driver is kfreeing the adapter structure before calling
      pci_release_regions(), which result in the following error:
      
        Unable to handle kernel paging request for data at address 0x6b6b6b6b6b6b6d33
        Faulting instruction address: 0xc000000000246b80
        Oops: Kernel access of bad area, sig: 11 [#1]
        ....
        [c000000012d436a0] [c0000000001002d0] .kfree+0x120/0x34c (unreliable)
        [c000000012d43730] [c000000000246d60] .pci_release_selected_regions+0x3c/0x68
        [c000000012d437c0] [d000000002d54700] .icom_kref_release+0xf4/0x118 [icom]
        [c000000012d43850] [c000000000232e50] .kref_put+0x74/0x94
        [c000000012d438d0] [d000000002d56c58] .icom_remove+0x40/0xa4 [icom]
        [c000000012d43960] [c000000000249e48] .pci_device_remove+0x50/0x90
        [c000000012d439e0] [c0000000002d68d8] .__device_release_driver+0x94/0xd4
        [c000000012d43a70] [c0000000002d7104] .driver_detach+0xf8/0x12c
        [c000000012d43b00] [c0000000002d549c] .bus_remove_driver+0xbc/0x11c
        [c000000012d43b90] [c0000000002d71dc] .driver_unregister+0x60/0x80
        [c000000012d43c20] [c00000000024a07c] .pci_unregister_driver+0x44/0xe8
        [c000000012d43cb0] [d000000002d56bf4] .icom_exit+0x1c/0x40 [icom]
        [c000000012d43d30] [c000000000095fa8] .SyS_delete_module+0x214/0x2a8
        [c000000012d43e30] [c00000000000852c] syscall_exit+0x0/0x40
      Signed-off-by: default avatarBreno Leitao <leitao@linux.vnet.ibm.com>
      Cc: stable@kernel.org
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      95caa0a9
  3. 22 May, 2009 25 commits
  4. 21 May, 2009 2 commits
  5. 20 May, 2009 4 commits