1. 22 Oct, 2007 16 commits
    • Keshavamurthy, Anil S's avatar
      Intel IOMMU: Intel iommu cmdline option - forcedac · 7d3b03ce
      Keshavamurthy, Anil S authored
      Introduce intel_iommu=forcedac commandline option.  This option is helpful to
      verify the pci device capability of handling physical dma'able address greater
      than 4G.
      Signed-off-by: default avatarAnil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Muli Ben-Yehuda <muli@il.ibm.com>
      Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Ashok Raj <ashok.raj@intel.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Christoph Lameter <clameter@sgi.com>
      Cc: Greg KH <greg@kroah.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7d3b03ce
    • Keshavamurthy, Anil S's avatar
      Intel IOMMU: Avoid memory allocation failures in dma map api calls · eb3fa7cb
      Keshavamurthy, Anil S authored
      Intel IOMMU driver needs memory during DMA map calls to setup its internal
      page tables and for other data structures.  As we all know that these DMA map
      calls are mostly called in the interrupt context or with the spinlock held by
      the upper level drivers(network/storage drivers), so in order to avoid any
      memory allocation failure due to low memory issues, this patch makes memory
      allocation by temporarily setting PF_MEMALLOC flags for the current task
      before making memory allocation calls.
      
      We evaluated mempools as a backup when kmem_cache_alloc() fails
      and found that mempools are really not useful here because
       1) We don't know for sure how much to reserve in advance
       2) And mempools are not useful for GFP_ATOMIC case (as we call
          memory alloc functions with GFP_ATOMIC)
      
      (akpm: point 2 is wrong...)
      
      With PF_MEMALLOC flag set in the current->flags, the VM subsystem avoids any
      watermark checks before allocating memory thus guarantee'ing the memory till
      the last free page.  Further, looking at the code in mm/page_alloc.c in
      __alloc_pages() function, looks like this flag is useful only in the
      non-interrupt context.
      
      If we are in the interrupt context and memory allocation in IOMMU driver fails
      for some reason, then the DMA map api's will return failure and it is up to
      the higher level drivers to retry.  Suppose, if upper level driver programs
      the controller with the buggy DMA virtual address, the IOMMU will block that
      DMA transaction when that happens thus preventing any corruption to main
      memory.
      
      So far in our test scenario, we were unable to create any memory allocation
      failure inside dma map api calls.
      Signed-off-by: default avatarAnil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Muli Ben-Yehuda <muli@il.ibm.com>
      Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Ashok Raj <ashok.raj@intel.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Christoph Lameter <clameter@sgi.com>
      Cc: Greg KH <greg@kroah.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      eb3fa7cb
    • Keshavamurthy, Anil S's avatar
      Intel IOMMU: Intel IOMMU driver · ba395927
      Keshavamurthy, Anil S authored
      Actual intel IOMMU driver.  Hardware spec can be found at:
      http://www.intel.com/technology/virtualization
      
      This driver sets X86_64 'dma_ops', so hook into standard DMA APIs.  In this
      way, PCI driver will get virtual DMA address.  This change is transparent to
      PCI drivers.
      
      [akpm@linux-foundation.org: remove unneeded cast]
      [akpm@linux-foundation.org: build fix]
      [bunk@stusta.de: fix duplicate CONFIG_DMAR Makefile line]
      Signed-off-by: default avatarAnil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Muli Ben-Yehuda <muli@il.ibm.com>
      Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Ashok Raj <ashok.raj@intel.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Christoph Lameter <clameter@sgi.com>
      Cc: Greg KH <greg@kroah.com>
      Signed-off-by: default avatarAdrian Bunk <bunk@stusta.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ba395927
    • Keshavamurthy, Anil S's avatar
      Intel IOMMU: IOVA allocation and management routines · f8de50eb
      Keshavamurthy, Anil S authored
      This code implements a generic IOVA allocation and management.  As per Dave's
      suggestion we are now allocating IO virtual address from Higher DMA limit
      address rather than lower end address and this eliminated the need to preserve
      the IO virtual address for multiple devices sharing the same domain virtual
      address.
      
      Also this code uses red black trees to store the allocated and reserved iova
      nodes.  This showed a good performance improvements over previous linear
      linked list.
      
      [akpm@linux-foundation.org: remove inlines]
      [akpm@linux-foundation.org: coding style fixes]
      Signed-off-by: default avatarAnil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Muli Ben-Yehuda <muli@il.ibm.com>
      Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Ashok Raj <ashok.raj@intel.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Christoph Lameter <clameter@sgi.com>
      Cc: Greg KH <greg@kroah.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f8de50eb
    • Keshavamurthy, Anil S's avatar
      Intel IOMMU: clflush_cache_range now takes size param · a9c55b3b
      Keshavamurthy, Anil S authored
      Introduce the size param for clflush_cache_range().
      Signed-off-by: default avatarAnil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Muli Ben-Yehuda <muli@il.ibm.com>
      Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Ashok Raj <ashok.raj@intel.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Christoph Lameter <clameter@sgi.com>
      Cc: Greg KH <greg@kroah.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a9c55b3b
    • Keshavamurthy, Anil S's avatar
      Intel IOMMU: PCI generic helper function · 994a65e2
      Keshavamurthy, Anil S authored
      When devices are under a p2p bridge, upstream transactions get replaced by the
      device id of the bridge as it owns the PCIE transaction.  Hence its necessary
      to setup translations on behalf of the bridge as well.  Due to this limitation
      all devices under a p2p share the same domain in a DMAR.
      
      We just cache the type of device, if its a native PCIe device
      or not for later use.
      
      [akpm@linux-foundation.org: BUG_ON -> WARN_ON+recover]
      Signed-off-by: default avatarAnil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Muli Ben-Yehuda <muli@il.ibm.com>
      Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Ashok Raj <ashok.raj@intel.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Christoph Lameter <clameter@sgi.com>
      Cc: Greg KH <greg@kroah.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      994a65e2
    • Keshavamurthy, Anil S's avatar
      Intel IOMMU: DMAR detection and parsing logic · 10e5247f
      Keshavamurthy, Anil S authored
      This patch supports the upcomming Intel IOMMU hardware a.k.a.  Intel(R)
      Virtualization Technology for Directed I/O Architecture and the hardware spec
      for the same can be found here
      http://www.intel.com/technology/virtualization/index.htm
      
      FAQ! (questions from akpm, answers from ak)
      
      > So...  what's all this code for?
      >
      > I assume that the intent here is to speed things up under Xen, etc?
      
      Yes in some cases, but not this code.  That would be the Xen version of this
      code that could potentially assign whole devices to guests.  I expect this to
      be only useful in some special cases though because most hardware is not
      virtualizable and you typically want an own instance for each guest.
      
      Ok at some point KVM might implement this too; i likely would use this code
      for this.
      
      > Do we
      > have any benchmark results to help us to decide whether a merge would be
      > justified?
      
      The main advantage for doing it in the normal kernel is not performance, but
      more safety.  Broken devices won't be able to corrupt memory by doing random
      DMA.
      
      Unfortunately that doesn't work for graphics yet, for that need user space
      interfaces for the X server are needed.
      
      There are some potential performance benefits too:
      
      - When you have a device that cannot address the complete address range an
        IOMMU can remap its memory instead of bounce buffering.  Remapping is likely
        cheaper than copying.
      
      - The IOMMU can merge sg lists into a single virtual block.  This could
        potentially speed up SG IO when the device is slow walking SG lists.  [I
        long ago benchmarked 5% on some block benchmark with an old MPT Fusion; but
        it probably depends a lot on the HBA]
      
      And you get better driver debugging because unexpected memory accesses from
      the devices will cause a trappable event.
      
      >
      > Does it slow anything down?
      
      It adds more overhead to each IO so yes.
      
      This patch:
      
      Add support for early detection and parsing of DMAR's (DMA Remapping) reported
      to OS via ACPI tables.
      
      DMA remapping(DMAR) devices support enables independent address translations
      for Direct Memory Access(DMA) from Devices.  These DMA remapping devices are
      reported via ACPI tables and includes pci device scope covered by these DMA
      remapping device.
      
      For detailed info on the specification of "Intel(R) Virtualization Technology
      for Directed I/O Architecture" please see
      http://www.intel.com/technology/virtualization/index.htmSigned-off-by: default avatarAnil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Muli Ben-Yehuda <muli@il.ibm.com>
      Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Ashok Raj <ashok.raj@intel.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Christoph Lameter <clameter@sgi.com>
      Cc: Greg KH <greg@kroah.com>
      Cc: Len Brown <lenb@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      10e5247f
    • Jan Kara's avatar
      ext2: avoid rec_len overflow with 64KB block size · 89910ccc
      Jan Kara authored
      With 64KB blocksize, a directory entry can have size 64KB which does not
      fit into 16 bits we have for entry length.  So we store 0xffff instead and
      convert the value when read from / written to disk.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Cc: <linux-ext4@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      89910ccc
    • J. Bruce Fields's avatar
      dcache: don't expose uninitialized memory in /proc/<pid>/fd/<fd> · 321bcf92
      J. Bruce Fields authored
      Well, it's not especially important that target->d_iname get the contents
      of dentry->d_iname, but it's important that it get initialized with
      *something*, otherwise we're just exposing some random piece of memory to
      anyone who reads the link at /proc/<pid>/fd/<fd> for the deleted file, when
      it's still held open by someone.
      
      I've run a test program that copies a short (<36 character) name ontop of a
      long (>=36 character) name and see that the first time I run it, without
      this patch, I get unpredicatable results out of /proc/<pid>/fd/<fd>.
      Signed-off-by: default avatarJ. Bruce Fields <bfields@citi.umich.edu>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      321bcf92
    • Serge E. Hallyn's avatar
      capabilities: clean up file capability reading · b68680e4
      Serge E. Hallyn authored
      Simplify the vfs_cap_data structure.
      
      Also fix get_file_caps which was declaring
      __le32 v1caps[XATTR_CAPS_SZ] on the stack, but
      XATTR_CAPS_SZ is already * sizeof(__le32).
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: default avatarSerge E. Hallyn <serue@us.ibm.com>
      Cc: Andrew Morgan <morgan@kernel.org>
      Cc: Chris Wright <chrisw@sous-sol.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b68680e4
    • Yasunori Goto's avatar
      memory hotplug: make kmem_cache_node for SLUB on memory online avoid panic · b9049e23
      Yasunori Goto authored
      Fix a panic due to access NULL pointer of kmem_cache_node at discard_slab()
      after memory online.
      
      When memory online is called, kmem_cache_nodes are created for all SLUBs
      for new node whose memory are available.
      
      slab_mem_going_online_callback() is called to make kmem_cache_node() in
      callback of memory online event.  If it (or other callbacks) fails, then
      slab_mem_offline_callback() is called for rollback.
      
      In memory offline, slab_mem_going_offline_callback() is called to shrink
      all slub cache, then slab_mem_offline_callback() is called later.
      
      [akpm@linux-foundation.org: coding-style fixes]
      [akpm@linux-foundation.org: locking fix]
      [akpm@linux-foundation.org: build fix]
      Signed-off-by: default avatarYasunori Goto <y-goto@jp.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b9049e23
    • Yasunori Goto's avatar
      memory hotplug: rearrange memory hotplug notifier · 7b78d335
      Yasunori Goto authored
      Current memory notifier has some defects yet.  (Fortunately, nothing uses
      it.) This patch is to fix and rearrange for them.
      
        - Add information of start_pfn, nr_pages, and node id if node status is
          changes from/to memoryless node for callback functions.
          Callbacks can't do anything without those information.
        - Add notification going-online status.
          It is necessary for creating per node structure before the node's
          pages are available.
        - Move GOING_OFFLINE status notification after page isolation.
          It is good place for return memory like cache for callback,
          because returned page is not used again.
        - Make CANCEL events for rollingback when error occurs.
        - Delete MEM_MAPPING_INVALID notification. It will be not used.
        - Fix compile error of (un)register_memory_notifier().
      Signed-off-by: default avatarYasunori Goto <y-goto@jp.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7b78d335
    • Yasunori Goto's avatar
      memory hotplug: document the memory hotplug notifier · 10020ca2
      Yasunori Goto authored
      Add description about event notification callback routine to the document
      Signed-off-by: default avatarYasunori Goto <y-goto@jp.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      10020ca2
    • Rusty Russell's avatar
      i386: paravirt boot sequence · a24e7851
      Rusty Russell authored
      This patch uses the updated boot protocol to do paravirtualized boot.
      If the boot version is >= 2.07, then it will do two things:
      
       1. Check the bootparams loadflags to see if we should reload the
          segment registers and clear interrupts.  This is appropriate
          for normal native boot and some paravirtualized environments, but
          inapproprate for others.
      
       2. Check the hardware architecture, and dispatch to the appropriate
          kernel entrypoint.  If the bootloader doesn't set this, then we
          simply do the normal boot sequence.
      Signed-off-by: default avatarJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      Acked-by: default avatarH. Peter Anvin <hpa@zytor.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Vivek Goyal <vgoyal@in.ibm.com>
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      Cc: Zachary Amsden <zach@vmware.com>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a24e7851
    • Rusty Russell's avatar
    • Rusty Russell's avatar
      update boot spec to 2.07 · e5371ac5
      Rusty Russell authored
      Updates for version 2.07 of the boot protocol.  This includes:
      
      load_flags.KEEP_SEGMENTS- flag to request/inhibit segment reloads
      hardware_subarch	- what subarchitecture we're booting under
      hardware_subarch_data	- per-architecture data
      
      The intention of these changes is to make booting a paravirtualized
      kernel work via the normal Linux boot protocol.
      Signed-off-by: default avatarJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      Acked-by: default avatarH. Peter Anvin <hpa@zytor.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Vivek Goyal <vgoyal@in.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e5371ac5
  2. 21 Oct, 2007 24 commits