Commit c41ade2e authored by Matthew Wilcox's avatar Matthew Wilcox Committed by Jesse Barnes

Rewrite MSI-HOWTO

I didn't find the previous version very useful, so I rewrote it.
Signed-off-by: default avatarMatthew Wilcox <willy@linux.intel.com>
Reviewed-by: default avatarRandy Dunlap <randy.dunlap@oracle.com>
Reviewed-by: default avatarGrant Grundler <grundler@parisc-linunx.org>
Signed-off-by: default avatarJesse Barnes <jbarnes@virtuousgeek.org>
parent 0994375e
...@@ -4,506 +4,302 @@ ...@@ -4,506 +4,302 @@
Revised Feb 12, 2004 by Martine Silbermann Revised Feb 12, 2004 by Martine Silbermann
email: Martine.Silbermann@hp.com email: Martine.Silbermann@hp.com
Revised Jun 25, 2004 by Tom L Nguyen Revised Jun 25, 2004 by Tom L Nguyen
Revised Jul 9, 2008 by Matthew Wilcox <willy@linux.intel.com>
Copyright 2003, 2008 Intel Corporation
1. About this guide 1. About this guide
This guide describes the basics of Message Signaled Interrupts (MSI), This guide describes the basics of Message Signaled Interrupts (MSIs),
the advantages of using MSI over traditional interrupt mechanisms, the advantages of using MSI over traditional interrupt mechanisms, how
and how to enable your driver to use MSI or MSI-X. Also included is to change your driver to use MSI or MSI-X and some basic diagnostics to
a Frequently Asked Questions (FAQ) section. try if a device doesn't support MSIs.
1.1 Terminology
2. What are MSIs?
PCI devices can be single-function or multi-function. In either case,
when this text talks about enabling or disabling MSI on a "device A Message Signaled Interrupt is a write from the device to a special
function," it is referring to one specific PCI device and function and address which causes an interrupt to be received by the CPU.
not to all functions on a PCI device (unless the PCI device has only
one function). The MSI capability was first specified in PCI 2.2 and was later enhanced
in PCI 3.0 to allow each interrupt to be masked individually. The MSI-X
2. Copyright 2003 Intel Corporation capability was also introduced with PCI 3.0. It supports more interrupts
per device than MSI and allows interrupts to be independently configured.
3. What is MSI/MSI-X?
Devices may support both MSI and MSI-X, but only one can be enabled at
Message Signaled Interrupt (MSI), as described in the PCI Local Bus a time.
Specification Revision 2.3 or later, is an optional feature, and a
required feature for PCI Express devices. MSI enables a device function
to request service by sending an Inbound Memory Write on its PCI bus to 3. Why use MSIs?
the FSB as a Message Signal Interrupt transaction. Because MSI is
generated in the form of a Memory Write, all transaction conditions, There are three reasons why using MSIs can give an advantage over
such as a Retry, Master-Abort, Target-Abort or normal completion, are traditional pin-based interrupts.
supported.
Pin-based PCI interrupts are often shared amongst several devices.
A PCI device that supports MSI must also support pin IRQ assertion To support this, the kernel must call each interrupt handler associated
interrupt mechanism to provide backward compatibility for systems that with an interrupt, which leads to reduced performance for the system as
do not support MSI. In systems which support MSI, the bus driver is a whole. MSIs are never shared, so this problem cannot arise.
responsible for initializing the message address and message data of
the device function's MSI/MSI-X capability structure during device When a device writes data to memory, then raises a pin-based interrupt,
initial configuration. it is possible that the interrupt may arrive before all the data has
arrived in memory (this becomes more likely with devices behind PCI-PCI
An MSI capable device function indicates MSI support by implementing bridges). In order to ensure that all the data has arrived in memory,
the MSI/MSI-X capability structure in its PCI capability list. The the interrupt handler must read a register on the device which raised
device function may implement both the MSI capability structure and the interrupt. PCI transaction ordering rules require that all the data
the MSI-X capability structure; however, the bus driver should not arrives in memory before the value can be returned from the register.
enable both. Using MSIs avoids this problem as the interrupt-generating write cannot
pass the data writes, so by the time the interrupt is raised, the driver
The MSI capability structure contains Message Control register, knows that all the data has arrived in memory.
Message Address register and Message Data register. These registers
provide the bus driver control over MSI. The Message Control register PCI devices can only support a single pin-based interrupt per function.
indicates the MSI capability supported by the device. The Message Often drivers have to query the device to find out what event has
Address register specifies the target address and the Message Data occurred, slowing down interrupt handling for the common case. With
register specifies the characteristics of the message. To request MSIs, a device can support more interrupts, allowing each interrupt
service, the device function writes the content of the Message Data to be specialised to a different purpose. One possible design gives
register to the target address. The device and its software driver infrequent conditions (such as errors) their own interrupt which allows
are prohibited from writing to these registers. the driver to handle the normal interrupt handling path more efficiently.
Other possible designs include giving one interrupt to each packet queue
The MSI-X capability structure is an optional extension to MSI. It in a network card or each port in a storage controller.
uses an independent and separate capability structure. There are
some key advantages to implementing the MSI-X capability structure
over the MSI capability structure as described below. 4. How to use MSIs
- Support a larger maximum number of vectors per function. PCI devices are initialised to use pin-based interrupts. The device
driver has to set up the device to use MSI or MSI-X. Not all machines
- Provide the ability for system software to configure support MSIs correctly, and for those machines, the APIs described below
each vector with an independent message address and message will simply fail and the device will continue to use pin-based interrupts.
data, specified by a table that resides in Memory Space.
4.1 Include kernel support for MSIs
- MSI and MSI-X both support per-vector masking. Per-vector
masking is an optional extension of MSI but a required To support MSI or MSI-X, the kernel must be built with the CONFIG_PCI_MSI
feature for MSI-X. Per-vector masking provides the kernel the option enabled. This option is only available on some architectures,
ability to mask/unmask a single MSI while running its and it may depend on some other options also being set. For example,
interrupt service routine. If per-vector masking is on x86, you must also enable X86_UP_APIC or SMP in order to see the
not supported, then the device driver should provide the CONFIG_PCI_MSI option.
hardware/software synchronization to ensure that the device
generates MSI when the driver wants it to do so. 4.2 Using MSI
4. Why use MSI? Most of the hard work is done for the driver in the PCI layer. It simply
has to request that the PCI layer set up the MSI capability for this
As a benefit to the simplification of board design, MSI allows board device.
designers to remove out-of-band interrupt routing. MSI is another
step towards a legacy-free environment. 4.2.1 pci_enable_msi
Due to increasing pressure on chipset and processor packages to
reduce pin count, the need for interrupt pins is expected to
diminish over time. Devices, due to pin constraints, may implement
messages to increase performance.
PCI Express endpoints uses INTx emulation (in-band messages) instead
of IRQ pin assertion. Using INTx emulation requires interrupt
sharing among devices connected to the same node (PCI bridge) while
MSI is unique (non-shared) and does not require BIOS configuration
support. As a result, the PCI Express technology requires MSI
support for better interrupt performance.
Using MSI enables the device functions to support two or more
vectors, which can be configured to target different CPUs to
increase scalability.
5. Configuring a driver to use MSI/MSI-X
By default, the kernel will not enable MSI/MSI-X on all devices that
support this capability. The CONFIG_PCI_MSI kernel option
must be selected to enable MSI/MSI-X support.
5.1 Including MSI/MSI-X support into the kernel
To allow MSI/MSI-X capable device drivers to selectively enable
MSI/MSI-X (using pci_enable_msi()/pci_enable_msix() as described
below), the VECTOR based scheme needs to be enabled by setting
CONFIG_PCI_MSI during kernel config.
Since the target of the inbound message is the local APIC, providing
CONFIG_X86_LOCAL_APIC must be enabled as well as CONFIG_PCI_MSI.
5.2 Configuring for MSI support
Due to the non-contiguous fashion in vector assignment of the
existing Linux kernel, this version does not support multiple
messages regardless of a device function is capable of supporting
more than one vector. To enable MSI on a device function's MSI
capability structure requires a device driver to call the function
pci_enable_msi() explicitly.
5.2.1 API pci_enable_msi
int pci_enable_msi(struct pci_dev *dev) int pci_enable_msi(struct pci_dev *dev)
With this new API, a device driver that wants to have MSI A successful call will allocate ONE interrupt to the device, regardless
enabled on its device function must call this API to enable MSI. of how many MSIs the device supports. The device will be switched from
A successful call will initialize the MSI capability structure pin-based interrupt mode to MSI mode. The dev->irq number is changed
with ONE vector, regardless of whether a device function is to a new number which represents the message signaled interrupt.
capable of supporting multiple messages. This vector replaces the This function should be called before the driver calls request_irq()
pre-assigned dev->irq with a new MSI vector. To avoid a conflict since enabling MSIs disables the pin-based IRQ and the driver will not
of the new assigned vector with existing pre-assigned vector requires receive interrupts on the old interrupt.
a device driver to call this API before calling request_irq().
5.2.2 API pci_disable_msi 4.2.2 pci_disable_msi
void pci_disable_msi(struct pci_dev *dev) void pci_disable_msi(struct pci_dev *dev)
This API should always be used to undo the effect of pci_enable_msi() This function should be used to undo the effect of pci_enable_msi().
when a device driver is unloading. This API restores dev->irq with Calling it restores dev->irq to the pin-based interrupt number and frees
the pre-assigned IOAPIC vector and switches a device's interrupt the previously allocated message signaled interrupt(s). The interrupt
mode to PCI pin-irq assertion/INTx emulation mode. may subsequently be assigned to another device, so drivers should not
cache the value of dev->irq.
Note that a device driver should always call free_irq() on the MSI vector
that it has done request_irq() on before calling this API. Failure to do
so results in a BUG_ON() and a device will be left with MSI enabled and
leaks its vector.
5.2.3 MSI mode vs. legacy mode diagram
The below diagram shows the events which switch the interrupt
mode on the MSI-capable device function between MSI mode and
PIN-IRQ assertion mode.
------------ pci_enable_msi ------------------------
| | <=============== | |
| MSI MODE | | PIN-IRQ ASSERTION MODE |
| | ===============> | |
------------ pci_disable_msi ------------------------
Figure 1. MSI Mode vs. Legacy Mode
In Figure 1, a device operates by default in legacy mode. Legacy
in this context means PCI pin-irq assertion or PCI-Express INTx
emulation. A successful MSI request (using pci_enable_msi()) switches
a device's interrupt mode to MSI mode. A pre-assigned IOAPIC vector
stored in dev->irq will be saved by the PCI subsystem and a new
assigned MSI vector will replace dev->irq.
To return back to its default mode, a device driver should always call
pci_disable_msi() to undo the effect of pci_enable_msi(). Note that a
device driver should always call free_irq() on the MSI vector it has
done request_irq() on before calling pci_disable_msi(). Failure to do
so results in a BUG_ON() and a device will be left with MSI enabled and
leaks its vector. Otherwise, the PCI subsystem restores a device's
dev->irq with a pre-assigned IOAPIC vector and marks the released
MSI vector as unused.
Once being marked as unused, there is no guarantee that the PCI
subsystem will reserve this MSI vector for a device. Depending on
the availability of current PCI vector resources and the number of
MSI/MSI-X requests from other drivers, this MSI may be re-assigned.
For the case where the PCI subsystem re-assigns this MSI vector to
another driver, a request to switch back to MSI mode may result
in being assigned a different MSI vector or a failure if no more
vectors are available.
5.3 Configuring for MSI-X support
Due to the ability of the system software to configure each vector of
the MSI-X capability structure with an independent message address
and message data, the non-contiguous fashion in vector assignment of
the existing Linux kernel has no impact on supporting multiple
messages on an MSI-X capable device functions. To enable MSI-X on
a device function's MSI-X capability structure requires its device
driver to call the function pci_enable_msix() explicitly.
The function pci_enable_msix(), once invoked, enables either
all or nothing, depending on the current availability of PCI vector
resources. If the PCI vector resources are available for the number
of vectors requested by a device driver, this function will configure
the MSI-X table of the MSI-X capability structure of a device with
requested messages. To emphasize this reason, for example, a device
may be capable for supporting the maximum of 32 vectors while its
software driver usually may request 4 vectors. It is recommended
that the device driver should call this function once during the
initialization phase of the device driver.
Unlike the function pci_enable_msi(), the function pci_enable_msix()
does not replace the pre-assigned IOAPIC dev->irq with a new MSI
vector because the PCI subsystem writes the 1:1 vector-to-entry mapping
into the field vector of each element contained in a second argument.
Note that the pre-assigned IOAPIC dev->irq is valid only if the device
operates in PIN-IRQ assertion mode. In MSI-X mode, any attempt at
using dev->irq by the device driver to request for interrupt service
may result in unpredictable behavior.
For each MSI-X vector granted, a device driver is responsible for calling
other functions like request_irq(), enable_irq(), etc. to enable
this vector with its corresponding interrupt service handler. It is
a device driver's choice to assign all vectors with the same
interrupt service handler or each vector with a unique interrupt
service handler.
5.3.1 Handling MMIO address space of MSI-X Table
The PCI 3.0 specification has implementation notes that MMIO address
space for a device's MSI-X structure should be isolated so that the
software system can set different pages for controlling accesses to the
MSI-X structure. The implementation of MSI support requires the PCI
subsystem, not a device driver, to maintain full control of the MSI-X
table/MSI-X PBA (Pending Bit Array) and MMIO address space of the MSI-X
table/MSI-X PBA. A device driver should not access the MMIO address
space of the MSI-X table/MSI-X PBA.
5.3.2 API pci_enable_msix
int pci_enable_msix(struct pci_dev *dev, struct msix_entry *entries, int nvec)
This API enables a device driver to request the PCI subsystem A device driver must always call free_irq() on the interrupt(s)
to enable MSI-X messages on its hardware device. Depending on for which it has called request_irq() before calling this function.
the availability of PCI vectors resources, the PCI subsystem enables Failure to do so will result in a BUG_ON(), the device will be left with
either all or none of the requested vectors. MSI enabled and will leak its vector.
Argument 'dev' points to the device (pci_dev) structure. 4.3 Using MSI-X
Argument 'entries' is a pointer to an array of msix_entry structs. The MSI-X capability is much more flexible than the MSI capability.
The number of entries is indicated in argument 'nvec'. It supports up to 2048 interrupts, each of which can be controlled
struct msix_entry is defined in /driver/pci/msi.h: independently. To support this flexibility, drivers must use an array of
`struct msix_entry':
struct msix_entry { struct msix_entry {
u16 vector; /* kernel uses to write alloc vector */ u16 vector; /* kernel uses to write alloc vector */
u16 entry; /* driver uses to specify entry */ u16 entry; /* driver uses to specify entry */
}; };
A device driver is responsible for initializing the field 'entry' of This allows for the device to use these interrupts in a sparse fashion;
each element with a unique entry supported by MSI-X table. Otherwise, for example it could use interrupts 3 and 1027 and allocate only a
-EINVAL will be returned as a result. A successful return of zero two-element array. The driver is expected to fill in the 'entry' value
indicates the PCI subsystem completed initializing each of the requested in each element of the array to indicate which entries it wants the kernel
entries of the MSI-X table with message address and message data. to assign interrupts for. It is invalid to fill in two entries with the
Last but not least, the PCI subsystem will write the 1:1 same number.
vector-to-entry mapping into the field 'vector' of each element. A
device driver is responsible for keeping track of allocated MSI-X 4.3.1 pci_enable_msix
vectors in its internal data structure.
int pci_enable_msix(struct pci_dev *dev, struct msix_entry *entries, int nvec)
A return of zero indicates that the number of MSI-X vectors was
successfully allocated. A return of greater than zero indicates Calling this function asks the PCI subsystem to allocate 'nvec' MSIs.
MSI-X vector shortage. Or a return of less than zero indicates The 'entries' argument is a pointer to an array of msix_entry structs
a failure. This failure may be a result of duplicate entries which should be at least 'nvec' entries in size. On success, the
specified in second argument, or a result of no available vector, function will return 0 and the device will have been switched into
or a result of failing to initialize MSI-X table entries. MSI-X interrupt mode. The 'vector' elements in each entry will have
been filled in with the interrupt number. The driver should then call
5.3.3 API pci_disable_msix request_irq() for each 'vector' that it decides to use.
If this function returns a negative number, it indicates an error and
the driver should not attempt to allocate any more MSI-X interrupts for
this device. If it returns a positive number, it indicates the maximum
number of interrupt vectors that could have been allocated.
This function, in contrast with pci_enable_msi(), does not adjust
dev->irq. The device will not generate interrupts for this interrupt
number once MSI-X is enabled. The device driver is responsible for
keeping track of the interrupts assigned to the MSI-X vectors so it can
free them again later.
Device drivers should normally call this function once per device
during the initialization phase.
4.3.2 pci_disable_msix
void pci_disable_msix(struct pci_dev *dev) void pci_disable_msix(struct pci_dev *dev)
This API should always be used to undo the effect of pci_enable_msix() This API should be used to undo the effect of pci_enable_msix(). It frees
when a device driver is unloading. Note that a device driver should the previously allocated message signaled interrupts. The interrupts may
always call free_irq() on all MSI-X vectors it has done request_irq() subsequently be assigned to another device, so drivers should not cache
on before calling this API. Failure to do so results in a BUG_ON() and the value of the 'vector' elements over a call to pci_disable_msix().
a device will be left with MSI-X enabled and leaks its vectors.
A device driver must always call free_irq() on the interrupt(s)
5.3.4 MSI-X mode vs. legacy mode diagram for which it has called request_irq() before calling this function.
Failure to do so will result in a BUG_ON(), the device will be left with
The below diagram shows the events which switch the interrupt MSI enabled and will leak its vector.
mode on the MSI-X capable device function between MSI-X mode and
PIN-IRQ assertion mode (legacy). 4.3.3 The MSI-X Table
------------ pci_enable_msix(,,n) ------------------------ The MSI-X capability specifies a BAR and offset within that BAR for the
| | <=============== | | MSI-X Table. This address is mapped by the PCI subsystem, and should not
| MSI-X MODE | | PIN-IRQ ASSERTION MODE | be accessed directly by the device driver. If the driver wishes to
| | ===============> | | mask or unmask an interrupt, it should call disable_irq() / enable_irq().
------------ pci_disable_msix ------------------------
4.4 Handling devices implementing both MSI and MSI-X capabilities
Figure 2. MSI-X Mode vs. Legacy Mode
If a device implements both MSI and MSI-X capabilities, it can
In Figure 2, a device operates by default in legacy mode. A run in either MSI mode or MSI-X mode but not both simultaneously.
successful MSI-X request (using pci_enable_msix()) switches a This is a requirement of the PCI spec, and it is enforced by the
device's interrupt mode to MSI-X mode. A pre-assigned IOAPIC vector PCI layer. Calling pci_enable_msi() when MSI-X is already enabled or
stored in dev->irq will be saved by the PCI subsystem; however, pci_enable_msix() when MSI is already enabled will result in an error.
unlike MSI mode, the PCI subsystem will not replace dev->irq with If a device driver wishes to switch between MSI and MSI-X at runtime,
assigned MSI-X vector because the PCI subsystem already writes the 1:1 it must first quiesce the device, then switch it back to pin-interrupt
vector-to-entry mapping into the field 'vector' of each element mode, before calling pci_enable_msi() or pci_enable_msix() and resuming
specified in second argument. operation. This is not expected to be a common operation but may be
useful for debugging or testing during development.
To return back to its default mode, a device driver should always call
pci_disable_msix() to undo the effect of pci_enable_msix(). Note that 4.5 Considerations when using MSIs
a device driver should always call free_irq() on all MSI-X vectors it
has done request_irq() on before calling pci_disable_msix(). Failure 4.5.1 Choosing between MSI-X and MSI
to do so results in a BUG_ON() and a device will be left with MSI-X
enabled and leaks its vectors. Otherwise, the PCI subsystem switches a If your device supports both MSI-X and MSI capabilities, you should use
device function's interrupt mode from MSI-X mode to legacy mode and the MSI-X facilities in preference to the MSI facilities. As mentioned
marks all allocated MSI-X vectors as unused. above, MSI-X supports any number of interrupts between 1 and 2048.
In constrast, MSI is restricted to a maximum of 32 interrupts (and
Once being marked as unused, there is no guarantee that the PCI must be a power of two). In addition, the MSI interrupt vectors must
subsystem will reserve these MSI-X vectors for a device. Depending on be allocated consecutively, so the system may not be able to allocate
the availability of current PCI vector resources and the number of as many vectors for MSI as it could for MSI-X. On some platforms, MSI
MSI/MSI-X requests from other drivers, these MSI-X vectors may be interrupts must all be targetted at the same set of CPUs whereas MSI-X
re-assigned. interrupts can all be targetted at different CPUs.
For the case where the PCI subsystem re-assigned these MSI-X vectors 4.5.2 Spinlocks
to other drivers, a request to switch back to MSI-X mode may result
being assigned with another set of MSI-X vectors or a failure if no Most device drivers have a per-device spinlock which is taken in the
more vectors are available. interrupt handler. With pin-based interrupts or a single MSI, it is not
necessary to disable interrupts (Linux guarantees the same interrupt will
5.4 Handling function implementing both MSI and MSI-X capabilities not be re-entered). If a device uses multiple interrupts, the driver
must disable interrupts while the lock is held. If the device sends
For the case where a function implements both MSI and MSI-X a different interrupt, the driver will deadlock trying to recursively
capabilities, the PCI subsystem enables a device to run either in MSI acquire the spinlock.
mode or MSI-X mode but not both. A device driver determines whether it
wants MSI or MSI-X enabled on its hardware device. Once a device There are two solutions. The first is to take the lock with
driver requests for MSI, for example, it is prohibited from requesting spin_lock_irqsave() or spin_lock_irq() (see
MSI-X; in other words, a device driver is not permitted to ping-pong Documentation/DocBook/kernel-locking). The second is to specify
between MSI mod MSI-X mode during a run-time. IRQF_DISABLED to request_irq() so that the kernel runs the entire
interrupt routine with interrupts disabled.
5.5 Hardware requirements for MSI/MSI-X support
If your MSI interrupt routine does not hold the lock for the whole time
MSI/MSI-X support requires support from both system hardware and it is running, the first solution may be best. The second solution is
individual hardware device functions. normally preferred as it avoids making two transitions from interrupt
disabled to enabled and back again.
5.5.1 Required x86 hardware support
4.6 How to tell whether MSI/MSI-X is enabled on a device
Since the target of MSI address is the local APIC CPU, enabling
MSI/MSI-X support in the Linux kernel is dependent on whether existing Using 'lspci -v' (as root) may show some devices with "MSI", "Message
system hardware supports local APIC. Users should verify that their Signalled Interrupts" or "MSI-X" capabilities. Each of these capabilities
system supports local APIC operation by testing that it runs when has an 'Enable' flag which will be followed with either "+" (enabled)
CONFIG_X86_LOCAL_APIC=y. or "-" (disabled).
In SMP environment, CONFIG_X86_LOCAL_APIC is automatically set;
however, in UP environment, users must manually set 5. MSI quirks
CONFIG_X86_LOCAL_APIC. Once CONFIG_X86_LOCAL_APIC=y, setting
CONFIG_PCI_MSI enables the VECTOR based scheme and the option for Several PCI chipsets or devices are known not to support MSIs.
MSI-capable device drivers to selectively enable MSI/MSI-X. The PCI stack provides three ways to disable MSIs:
Note that CONFIG_X86_IO_APIC setting is irrelevant because MSI/MSI-X 1. globally
vector is allocated new during runtime and MSI/MSI-X support does not 2. on all devices behind a specific bridge
depend on BIOS support. This key independency enables MSI/MSI-X 3. on a single device
support on future IOxAPIC free platforms.
5.1. Disabling MSIs globally
5.5.2 Device hardware support
Some host chipsets simply don't support MSIs properly. If we're
The hardware device function supports MSI by indicating the lucky, the manufacturer knows this and has indicated it in the ACPI
MSI/MSI-X capability structure on its PCI capability list. By FADT table. In this case, Linux will automatically disable MSIs.
default, this capability structure will not be initialized by Some boards don't include this information in the table and so we have
the kernel to enable MSI during the system boot. In other words, to detect them ourselves. The complete list of these is found near the
the device function is running on its default pin assertion mode. quirk_disable_all_msi() function in drivers/pci/quirks.c.
Note that in many cases the hardware supporting MSI have bugs,
which may result in system hangs. The software driver of specific If you have a board which has problems with MSIs, you can pass pci=nomsi
MSI-capable hardware is responsible for deciding whether to call on the kernel command line to disable MSIs on all devices. It would be
pci_enable_msi or not. A return of zero indicates the kernel in your best interests to report the problem to linux-pci@vger.kernel.org
successfully initialized the MSI/MSI-X capability structure of the including a full 'lspci -v' so we can add the quirks to the kernel.
device function. The device function is now running on MSI/MSI-X mode.
5.2. Disabling MSIs below a bridge
5.6 How to tell whether MSI/MSI-X is enabled on device function
Some PCI bridges are not able to route MSIs between busses properly.
At the driver level, a return of zero from the function call of In this case, MSIs must be disabled on all devices behind the bridge.
pci_enable_msi()/pci_enable_msix() indicates to a device driver that
its device function is initialized successfully and ready to run in Some bridges allow you to enable MSIs by changing some bits in their
MSI/MSI-X mode. PCI configuration space (especially the Hypertransport chipsets such
as the nVidia nForce and Serverworks HT2000). As with host chipsets,
At the user level, users can use the command 'cat /proc/interrupts' Linux mostly knows about them and automatically enables MSIs if it can.
to display the vectors allocated for devices and their interrupt If you have a bridge which Linux doesn't yet know about, you can enable
MSI/MSI-X modes ("PCI-MSI"/"PCI-MSI-X"). Below shows MSI mode is MSIs in configuration space using whatever method you know works, then
enabled on a SCSI Adaptec 39320D Ultra320 controller. enable MSIs on that bridge by doing:
CPU0 CPU1 echo 1 > /sys/bus/pci/devices/$bridge/msi_bus
0: 324639 0 IO-APIC-edge timer
1: 1186 0 IO-APIC-edge i8042 where $bridge is the PCI address of the bridge you've enabled (eg
2: 0 0 XT-PIC cascade 0000:00:0e.0).
12: 2797 0 IO-APIC-edge i8042
14: 6543 0 IO-APIC-edge ide0 To disable MSIs, echo 0 instead of 1. Changing this value should be
15: 1 0 IO-APIC-edge ide1 done with caution as it can break interrupt handling for all devices
169: 0 0 IO-APIC-level uhci-hcd below this bridge.
185: 0 0 IO-APIC-level uhci-hcd
193: 138 10 PCI-MSI aic79xx Again, please notify linux-pci@vger.kernel.org of any bridges that need
201: 30 0 PCI-MSI aic79xx special handling.
225: 30 0 IO-APIC-level aic7xxx
233: 30 0 IO-APIC-level aic7xxx 5.3. Disabling MSIs on a single device
NMI: 0 0
LOC: 324553 325068 Some devices are known to have faulty MSI implementations. Usually this
ERR: 0 is handled in the individual device driver but occasionally it's necessary
MIS: 0 to handle this with a quirk. Some drivers have an option to disable use
of MSI. While this is a convenient workaround for the driver author,
6. MSI quirks it is not good practise, and should not be emulated.
Several PCI chipsets or devices are known to not support MSI. 5.4. Finding why MSIs are disabled on a device
The PCI stack provides 3 possible levels of MSI disabling:
* on a single device From the above three sections, you can see that there are many reasons
* on all devices behind a specific bridge why MSIs may not be enabled for a given device. Your first step should
* globally be to examine your dmesg carefully to determine whether MSIs are enabled
for your machine. You should also check your .config to be sure you
6.1. Disabling MSI on a single device have enabled CONFIG_PCI_MSI.
Under some circumstances it might be required to disable MSI on a Then, 'lspci -t' gives the list of bridges above a device. Reading
single device. This may be achieved by either not calling pci_enable_msi() /sys/bus/pci/devices/*/msi_bus will tell you whether MSI are enabled (1)
or all, or setting the pci_dev->no_msi flag before (most of the time or disabled (0). If 0 is found in any of the msi_bus files belonging
in a quirk). to bridges between the PCI root and the device, MSIs are disabled.
6.2. Disabling MSI below a bridge It is also worth checking the device driver to see whether it supports MSIs.
For example, it may contain calls to pci_enable_msi(), pci_enable_msix() or
The vast majority of MSI quirks are required by PCI bridges not pci_enable_msi_block().
being able to route MSI between busses. In this case, MSI have to be
disabled on all devices behind this bridge. It is achieves by setting
the PCI_BUS_FLAGS_NO_MSI flag in the pci_bus->bus_flags of the bridge
subordinate bus. There is no need to set the same flag on bridges that
are below the broken bridge. When pci_enable_msi() is called to enable
MSI on a device, pci_msi_supported() takes care of checking the NO_MSI
flag in all parent busses of the device.
Some bridges actually support dynamic MSI support enabling/disabling
by changing some bits in their PCI configuration space (especially
the Hypertransport chipsets such as the nVidia nForce and Serverworks
HT2000). It may then be required to update the NO_MSI flag on the
corresponding devices in the sysfs hierarchy. To enable MSI support
on device "0000:00:0e", do:
echo 1 > /sys/bus/pci/devices/0000:00:0e/msi_bus
To disable MSI support, echo 0 instead of 1. Note that it should be
used with caution since changing this value might break interrupts.
6.3. Disabling MSI globally
Some extreme cases may require to disable MSI globally on the system.
For now, the only known case is a Serverworks PCI-X chipsets (MSI are
not supported on several busses that are not all connected to the
chipset in the Linux PCI hierarchy). In the vast majority of other
cases, disabling only behind a specific bridge is enough.
For debugging purpose, the user may also pass pci=nomsi on the kernel
command-line to explicitly disable MSI globally. But, once the appro-
priate quirks are added to the kernel, this option should not be
required anymore.
6.4. Finding why MSI cannot be enabled on a device
Assuming that MSI are not enabled on a device, you should look at
dmesg to find messages that quirks may output when disabling MSI
on some devices, some bridges or even globally.
Then, lspci -t gives the list of bridges above a device. Reading
/sys/bus/pci/devices/0000:00:0e/msi_bus will tell you whether MSI
are enabled (1) or disabled (0). In 0 is found in a single bridge
msi_bus file above the device, MSI cannot be enabled.
7. FAQ
Q1. Are there any limitations on using the MSI?
A1. If the PCI device supports MSI and conforms to the
specification and the platform supports the APIC local bus,
then using MSI should work.
Q2. Will it work on all the Pentium processors (P3, P4, Xeon,
AMD processors)? In P3 IPI's are transmitted on the APIC local
bus and in P4 and Xeon they are transmitted on the system
bus. Are there any implications with this?
A2. MSI support enables a PCI device sending an inbound
memory write (0xfeexxxxx as target address) on its PCI bus
directly to the FSB. Since the message address has a
redirection hint bit cleared, it should work.
Q3. The target address 0xfeexxxxx will be translated by the
Host Bridge into an interrupt message. Are there any
limitations on the chipsets such as Intel 8xx, Intel e7xxx,
or VIA?
A3. If these chipsets support an inbound memory write with
target address set as 0xfeexxxxx, as conformed to PCI
specification 2.3 or latest, then it should work.
Q4. From the driver point of view, if the MSI is lost because
of errors occurring during inbound memory write, then it may
wait forever. Is there a mechanism for it to recover?
A4. Since the target of the transaction is an inbound memory
write, all transaction termination conditions (Retry,
Master-Abort, Target-Abort, or normal completion) are
supported. A device sending an MSI must abide by all the PCI
rules and conditions regarding that inbound memory write. So,
if a retry is signaled it must retry, etc... We believe that
the recommendation for Abort is also a retry (refer to PCI
specification 2.3 or latest).
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment