Commits · 30bc4dfd3b64eb1fbefe2c63e30d8fc129273e20 · linux / linux-davinci

22 Oct, 2008 4 commits

nfsd: clean up expkey_parse error cases · 30bc4dfd

J. Bruce Fields authored Oct 20, 2008

We might as well do all of these at the end.  Fix up a couple minor
style nits while we're there.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>

30bc4dfd

nfsd: Drop reference in expkey_parse error cases · 6dfcde98

Krishna Kumar authored Oct 20, 2008

Drop reference to export key on error. Compile tested.
Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>

6dfcde98

nfsd: Fix memory leak in nfsd_getxattr · 6c6a426f

Krishna Kumar authored Oct 22, 2008

Fix a memory leak in nfsd_getxattr. nfsd_getxattr should free up memory
	that it allocated if vfs_getxattr fails.
Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>

6c6a426f

NFSD: Fix BUG during NFSD shutdown processing · 1cd9cd16

Chuck Lever authored Oct 22, 2008

The Linux NFS server can be started via a user-space write to
/proc/fs/nfs/threads or to /proc/fs/nfs/portlist.  In the first case,
all default listeners are started (both UDP and TCP).  In the second,
a listener is started only for one specified transport.

The NFS server has to make sure lockd stays up until the last listener
transport goes away.  To support both start-up interfaces, it should
do one lockd_up() for each NFSD listener.

The nfsd_init_socks() function used to do one lockd_up() call for each
svc_create_xprt().  Recently commit
26a41409 mistakenly changed
nfsd_init_socks() to do only one lockd_up() call even though it still
does two svc_create_xprt() calls.

The end result is a lockd_down() BUG during NFSD shutdown processing
because nfsd_last_threads() does a lockd_down() call for each entry
on the sv_permsocks list, but the start-up code doesn't do a matching
number of lockd_up() calls.

Add a second lockd_up() in nfsd_init_socks() to make sure the number
of lockd_up() calls matches the number of entries on the NFS servers's
sv_permsocks list.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>

1cd9cd16

08 Oct, 2008 1 commit
- Merge branch 'from-tomtucker' into for-2.6.28 · 107e0008
  J. Bruce Fields authored Oct 08, 2008
  
  107e0008
06 Oct, 2008 9 commits

svcrdma: Fix IRD/ORD polarity · 67080c82

Tom Tucker authored Oct 03, 2008

The inititator/responder resources in the event have been swapped. They
no represent what the local peer would set their values to in order to
match the peer. Note that iWARP does not exchange these on the wire and
the provider is simply putting in the local device max.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>

67080c82

svcrdma: Update svc_rdma_send_error to use DMA LKEY · 04911b53

Tom Tucker authored Aug 11, 2008

Update the svc_rdma_send_error code to use the DMA LKEY which is valid
regardless of the memory registration strategy in use.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>

04911b53

svcrdma: Modify the RPC reply path to use FRMR when available · afd566ea

Tom Tucker authored Oct 03, 2008

Use FRMR to map local RPC reply data. This allows RDMA_WRITE to send reply
data using a single WR. The FRMR is invalidated by linking the LOCAL_INV WR
to the RDMA_SEND message used to complete the reply.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>

afd566ea

svcrdma: Modify the RPC recv path to use FRMR when available · 146b6df6

Tom Tucker authored Aug 12, 2008

RPCRDMA requests that specify a read-list are fetched with RDMA_READ. Using
an FRMR to map the data sink improves NFSRDMA security on transports that
place the RDMA_READ data sink LKEY on the wire because the valid lifetime
of the MR is only the duration of the RDMA_READ. The LKEY is invalidated
when the last RDMA_READ WR completes.

Mapping the data sink also allows for very large amounts to data to be
fetched with a single WR, so if the client is also using FRMR, the entire
RPC read-list can be fetched with a single WR.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>

146b6df6

svcrdma: Add support to svc_rdma_send to handle chained WR · 5b180a9a

Tom Tucker authored Aug 11, 2008

WR can be submitted as linked lists of WR. Update the svc_rdma_send
routine to handle WR chains. This will be used to submit a WR that
uses an FRMR with another WR that invalidates the FRMR.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>

5b180a9a

svcrdma: Modify post recv path to use local dma key · a5abf4e8

Tom Tucker authored Sep 30, 2008

Update the svc_rdma_post_recv routine to use the adapter's global LKEY
instead of sc_phys_mr which is only valid when using a DMA MR.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>

a5abf4e8

svcrdma: Add a service to register a Fast Reg MR with the device · e1183210

Tom Tucker authored Oct 03, 2008

Fast Reg MR introduces a new WR type. Add a service to register the
region with the adapter and update the completion handling to support
completions with a NULL WR context.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>

e1183210

svcrdma: Query device for Fast Reg support during connection setup · 3a5c6380

Tom Tucker authored Sep 30, 2008

Query the device capabilities in the svc_rdma_accept function to determine
what advanced memory management capabilities are supported by the device.
Based on the query, select the most secure model available given the
requirements of the transport and capabilities of the adapter.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>

3a5c6380

svcrdma: Add FRMR get/put services · 64be8608

Tom Tucker authored Oct 06, 2008

Add services for the allocating, freeing, and unmapping Fast Reg MR. These
services will be used by the transport connection setup, send and receive
routines.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>

64be8608

04 Oct, 2008 3 commits

NLM: Remove unused argument from svc_addsock() function · 29373913

Chuck Lever authored Oct 03, 2008

Clean up: The svc_addsock() function no longer uses its "proto"
argument, so remove it.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>

29373913

NLM: Remove "proto" argument from lockd_up() · 26a41409

Chuck Lever authored Oct 03, 2008

Clean up: Now that lockd_up() starts listeners for both transports, the
"proto" argument is no longer needed.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>

26a41409

NLM: Always start both UDP and TCP listeners · 8c3916f4

Chuck Lever authored Oct 03, 2008

Commit 24e36663, which first appeared in 2.6.19, changed lockd so that
the client side starts a UDP listener only if there is a UDP NFSv2/v3
mount.  Its description notes:

    This... means that lockd will *not* listen on UDP if the only
    mounts are TCP mount (and nfsd hasn't started).

    The latter is the only one that concerns me at all - I don't know
    if this might be a problem with some servers.

Unfortunately it is a problem for Linux itself.  The rpc.statd daemon
on Linux uses UDP for contacting the local lockd, no matter which
protocol is used for NFS mounts.  Without a local lockd UDP listener,
NFSv2/v3 lock recovery from Linux NFS clients always fails.

Revert parts of commit 24e36663 so lockd_up() always starts both
listeners.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>

8c3916f4

03 Oct, 2008 11 commits

lockd: Remove unused fields in the nlm_reboot structure · 9a38a838

Chuck Lever authored Oct 03, 2008

The nlm_reboot structure is used to store information provided by the
NSM_NOTIFY procedure.  This procedure is not specified by the NLM or NSM
protocols, other than to say that the procedure can be used to transmit
information private to a particular NLM/NSM implementation.

For Linux, the callback arguments include the name of the monitored host,
the new NSM state of the host, and a 16-byte private opaque.

As a clean up, remove the unused fields and the server-side XDR logic that
decodes them.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>

9a38a838

lockd: Add helper to sanity check incoming NOTIFY requests · b85e4676

Chuck Lever authored Oct 03, 2008

lockd accepts SM_NOTIFY calls only from a privileged process on the
local system.  If lockd uses an AF_INET6 listener, the sender's address
(ie the local rpc.statd) will be the IPv6 loopback address, not the
IPv4 loopback address.

Make sure the privilege test in nlmsvc_proc_sm_notify() and
nlm4svc_proc_sm_notify() works for both AF_INET and AF_INET6 family
addresses by refactoring the test into a helper and adding support for
IPv6 addresses.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>

b85e4676

lockd: change nlmclnt_grant() to take a "struct sockaddr *" · dcff09f1

Chuck Lever authored Oct 03, 2008

Adjust the signature and callers of nlmclnt_grant() to pass a "struct
sockaddr *" instead of a "struct sockaddr_in *" in order to support IPv6
addresses.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>

dcff09f1

lockd: Adjust nlmsvc_lookup_host() to accomodate AF_INET6 addresses · 6bfbe8af

Chuck Lever authored Oct 03, 2008

Fix up nlmsvc_lookup_host() to pass AF_INET6 source addresses to
nlm_lookup_host().
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>

6bfbe8af

lockd: Adjust nlmclnt_lookup_host() signature to accomodate non-AF_INET · d7d20440

Chuck Lever authored Oct 03, 2008

Pass a struct sockaddr * and a length to nlmclnt_lookup_host() to
accomodate non-AF_INET family addresses.

As a side benefit, eliminate the hostname_len argument, as the hostname
is always NUL-terminated.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>

d7d20440

lockd: Support non-AF_INET addresses in nlm_lookup_host() · 88541c84

Chuck Lever authored Oct 03, 2008

Use struct sockaddr * and length in nlm_lookup_host_info to all callers
to pass in either AF_INET or AF_INET6 addresses.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>

88541c84

NLM: Convert nlm_lookup_host() to use a single argument · 7f1ed18b

Chuck Lever authored Oct 03, 2008

The nlm_lookup_host() function already has a large number of arguments,
and I'm about to add a few more.  As a clean up, convert the function
to use a single data structure argument.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>

7f1ed18b

svcrdma: Add Fast Reg MR Data Types · 0d3ebb9a

Tom Tucker authored Sep 30, 2008

Add data types to track Fast Reg Memory Regions. The core data type is
svc_rdma_fastreg_mr that associates a device MR with a host kva and page
list. A field is added to the WR context to keep track of the FRMR
used to map the local memory for an RPC.

An FRMR list and spin lock are added to the transport instance to keep
track of all FRMR allocated for the transport. Also added are device
capability flags to indicate what the memory registration
capabilities are for the underlying device and whether or not fast
memory registration is supported.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>

0d3ebb9a

lockd: reject reclaims outside the grace period · d22b1cff

J. Bruce Fields authored Feb 06, 2008

The current lockd does not reject reclaims that arrive outside of the
grace period.

Accepting a reclaim means promising to the client that no conflicting
locks were granted since last it held the lock. We can meet that
promise if we assume the only lockers are nfs clients, and that they are
sufficiently well-behaved to reclaim only locks that they held before,
and that only reclaim locks have been permitted so far. Once we leave
the grace period (and start permitting non-reclaims), we can no longer
keep that promise. So we must start rejecting reclaims at that point.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>

d22b1cff

lockd: move grace period checks to common code · b2b50289

J. Bruce Fields authored Feb 06, 2008

Do all the grace period checks in svclock.c.  This simplifies the code a
bit, and will ease some later changes.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>

b2b50289

nfsd: common grace period control · af558e33

J. Bruce Fields authored Sep 06, 2007

Rewrite grace period code to unify management of grace period across
lockd and nfsd.  The current code has lockd and nfsd cooperate to
compute a grace period which is satisfactory to them both, and then
individually enforce it.  This creates a slight race condition, since
the enforcement is not coordinated.  It's also more complicated than
necessary.

Here instead we have lockd and nfsd each inform common code when they
enter the grace period, and when they're ready to leave the grace
period, and allow normal locking only after both of them are ready to
leave.

We also expect the locks_start_grace()/locks_end_grace() interface here
to be simpler to build on for future cluster/high-availability work,
which may require (for example) putting individual filesystems into
grace, or enforcing grace periods across multiple cluster nodes.
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>

af558e33

29 Sep, 2008 12 commits

nfsd: use nfs client rpc callback program · d5b337b4

Benny Halevy authored Sep 28, 2008

since commit ff7d9756
"nfsd: use static memory for callback program and stats"
do_probe_callback uses a static callback program
(NFS4_CALLBACK) rather than the one set in clp->cl_callback.cb_prog
as passed in by the client in setclientid (4.0)
or create_session (4.1).

This patches introduces rpc_create_args.prognumber that allows
overriding program->number when creating rpc_clnt.
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>

d5b337b4

nfsd: do_probe_callback should not clear rpc stats · 97eb89bb

Benny Halevy authored Sep 26, 2008

Now that cb_stats are static (since commit
ff7d9756)
there's no need to clear them.

Initially I thought it might make sense to do
that every callback probing but since the stats
are per-program and they are shared between possibly
several client callback instances, zeroing them out
seems like the wrong thing to do.

Note that that commit also introduced a bug
since stats.program is also being cleared in the process
and it is not restored after the memset as it used to be.
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>

97eb89bb

SUNRPC: Clean up debug messages in rpcb_clnt.c · db820d63

Chuck Lever authored Sep 25, 2008

The RPCB XDR functions are used for multiple procedures.  For instance,
rpcb_encode_getaddr() is used for RPCB_GETADDR, RPCB_SET, and
RPCB_UNSET.  Make the XDR debug messages more generic so they are less
confusing.

And, unlike in other RPC consumers in the kernel, a single debug flag
enables all levels of debug messages in the RPC bind client, including
XDR debug messages.  Since the XDR decoders already report success or
failure in this case, remove redundant debug messages in the mid-level
rpcb_register_call() function.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>

db820d63

SUNRPC: Fix up svc_unregister() · f6fb3f6f

Chuck Lever authored Sep 25, 2008

With the new rpcbind code, a PMAP_UNSET will not have any effect on
services registered via rpcbind v3 or v4.

Implement a version of svc_unregister() that uses an RPCB_UNSET with
an empty netid string to make sure we have cleared *all* entries for
a kernel RPC service when shutting down, or before starting a fresh
instance of the service.

Use the new version only when CONFIG_SUNRPC_REGISTER_V4 is enabled;
otherwise, the legacy PMAP version is used to ensure complete
backwards-compatibility with the Linux portmapper daemon.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>

f6fb3f6f

SUNRPC: Use short-hand IPv6 ANYADDR for RPCB_SET · 9d548b9c

Chuck Lever authored Sep 15, 2008

Clean up: When doing an RPCB_SET, make the kernel's rpcb client use the
shorthand "::" for the universal form of the IPv6 ANY address.

Without this patch, rpcbind will advertise:

  0000:0000:0000:0000:0000:0000:0000:0000.x.y

This is cosmetic only.  It cleans up the display of information from
/sbin/rpcinfo.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>

9d548b9c

SUNRPC: Register both netids for AF_INET6 servers · 2c7eb0b2

Chuck Lever authored Sep 15, 2008

TI-RPC is a user-space library of RPC functions that replaces ONC RPC
and allows RPC to operate in the new world of IPv6.

TI-RPC combines the concept of a transport protocol (UDP and TCP)
and a protocol family (PF_INET and PF_INET6) into a single identifier
called a "netid."  For example, "udp" means UDP over IPv4, and "udp6"
means UDP over IPv6.

For rpcbind, then, the RPC service tuple that is registered and
advertised is:

  [RPC program, RPC version, service address and port, netid]

instead of

  [RPC program, RPC version, port, protocol]

Service address is typically ANYADDR, but can be a specific address
of one of the interfaces on a multi-homed host.  The third item in
the new tuple is expressed as a universal address.

The current Linux rpcbind implementation registers a netid for both
protocol families when RPCB_SET is done for just the PF_INET6 version
of the netid (ie udp6 or tcp6).  So registering "udp6" causes a
registration for "udp" to appear automatically as well.

We've recently determined that this is incorrect behavior.  In the
TI-RPC world, "udp6" is not meant to imply that the registered RPC
service handles requests from AF_INET as well, even if the listener
socket does address mapping.  "udp" and "udp6" are entirely separate
capabilities, and must be registered separately.

The Linux kernel, unlike TI-RPC, leverages address mapping to allow a
single listener socket to handle requests for both AF_INET and AF_INET6.
This is still OK, but the kernel currently assumes registering "udp6"
will cover "udp" as well.  It registers only "udp6" for it's AF_INET6
services, even though they handle both AF_INET and AF_INET6 on the same
port.

So svc_register() actually needs to register both "udp" and "udp6"
explicitly (and likewise for TCP).  Until rpcbind is fixed, the
kernel can ignore the return code for the second RPCB_SET call.

Please merge this with commit 15231312:

    SUNRPC: Support IPv6 when registering kernel RPC services
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Olaf Kirch <okir@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>

2c7eb0b2

lockd: Update nsm_find() to support non-AF_INET addresses · e018040a

Chuck Lever authored Sep 03, 2008

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>

e018040a

lockd: Combine __nsm_find() and nsm_find(). · bc48e4d6

Chuck Lever authored Sep 03, 2008

Clean up: Having two separate functions doesn't add clarity, so
eliminate one of them.  Use contemporary kernel coding conventions
where appropriate.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>

bc48e4d6

lockd: Support AF_INET6 when hashing addresses in nlm_lookup_host · ede2fea0

Chuck Lever authored Sep 03, 2008

Adopt an approach similar to the RPC server's auth cache (from Aurelien
Charbon and Brian Haley).

Note nlm_lookup_host()'s existing IP address hash function has the same
issue with correctness on little-endian systems as the original IPv4 auth
cache hash function, so I've also updated it with a hash function similar
to the new auth cache hash function.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>

ede2fea0

lockd: Teach nlm_cmp_addr() to support AF_INET6 addresses · 781b61a6

Chuck Lever authored Sep 03, 2008

Update the nlm_cmp_addr() helper to support AF_INET6 as well as AF_INET
addresses.  New version takes two "struct sockaddr *" arguments instead of
"struct sockaddr_in *" arguments.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>

781b61a6

NSM: Use sockaddr_storage for sm_addr field · 7e9d7746

Chuck Lever authored Sep 03, 2008

To store larger addresses in the nsm_handle structure, make sm_addr a
sockaddr_storage.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>

7e9d7746

lockd: Use sockaddr_storage for h_saddr field · 90151e6e

Chuck Lever authored Sep 03, 2008

To store larger addresses in the nlm_host structure, make h_saddr a
sockaddr_storage.  And let's call it something more self-explanatory:
"saddr" could easily be mistaken for "server address".
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>

90151e6e