Commits · 3c8c45dfab78a1919f6f8a3ea46998c487eb7e12 · linux / linux-davinci

28 Mar, 2009 23 commits

NFS: Simplify logic to compare socket addresses in client.c · 3c8c45df

Chuck Lever authored Mar 18, 2009

Callback requests from IPv4 servers are now always guaranteed to be
AF_INET, and never mapped IPv4 AF_INET6 addresses.  Both
nfs_match_client() and nfs_find_client() can now share the same
address comparison logic, so fold them together.

We can also dispense with of most of the conditional compilation
in here.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

3c8c45df

Merge commit '9f4c899c ' into devel · d188262d
Trond Myklebust authored Mar 28, 2009

d188262d

NFS: Start PF_INET6 callback listener only if IPv6 support is available · f738f517

Chuck Lever authored Mar 18, 2009

Apparently a lot of people need to disable IPv6 completely on their
distributor-built systems, which have CONFIG_IPV6_MODULE enabled at
build time.

They do this by blacklisting the ipv6.ko module.  This causes the
creation of the NFSv4 callback service listener to fail if
CONFIG_IPV6_MODULE is set, but the module cannot be loaded.

Now that the kernel's PF_INET6 RPC listeners are completely separate
from PF_INET listeners, we can always start PF_INET.  Then the NFS
client can try to start a PF_INET6 listener, but it isn't required
to be available.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

f738f517

lockd: Start PF_INET6 listener only if IPv6 support is available · eb16e907

Chuck Lever authored Mar 18, 2009

Apparently a lot of people need to disable IPv6 completely on their
distributor-built systems, which have CONFIG_IPV6_MODULE enabled at
build time.

They do this by blacklisting the ipv6.ko module.  This causes the
creation of the lockd service listener to fail if CONFIG_IPV6_MODULE
is set, but the module cannot be loaded.

Now that the kernel's PF_INET6 RPC listeners are completely separate
from PF_INET listeners, we can always start PF_INET.  Then lockd can
try to start PF_INET6, but it isn't required to be available.

Note this has the added benefit that NLM callbacks from AF_INET6
servers will never come from AF_INET remotes.  We no longer have to
worry about matching mapped IPv4 addresses to AF_INET when comparing
addresses.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

eb16e907

SUNRPC: Remove CONFIG_SUNRPC_REGISTER_V4 · 93559828

Chuck Lever authored Mar 18, 2009

We just augmented the kernel's RPC service registration code so that
it automatically adjusts to what is supported in user space. Thus we
no longer need the kernel configuration option to enable registering
RPC services with v4 -- it's all done automatically.

This patch is part of a series that addresses
http://bugzilla.kernel.org/show_bug.cgi?id=12256Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

93559828

SUNRPC: rpcb_register() should handle errors silently · 363f724c

Chuck Lever authored Mar 18, 2009

Move error reporting for RPC registration to rpcb_register's caller.

This way the caller can choose to recover silently from certain
errors, but report errors it does not recognize. Error reporting
for kernel RPC service registration is now handled in one place.

363f724c

SUNRPC: Simplify kernel RPC service registration · cadc0fa5

Chuck Lever authored Mar 18, 2009

The kernel registers RPC services with the local portmapper with an
rpcbind SET upcall to the local portmapper.  Traditionally, this used
rpcbind v2 (PMAP), but registering RPC services that support IPv6
requires rpcbind v3 or v4.

Since we now want separate PF_INET and PF_INET6 listeners for each
kernel RPC service, svc_register() will do only one of those
registrations at a time.

For PF_INET, it tries an rpcb v4 SET upcall first; if that fails, it
does a legacy portmap SET.  This makes it entirely backwards
compatible with legacy user space, but allows a proper v4 SET to be
used if rpcbind is available.

For PF_INET6, it does an rpcb v4 SET upcall.  If that fails, it fails
the registration, and thus the transport creation.  This let's the
kernel detect if user space is able to support IPv6 RPC services, and
thus whether it should maintain a PF_INET6 listener for each service
at all.

This provides complete backwards compatibilty with legacy user space
that only supports rpcbind v2.  The only down-side is that registering
a new kernel RPC service may take an extra exchange with the local
portmapper on legacy systems, but this is an infrequent operation and
is done over UDP (no lingering sockets in TIMEWAIT), so it shouldn't
be consequential.

This patch is part of a series that addresses
   http://bugzilla.kernel.org/show_bug.cgi?id=12256Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

cadc0fa5

SUNRPC: Simplify svc_unregister() · d5a8620f

Chuck Lever authored Mar 18, 2009

Our initial implementation of svc_unregister() assumed that PMAP_UNSET
cleared all rpcbind registrations for a [program, version] tuple.
However, we now have evidence that PMAP_UNSET clears only "inet"
entries, and not "inet6" entries, in the rpcbind database.

For backwards compatibility with the legacy portmapper, the
svc_unregister() function also must work if user space doesn't support
rpcbind version 4 at all.

Thus we'll send an rpcbind v4 UNSET, and if that fails, we'll send a
PMAP_UNSET.

This simplifies the code in svc_unregister() and provides better
backwards compatibility with legacy user space that does not support
rpcbind version 4.  We can get rid of the conditional compilation in
here as well.

This patch is part of a series that addresses
   http://bugzilla.kernel.org/show_bug.cgi?id=12256Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

d5a8620f

SUNRPC: Allow callers to pass rpcb_v4_register a NULL address · 1673d0de

Chuck Lever authored Mar 18, 2009

The user space TI-RPC library uses an empty string for the universal
address when unregistering all target addresses for [program, version].
The kernel's rpcb client should behave the same way.

Here, we are switching between several registration methods based on
the protocol family of the incoming address.  Rename the other rpcbind
v4 registration functions to make it clear that they, as well, are
switched on protocol family.  In /etc/netconfig, this is either "inet"
or "inet6".

NB: The loopback protocol families are not supported in the kernel.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

1673d0de

SUNRPC: rpcbind actually interprets r_owner string · 126e4bc3

Chuck Lever authored Mar 18, 2009

RFC 1833 has little to say about the contents of r_owner; it only
specifies that it is a string, and states that it is used to control
who can UNSET an entry.

Our port of rpcbind (from Sun) assumes this string contains a numeric
UID value, not alphabetical or symbolic characters, but checks this
value only for AF_LOCAL RPCB_SET or RPCB_UNSET requests.  In all other
cases, rpcbind ignores the contents of the r_owner string.

The reference user space implementation of rpcb_set(3) uses a numeric
UID for all SET/UNSET requests (even via the network) and an empty
string for all other requests.  We emulate that behavior here to
maintain bug-for-bug compatibility.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

126e4bc3

SUNRPC: Clean up address type casts in rpcb_v4_register() · 3aba4553

Chuck Lever authored Mar 18, 2009

Clean up: Simplify rpcb_v4_register() and its helpers by moving the
details of sockaddr type casting to rpcb_v4_register()'s helper
functions.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

3aba4553

SUNRPC: Don't return EPROTONOSUPPORT in svc_register()'s helpers · ba5c35e0

Chuck Lever authored Mar 18, 2009

The RPC client returns -EPROTONOSUPPORT if there is a protocol version
mismatch (ie the remote RPC server doesn't support the RPC protocol
version sent by the client).

Helpers for the svc_register() function return -EPROTONOSUPPORT if they
don't recognize the passed-in IPPROTO_ value.

These are two entirely different failure modes.

Have the helpers return -ENOPROTOOPT instead of -EPROTONOSUPPORT.  This
will allow callers to determine more precisely what the underlying
problem is, and decide to report or recover appropriately.

This patch is part of a series that addresses
   http://bugzilla.kernel.org/show_bug.cgi?id=12256Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

ba5c35e0

SUNRPC: Use IPv4 loopback for registering AF_INET6 kernel RPC services · fc28decd

Chuck Lever authored Mar 18, 2009

The kernel uses an IPv6 loopback address when registering its AF_INET6
RPC services so that it can tell whether the local portmapper is
actually IPv6-enabled.

Since the legacy portmapper doesn't listen on IPv6, however, this
causes a long timeout on older systems if the kernel happens to try
creating and registering an AF_INET6 RPC service.  Originally I wanted
to use a connected transport (either TCP or connected UDP) so that the
upcall would fail immediately if the portmapper wasn't listening on
IPv6, but we never agreed on what transport to use.

In the end, it's of little consequence to the kernel whether the local
portmapper is listening on IPv6.  It's only important whether the
portmapper supports rpcbind v4.  And the kernel can't tell that at all
if it is sending requests via IPv6 -- the portmapper will just ignore
them.

So, send both rpcbind v2 and v4 SET/UNSET requests via IPv4 loopback
to maintain better backwards compatibility between new kernels and
legacy user space, and prevent multi-second hangs in some cases when
the kernel attempts to register RPC services.

This patch is part of a series that addresses

   http://bugzilla.kernel.org/show_bug.cgi?id=12256Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

fc28decd

SUNRPC: Set IPV6ONLY flag on PF_INET6 RPC listener sockets · 7d21c0f9

Chuck Lever authored Mar 18, 2009

We are about to convert to using separate RPC listener sockets for
PF_INET and PF_INET6. This echoes the way IPv6 is handled in user
space by TI-RPC, and eliminates the need for ULPs to worry about
mapped IPv4 AF_INET6 addresses when doing address comparisons.

Start by setting the IPV6ONLY flag on PF_INET6 RPC listener sockets.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

7d21c0f9

NFS: Revert creation of IPv6 listeners for lockd and NFSv4 callbacks · 26298caa

Chuck Lever authored Mar 18, 2009

We're about to convert over to using separate PF_INET and PF_INET6
listeners, instead of a single PF_INET6 listener that also receives
AF_INET requests and maps them to AF_INET6.

Clear the way by removing the logic in lockd and the NFSv4 callback
server that creates an AF_INET6 service listener.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

26298caa

SUNRPC: Remove @family argument from svc_create() and svc_create_pooled() · 49a9072f

Chuck Lever authored Mar 18, 2009

Since an RPC service listener's protocol family is specified now via
svc_create_xprt(), it no longer needs to be passed to svc_create() or
svc_create_pooled(). Remove that argument from the synopsis of those
functions, and remove the sv_family field from the svc_serv struct.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

49a9072f

SUNRPC: Change svc_create_xprt() to take a @family argument · 9652ada3

Chuck Lever authored Mar 18, 2009

The sv_family field is going away.  Pass a protocol family argument to
svc_create_xprt() instead of extracting the family from the passed-in
svc_serv struct.

Again, as this is a listener socket and not an address, we make this
new argument an "int" protocol family, instead of an "sa_family_t."
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

9652ada3

SUNRPC: svc_setup_socket() gets protocol family from socket · baf01caf

Chuck Lever authored Mar 18, 2009

Since the sv_family field is going away, modify svc_setup_socket() to
extract the protocol family from the passed-in socket instead of from
the passed-in svc_serv struct.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

baf01caf

SUNRPC: Pass a family argument to svc_register() · 4b62e58c

Chuck Lever authored Mar 18, 2009

The sv_family field is going away. Instead of using sv_family, have
the svc_register() function take a protocol family argument.

Since this argument represents a protocol family, and not an address
family, this argument takes an int, as this is what is passed to
sock_create_kern(). Also make sure svc_register's helpers are
checking for PF_FOO instead of AF_FOO. The value of [AP]F_FOO are
equivalent; this is simply a symbolic change to reflect the semantics
of the value stored in that variable.

sock_create_kern() should return EPFNOSUPPORT if the passed-in
protocol family isn't supported, but it uses EAFNOSUPPORT for this
case. We will stick with that tradition here, as svc_register()
is called by the RPC server in the same path as sock_create_kern().
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

4b62e58c

SUNRPC: Clean up svc_find_xprt() calling sequence · 156e6209

Chuck Lever authored Mar 18, 2009

Clean up: add documentating comment and use appropriate data types for
svc_find_xprt()'s arguments.

This also eliminates a mixed sign comparison: @port was an int, while
the return value of svc_xprt_local_port() is an unsigned short.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

156e6209

NFSD: If port value written to /proc/fs/nfsd/portlist is invalid, return EINVAL · adbbe929

Chuck Lever authored Mar 18, 2009

Make sure port value read from user space by write_ports is valid before
passing it to svc_find_xprt().  If it wasn't, the writer would get ENOENT
instead of EINVAL.
Noticed-by: J. Bruce Fields <bfields@fieldses.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

adbbe929

SUNRPC: Clean up static inline functions in svc_xprt.h · efb3288b

Chuck Lever authored Mar 18, 2009

Clean up: Enable the use of const arguments in higher level svc_ APIs
by adding const to the arguments of the helper functions in svc_xprt.h
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

efb3288b

SUNRPC: Don't flag empty RPCB_GETADDR reply as bogus · 776bd5c7

Chuck Lever authored Mar 18, 2009

In 2007, commit e65fe397 added
additional sanity checking to rpcb_decode_getaddr() to make sure we
were getting a reply that was long enough to be an actual universal
address.  If the uaddr string isn't long enough, the XDR decoder
returns EIO.

However, an empty string is a valid RPCB_GETADDR response if the
requested service isn't registered.  Moreover, "::.n.m" is also a
valid RPCB_GETADDR response for IPv6 addresses that is shorter
than rpcb_decode_getaddr()'s lower limit of 11.  So this sanity
check introduced a regression for rpcbind requests against IPv6
remotes.

So revert the lower bound check added by commit
e65fe397, and add an explicit check
for an empty uaddr string, similar to libtirpc's rpcb_getaddr(3).
Pointed-out-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

776bd5c7

19 Mar, 2009 8 commits

NFS: Optimise NFS close() · 7fe5c398

Trond Myklebust authored Mar 19, 2009

Close-to-open cache consistency rules really only require us to flush out
writes on calls to close(), and require us to revalidate attributes on the
very last close of the file.

Currently we appear to be doing a lot of extra attribute revalidation
and cache flushes.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

7fe5c398

NFS: Fix the notifications when renaming onto an existing file · b1e4adf4

Trond Myklebust authored Mar 19, 2009

NFS appears to be returning an unnecessary "delete" notification when
we're doing an atomic rename. See

  http://bugzilla.gnome.org/show_bug.cgi?id=575684

The fix is to get rid of the redundant call to d_delete().
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

b1e4adf4

NFS: Fix up a mismerged patch · 47c62564

Trond Myklebust authored Mar 16, 2009

Move the definition of nfs_need_commit() into the #ifdef CONFIG_NFS_V3
section as originally intended in the patch "NFS: cleanup - remove
struct nfs_inode->ncommit"
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

47c62564

SVCRDMA: fix recent printk format warnings. · 2e3c230b

Tom Talpey authored Mar 12, 2009

printk formats in prior commit were reversed/incorrect.
Compiled without warning on x86 and x86_64, but detected on ppc.
Signed-off-by: Tom Talpey <tmtalpey@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

2e3c230b

SUNRPC: Ensure we close the socket on EPIPE errors too... · 55420c24

Trond Myklebust authored Mar 11, 2009

As long as one task is holding the socket lock, then calls to
xprt_force_disconnect(xprt) will not succeed in shutting down the socket.
In particular, this would mean that a server initiated shutdown will not
succeed until the lock is relinquished.
In order to avoid the deadlock, we should ensure that xs_tcp_send_request()
closes the socket on EPIPE errors too.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

55420c24

SUNRPC: xs_tcp_connect_worker{4,6}: merge common code · b61d59ff
Trond Myklebust authored Mar 11, 2009
```
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
```
b61d59ff
SUNRPC: Add a sysctl to control the duration of the socket linger timeout · 25fe6142
Trond Myklebust authored Mar 11, 2009
```
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
```
25fe6142

SUNRPC: Add the equivalent of the linger and linger2 timeouts to RPC sockets · 7d1e8255

Trond Myklebust authored Mar 11, 2009

This fixes a regression against FreeBSD servers as reported by Tomas
Kasparek. Apparently when using RPC over a TCP socket, the FreeBSD servers
don't ever react to the client closing the socket, and so commit
e06799f9 (SUNRPC: Use shutdown() instead of
close() when disconnecting a TCP socket) causes the setup to hang forever
whenever the client attempts to close and then reconnect.

We break the deadlock by adding a 'linger2' style timeout to the socket,
after which, the client will abort the connection using a TCP 'RST'.

The default timeout is set to 15 seconds. A subsequent patch will put it
under user control by means of a systctl.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

7d1e8255

12 Mar, 2009 1 commit

NFS: Fix the fix to Bugzilla #11061, when IPv6 isn't defined... · 9f4c899c

Trond Myklebust authored Mar 12, 2009

Stephen Rothwell reports:

Today's linux-next build (powerpc ppc64_defconfig) failed like this:

fs/built-in.o: In function `.nfs_get_client':
client.c:(.text+0x115010): undefined reference to `.__ipv6_addr_type'

Fix by moving the IPV6 specific parts of commit
d7371c41 ("Bug 11061, NFS mounts dropped")
into the '#ifdef IPV6..." section.

Also fix up a couple of formatting issues.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

9f4c899c

11 Mar, 2009 8 commits

SUNRPC: Ensure that xs_nospace return values are propagated · 5e3771ce

Trond Myklebust authored Mar 11, 2009

If xs_nospace() finds that the socket has disconnected, it attempts to
return ENOTCONN, however that value is then squashed by the callers.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

5e3771ce

SUNRPC: Delay, then retry on connection errors. · 8a2cec29

Trond Myklebust authored Mar 11, 2009

Enforce the comment in xs_tcp_connect_worker4/xs_tcp_connect_worker6 that
we should delay, then retry on certain connection errors.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

8a2cec29

SUNRPC: Return EAGAIN instead of ENOTCONN when waking up xprt->pending · 2a491991

Trond Myklebust authored Mar 11, 2009

While we should definitely return socket errors to the task that is
currently trying to send data, there is no need to propagate the same error
to all the other tasks on xprt->pending. Doing so actually slows down
recovery, since it causes more than one tasks to attempt socket recovery.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

2a491991

SUNRPC: Handle socket errors correctly · 482f32e6

Trond Myklebust authored Mar 11, 2009

Ensure that we pick up and handle socket errors as they occur.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

482f32e6

SUNRPC: Handle ECONNREFUSED correctly in xprt_transmit() · c8485e4d

Trond Myklebust authored Mar 11, 2009

If we get an ECONNREFUSED error, we currently go to sleep on the
'xprt->sending' wait queue. The problem is that no timeout is set there,
and there is nothing else that will wake the task up later.

We should deal with ECONNREFUSED in call_status, given that is where we
also deal with -EHOSTDOWN, and friends.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

c8485e4d

SUNRPC: Don't disconnect if a connection is still in progress. · 40d2549d
Trond Myklebust authored Mar 11, 2009
```
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
```
40d2549d

SUNRPC: Ensure we set XPRT_CLOSING only after we've sent a tcp FIN... · 670f9457

Trond Myklebust authored Mar 11, 2009

...so that we can distinguish between when we need to shutdown and when we
don't. Also remove the call to xs_tcp_shutdown() from xs_tcp_connect(),
since xprt_connect() makes the same test.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

670f9457

SUNRPC: Avoid an unnecessary task reschedule on ENOTCONN · 15f081ca

Trond Myklebust authored Mar 11, 2009

If the socket is unconnected, and xprt_transmit() returns ENOTCONN, we
currently give up the lock on the transport channel. Doing so means that
the lock automatically gets assigned to the next task in the xprt->sending
queue, and so that task needs to be woken up to do the actual connect.

The following patch aims to avoid that unnecessary task switch.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

15f081ca