- 04 Apr, 2009 21 commits
-
-
Benny Halevy authored
Implement the destory_session operation confoming to http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion1-26 [use sessionid_lock spin lock] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
Andy Adamson authored
A session inactivity time compound (lease renewal) or a compound where the sequence operation has sa_cachethis set to FALSE do not require any pages to be held in the v4.1 DRC. This is because struct nfsd4_slot is already caching the session information. Add logic to the nfs41 server to not cache response pages for solo sequence responses. Return nfserr_replay_uncached_rep on the operation following the sequence operation when sa_cachethis is FALSE. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfsd41: use cstate session in nfsd4_replay_cache_entry] [nfsd41: rename nfsd4_no_page_in_cache] [nfsd41 rename nfsd4_enc_no_page_replay] [nfsd41 nfsd4_is_solo_sequence] [nfsd41 change nfsd4_not_cached return] Signed-off-by: Andy Adamson <andros@netapp.com> [changed return type to bool] Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfsd41 drop parens in nfsd4_is_solo_sequence call] Signed-off-by: Andy Adamson <andros@netapp.com> [changed "== 0" to "!"] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
Andy Adamson authored
Replace the nfs4_client cl_seqid field with a single struct nfs41_slot used for the create session replay cache. The CREATE_SESSION slot sets the sl_session pointer to NULL. Otherwise, the slot and it's replay cache are used just like the session slots. Fix unconfirmed create_session replay response by initializing the create_session slot sequence id to 0. A future patch will set the CREATE_SESSION cache when a SEQUENCE operation preceeds the CREATE_SESSION operation. This compound is currently only cached in the session slot table. Signed-off-by: Andy Adamson<andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfsd41: use bool inuse for slot state] Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfsd41: revert portion of nfsd4_set_cache_entry] Signed-off-by: Andy Adamson <andros@netpp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
Andy Adamson authored
Implement the create_session operation confoming to http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion1-26 Look up the client id (generated by the server on exchange_id, given by the client on create_session). If neither a confirmed or unconfirmed client is found then the client id is stale If a confirmed cilent is found (i.e. we already received create_session for it) then compare the sequence id to determine if it's a replay or possibly a mis-ordered rpc. If the seqid is in order, update the confirmed client seqid and procedd with updating the session parameters. If an unconfirmed client_id is found then verify the creds and seqid. If both match move the client id to confirmed state and proceed with processing the create_session. Currently, we do not support persistent sessions, and RDMA. alloc_init_session generates a new sessionid and creates a session structure. NFSD_PAGES_PER_SLOT is used for the max response cached calculation, and for the counting of DRC pages using the hard limits set in struct srv_serv. A note on NFSD_PAGES_PER_SLOT: Other patches in this series allow for NFSD_PAGES_PER_SLOT + 1 pages to be cached in a DRC slot when the response size is less than NFSD_PAGES_PER_SLOT * PAGE_SIZE but xdr_buf pages are used. e.g. a READDIR operation will encode a small amount of data in the xdr_buf head, and then the READDIR in the xdr_buf pages. So, the hard limit calculation use of pages by a session is underestimated by the number of cached operations using the xdr_buf pages. Yet another patch caches no pages for the solo sequence operation, or any compound where cache_this is False. So the hard limit calculation use of pages by a session is overestimated by the number of these operations in the cache. TODO: improve resource pre-allocation and negotiate session parameters accordingly. Respect and possibly adjust backchannel attributes. Signed-off-by: Marc Eshel <eshel@almaden.ibm.com> Signed-off-by: Dean Hildebrand <dhildeb@us.ibm.com> [nfsd41: remove headerpadsz from channel attributes] Our client and server only support a headerpadsz of 0. [nfsd41: use DRC limits in fore channel init] [nfsd41: do not change CREATE_SESSION back channel attrs] Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [use sessionid_lock spin lock] [nfsd41: use bool inuse for slot state] Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfsd41 remove sl_session from alloc_init_session] Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [simplify nfsd4_encode_create_session error handling] [nfsd41: fix comment style in init_forechannel_attrs] [nfsd41: allocate struct nfsd4_session and slot table in one piece] [nfsd41: no need to INIT_LIST_HEAD in alloc_init_session just prior to list_add] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
Andy Adamson authored
Signed-off-by: Andy Adamson<andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
Andy Adamson authored
Replay a request in nfsd4_sequence. Add a minorversion to struct nfsd4_compound_state. Pass the current slot to nfs4svc_encode_compound res via struct nfsd4_compoundres to set an NFSv4.1 DRC entry. Signed-off-by: Andy Adamson<andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfsd41: use bool inuse for slot state] Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfsd41: use cstate session in nfs4svc_encode_compoundres] [nfsd41 replace nfsd4_set_cache_entry] Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
Andy Adamson authored
Use no more than 1/128th of the number of free pages at nfsd startup for the v4.1 DRC. This is an arbitrary default which should probably end up under the control of an administrator. Signed-off-by: Andy Adamson <andros@netapp.com> [moved added fields in struct svc_serv under CONFIG_NFSD_V4_1] Signed-off-by: Benny Halevy <bhalevy@panasas.com> [fix set_max_drc calculation of sv_drc_max_pages] [moved NFSD_DRC_SIZE_SHIFT's declaration up in header file] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
Andy Adamson authored
Cache all the result pages, including the rpc header in rq_respages[0], for a request in the slot table cache entry. Cache the statp pointer from nfsd_dispatch which points into rq_respages[0] just past the rpc header. When setting a cache entry, calculate and save the length of the nfs data minus the rpc header for rq_respages[0]. When replaying a cache entry, replace the cached rpc header with the replayed request rpc result header, unless there is not enough room in the cached results first page. In that case, use the cached rpc header. The sessions fore channel maxresponse size cached is set to NFSD_PAGES_PER_SLOT * PAGE_SIZE. For compounds we are cacheing with operations such as READDIR that use the xdr_buf->pages to hold data, we choose to cache the extra page of data rather than copying data from xdr_buf->pages into the xdr_buf->head page. [nfsd41: limit cache to maxresponsesize_cached] [nfsd41: mv nfsd4_set_statp under CONFIG_NFSD_V4_1] [nfsd41: rename nfsd4_move_pages] [nfsd41: rename page_no variable] [nfsd41: rename nfsd4_set_cache_entry] [nfsd41: fix nfsd41_copy_replay_data comment] [nfsd41: add to nfsd4_set_cache_entry] Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
Andy Adamson authored
Signed-off-by: Andy Adamson<andros@netapp.com> [nfsd41: do not verify nfserr_sequence_pos for minorversion 0] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
Benny Halevy authored
Implement the sequence operation conforming to http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion1-26 Check for stale clientid (as derived from the sessionid). Enforce slotid range and exactly-once semantics using the slotid and seqid. If everything went well renew the client lease and mark the slot INPROGRESS. Add a struct nfsd4_slot pointer to struct nfsd4_compound_state. To be used for sessions DRC replay. [nfsd41: rename sequence catchthis to cachethis] Signed-off-by: Andy Adamson<andros@netapp.com> [pulled some code to set cstate->slot from "nfsd DRC logic"] [use sessionid_lock spin lock] [nfsd41: use bool inuse for slot state] Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfsd: add a struct nfsd4_slot pointer to struct nfsd4_compound_state] Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfsd41: add nfsd4_session pointer to nfsd4_compound_state] [nfsd41: set cstate session] [nfsd41: use cstate session in nfsd4_sequence] Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [simplify nfsd4_encode_sequence error handling] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
Andy Adamson authored
We need to distinguish between client names provided by NFSv4.0 clients SETCLIENTID and those provided by NFSv4.1 via EXCHANGE_ID when looking up the clientid by string. Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Andy Adamson <andros@netapp.com> [nfsd41: use boolean values for use_exchange_id argument] Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfsd41: simplify match_clientid_establishment logic] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
Andy Adamson authored
Implement the exchange_id operation confoming to http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion1-28 Based on the client provided name, hash a client id. If a confirmed one is found, compare the op's creds and verifier. If the creds match and the verifier is different then expire the old client (client re-incarnated), otherwise, if both match, assume it's a replay and ignore it. If an unconfirmed client is found, then copy the new creds and verifer if need update, otherwise assume replay. The client is moved to a confirmed state on create_session. In the nfs41 branch set the exchange_id flags to EXCHGID4_FLAG_USE_NON_PNFS | EXCHGID4_FLAG_SUPP_MOVED_REFER (pNFS is not supported, Referrals are supported, Migration is not.). Address various scenarios from section 18.35 of the spec: 1. Check for EXCHGID4_FLAG_UPD_CONFIRMED_REC_A and set EXCHGID4_FLAG_CONFIRMED_R as appropriate. 2. Return error codes per 18.35.4 scenarios. 3. Update client records or generate new client ids depending on scenario. Note: 18.35.4 case 3 probably still needs revisiting. The handling seems not quite right. Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Andy Adamosn <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfsd41: use utsname for major_id (and copy to server_scope)] [nfsd41: fix handling of various exchange id scenarios] Signed-off-by: Mike Sager <sager@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfsd41: reverse use of EXCHGID4_INVAL_FLAG_MASK_A] [simplify nfsd4_encode_exchange_id error handling] [nfsd41: embed an xdr_netobj in nfsd4_exchange_id] [nfsd41: return nfserr_serverfault for spa_how == SP4_MACH_CRED] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
Andy Adamson authored
Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
Andy Adamson authored
Define nfsd41_dec_ops vector and add it to nfsd4_minorversion for minorversion 1. Note: nfsd4_enc_ops vector is shared for v4.0 and v4.1 since we don't need to filter out obsolete ops as this is done in the decoding phase. exchange_id, create_session, destroy_session, and sequence ops are implemented as stubs returning nfserr_opnotsupp at this stage. [was nfsd41: xdr stubs] [get rid of CONFIG_NFSD_V4_1] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
Marc Eshel authored
Simple sessionid hashing using its monotonically increasing sequence number. Locking considerations: sessionid_hashtbl access is controlled by the sessionid_lock spin lock. It must be taken for insert, delete, and lookup. nfsd4_sequence looks up the session id and if the session is found, it calls nfsd4_get_session (still under the sessionid_lock). nfsd4_destroy_session calls nfsd4_put_session after unhashing it, so when the session's kref reaches zero it's going to get freed. Signed-off-by: Benny Halevy <bhalevy@panasas.com> [we don't use a prime for sessionid hash table size] [use sessionid_lock spin lock] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
Marc Eshel authored
Signed-off-by: Benny Halevy <bhalevy@panasas.com> [add CONFIG_NFSD_V4_1 to fix v4.0 regression bug] Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
Marc Eshel authored
[get rid of CONFIG_NFSD_V4_1] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
Andy Adamson authored
This patch provides basic data structures representing the nfs41 sessions and slots, plus helpers for keeping a reference count on the session and freeing it. Note that our server only support a headerpadsz of 0 and it ignores backchannel attributes at the moment. Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfsd41: remove headerpadsz from channel attributes] [nfsd41: embed nfsd4_channel in nfsd4_session] Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfsd41: use bool inuse for slot state] Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfsd41 remove sl_session from nfsd4_slot] Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
Marc Eshel authored
Define all error code present in http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion1-29. Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfsd41: clean up error code definitions] [nfsd41: change NFSERR_REPLAY_ME] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
Benny Halevy authored
Define all NFSv4.1 common operation and error code constants. Note that some of the definitions are used by both the nfs41 client and the server code. This patch is duplicated in the nfs41 and nfsd41 sessions patchset. Signed-off-by: Andy Adamson<andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: add exchange id flags] Signed-off-by: Mike Sager <sager@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [removed server-only hunk changing NFSERR_REPLAY_ME] Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: add SEQ4_XX to nfs41-common-protocol] Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfs41: generic error code update] [nfs41: reverse EXCHGID4_INVAL_FLAG_MASK_{A,R}] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
Andy Adamson authored
On an NFSv4.1 server cache miss that causes an upcall, NFS4ERR_DELAY will be returned. It is up to the NFSv4.1 client to resend only the operations that have not been processed. Initialize rq_usedeferral to 1 in svc_process(). It sill be turned off in nfsd4_proc_compound() only when NFSv4.1 Sessions are used. Note: this isn't an adequate solution on its own. It's acceptable as a way to get some minimal 4.1 up and working, but we're going to have to find a way to avoid returning DELAY in all common cases before 4.1 can really be considered ready. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfsd41: reverse rq_nodeferral negative logic] Signed-off-by: Benny Halevy <bhalevy@panasas.com> [sunrpc: initialize rq_usedeferral] Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
- 30 Mar, 2009 1 commit
-
-
Benny Halevy authored
There's no need for it. Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
- 29 Mar, 2009 1 commit
-
-
Andy Adamson authored
Remove the allocation of struct nfsd4_compound_state. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
- 27 Mar, 2009 3 commits
-
-
Greg Banks authored
Fix a build warning about leaking CONFIG_NFSD to userspace. The nfsd_stats data structure does not need to be available to userspace; no kernel interface uses it. So move it inside #ifdef __KERNEL__ and the warning goes away. Signed-off-by: Greg Banks <gnb@sgi.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
Greg Banks authored
Document the format and semantics of the /proc/fs/nfsd/pool_stats file. Signed-off-by: Greg Banks <gnb@sgi.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
ideawu authored
There is no need to set rqstp->rq_server to serv, while serv is initialized as rqstp->rq_server at previous line. And between these two lines, there is no change to rqstp->rq_server. Signed-off-by: ideawu <ideawu@163.com> Reviewed-by: Tom Tucker <tom@opengridcomputing.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
- 18 Mar, 2009 14 commits
-
-
Chuck Lever authored
Clean up: Enable the use of const arguments in higher level svc_ APIs by adding const to the arguments of the helper functions in svc_xprt.h Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
Sachin S. Prabhu authored
There is an inconsistency seen in the behaviour of nfs compared to other local filesystems on linux when changing owner or group of a directory. If the directory has SUID/SGID flags set, on changing owner or group on the directory, the flags are stripped off on nfs. These flags are maintained on other filesystems such as ext3. To reproduce on a nfs share or local filesystem, run the following commands mkdir test; chmod +s+g test; chown user1 test; ls -ld test On the nfs share, the flags are stripped and the output seen is drwxr-xr-x 2 user1 root 4096 Feb 23 2009 test On other local filesystems(ex: ext3), the flags are not stripped and the output seen is drwsr-sr-x 2 user1 root 4096 Feb 23 13:57 test chown_common() called from sys_chown() will only strip the flags if the inode is not a directory. static int chown_common(struct dentry * dentry, uid_t user, gid_t group) { .. if (!S_ISDIR(inode->i_mode)) newattrs.ia_valid |= ATTR_KILL_SUID | ATTR_KILL_SGID | ATTR_KILL_PRIV; .. } See: http://www.opengroup.org/onlinepubs/7990989775/xsh/chown.html "If the path argument refers to a regular file, the set-user-ID (S_ISUID) and set-group-ID (S_ISGID) bits of the file mode are cleared upon successful return from chown(), unless the call is made by a process with appropriate privileges, in which case it is implementation-dependent whether these bits are altered. If chown() is successfully invoked on a file that is not a regular file, these bits may be cleared. These bits are defined in <sys/stat.h>." The behaviour as it stands does not appear to violate POSIX. However the actions performed are inconsistent when comparing ext3 and nfs. Signed-off-by: Sachin Prabhu <sprabhu@redhat.com> Acked-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
Olga Kornievskaia authored
Allow the NFSv4 server to make use of TCP autotuning behaviour, which was previously disabled by setting the sk_userlocks variable. Set the receive buffers to be big enough to receive the whole RPC request, and set this for the listening socket, not the accept socket. Remove the code that readjusts the receive/send buffer sizes for the accepted socket. Previously this code was used to influence the TCP window management behaviour, which is no longer needed when autotuning is enabled. This can improve IO bandwidth on networks with high bandwidth-delay products, where a large tcp window is required. It also simplifies performance tuning, since getting adequate tcp buffers previously required increasing the number of nfsd threads. Signed-off-by: Olga Kornievskaia <aglo@citi.umich.edu> Cc: Jim Rees <rees@umich.edu> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
J. Bruce Fields authored
The spec allows clients to change ip address, so we shouldn't be requiring that setclientid always come from the same address. For example, a client could reboot and get a new dhcpd address, but still present the same clientid to the server. In that case the server should revoke the client's previous state and allow it to continue, instead of (as it currently does) returning a CLID_INUSE error. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
Greg Banks authored
Add /proc/fs/nfsd/pool_stats to export to userspace various statistics about the operation of rpc server thread pools. This patch is based on a forward-ported version of knfsd-add-pool-thread-stats which has been shipping in the SGI "Enhanced NFS" product since 2006 and which was previously posted: http://article.gmane.org/gmane.linux.nfs/10375 It has also been updated thus: * moved EXPORT_SYMBOL() to near the function it exports * made the new struct struct seq_operations const * used SEQ_START_TOKEN instead of ((void *)1) * merged fix from SGI PV 990526 "sunrpc: use dprintk instead of printk in svc_pool_stats_*()" by Harshula Jayasuriya. * merged fix from SGI PV 964001 "Crash reading pool_stats before nfsds are started". Signed-off-by: Greg Banks <gnb@sgi.com> Signed-off-by: Harshula Jayasuriya <harshula@sgi.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
Greg Banks authored
Avoid overloading the CPU scheduler with enormous load averages when handling high call-rate NFS loads. When the knfsd bottom half is made aware of an incoming call by the socket layer, it tries to choose an nfsd thread and wake it up. As long as there are idle threads, one will be woken up. If there are lot of nfsd threads (a sensible configuration when the server is disk-bound or is running an HSM), there will be many more nfsd threads than CPUs to run them. Under a high call-rate low service-time workload, the result is that almost every nfsd is runnable, but only a handful are actually able to run. This situation causes two significant problems: 1. The CPU scheduler takes over 10% of each CPU, which is robbing the nfsd threads of valuable CPU time. 2. At a high enough load, the nfsd threads starve userspace threads of CPU time, to the point where daemons like portmap and rpc.mountd do not schedule for tens of seconds at a time. Clients attempting to mount an NFS filesystem timeout at the very first step (opening a TCP connection to portmap) because portmap cannot wake up from select() and call accept() in time. Disclaimer: these effects were observed on a SLES9 kernel, modern kernels' schedulers may behave more gracefully. The solution is simple: keep in each svc_pool a counter of the number of threads which have been woken but have not yet run, and do not wake any more if that count reaches an arbitrary small threshold. Testing was on a 4 CPU 4 NIC Altix using 4 IRIX clients, each with 16 synthetic client threads simulating an rsync (i.e. recursive directory listing) workload reading from an i386 RH9 install image (161480 regular files in 10841 directories) on the server. That tree is small enough to fill in the server's RAM so no disk traffic was involved. This setup gives a sustained call rate in excess of 60000 calls/sec before being CPU-bound on the server. The server was running 128 nfsds. Profiling showed schedule() taking 6.7% of every CPU, and __wake_up() taking 5.2%. This patch drops those contributions to 3.0% and 2.2%. Load average was over 120 before the patch, and 20.9 after. This patch is a forward-ported version of knfsd-avoid-nfsd-overload which has been shipping in the SGI "Enhanced NFS" product since 2006. It has been posted before: http://article.gmane.org/gmane.linux.nfs/10374Signed-off-by: Greg Banks <gnb@sgi.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
Greg Banks authored
Stop gathering the data that feeds the 'th' line in /proc/net/rpc/nfsd because the questionable data provided is not worth the scalability impact of calculating it. Instead, always report zeroes. The current approach suffers from three major issues: 1. update_thread_usage() increments buckets by call service time or call arrival time...in jiffies. On lightly loaded machines, call service times are usually < 1 jiffy; on heavily loaded machines call arrival times will be << 1 jiffy. So a large portion of the updates to the buckets are rounded down to zero, and the histogram is undercounting. 2. As seen previously on the nfs mailing list, the format in which the histogram is presented is cryptic, difficult to explain, and difficult to use. 3. Updating the histogram requires taking a global spinlock and dirtying the global variables nfsd_last_call, nfsd_busy, and nfsdstats *twice* on every RPC call, which is a significant scaling limitation. Testing on a 4 CPU 4 NIC Altix using 4 IRIX clients each doing 1K streaming reads at full line rate, shows the stats update code (inlined into nfsd()) takes about 1.7% of each CPU. This patch drops the contribution from nfsd() into the profile noise. This patch is a forward-ported version of knfsd-remove-nfsd-threadstats which has been shipping in the SGI "Enhanced NFS" product since 2006. In that time, exactly one customer has noticed that the threadstats were missing. It has been previously posted: http://article.gmane.org/gmane.linux.nfs/10376 and more recently requested to be posted again. Signed-off-by: Greg Banks <gnb@sgi.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
J. Bruce Fields authored
Note that we already checked for this invalid case at the top of this function. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
J. Bruce Fields authored
The main nfsd code was recently modified to no longer do lookups from withing the readdir callback, to avoid locking problems on certain filesystems. This (rather hacky, and overdue for replacement) NFSv4 recovery code has the same problem. Fix it to build up a list of names (instead of dentries) and do the lookups afterwards. Reported symptoms were a deadlock in the xfs code (called from nfsd4_recdir_load), with /var/lib/nfs on xfs. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Reported-by: David Warren <warren@atmos.washington.edu>
-
J. Bruce Fields authored
Currently putpubfh returns NFSERR_OPNOTSUPP, which isn't actually allowed for v4. The right error is probably NFSERR_NOTSUPP. But let's just implement it; though rarely seen, it can be used by Solaris (with a special mount option), is mandated by the rfc, and is trivial for us to support. Thanks to Yang Hongyang for pointing out the original problem, and to Mike Eisler, Tom Talpey, Trond Myklebust, and Dave Noveck for further argument.... Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
David Shaw authored
If a filesystem being written to via NFS returns a short write count (as opposed to an error) to nfsd, nfsd treats that as a success for the entire write, rather than the short count that actually succeeded. For example, given a 8192 byte write, if the underlying filesystem only writes 4096 bytes, nfsd will ack back to the nfs client that all 8192 bytes were written. The nfs client does have retry logic for short writes, but this is never called as the client is told the complete write succeeded. There are probably other ways it could happen, but in my case it happened with a fuse (filesystem in userspace) filesystem which can rather easily have a partial write. Here is a patch to properly return the short write count to the client. Signed-off-by: David Shaw <dshaw@jabberwocky.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
Benny Halevy authored
Thanks for Bill Baker at sun.com for catching this at Connectathon 2009. This bug was introduced in 2.6.27 Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
J. Bruce Fields authored
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
-
J. Bruce Fields authored
The errors returned aren't used. Just return 0 and make them available to a dprintk(). Also, consistently use -ERRNO errors instead of nfs errors. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Reviewed-by: Benny Halevy <bhalevy@panasas.com>
-