- 04 Sep, 2009 40 commits
-
-
Vlad Yasevich authored
SCTP will delay the last part of a large write due to NAGLE, if that part is smaller then MTU. Since we are doing large writes, we might as well send the last portion now instead of waiting untill the next large write happens. The small portion will be sent as is regardless, so it's better to not delay it. This is a result of much discussions with Wei Yongjun <yjwei@cn.fujitsu.com> and Doug Graham <dgraham@nortel.com>. Many thanks go out to them. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
-
Vlad Yasevich authored
The decision to delay due to Nagle should be based on the path mtu and future packet size. We currently incorrectly base it on 'frag_point' which is the SCTP DATA segment size, and also we do not count DATA chunk header overhead in the computation. This actuall allows situations where a user can set low 'frag_point', and then send small messages without delay. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
-
Vlad Yasevich authored
We currently set a_rwnd to 0 when faking a SACK from SHUTDOWN. This results in an hung association if the remote only uses SHUTDOWNs (which it's allowed to do) to acknowlege DATA when closing. The reason for that is that we simply honor the a_rwnd from the sack, but since we faked it to be 0, we enter 0-window probing. The fix is to use the peers old rwnd and add our flight size to it. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
-
Vlad Yasevich authored
SCTP has a problem that when small chunks are used, it is possible to exhaust the receiver buffer without fully closing receive window. This happens due to all overhead that we have account for with small messages. To fix this, when receive buffer is exceeded, we'll drop the window to 0 and save the 'drop' portion. When application starts reading data and freeing up recevie buffer space, we'll wait until we've reached the 'drop' window and then add back this 'drop' one mtu at a time. This worked well in testing and under stress produced rather even recovery. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
-
Vlad Yasevich authored
If T3 timer expires, we are retransmitting data due to timeout any any fast recovery is null and void. We can clear the fast recovery flag. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
-
Vlad Yasevich authored
SCTP RFC 4960 states that unacknowledged HEARTBEATS count as errors agains a given transport or endpoint. As such, we should increment the error counts for only for unacknowledged HB, otherwise we detect failure too soon. This goes for both the overall error count and the path error count. Now, there is a difference in how the detection is done between the two. The path error detection is done after the increment, so to detect it properly, we actually need to exceed the path threshold. The overall error detection is done _BEFORE_ the increment. Thus to detect the failure, it's enough for the error count to match the threshold. This is why all the state functions use '>=' to detect failure, while path detection uses '>'. Thanks goes to Chunbo Luo <chunbo.luo@windriver.com> who first proposed patches to fix this issue and made me re-read the spec and the code to figure out how this cruft really works. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
-
Alexey Dobriyan authored
create_proc_entry() is deprecated (not formally, though). Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
-
Wei Yongjun authored
The receiver of the HEARTBEAT should respond with a HEARTBEAT ACK that contains the Heartbeat Information field copied from the received HEARTBEAT chunk. So the received HEARTBEAT-ACK chunk must have a length of: sizeof(sctp_chunkhdr_t) + sizeof(sctp_sender_hb_info_t) A badly formatted HB-ACK chunk, it is possible that we may access invalid memory. We should really make sure that the chunk format is what we expect, before attempting to touch the data. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
-
Wei Yongjun authored
If Cumulative TSN Ack field of SHUTDOWN chunk is less than the Cumulative TSN Ack Point then drop the SHUTDOWN chunk. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
-
Vlad Yasevich authored
Currenlty, sctp breaks up user messages into fragments and sends each fragment to the lower layer by itself. This means that for each fragment we go all the way down the stack and back up. This also discourages bundling of multiple fragments when they can fit into a sigle packet (ex: due to user setting a low fragmentation threashold). We introduce a new command SCTP_CMD_SND_MSG and hand the whole message down state machine. The state machine and the side-effect parser will cork the queue, add all chunks from the message to the queue, and then un-cork the queue thus causing the chunks to get transmitted. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
-
Vlad Yasevich authored
If the association has a SACK timer pending and now DATA queued to be send, we'll try to bundle the SACK with the next application send. As such, try encourage bundling by accounting for SACK in the size of the first chunk fragment. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
-
Vlad Yasevich authored
We are now trying to bundle SACKs when we have outbound DATA to send. However, there are situations where this outbound DATA will not be sent (due to congestion or available window). In such cases it's ok to wait for the timer to expire. This patch refactors the sending code so that betfore attempting to bundle the SACK we check to see if the DATA will actually be transmitted. Based on eirlier works for Doug Graham <dgraham@nortel.com> and Wei Youngjun <yjwei@cn.fujitsu.com>. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
-
Vlad Yasevich authored
Since an application may specify the maximum SCTP fragment size that all data should be fragmented to, we need to fix how we do segmentation. Right now, if a user specifies a small fragment size, the segment size can go negative in the presence of AUTH or COOKIE_ECHO bundling. What we need to do is track the largest possbile DATA chunk that can fit into the mtu. Then if the fragment size specified is bigger then this maximum length, we'll shrink it down. Otherwise, we just use the smaller segment size without changing it further. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
-
Vlad Yasevich authored
If a socket has a lot of association that are in the process of of being closed/aborted, it is possible for a remote to establish new associations during the time period that the old ones are shutting down. If this was a result of a close() call, there will be no socket and will cause a memory leak. We'll prevent this by setting the socket state to CLOSING and disallow new associations when in this state. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
-
Doug Graham authored
This patch corrects the conditions under which a SACK will be piggybacked on a DATA packet. The previous condition was incorrect due to a misinterpretation of RFC 4960 and/or RFC 2960. Specifically, the following paragraph from section 6.2 had not been implemented correctly: Before an endpoint transmits a DATA chunk, if any received DATA chunks have not been acknowledged (e.g., due to delayed ack), the sender should create a SACK and bundle it with the outbound DATA chunk, as long as the size of the final SCTP packet does not exceed the current MTU. See Section 6.2. When about to send a DATA chunk, the code now checks to see if the SACK timer is running. If it is, we know we have a SACK to send to the peer, so we append the SACK (assuming available space in the packet) and turn off the timer. For a simple request-response scenario, this will result in the SACK being bundled with the response, meaning the the SACK is received quickly by the client, and also meaning that no separate SACK packet needs to be sent by the server to acknowledge the request. Prior to this patch, a separate SACK packet would have been sent by the server SCTP only after its delayed-ACK timer had expired (usually 200ms). This is wasteful of bandwidth, and can also have a major negative impact on performance due the interaction of delayed ACKs with the Nagle algorithm. Signed-off-by: Doug Graham <dgraham@nortel.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
-
Rami Rosen authored
This patch removes an unused union definition (sctp_cmsg_data_t) from include/net/sctp/user.h. Signed-off-by: Rami Rosen <rosenrami@gmail.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
-
Vlad Yasevich authored
When the sctp transport is marked down, we can release the cached route and force a new lookup when attempting to use this transport for anything. This way, if a better route or source address is available, we'll try to use it. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
-
Wei Yongjun authored
Update the route and saddr entries for the non-active transports as some of the added addresses can be used as better source addresses, or may be there is a better route. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
-
Wei Yongjun authored
This patch fix to check the unrecognized ASCONF parameter before access it. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
-
Wei Yongjun authored
The return value of sctp_process_asconf_ack() may be overwritten while process parameters with no error. This patch fixed the problem. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
-
Sachin Sant authored
Signed-off-by: Sachin Sant <sachinp@in.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Wolfgang Grandegger authored
This patch adds support for legacy SJA1000 CAN controllers on the ISA or PC-104 bus. The I/O port or memory address and the IRQ number must be specified via module parameters: insmod sja1000_isa.ko port=0x310,0x380 irq=7,11 for ISA devices using I/O ports or: insmod sja1000_isa.ko mem=0xd1000,0xd1000 irq=7,11 for memory mapped ISA devices. Indirect access via address and data port is supported as well: insmod sja1000_isa.ko port=0x310,0x380 indirect=1 irq=7,11 Here is a full list of the supported module parameters: port:I/O port number (array of ulong) mem:I/O memory address (array of ulong) indirect:Indirect access via address and data port (array of byte) irq:IRQ number (array of int) clk:External oscillator clock frequency (default=16000000 [16 MHz]) (array of int) cdr:Clock divider register (default=0x48 [CDR_CBP | CDR_CLK_OFF]) (array of byte) ocr:Output clock register (default=0x18 [OCR_TX0_PUSHPULL]) (array of byte) Note: for clk, cdr, ocr, the first argument re-defines the default for all other devices, e.g.: insmod sja1000_isa.ko mem=0xd1000,0xd1000 irq=7,11 clk=24000000 is equivalent to insmod sja1000_isa.ko mem=0xd1000,0xd1000 irq=7,11 \ clk=24000000,24000000 Signed-off-by: Wolfgang Grandegger <wg@grandegger.com> Tested-by: Oliver Hartkopp <oliver@hartkopp.net> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Wolfgang Grandegger authored
The member "tx_bytes" of "struct net_device_stats" should be incremented when the interrupt is done and an "arbitration lost error" is a TX error and the statistics should be updated accordingly. Signed-off-by: Wolfgang Grandegger <wg@grandegger.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Wolfgang Grandegger authored
This patch adds the function can_free_echo_skb to the CAN device interface to allow upcoming drivers to release echo skb's in case of error. Signed-off-by: Wolfgang Grandegger <wg@grandegger.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Noticed by Stephen Rothwell: Today's linux-next build (x86_64 allmodconfig gcc-4.4.0) produced this warning: drivers/net/wan/dscc4.c: In function 'dscc4_rx_skb': drivers/net/wan/dscc4.c:670: warning: suggest parentheses around comparison in operand of '|' which actually points out a bug, I think. It is doing (x & (y | z)) != y | z when it probably means (x & (y | z)) != (y | z) Introduced by commit 5de3fcab ("WAN: bit and/or confusion"). Signed-off-by: David S. Miller <davem@davemloft.net>
-
Cosmin Ratiu authored
Here is a patch which fixes an issue observed when using TCP over IPv6 and AH from IPsec. When a connection gets closed the 4-way method and the last ACK from the server gets dropped, the subsequent FINs from the client do not get ACKed because tcp_v6_send_response does not set the transport header pointer. This causes ah6_output to try to allocate a lot of memory, which typically fails, so the ACKs never make it out of the stack. I have reproduced the problem on kernel 2.6.7, but after looking at the latest kernel it seems the problem is still there. Signed-off-by: Cosmin Ratiu <cratiu@ixiacom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Scott Feldman authored
To unclutter probe() a little bit, put all device initialization code in one spot and device deinit code in another spot. Also remove unused rq->buf_index variable/func. Signed-off-by: Scott Feldman <scofeldm@cisco.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Scott Feldman authored
Nic firmware can return zero for port MTU, so check for non-zero value before checking for change in port MTU. Signed-off-by: Scott Feldman <scofeldm@cisco.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Scott Feldman authored
Deprecate some old APIa; change arguments to stats dump all API; add new interrupt assert API Signed-off-by: Scott Feldman <scofeldm@cisco.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Scott Feldman authored
Bug fix: enable VLAN filtering Signed-off-by: Scott Feldman <scofeldm@cisco.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Scott Feldman authored
Provision for multiple Rx/Tx queues. Max of 8 WQs and 8 RQs. Max for completion queue is 8+8=16 and max for interrupt resources is 8+8+2. Add driver/firmware interface for setting up RSS secret key and indirection table. Signed-off-by: Scott Feldman <scofeldm@cisco.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Scott Feldman authored
Bug fix: included MAC drops in rx_dropped netstat. Also track Rx trunctations stat at the MAC Signed-off-by: Scott Feldman <scofeldm@cisco.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Scott Feldman authored
Some driver -> nic firmware calls weren't guarded with a spinlock, exposing the call i/f to a race between two threads Signed-off-by: Scott Feldman <scofeldm@cisco.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Scott Feldman authored
Use netdev_alloc_skb rather than dev_alloc_skb Signed-off-by: Scott Feldman <scofeldm@cisco.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Scott Feldman authored
enic WQ desc supports a maximum 16K buf size, so split any send fragments larger than 16K into several descs. Signed-off-by: Scott Feldman <scofeldm@cisco.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Scott Feldman authored
A0 revision ASIC has an erratum on the RQ desc cache on chip where the cache can become corrupted causing pkt buf writes to wrong locations. The s/w workaround is to post a dummy RQ desc in the ring every 32 descs, causing a flush of the cache. A0 parts are not production, but there are enough of these parts in the wild in test setups to warrant including workaround. A1 revision ASIC parts fix erratum. Signed-off-by: Scott Feldman <scofeldm@cisco.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Scott Feldman authored
Nic firmware can place resources (queues, intrs, etc) on multiple BARs, so allow driver to discover/map resources beyond BAR0. Signed-off-by: Scott Feldman <scofeldm@cisco.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
Its hard to tell if vlans are dropping frames, since every frame given to vlan_???_start_xmit() functions is accounted as fully transmitted by lower device. We can test dev_queue_xmit() return values to properly account for dropped frames. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
macvlan devices are currently not multi-queue capable. We can do that defining rtnl_link_ops method, get_tx_queues(), called from rtnl_create_link() This new method gets num_tx_queues/real_num_tx_queues from lower device. macvlan_get_tx_queues() is a copy of vlan_get_tx_queues(). Because macvlan_start_xmit() has to update netdev_queue stats only (and not dev->stats), I chose to change tx_errors/tx_aborted_errors accounting to tx_dropped, since netdev_queue structure doesnt define tx_errors / tx_aborted_errors. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Ben Hutchings authored
A few drivers still access the arguments to MDIO ioctls as an array of u16. Convert them to use struct mii_ioctl_data. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-