- 14 Dec, 2007 40 commits
-
-
Patrick McHardy authored
[NETFILTER]: xt_TCPMSS: remove network triggerable WARN_ON [ Upstream commit: 9dc0564e ] ipv6_skip_exthdr() returns -1 for invalid packets. don't WARN_ON that. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
-
Patrick McHardy authored
[XFRM]: Fix leak of expired xfrm_states [ Upstream commit: 5dba4797 ] The xfrm_timer calls __xfrm_state_delete, which drops the final reference manually without triggering destruction of the state. Change it to use xfrm_state_put to add the state to the gc list when we're dropping the last reference. The timer function may still continue to use the state safely since the final destruction does a del_timer_sync(). Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
-
Tejun Heo authored
patch 459ad688 in mainline. Spurious NCQ completion detection implemented in ahci was incorrect. On AHCI receving and processing FISes and raising interrupts are not interlocked and spurious interrupts are expected. For example, if an interrupt occurs while interrupt handler is running and the running interrupt handler handles the event the new IRQ indicated, after IRQ handler finishes, it will be executed again because IRQ pending bit is set by the new interrupt but there won't be anything to process. Please read the following message for more information. http://article.gmane.org/gmane.linux.ide/26012 This patch... * Removes all spurious IRQ whining from ahci. Spurious NCQ completion detection was completely wrong. Spurious D2H Register FIS taught us that some early drives send spurious D2H Register FIS with I bit set while NCQ commands are in progress but none of recent drives does that and even the ones which show such behavior can do NCQ fine. * Kills all NCQ blacklist entries which were added because of spurious NCQ completions. I tracked down each commit and verified all removed ones are actually added because of spurious completions. WD740ADFD-00NLR1 wasn't deleted but moved upward because the drive not only had spurious NCQ completions but also is slow on sequential data transfers if NCQ is enabled. Maxtor 7V300F0 was added by 0e3dbc01 from Alan Cox. I can only find evidences that the drive only had troubles with spuruious completions by searching the mailing list. This entry needs to be verified and removed if it doesn't have other NCQ related problems. Signed-off-by: Tejun Heo <htejun@gmail.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Jeff Garzik <jeff@garzik.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
-
Jan Engelhardt authored
[NETFILTER]: fix forgotten module release in xt_CONNMARK and xt_CONNSECMARK [ Upstream commit: 67b4af29 ] Fix forgotten module release in xt_CONNMARK and xt_CONNSECMARK When xt_CONNMARK is used outside the mangle table and the user specified "--restore-mark", the connmark_tg_check() function will (correctly) error out, but (incorrectly) forgets to release the L3 conntrack module. Same for xt_CONNSECMARK. Fix is to move the call to acquire the L3 module after the basic constraint checks. Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
-
Florian Zumbiehl authored
[UNIX]: EOF on non-blocking SOCK_SEQPACKET [ Upstream commit: 0a112258 ] I am not absolutely sure whether this actually is a bug (as in: I've got no clue what the standards say or what other implementations do), but at least I was pretty surprised when I noticed that a recv() on a non-blocking unix domain socket of type SOCK_SEQPACKET (which is connection oriented, after all) where the remote end has closed the connection returned -1 (EAGAIN) rather than 0 to indicate end of file. This is a test case: | #include <sys/types.h> | #include <unistd.h> | #include <sys/socket.h> | #include <sys/un.h> | #include <fcntl.h> | #include <string.h> | #include <stdlib.h> | | int main(){ | int sock; | struct sockaddr_un addr; | char buf[4096]; | int pfds[2]; | | pipe(pfds); | sock=socket(PF_UNIX,SOCK_SEQPACKET,0); | addr.sun_family=AF_UNIX; | strcpy(addr.sun_path,"/tmp/foobar_testsock"); | bind(sock,(struct sockaddr *)&addr,sizeof(addr)); | listen(sock,1); | if(fork()){ | close(sock); | sock=socket(PF_UNIX,SOCK_SEQPACKET,0); | connect(sock,(struct sockaddr *)&addr,sizeof(addr)); | fcntl(sock,F_SETFL,fcntl(sock,F_GETFL)|O_NONBLOCK); | close(pfds[1]); | read(pfds[0],buf,sizeof(buf)); | recv(sock,buf,sizeof(buf),0); // <-- this one | }else accept(sock,NULL,NULL); | exit(0); | } If you try it, make sure /tmp/foobar_testsock doesn't exist. The marked recv() returns -1 (EAGAIN) on 2.6.23.9. Below you find a patch that fixes that. Signed-off-by: Florian Zumbiehl <florz@florz.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
-
Stephen Hemminger authored
[TCP] illinois: Incorrect beta usage [ Upstream commit: a357dde9 ] Lachlan Andrew observed that my TCP-Illinois implementation uses the beta value incorrectly: The parameter beta in the paper specifies the amount to decrease *by*: that is, on loss, W <- W - beta*W but in tcp_illinois_ssthresh() uses beta as the amount to decrease *to*: W <- beta*W This bug makes the Linux TCP-Illinois get less-aggressive on uncongested network, hurting performance. Note: since the base beta value is .5, it has no impact on a congested network. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
-
Evgeniy Polyakov authored
[IPV6]: Restore IPv6 when MTU is big enough [ Upstream commit: d31c7b8f ] Avaid provided test application, so bug got fixed. IPv6 addrconf removes ipv6 inner device from netdev each time cmu changes and new value is less than IPV6_MIN_MTU (1280 bytes). When mtu is changed and new value is greater than IPV6_MIN_MTU, it does not add ipv6 addresses and inner device bac. This patch fixes that. Tested with Avaid's application, which works ok now. Signed-off-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
-
Pavel Emelyanov authored
[DECNET]: dn_nl_deladdr() almost always returns no error [ Upstream commit: 3ccd8624 ] As far as I see from the err variable initialization the dn_nl_deladdr() routine was designed to report errors like "EADDRNOTAVAIL" and probaby "ENODEV". But the code sets this err to 0 after the first nlmsg_parse and goes on, returning this 0 in any case. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
-
Joonwoo Park authored
[VLAN]: Fix nested VLAN transmit bug [ Upstream commit: 6ab3b487 ] Fix misbehavior of vlan_dev_hard_start_xmit() for recursive encapsulations. Signed-off-by: Joonwoo Park <joonwpark81@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
-
Pablo Neira Ayuso authored
[TEXTSEARCH]: Do not allow zero length patterns in the textsearch infrastructure [ Upstream commit: e03ba84a ] If a zero length pattern is passed then return EINVAL. Avoids infinite loops (bm) or invalid memory accesses (kmp). Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
-
David Howells authored
[RXRPC]: Add missing select on CRYPTO [ Upstream commit: d5a784b3 ] AF_RXRPC uses the crypto services, so should depend on or select CRYPTO. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
-
Pavel Emelyanov authored
[BRIDGE]: Lost call to br_fdb_fini() in br_init() error path [ Upstream commit: 17efdd45 ] In case the br_netfilter_init() (or any subsequent call) fails, the br_fdb_fini() must be called to free the allocated in br_fdb_init() br_fdb_cache kmem cache. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
-
Charles Hardin authored
[PFKEY]: Sending an SADB_GET responds with an SADB_GET [ Upstream commit: 435000be ] Kernel needs to respond to an SADB_GET with the same message type to conform to the RFC 2367 Section 3.1.5 Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
-
Ilpo Järvinen authored
[TCP] MTUprobe: fix potential sk_send_head corruption [ Upstream commit: 6e421410 ] When the abstraction functions got added, conversion here was made incorrectly. As a result, the skb may end up pointing to skb which got included to the probe skb and then was freed. For it to trigger, however, skb_transmit must fail sending as well. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
-
Herbert Xu authored
[TCP]: Fix TCP header misalignment [ Upstream commit: 21df56c6 ] Indeed my previous change to alloc_pskb has made it possible for the TCP header to be misaligned iff the MTU is not a multiple of 4 (and less than a page). So I suspect the optimised IPsec MTU calculation is giving you just such an MTU :) This patch fixes it by changing alloc_pskb to make sure that the size is at least 32-bit aligned. This does not cause the problem fixed by the previous patch because max_header is always 32-bit aligned which means that in the SG/NOTSO case this will be a no-op. I thought about putting this in the callers but all the current callers are from TCP. If and when we get a non-TCP caller we can always create a TCP wrapper for this function and move the alignment over there. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
-
Herbert Xu authored
[CRYPTO] api: Fix potential race in crypto_remove_spawn [ Upstream commit: 38cb2419 ] As it is crypto_remove_spawn may try to unregister an instance which is yet to be registered. This patch fixes this by checking whether the instance has been registered before attempting to remove it. It also removes a bogus cra_destroy check in crypto_register_instance as 1) it's outside the mutex; 2) we have a check in __crypto_register_alg already. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Cc: David Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
-
Sam Jansen authored
[TCP]: Problem bug with sysctl_tcp_congestion_control function [ Upstream commit: 5487796f ] sysctl_tcp_congestion_control seems to have a bug that prevents it from actually calling the tcp_set_default_congestion_control function. This is not so apparent because it does not return an error and generally the /proc interface is used to configure the default TCP congestion control algorithm. This is present in 2.6.18 onwards and probably earlier, though I have not inspected 2.6.15--2.6.17. sysctl_tcp_congestion_control calls sysctl_string and expects a successful return code of 0. In such a case it actually sets the congestion control algorithm with tcp_set_default_congestion_control. Otherwise, it returns the value returned by sysctl_string. This was correct in 2.6.14, as sysctl_string returned 0 on success. However, sysctl_string was updated to return 1 on success around about 2.6.15 and sysctl_tcp_congestion_control was not updated. Even though sysctl_tcp_congestion_control returns 1, do_sysctl_strategy converts this return code to '0', so the caller never notices the error. Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
-
chas williams authored
[ATM]: [he] initialize lock and tasklet earlier [ Upstream commit: 8a8037ac ] if you are lucky (unlucky?) enough to have shared interrupts, the interrupt handler can be called before the tasklet and lock are ready for use. Signed-off-by: chas williams <chas@cmf.nrl.navy.mil> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Cc: David Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
-
Adrian Bunk authored
[IPV4]: Remove bogus ifdef mess in arp_process [ Upstream commit: 3660019e ] The #ifdef's in arp_process() were not only a mess, they were also wrong in the CONFIG_NET_ETHERNET=n and (CONFIG_NETDEV_1000=y or CONFIG_NETDEV_10000=y) cases. Since they are not required this patch removes them. Also removed are some #ifdef's around #include's that caused compile errors after this change. Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Cc: David Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
-
Eric Dumazet authored
[NET]: Corrects a bug in ip_rt_acct_read() [ Upstream commit: 483b23ff ] It seems that stats of cpu 0 are counted twice, since for_each_possible_cpu() is looping on all possible cpus, including 0 Before percpu conversion of ip_rt_acct, we should also remove the assumption that CPU 0 is online (or even possible) Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
-
Daniel Drake authored
patch dec13c15 in mainline. The CONFIG_SUSPEND changes in 2.6.23 caused a regression under certain configuration conditions (SUSPEND=n, USB_AUTOSUSPEND=y) where all USB device attributes in sysfs (idVendor, idProduct, ...) silently disappeared, causing udev breakage and more. The cause of this is that the /sys/.../power subdirectory is now only created when CONFIG_PM_SLEEP is set, however, it should be created whenever CONFIG_PM is set to handle the above situation. The following patch fixes the regression. Signed-off-by: Daniel Drake <dsd@gentoo.org> Acked-by: Rafael J. Wysocki <rjw@sisk.pl> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: Kay Sievers <kay.sievers@vrfy.org> Cc: stable <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
-
Evgeniy Polyakov authored
This patch fixes a NAT regression in 2.6.23, resulting in a crash when a connection is NATed and matches a conntrack helper after NAT. Please apply, thanks. [NETFILTER]: Fix kernel panic with REDIRECT target. Upstream commit 1f305323 When connection tracking entry (nf_conn) is about to copy itself it can have some of its extension users (like nat) as being already freed and thus not required to be copied. Actually looking at this function I suspect it was copied from nf_nat_setup_info() and thus bug was introduced. Report and testing from David <david@unsolicited.net>. [ Patrick McHardy states: I now understand whats happening: - new connection is allocated without helper - connection is REDIRECTed to localhost - nf_nat_setup_info adds NAT extension, but doesn't initialize it yet - nf_conntrack_alter_reply performs a helper lookup based on the new tuple, finds the SIP helper and allocates a helper extension, causing reallocation because of too little space - nf_nat_move_storage is called with the uninitialized nat extension So your fix is entirely correct, thanks a lot :) ] Signed-off-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
-
Li Zefan authored
This patch fixes an incorrect memset in the NAT code, causing misbehaviour when unloading and reloading the NAT module. Applies to stable-2.6.22 and stable-2.6.23. Please apply, thanks. [NETFILTER]: nf_nat: fix memset error Upstream commit e0bf9cf1 The size passing to memset is the size of a pointer. Fixes misbehaviour when unloading and reloading the NAT module. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
-
Maciej W. Rozycki authored
patch 522939d4 in mainline. The esp_reset_cleanup() function is called with the host lock held and invokes starget_for_each_device() which wants to take it too. Here is a fix along the lines of shost_for_each_device()/__shost_for_each_device() adding a __starget_for_each_device() counterpart which assumes the lock has already been taken. Eventually, I think the driver should get modified so that more work is done as a softirq rather than in the interrupt context, but for now it fixes a bug that causes the spinlock debugger to fire. While at it, it fixes a small number of cosmetic problems with starget_for_each_device() too. Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org> Acked-by: David S. Miller <davem@davemloft.net> Cc: James Bottomley <James.Bottomley@steeleye.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
-
Andrew Morton authored
patch 24601bbc in mainline. revert commit 55d9fcf5 Author: Matthew Wilcox <matthew@wil.cx> Date: Mon Jul 30 15:19:18 2007 -0600 [SCSI] dpt_i2o: convert to SCSI hotplug model - Delete refereces to HOSTS_C - Switch to module_init/module_exit instead of detect/release - Don't pass around the host template and rename it to adpt_template - Switch from scsi_register/scsi_unregister to scsi_host_alloc, scsi_add_host, scsi_scan_host and scsi_host_put. Because it caused (for unknown reasons) Andres' all-data-reads-as-zeroes problem, reported at http://groups.google.com/group/fa.linux.kernel/msg/083a9acff0330234 Cc: Matthew Wilcox <matthew@wil.cx> Cc: Mark Salyzyn <mark_salyzyn@adaptec.com> Cc: James Bottomley <James.Bottomley@SteelEye.com> Acked-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Anders Henke <anders.henke@1und1.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
-
Jean Delvare authored
patch b64d7082 in mainline. The code in fb_ddc_read() is said to be based on the implementation of the radeon driver: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=fc5891c8a3ba284f13994d7bc1f1bfa8283982de However, comparing the old radeon driver code with the new fb_ddc code reveals some differences. Most notably, the I2C bus lines are held at the end of the function, while the original code was releasing them (as the comment above correctly says.) There are a few other differences, which appear to be responsible for read failures on my system. While tracing low-level I2C code in i2c-algo-bit, I noticed that the initial attempt to read the EDID always failed. It takes one retry for the read to succeed. As we are about to remove this automatic retry property from i2c-algo-bit, reading the EDID would really fail. As a summary, the I2C lines quirk which is supposedly needed to read EDID on some older monitors is currently breaking the (first) read on all other monitors (and might not even work with older ones - did anyone try since October 2006?) After applying the patch below, which makes the code in fb_ddc_read() really similar to what the radeon driver used to have, the first EDID read succeeds again. On top of that, as it appears that this code has been broken for one year now and nobody seems to have complained, I'm curious if it makes sense to keep this quirk in place. It makes the code more complex and slower just for the sake of monitors which I guess nobody uses anymore. Can't we just get rid of it? Signed-off-by: Jean Delvare <khali@linux-fr.org> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Tested-by: Roger Leigh <rleigh@whinlatter.ukfsn.org> Tested-by: Michael Buesch <mb@bu3sch.de> Cc: "Antonino A. Daplas" <adaplas@pol.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
-
Scott James Remnant authored
patch e6ceb32a in mainline. In wait_task_stopped() exit_code already contains the right value for the si_status member of siginfo, and this is simply set in the non WNOWAIT case. If you call waitid() with a stopped or traced process, you'll get the signal in siginfo.si_status as expected -- however if you call waitid(WNOWAIT) at the same time, you'll get the signal << 8 | 0x7f Pass it unchanged to wait_noreap_copyout(); we would only need to shift it and add 0x7f if we were returning it in the user status field and that isn't used for any function that permits WNOWAIT. Signed-off-by: Scott James Remnant <scott@ubuntu.com> Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
-
Zhao Yakui authored
patch a7839e96 in mainline. On some systems the number of resources(IO,MEM) returnedy by PNP device is greater than the PNP constant, for example motherboard devices. It brings that some resources can't be reserved and resource confilicts. This will cause PCI resources are assigned wrongly in some systems, and cause hang. This is a regression since we deleted ACPI motherboard driver and use PNP system driver. [akpm@linux-foundation.org: fix text and coding-style a bit] Signed-off-by: Li Shaohua <shaohua.li@intel.com> Signed-off-by: Zhao Yakui <yakui.zhao@intel.com> Cc: Bjorn Helgaas <bjorn.helgaas@hp.com> Cc: Thomas Renninger <trenn@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
-
Rafael J. Wysocki authored
patch cb43c54c in mainline. The APM emulation is currently broken as a result of commit 83144186 "Freezer: make kernel threads nonfreezable by default" that removed the PF_NOFREEZE annotations from apm_ioctl() without adding the appropriate freezer hooks. Fix it and remove the unnecessary variable flags from apm_ioctl(). Special thanks to Franck Bui-Huu <vagabon.xyz@gmail.com> for pointing out the problem. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Cc: Pavel Machek <pavel@ucw.cz> Cc: Franck Bui-Huu <vagabon.xyz@gmail.com> Cc: Nigel Cunningham <nigel@nigel.suspend2.net> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
-
Steven Rostedt authored
From Steven Rostedt <srostedt@redhat.com> patch ce6bd420 in mainline. David Holmes found a bug in the -rt tree with respect to pthread_cond_timedwait. After trying his test program on the latest git from mainline, I found the bug was there too. The bug he was seeing that his test program showed, was that if one were to do a "Ctrl-Z" on a process that was in the pthread_cond_timedwait, and then did a "bg" on that process, it would return with a "-ETIMEDOUT" but early. That is, the timer would go off early. Looking into this, I found the source of the problem. And it is a rather nasty bug at that. Here's the relevant code from kernel/futex.c: (not in order in the file) [...] smlinkage long sys_futex(u32 __user *uaddr, int op, u32 val, struct timespec __user *utime, u32 __user *uaddr2, u32 val3) { struct timespec ts; ktime_t t, *tp = NULL; u32 val2 = 0; int cmd = op & FUTEX_CMD_MASK; if (utime && (cmd == FUTEX_WAIT || cmd == FUTEX_LOCK_PI)) { if (copy_from_user(&ts, utime, sizeof(ts)) != 0) return -EFAULT; if (!timespec_valid(&ts)) return -EINVAL; t = timespec_to_ktime(ts); if (cmd == FUTEX_WAIT) t = ktime_add(ktime_get(), t); tp = &t; } [...] return do_futex(uaddr, op, val, tp, uaddr2, val2, val3); } [...] long do_futex(u32 __user *uaddr, int op, u32 val, ktime_t *timeout, u32 __user *uaddr2, u32 val2, u32 val3) { int ret; int cmd = op & FUTEX_CMD_MASK; struct rw_semaphore *fshared = NULL; if (!(op & FUTEX_PRIVATE_FLAG)) fshared = ¤t->mm->mmap_sem; switch (cmd) { case FUTEX_WAIT: ret = futex_wait(uaddr, fshared, val, timeout); [...] static int futex_wait(u32 __user *uaddr, struct rw_semaphore *fshared, u32 val, ktime_t *abs_time) { [...] struct restart_block *restart; restart = ¤t_thread_info()->restart_block; restart->fn = futex_wait_restart; restart->arg0 = (unsigned long)uaddr; restart->arg1 = (unsigned long)val; restart->arg2 = (unsigned long)abs_time; restart->arg3 = 0; if (fshared) restart->arg3 |= ARG3_SHARED; return -ERESTART_RESTARTBLOCK; [...] static long futex_wait_restart(struct restart_block *restart) { u32 __user *uaddr = (u32 __user *)restart->arg0; u32 val = (u32)restart->arg1; ktime_t *abs_time = (ktime_t *)restart->arg2; struct rw_semaphore *fshared = NULL; restart->fn = do_no_restart_syscall; if (restart->arg3 & ARG3_SHARED) fshared = ¤t->mm->mmap_sem; return (long)futex_wait(uaddr, fshared, val, abs_time); } So when the futex_wait is interrupt by a signal we break out of the hrtimer code and set up or return from signal. This code does not return back to userspace, so we set up a RESTARTBLOCK. The bug here is that we save the "abs_time" which is a pointer to the stack variable "ktime_t t" from sys_futex. This returns and unwinds the stack before we get to call our signal. On return from the signal we go to futex_wait_restart, where we update all the parameters for futex_wait and call it. But here we have a problem where abs_time is no longer valid. I verified this with print statements, and sure enough, what abs_time was set to ends up being garbage when we get to futex_wait_restart. The solution I did to solve this (with input from Linus Torvalds) was to add unions to the restart_block to allow system calls to use the restart with specific parameters. This way the futex code now saves the time in a 64bit value in the restart block instead of storing it on the stack. Note: I'm a bit nervious to add "linux/types.h" and use u32 and u64 in thread_info.h, when there's a #ifdef __KERNEL__ just below that. Not sure what that is there for. If this turns out to be a problem, I've tested this with using "unsigned int" for u32 and "unsigned long long" for u64 and it worked just the same. I'm using u32 and u64 just to be consistent with what the futex code uses. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
-
Karsten Keil authored
patch 0f13864e in mainline. Addresses http://bugzilla.kernel.org/show_bug.cgi?id=9416Signed-off-by: Karsten Keil <kkeil@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
-
H. Peter Anvin authored
patch 7ed19290 in mainline. The 386 and 486 needs a jump immediately after setting %cr0 in order to serialize the pipeline. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
-
Eddie Dong authored
patch 8668a3c4 in mainline. Resetting an SMP guest will force AP enter real mode (RESET) with paging enabled in protected mode. While current enter_rmode() can only handle mode switch from nonpaging mode to real mode which leads to SMP reboot failure. Fix by reloading the mmu context on entering real mode. Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com> Signed-off-by: Qing He <qing.he@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
-
Avi Kivity authored
patch 78f78268 in mainline. When resetting from userspace, we need to handle the flags being cleared even after we are in real mode. Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
-
Avi Kivity authored
patch 0967b7bf in mainline. If we defer updating rip until pio instructions are executed, we have a problem with reset: a pio reset updates rip, and when the instruction completes we skip the emulated instruction, pointing rip somewhere completely unrelated. Fix by updating rip when we see decode the instruction, not after emulation. Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
-
Amit Shah authored
patch 404fb881 in mainline. The clts code didn't use set_cr0 properly, so our lazy FPU processing wasn't being done by the clts instruction at all. (this isn't called on Intel as the hardware does the decode for us) Signed-off-by: Amit Shah <amit.shah@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
-
Marko Kohtala authored
This is not in mainline, as it was fixed differently in that tree. first_cpu(cpus) returns the only CPU when NR_CPUS is 1 regardless of the cpus mask. Therefore we avoid a kernel hang in KVM_SET_MEMORY_REGION ioctl on uniprocessor by not entering the loop at all. Signed-off-by: Marko Kohtala <marko.kohtala@gmail.com> Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
-
Amit Shah authored
patch 00b2ef47 in mainline. emulator_write_std() is not implemented, and calling write_emulated should work just as well in place of write_std. Fixes emulator failures with the push r/m instruction. Signed-off-by: Amit Shah <amit.shah@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
-
Avi Kivity authored
patch cf5a94d1 in mainline. 'invd' can destroy host data, and 'wbinvd' allows the guest to induce long (milliseconds) latencies. Noted by Ben Serebrin. Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
-
Avi Kivity authored
patch 651a3e29 in mainline. Emulate the 'invd' instruction (opcode 0f 08). Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
-