- 30 Sep, 2009 1 commit
-
-
Nils Carlson authored
places, this is simply a cleanup of the patch. Signed-off-by: Nils Carlson <nils.carlson@ludd.ltu.se> Signed-off-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
- 13 Nov, 2009 2 commits
-
-
Karel Zak authored
allocated for the header. The HeaderSize field must be greater than 92 (= sizeof(struct gpt_header) and must be less than or equal to the logical block size. It means we have to read whole sector with the header, because the header crc32 checksum is calculated according to HeaderSize. For more details see UEFI standard (version 2.3, May 2009): - 5.3.1 GUID Format overview, page 93 - Table 13. GUID Partition Table Header, page 96 Signed-off-by: Karel Zak <kzak@redhat.com> Cc: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Karel Zak authored
That's wrong. UEFI standard (version 2.3, May 2009, 5.3.1 GUID Format overview, page 95) defines that LBA is always based on the logical block size. It means bdev_logical_block_size() (aka BLKSSZGET) for Linux. This patch removes static sector size from EFI GPT parser. The problem is reproducible with the latest GNU Parted: # modprobe scsi_debug dev_size_mb=50 sector_size=4096 # ./parted /dev/sdb print Model: Linux scsi_debug (scsi) Disk /dev/sdb: 52.4MB Sector size (logical/physical): 4096B/4096B Partition Table: gpt Number Start End Size File system Name Flags 1 24.6kB 3002kB 2978kB primary 2 3002kB 6001kB 2998kB primary 3 6001kB 9003kB 3002kB primary # blockdev --rereadpt /dev/sdb # dmesg | tail -1 sdb: unknown partition table <---- !!! with this patch: # blockdev --rereadpt /dev/sdb # dmesg | tail -1 sdb: sdb1 sdb2 sdb3 Signed-off-by: Karel Zak <kzak@redhat.com> Cc: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
- 12 Nov, 2009 1 commit
-
-
Rakib Mullick authored
!MODULE. So, #ifdef MODULE becomes obsolete. Also move the declaration moxa_board_conf at the start of the function, since we were hit by the following warning. drivers/char/moxa.c: In function `moxa_init': drivers/char/moxa.c:1040: warning: ISO C90 forbids mixed declarations and code Signed-off-by: Rakib Mullick<rakib.mullick@gmail.com> Cc: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
- 16 Oct, 2009 1 commit
-
-
Roel Kluin authored
Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
- 09 Oct, 2009 1 commit
-
-
Joe Perches authored
Convert printks to pr_<level> Convert some embedded function names to %s...__func__ Remove a period after exclamation points. Remove #define pr_dbg which could be used by future kernel.h includes Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
- 30 Sep, 2009 1 commit
-
-
Jiri Slaby authored
tty is dereferenced earlier, the test is superfluous. Remove it. Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Cc: Greg KH <greg@kroah.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
- 24 Aug, 2009 1 commit
-
-
Bartlomiej Zolnierkiewicz authored
Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
- 09 Nov, 2009 1 commit
-
-
Christoph Hellwig authored
USE_ELF_CORE_DUMP. The microblaze omission seems like an error to me, so let's kill this ifdef and make sure we are the same everywhere. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: <linux-arch@vger.kernel.org> Cc: Michal Simek <michal.simek@petalogix.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
- 29 Sep, 2009 1 commit
-
-
Julia Lawall authored
constants ending in _STATE are compared to the state variable. Signed-off-by: Julia Lawall <julia@diku.dk> Cc: Corey Minyard <minyard@acm.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
- 09 Nov, 2009 7 commits
-
-
Manfred Spraul authored
current code scans all decrement operations, even if the semaphore value is already 0. The patch optimizes that: if the semaphore value is 0, then there is no need to scan the q->alter entries. Note that this is a common case: It happens if 100 decrements by one are pending and now an increment by one increases the semaphore value from 0 to 1. Without this patch, all 100 entries are scanned. With the patch, only one entry is scanned, then woken up. Then the new rule triggers and the scanning is aborted, without looking at the remaining 99 tasks. With this patch, single sop increment/decrement by 1 are now O(1). (same as with Nick's patch) Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Cc: Nick Piggin <npiggin@suse.de> Cc: Pierre Peiffer <peifferp@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Manfred Spraul authored
semaphores. Atomic operations that affect multiple semaphores are supported. The patch optimizes single semaphore operation calls that affect only one semaphore: It's not necessary to scan all pending operations, it is sufficient to scan the per-semaphore list. The idea is from Nick Piggin version of an ipc sem improvement, the implementation is different: The code tries to keep as much common code as possible. As the result, the patch is simpler, but optimizes fewer cases. Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Cc: Nick Piggin <npiggin@suse.de> Cc: Pierre Peiffer <peifferp@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Manfred Spraul authored
sysv sem has the concept of semaphore arrays that consist out of multiple semaphores. Atomic operations that affect multiple semaphores are supported. The patch is the first step for optimizing simple, single semaphore operations: In addition to the global list of all pending operations, a 2nd, per-semaphore list with the simple operations is added. Note: this patch does not make sense by itself, the new list is used nowhere. Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Cc: Nick Piggin <npiggin@suse.de> Cc: Pierre Peiffer <peifferp@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Manfred Spraul authored
If try_atomic_semop failed, then no changes were applied. Thus no need to restart. Additionally, this patch correct an incorrect comment: It's possible to wait for arbitrary semaphore values (do a dec by <x>, wait-for-zero, inc by <x> in one atomic operation) Both changes are from Nick Piggin, the patch is the result of a different split of the individual changes. Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Cc: Nick Piggin <npiggin@suse.de> Cc: Pierre Peiffer <peifferp@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Nick Piggin authored
involved, which could deadlock if preemption is enabled during the "lock". It is an implementation detail (due to a spinlock being held) that this is actually the case. However if "spinlocks" are made preemptible, or if the sem lock is changed to a sleeping lock for example, then the wakeup would become buggy. So this might be a bugfix for -rt kernels. Imagine waker being preempted by wakee and never clearing IN_WAKEUP -- if wakee has higher RT priority then there is a priority inversion deadlock. Even if there is not a priority inversion to cause a deadlock, then there is still time wasted spinning. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Cc: Pierre Peiffer <peifferp@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Nick Piggin authored
list_for_each_entry macros. list_for_each_entry_safe() must be used, because list entries can disappear immediately uppon the wakeup event. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Cc: Pierre Peiffer <peifferp@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Nick Piggin authored
sysv sem algorithm: Most (at least: some important) users only use simple semaphore operations, therefore it's worthwile to optimize this use case. This patch: Move last looked up sem_undo struct to the head of the task's undo list. Attempt to move common entries to the front of the list so search time is reduced. This reduces lookup_undo on oprofile of problematic SAP workload by 30% (see patch 4 for a description of SAP workload). Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Cc: Pierre Peiffer <peifferp@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
- 13 Oct, 2009 1 commit
-
-
Serge E. Hallyn authored
7ca7e564 "ipc: store ipcs into IDRs" in 2007. The idr of which 3 exist for each ipc namespace is never freed. This patch simply frees them when the ipcns is freed. I don't believe any idr_remove() are done from rcu (and could therefore be delayed until after this idr_destroy()), so the patch should be safe. Some quick testing showed no harm, and the memory leak fixed. Caught by kmemleak. Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
- 09 Nov, 2009 1 commit
-
-
Oleg Nesterov authored
Currently we add sub-threads to ->real_parent->children list. This buys nothing but slows down do_wait(). With this patch ->children contains only main threads (group leaders). The only complication is that forget_original_parent() should iterate over sub-threads by hand, and de_thread() needs another list_replace() when it changes ->group_leader. Henceforth do_wait_thread() can never see task_detached() && !EXIT_DEAD tasks, we can remove this check (and we can unify do_wait_thread() and ptrace_do_wait()). This change can confuse the optimistic search in mm_update_next_owner(), but this is fixable and minor. Perhaps badness() and oom_kill_process() should be updated, but they should be fixed in any case. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Roland McGrath <roland@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Ratan Nalumasu <rnalumasu@gmail.com> Cc: Vitaly Mayatskikh <vmayatsk@redhat.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
- 30 Oct, 2009 1 commit
-
-
Oleg Nesterov authored
->group_stop_count condition visible to tracers before do_signal_stop() will participate in this group-stop. Currently the patch has no effect, tracehook_get_signal() always returns 0. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
- 16 Oct, 2009 3 commits
-
-
Oleg Nesterov authored
Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Roland McGrath <roland@redhat.com> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Oleg Nesterov authored
This is a bit confusing, we don't know the source of this signal. But we don't care, and "info->si_code = 0" is imho worse. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Roland McGrath <roland@redhat.com> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Oleg Nesterov authored
triggers the "from_ancestor_ns" check. This fixes reparent_thread()->group_send_sig_info(pdeath_signal) behaviour, before this patch send_signal() does not detect the cross-namespace case when the child of the dying parent belongs to the sub-namespace. This patch can affect the behaviour of send_sig(), kill_pgrp() and kill_pid() when the caller sends the signal to the sub-namespace with "priv == 0" but surprisingly all callers seem to use them correctly, including disassociate_ctty(on_exit). Except: drivers/staging/comedi/drivers/addi-data/*.c incorrectly use send_sig(priv => 0). But his is minor and should be fixed anyway. Reported-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Roland McGrath <roland@redhat.com> Reviewed-by: Sukadev Bhattiprolu <sukadev@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
- 11 Nov, 2009 1 commit
-
-
Oleg Nesterov authored
and changes check_kill_permission() to use this helper. The real effect of this patch is that from now we "officially" consider SEND_SIG_NOINFO signal as "from user-space" signals. This is already true if we look at the code which uses SEND_SIG_NOINFO, except __send_signal() has another opinion - see the next patch. The naming of these special SEND_SIG_XXX siginfo's is really bad imho. From __send_signal()'s pov they mean SEND_SIG_NOINFO from user SEND_SIG_PRIV from kernel SEND_SIG_FORCED no info Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Roland McGrath <roland@redhat.com> Reviewed-by: Sukadev Bhattiprolu <sukadev@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
- 12 Nov, 2009 5 commits
-
-
Oleg Nesterov authored
Unlike powepc, x86 always calls tracehook_report_syscall_exit(step) with step = 0, and sends the trap by hand. This results in unnecessary SIGTRAP when PTRACE_SINGLESTEP follows the syscall-exit stop. Change syscall_trace_leave() to pass the correct "step" argument to tracehook and remove the send_sigtrap() logic. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Roland McGrath <roland@redhat.com> Cc: <linux-arch@vger.kernel.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Oleg Nesterov authored
Implement user_single_step_siginfo() for x86. Extract this code from send_sigtrap(). Since x86 calls tracehook_report_syscall_exit(step => 0) the new helper is not used yet. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Roland McGrath <roland@redhat.com> Cc: <linux-arch@vger.kernel.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Oleg Nesterov authored
Change tracehook_report_syscall_exit() to look at step flag and send the trap signal if needed. This change affects ia64, microblaze, parisc, powerpc, sh. They pass nonzero "step" argument to tracehook but since it was ignored the tracee reports via ptrace_notify(), this is not right and not consistent. - PTRACE_SETSIGINFO doesn't work - if the tracer resumes the tracee with signr != 0 the new signal is generated rather than delivering it - If PT_TRACESYSGOOD is set the tracee reports the wrong exit_code I don't have a powerpc machine, but I think this test-case should see the difference: #include <unistd.h> #include <sys/ptrace.h> #include <sys/wait.h> #include <assert.h> #include <stdio.h> int main(void) { int pid, status; if (!(pid = fork())) { assert(ptrace(PTRACE_TRACEME) == 0); kill(getpid(), SIGSTOP); getppid(); return 0; } assert(pid == wait(&status)); assert(ptrace(PTRACE_SETOPTIONS, pid, 0, PTRACE_O_TRACESYSGOOD) == 0); assert(ptrace(PTRACE_SYSCALL, pid, 0,0) == 0); assert(pid == wait(&status)); assert(ptrace(PTRACE_SINGLESTEP, pid, 0,0) == 0); assert(pid == wait(&status)); if (status == 0x57F) return 0; printf("kernel bug: status=%X shouldn't have 0x80\n", status); return 1; } Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Roland McGrath <roland@redhat.com> Cc: <linux-arch@vger.kernel.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Oleg Nesterov authored
Implement user_single_step_siginfo() for powerpc. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Roland McGrath <roland@redhat.com> Cc: <linux-arch@vger.kernel.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Oleg Nesterov authored
Currently there is no way to synthesize a single-stepping trap in the arch-independent manner. This patch adds the default helper which fills siginfo_t, arch/ can can override it. Architetures which implement user_enable_single_step() should add user_single_step_siginfo() also. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Roland McGrath <roland@redhat.com> Cc: <linux-arch@vger.kernel.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
- 11 Nov, 2009 1 commit
-
-
Oleg Nesterov authored
starts with TIF_SINGLESTEP/X86_EFLAGS_TF bits copied from ptraced parent. This is not right, especially when the new child is not auto-attaced: in this case it is killed by SIGTRAP. Change copy_process() to call user_disable_single_step(). Tested on x86. Test-case: #include <stdio.h> #include <unistd.h> #include <signal.h> #include <sys/ptrace.h> #include <sys/wait.h> #include <assert.h> int main(void) { int pid, status; if (!(pid = fork())) { assert(ptrace(PTRACE_TRACEME) == 0); kill(getpid(), SIGSTOP); if (!fork()) { /* kernel bug: this child will be killed by SIGTRAP */ printf("Hello world\n"); return 43; } wait(&status); return WEXITSTATUS(status); } for (;;) { assert(pid == wait(&status)); if (WIFEXITED(status)) break; assert(ptrace(PTRACE_SINGLESTEP, pid, 0,0) == 0); } assert(WEXITSTATUS(status) == 43); return 0; } Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
- 30 Oct, 2009 1 commit
-
-
Oleg Nesterov authored
ptrace_init_task() looks confusing, as if we always auto-attach when "bool ptrace" argument is true, while in fact we attach only if current is traced. Make the code more explicit and kill now unused ptrace_link(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
- 12 Nov, 2009 3 commits
-
-
Daisuke Nishimura authored
caused by race between oom and cpuset_attach) instead of cgroup_mutex to fix a deadlock problem. The cgroup_mutex, which was removed by the commit, in mem_cgroup_out_of_memory() was originally introduced at commit c7ba5c9e (Memory controller: OOM handling). IIUC, the intention of this cgroup_mutex was to prevent task move during select_bad_process() so that situations like below can be avoided. Assume cgroup "foo" has exceeded its limit and is about to trigger oom. 1. Process A, which has been in cgroup "baa" and uses large memory, is just moved to cgroup "foo". Process A can be the candidates for being killed. 2. Process B, which has been in cgroup "foo" and uses large memory, is just moved from cgroup "foo". Process B can be excluded from the candidates for being killed. But these race window exists anyway even if we hold a lock, because __mem_cgroup_try_charge() decides wether it should trigger oom or not outside of the lock. So the original cgroup_mutex in mem_cgroup_out_of_memory and thus current memcg_tasklist has no use. And IMHO, those races are not so critical for users. This patch removes it and make codes simpler. Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Daisuke Nishimura authored
failure. IMHO, charge/uncharge(especially charge) is high cost operation, so we should avoid it as far as possible. This patch tries to delay try_charge in mem_cgroup_move_parent() by re-ordering checks it does. And this patch renames mem_cgroup_move_account() to __mem_cgroup_move_account(), changes the return value of __mem_cgroup_move_account() from int to void, and adds a new wrapper(mem_cgroup_move_account()), which checks whether a @pc is valid for moving account and calls __mem_cgroup_move_account(). This patch removes the last caller of trylock_page_cgroup(), so removes its definition too. Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
Daisuke Nishimura authored
cancel the charge and the refcnt we have got by mem_cgroup_tyr_charge(). This patch introduces mem_cgroup_cancel_charge() and call it in those places. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Reviewed-by: Balbir Singh <balbir@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
- 09 Nov, 2009 1 commit
-
-
KAMEZAWA Hiroyuki authored
grep difficult. Replace memcg's MAPPED_FILE with FILE_MAPPED And in global VM, mapped shared memory is accounted into FILE_MAPPED. But memcg doesn't. fix it. Note: page_is_file_cache() just checks SwapBacked or not. So, we need to check PageAnon. Cc: Balbir Singh <balbir@in.ibm.com> Reviewed-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
- 16 Oct, 2009 1 commit
-
-
Daisuke Nishimura authored
actually lead to a BUG. Just do it once in initialization. Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: Balbir Singh <balbir@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
- 10 Oct, 2009 2 commits
-
-
Andrew Morton authored
Cc: Balbir Singh <balbir@in.ibm.com> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
KAMEZAWA Hiroyuki authored
caching. At charge, memcg charges 64pages and remember it in percpu cache. Because it's cache, drain/flush if necessary. This version uses public percpu area. 2 benefits for using public percpu area. 1. Sum of stocked charge in the system is limited to # of cpus not to the number of memcg. This shows better synchonization. 2. drain code for flush/cpuhotplug is very easy (and quick) The most important point of this patch is that we never touch res_counter in fast path. The res_counter is system-wide shared counter which is modified very frequently. We shouldn't touch it as far as we can for avoiding false sharing. On x86-64 8cpu server, I tested overheads of memcg at page fault by running a program which does map/fault/unmap in a loop. Running a task per a cpu by taskset and see sum of the number of page faults in 60secs. [without memcg config] 40156968 page-faults # 0.085 M/sec ( +- 0.046% ) 27.67 cache-miss/faults [root cgroup] 36659599 page-faults # 0.077 M/sec ( +- 0.247% ) 31.58 cache miss/faults [in a child cgroup] 18444157 page-faults # 0.039 M/sec ( +- 0.133% ) 69.96 cache miss/faults [ + coalescing uncharge patch] 27133719 page-faults # 0.057 M/sec ( +- 0.155% ) 47.16 cache miss/faults [ + coalescing uncharge patch + this patch ] 34224709 page-faults # 0.072 M/sec ( +- 0.173% ) 34.69 cache miss/faults Changelog (since Oct/2): - updated comments - replaced get_cpu_var() with __get_cpu_var() if possible. - removed mutex for system-wide drain. adds a counter instead of it. - removed CONFIG_HOTPLUG_CPU Changelog (old): - rebased onto the latest mmotm - moved charge size check before __GFP_WAIT check for avoiding unnecesary - added asynchronous flush routine. - fixed bugs pointed out by Nishimura-san. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
- 09 Nov, 2009 1 commit
-
-
KAMEZAWA Hiroyuki authored
bottleneck. One strong techinque to reduce lock contention is reducing calls by coalescing some amount of calls into one. Considering charge/uncharge chatacteristic, - charge is done one by one via demand-paging. - uncharge is done by - in chunk at munmap, truncate, exit, execve... - one by one via vmscan/paging. It seems we have a chance to coalesce uncharges for improving scalability at unmap/truncation. This patch is a for coalescing uncharge. For avoiding scattering memcg's structure to functions under /mm, this patch adds memcg batch uncharge information to the task. A reason for per-task batching is for making use of caller's context information. We do batched uncharge (deleyed uncharge) when truncation/unmap occurs but do direct uncharge when uncharge is called by memory reclaim (vmscan.c). The degree of coalescing depends on callers - at invalidate/trucate... pagevec size - at unmap ....ZAP_BLOCK_SIZE (memory itself will be freed in this degree.) Then, we'll not coalescing too much. On x86-64 8cpu server, I tested overheads of memcg at page fault by running a program which does map/fault/unmap in a loop. Running a task per a cpu by taskset and see sum of the number of page faults in 60secs. [without memcg config] 40156968 page-faults # 0.085 M/sec ( +- 0.046% ) 27.67 cache-miss/faults [root cgroup] 36659599 page-faults # 0.077 M/sec ( +- 0.247% ) 31.58 miss/faults [in a child cgroup] 18444157 page-faults # 0.039 M/sec ( +- 0.133% ) 69.96 miss/faults [child with this patch] 27133719 page-faults # 0.057 M/sec ( +- 0.155% ) 47.16 miss/faults We can see some amounts of improvement. (root cgroup doesn't affected by this patch) Another patch for "charge" will follow this and above will be improved more. Changelog(since 2009/10/02): - renamed filed of memcg_batch (as pages to bytes, memsw to memsw_bytes) - some clean up and commentary/description updates. - added initialize code to copy_process(). (possible bug fix) Changelog(old): - fixed !CONFIG_MEM_CGROUP case. - rebased onto the latest mmotm + softlimit fix patches. - unified patch for callers - added commetns. - make ->do_batch as bool. - removed css_get() at el. We don't need it. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-
- 10 Nov, 2009 1 commit
-
-
Randy Dunlap authored
should also be updated. Remove reference to the OSDL PLM build farm. Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
-