1. 06 Oct, 2009 4 commits
  2. 13 Oct, 2009 1 commit
  3. 24 Sep, 2009 1 commit
    • Oleg Nesterov's avatar
      Thanks to Roland who pointed out de_thread() issues. · 7b9f1838
      Oleg Nesterov authored
      Currently we add sub-threads to ->real_parent->children list.  This buys
      nothing but slows down do_wait().
      
      With this patch ->children contains only main threads (group leaders). 
      The only complication is that forget_original_parent() should iterate over
      sub-threads by hand, and de_thread() needs another list_replace() when it
      changes ->group_leader.
      
      Henceforth do_wait_thread() can never see task_detached() && !EXIT_DEAD
      tasks, we can remove this check (and we can unify do_wait_thread() and
      ptrace_do_wait()).
      
      This change can confuse the optimistic search in mm_update_next_owner(),
      but this is fixable and minor.
      
      Perhaps badness() and oom_kill_process() should be updated, but they
      should be fixed in any case.
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Ratan Nalumasu <rnalumasu@gmail.com>
      Cc: Vitaly Mayatskikh <vmayatsk@redhat.com>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      7b9f1838
  4. 25 Sep, 2009 1 commit
    • Roland McGrath's avatar
      This adds the utrace facility, a new modular interface in the kernel for · 582156fe
      Roland McGrath authored
      implementing user thread tracing and debugging.  This fits on top of the
      tracehook_* layer, so the new code is well-isolated.
      
      The new interface is in <linux/utrace.h> and the DocBook utrace book
      describes it.  It allows for multiple separate tracing engines to work in
      parallel without interfering with each other.  Higher-level tracing
      facilities can be implemented as loadable kernel modules using this layer.
      
      The new facility is made optional under CONFIG_UTRACE.  When this is not
      enabled, no new code is added.  It can only be enabled on machines that
      have all the prerequisites and select CONFIG_HAVE_ARCH_TRACEHOOK.
      
      In this initial version, utrace and ptrace do not play together at all. 
      If ptrace is attached to a thread, the attach calls in the utrace kernel
      API return -EBUSY.  If utrace is attached to a thread, the PTRACE_ATTACH
      or PTRACE_TRACEME request will return EBUSY to userland.  The old ptrace
      code is otherwise unchanged and nothing using ptrace should be affected by
      this patch as long as utrace is not used at the same time.  In the future
      we can clean up the ptrace implementation and rework it to use the utrace
      API.
      
      [oleg@redhat.com: kill exclude_xtrace logic]
      Signed-off-by: default avatarRoland McGrath <roland@redhat.com>
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      582156fe
  5. 30 Oct, 2009 1 commit
  6. 16 Oct, 2009 4 commits
  7. 30 Oct, 2009 1 commit
  8. 28 Oct, 2009 1 commit
  9. 16 Oct, 2009 1 commit
  10. 10 Oct, 2009 3 commits
    • Andrew Morton's avatar
      tweak comments · 9c63e80b
      Andrew Morton authored
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      9c63e80b
    • KAMEZAWA Hiroyuki's avatar
      This is a patch for coalescing access to res_counter at charging by percpu · 8c20eaa6
      KAMEZAWA Hiroyuki authored
      caching.  At charge, memcg charges 64pages and remember it in percpu
      cache.  Because it's cache, drain/flush if necessary.
      
      This version uses public percpu area.
       2 benefits for using public percpu area.
       1. Sum of stocked charge in the system is limited to # of cpus
          not to the number of memcg. This shows better synchonization.
       2. drain code for flush/cpuhotplug is very easy (and quick)
      
      The most important point of this patch is that we never touch res_counter
      in fast path. The res_counter is system-wide shared counter which is modified
      very frequently. We shouldn't touch it as far as we can for avoiding
      false sharing.
      
      On x86-64 8cpu server, I tested overheads of memcg at page fault by
      running a program which does map/fault/unmap in a loop. Running
      a task per a cpu by taskset and see sum of the number of page faults
      in 60secs.
      
      [without memcg config]
        40156968  page-faults              #      0.085 M/sec   ( +-   0.046% )
        27.67 cache-miss/faults
      
      [root cgroup]
        36659599  page-faults              #      0.077 M/sec   ( +-   0.247% )
        31.58 cache miss/faults
      
      [in a child cgroup]
        18444157  page-faults              #      0.039 M/sec   ( +-   0.133% )
        69.96 cache miss/faults
      
      [ + coalescing uncharge patch]
        27133719  page-faults              #      0.057 M/sec   ( +-   0.155% )
        47.16 cache miss/faults
      
      [ + coalescing uncharge patch + this patch ]
        34224709  page-faults              #      0.072 M/sec   ( +-   0.173% )
        34.69 cache miss/faults
      
      Changelog (since Oct/2):
        - updated comments
        - replaced get_cpu_var() with __get_cpu_var() if possible.
        - removed mutex for system-wide drain. adds a counter instead of it.
        - removed CONFIG_HOTPLUG_CPU
      
      Changelog (old):
        - rebased onto the latest mmotm
        - moved charge size check before __GFP_WAIT check for avoiding unnecesary
        - added asynchronous flush routine.
        - fixed bugs pointed out by Nishimura-san.
      Signed-off-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      8c20eaa6
    • KAMEZAWA Hiroyuki's avatar
      In massive parallel enviroment, res_counter can be a performance · ec1d6cb0
      KAMEZAWA Hiroyuki authored
      bottleneck.  One strong techinque to reduce lock contention is reducing
      calls by coalescing some amount of calls into one.
      
      Considering charge/uncharge chatacteristic,
      	- charge is done one by one via demand-paging.
      	- uncharge is done by
      		- in chunk at munmap, truncate, exit, execve...
      		- one by one via vmscan/paging.
      
      It seems we have a chance to coalesce uncharges for improving scalability
      at unmap/truncation.
      
      This patch is a for coalescing uncharge.  For avoiding scattering memcg's
      structure to functions under /mm, this patch adds memcg batch uncharge
      information to the task.  A reason for per-task batching is for making use
      of caller's context information.  We do batched uncharge (deleyed
      uncharge) when truncation/unmap occurs but do direct uncharge when
      uncharge is called by memory reclaim (vmscan.c).
      
      The degree of coalescing depends on callers
        - at invalidate/trucate... pagevec size
        - at unmap ....ZAP_BLOCK_SIZE
      (memory itself will be freed in this degree.)
      Then, we'll not coalescing too much.
      
      On x86-64 8cpu server, I tested overheads of memcg at page fault by
      running a program which does map/fault/unmap in a loop. Running
      a task per a cpu by taskset and see sum of the number of page faults
      in 60secs.
      
      [without memcg config]
        40156968  page-faults              #      0.085 M/sec   ( +-   0.046% )
        27.67 cache-miss/faults
      [root cgroup]
        36659599  page-faults              #      0.077 M/sec   ( +-   0.247% )
        31.58 miss/faults
      [in a child cgroup]
        18444157  page-faults              #      0.039 M/sec   ( +-   0.133% )
        69.96 miss/faults
      [child with this patch]
        27133719  page-faults              #      0.057 M/sec   ( +-   0.155% )
        47.16 miss/faults
      
      We can see some amounts of improvement.
      (root cgroup doesn't affected by this patch)
      Another patch for "charge" will follow this and above will be improved more.
      
      Changelog(since 2009/10/02):
       - renamed filed of memcg_batch (as pages to bytes, memsw to memsw_bytes)
       - some clean up and commentary/description updates.
       - added initialize code to copy_process(). (possible bug fix)
      
      Changelog(old):
       - fixed !CONFIG_MEM_CGROUP case.
       - rebased onto the latest mmotm + softlimit fix patches.
       - unified patch for callers
       - added commetns.
       - make ->do_batch as bool.
       - removed css_get() at el. We don't need it.
      Signed-off-by: default avatarKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      ec1d6cb0
  11. 25 Sep, 2009 1 commit
  12. 09 Oct, 2009 2 commits
  13. 25 Sep, 2009 2 commits
  14. 21 Nov, 2008 1 commit
    • Warren Turkal's avatar
      This is a patchset to change the way that the HFS+ filesystem detects · 4668b93f
      Warren Turkal authored
      whether a volume has a journal or not.
      
      The code currently mounts an HFS+ volume read-only by default when a
      journal is detected.  One can force a read/write mount by giving the
      "force" mount option.  The current code has this behavior since there is
      no support for the HFS+ journal.
      
      My problem is that the detection of the journal could be better.  The
      current code tests the attribute bit in the volume header that indicates
      there is a journal.  If that bit is set, the code assumes that there is a
      journal.
      
      Unfortunately, this is not enough to really determine if there is a
      journal or not.  When that bit is set, one must also examine the journal
      info block field of the volume header.  If this field is 0, there is no
      journal, and the volume can be mounted read/write.
      
      
      This patch:
      
      The HFS+ support in the kernel currently will mount an HFS+ volume
      read-only if the volume header has the attribute bit set that indicates
      there is a journal.  The kernel does this because there is no support for
      a journalled HFS+ volume.
      
      The problem is that this is only half of what needs to be checked to see
      if there really is a journal.  There is also an entry in the volume header
      that tells you where to find the journal info block.  In the kernel
      version of the kernel, this 4 byte block is labeled reserved.  This patch
      identifies the journal info block in the header.
      Signed-off-by: default avatarWarren Turkal <wt@penguintechs.org>
      Cc: Roman Zippel <zippel@linux-m68k.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      4668b93f
  15. 16 Oct, 2009 1 commit
  16. 13 Jul, 2009 1 commit
  17. 14 Oct, 2009 1 commit
  18. 24 Sep, 2009 1 commit
  19. 14 Oct, 2009 2 commits
  20. 24 Sep, 2009 1 commit
  21. 25 Sep, 2009 1 commit
  22. 14 Oct, 2009 1 commit
  23. 16 Oct, 2009 1 commit
  24. 27 Aug, 2009 1 commit
    • Anton Vorontsov's avatar
      RTC core won't allow wakeup alarms to be set if RTC devices' parent (i.e. · b031ad60
      Anton Vorontsov authored
      i2c_client or spi_device) isn't wakeup capable.
      
      For I2C devices there is I2C_CLIENT_WAKE flag exists that we can pass via
      board info, and if set, I2C core will initialize wakeup capability.  For
      SPI devices there is no such flag at all.
      
      I believe that it's not platform code responsibility to allow or disallow
      wakeups, instead, drivers themselves should set the capability if a device
      can trigger wakeups.
      
      That's what drivers/base/power/sysfs.c says:
      
       * It is the responsibility of device drivers to enable (or disable)
       * wakeup signaling as part of changing device power states, respecting
       * the policy choices provided through the driver model.
      
      I2C and SPI RTC devices send wakeup events via interrupt lines, so we
      should set the wakeup capability if IRQ is routed.
      
      Ideally we should also check irq for wakeup capability before setting
      device's capability, i.e.
      
      	if (can_irq_wake(irq))
      		device_set_wakeup_capable(&client->dev, 1);
      
      But there is no can_irq_wake() call exist, and it is not that trivial to
      implement it for all interrupts controllers and complex/cascaded setups.
      
      drivers/base/power/sysfs.c also covers these cases:
      
       * Devices may not be able to generate wakeup events from all power
       * states.  Also, the events may be ignored in some configurations;
       * for example, they might need help from other devices that aren't
       * active
      
      So there is no guarantee that wakeup will actually work, and so I think
      there is no point in being pedantic wrt checking IRQ wakeup capability.
      Signed-off-by: default avatarAnton Vorontsov <avorontsov@ru.mvista.com>
      Cc: David Brownell <dbrownell@users.sourceforge.net>
      Cc: Ben Dooks <ben-linux@fluff.org>
      Cc: Jean Delvare <khali@linux-fr.org>
      Cc: Alessandro Zummo <a.zummo@towertech.it>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      b031ad60
  25. 16 Oct, 2009 2 commits
    • Atsushi Nemoto's avatar
      The to_platform_device macro itself uses container_of macro. Nested use · f5298547
      Atsushi Nemoto authored
      of container_of macro causes following sparse warnings:
      
      rtc-ds1553.c:259:3: warning: symbol '__mptr' shadows an earlier one
      rtc-ds1553.c:259:3: originally declared here
      Signed-off-by: default avatarAtsushi Nemoto <anemo@mba.ocn.ne.jp>
      Cc: Alessandro Zummo <alessandro.zummo@towertech.it>
      Cc: David Brownell <david-b@pacbell.net>
      Cc: Andrew Sharp <andy.sharp@lsi.com>
      Cc: Thomas Hommel <thomas.hommel@gefanuc.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      f5298547
    • Atsushi Nemoto's avatar
      - Call dev_set_drvdata before rtc device creation. · 0dc9e7c9
      Atsushi Nemoto authored
      - Use its own spinlock instead of rtc->irq_lock.  Because pdata->rtc
        must be initialized to use the irq_lock (pdata->rtc->irq_lock).  There
        is a small window which rtc methods can be called before pdata->rtc is
        initialized.
      
        And there is no need use the irq_lock to protect hardware registers.
        The driver's own spinlock shoule be enough.
      
      - Check pdata->rtc before calling rtc_update_irq.
      
      - Use alarm_irq_enable and remove ioctl routine.
      
      - Use devres APIs and simplify error/remove path.
      
      These fixes are ported from ds1553 driver and just compile-tested only.
      Signed-off-by: default avatarAtsushi Nemoto <anemo@mba.ocn.ne.jp>
      Cc: Alessandro Zummo <alessandro.zummo@towertech.it>
      Cc: Thomas Hommel <thomas.hommel@gefanuc.com>
      Cc: David Brownell <david-b@pacbell.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      0dc9e7c9
  26. 12 Oct, 2009 3 commits