Commits · 310a2c1012934f590192377f65940cad4aa72b15 · linux / linux-davinci

09 Oct, 2008 33 commits

Tejun Heo authored Aug 25, 2008

This patch makes the following misc updates in preparation for
disk->part dereference fix and extended block devt support.

* implment part_to_disk()

* fix comment about gendisk->part indexing

* rename get_part() to disk_map_sector()

* don't use n which is always zero while printing disk information in
  diskstats_show()
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

310a2c10

block: update add_partition() error handling · 88e34126

Tejun Heo authored Aug 25, 2008

d805dda4 tried to fix error case handling in add_partition() but had a
few problems.

* disk->part[] entry is set early and left dangling if operation
  fails.

* Once device initialized, the last put_device() is responsible for
  freeing all the resources.  The failure path freed part_stats and p
  regardless of put_device() causing double free.

* holders subdir holds reference to the disk device, so failure path
  should remove it to release resources properly which was missing.

This patch fixes the above problems and while at it move partition
slot busy check into add_partition() for completeness and inlines
holders subdirectory creation.  Using separate function for it just
obfuscates the code.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Abdel Benamrouche <draconux@gmail.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

88e34126

block: allow deleting zero length partition · ec2cdedf

Tejun Heo authored Aug 25, 2008

delete_partition() was noop for zero length partition.  As the
addition code allows creating zero lenght partition and deletion is
assumed to always succeed, this causes memory leak for zero length
partitions.  Allow zero length partitions to end their meaningless
lives.

While at it, allow deleting zero lenght partition via
BLKPG_DEL_PARTITION ioctl too.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

ec2cdedf

block: use class_dev_iterator instead of class_for_each_device() · def4e38d

Tejun Heo authored Sep 03, 2008

Recent block_class iteration updates 5c6f35c5..27f30251 converted all
class device iteration to class_for_each_device() and
class_find_device(), which are correct but pain in the ass to use.
This pach converts them to newly introduced class_dev_iterator so that
they can use more natural control structures instead of separate
callbacks and struct to pass parameters to them.

This results in smaller and easier code.

This patch also restores the original behavior of not printing header
in /proc/partitions if there's no partition to print.  This is trivial
but still user-visible behavior.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

def4e38d

block: don't grab block_class_lock unnecessarily · 2ac3cee5

Tejun Heo authored Sep 03, 2008

block_class_lock protects major_names array and bdev_map and doesn't
have anything to do with block class devices.  Don't grab them while
iterating over block class devices.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

2ac3cee5

block: fix partition info printouts · ac65ece4

Tejun Heo authored Aug 25, 2008

Recent block_class iteration updates 5c6f35c5..27f30251 broke partition
info printouts.

* printk_all_partitions(): Partition print out stops when it meets a
  partition hole.  Partition printing inner loop should continue
  instead of exiting on empty partition slot.

* /proc/partitions and /proc/diskstats: If all information can't be
  read in single read(), the information is truncated.  This is
  because find_start() doesn't actually update the counter containing
  the initial seek.  It runs to the end and ends up always reporting
  EOF on the second read.

This patch fixes both problems.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

ac65ece4

driver-core: use klist for class device list and implement iterator · 5a3ceb86

Tejun Heo authored Aug 25, 2008

Iterating over entries using callback usually isn't too fun especially
when the entry being iterated over can't be manipulated freely.  This
patch converts class->p->class_devices to klist and implements class
device iterator so that the users can freely build their own control
structure.  The users are also free to call back into class code
without worrying about locking.

class_for_each_device() and class_find_device() are converted to use
the new iterators, so their users don't have to worry about locking
anymore either.

Note: This depends on klist-dont-iterate-over-deleted-entries patch
because class_intf->add/remove_dev() depends on proper synchronization
with device removal.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

5a3ceb86

klist: don't iterate over deleted entries · a1ed5b0c

Tejun Heo authored Aug 25, 2008

A klist entry is kept on the list till all its current iterations are
finished; however, a new iteration after deletion also iterates over
deleted entries as long as their reference count stays above zero.
This causes problems for cases where there are users which iterate
over the list while synchronized against list manipulations and
natuarally expect already deleted entries to not show up during
iteration.

This patch implements dead flag which gets set on deletion so that
iteration can skip already deleted entries.  The dead flag piggy backs
on the lowest bit of knode->n_klist and only visible to klist
implementation proper.

While at it, drop klist_iter->i_head as it's redundant and doesn't
offer anything in semantics or performance wise as klist_iter->i_klist
is dereferenced on every iteration anyway.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

a1ed5b0c

Add some block/ source files to the kernel-api docbook. Fix kernel-doc... · 710027a4

Randy Dunlap authored Aug 19, 2008

Add some block/ source files to the kernel-api docbook. Fix kernel-doc notation in them as needed. Fix changed function parameter names. Fix typos/spellos. In comments, change REQ_SPECIAL to REQ_TYPE_SPECIAL and REQ_BLOCK_PC to REQ_TYPE_BLOCK_PC.
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

710027a4

block: make bi_phys_segments an unsigned int instead of short · 5b99c2ff

Jens Axboe authored Aug 15, 2008

raid5 can overflow with more than 255 stripes, and we can increase it
to an int for free on both 32 and 64-bit archs due to the padding.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

5b99c2ff

block: raid fixups for removal of bi_hw_segments · 960e739d
Jens Axboe authored Aug 15, 2008
```
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
```
960e739d

drop vmerge accounting · 5df97b91

Mikulas Patocka authored Aug 15, 2008

Remove hw_segments field from struct bio and struct request. Without virtual
merge accounting they have no purpose.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

5df97b91

block: drop virtual merging accounting · b8b3e16c

Mikulas Patocka authored Aug 15, 2008

Remove virtual merge accounting.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

b8b3e16c

block: update documentation for deadline fifo_batch tunable · 6a421c1d

Aaron Carroll authored Aug 14, 2008

Update the description of fifo_batch to match the current implementation,
and include a description of how to tune it.
Signed-off-by: Aaron Carroll <aaronc@gelato.unsw.edu.au>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

6a421c1d

deadline-iosched: non-functional fixes · 4fb72f76

Aaron Carroll authored Aug 14, 2008

* convert goto to simpler while loop;
 * use rq_end_sector() instead of computing manually;
 * fix false comments;
 * remove spurious whitespace;
 * convert rq_rb_root macro to an inline function.
Signed-off-by: Aaron Carroll <aaronc@gelato.unsw.edu.au>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

4fb72f76

deadline-iosched: allow non-sequential batching · 63de428b

Aaron Carroll authored Aug 14, 2008

Deadline currently only batches sector-contiguous requests, so except
for a few circumstances (e.g. requests in a single direction), it is
essentially first come first served. This is bad for throughput, so
change it to CSCAN, which means requests in a batch do not need to be
sequential and are issued in increasing sector order.
Signed-off-by: Aaron Carroll <aaronc@gelato.unsw.edu.au>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

63de428b

virtio_blk: use a wrapper function to access io context information of IO requests · 766ca442

Fernando Luis Vázquez Cao authored Aug 14, 2008

struct request has an ioprio member but it is never updated because
currently bios do not hold io context information. The implication of
this is that virtio_blk ends up passing useless information to the
backend driver.

That said, some IO schedulers such as CFQ do store io context
information in struct request, but use private members for that, which
means that that information cannot be directly accessed in a IO
scheduler-independent way.

This patch adds a function to obtain the ioprio of a request. We should
avoid accessing ioprio directly and use this function instead, so that
its users do not have to care about future changes in block layer
structures or what the currently active IO controller is.

This patch does not introduce any functional changes but paves the way
for future clean-ups and enhancements.
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

766ca442

Kill REQ_TYPE_FLUSH · 1a8e2bdd

David Woodhouse authored Aug 13, 2008

It was only used by ps3disk, and it should probably have been
REQ_TYPE_LINUX_BLOCK + REQ_LB_OP_FLUSH.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

1a8e2bdd

Allow elevators to sort/merge discard requests · e17fc0a1

David Woodhouse authored Aug 09, 2008

But blkdev_issue_discard() still emits requests which are interpreted as
soft barriers, because naïve callers might otherwise issue subsequent
writes to those same sectors, which might cross on the queue (if they're
reallocated quickly enough).

Callers still _can_ issue non-barrier discard requests, but they have to
take care of queue ordering for themselves.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

e17fc0a1

Add BLKDISCARD ioctl to allow userspace to discard sectors · d30a2605

David Woodhouse authored Aug 11, 2008

We may well want mkfs tools to use this to mark the whole device as
unwanted before they format it, for example.

The ioctl takes a pair of uint64_ts, which are start offset and length
in _bytes_. Although at the moment it might make sense for them both to
be in 512-byte sectors, I don't want to limit the ABI to that.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

d30a2605

Use WRITE_BARRIER in blkdev_issue_flush(), not (1<<BIO_RW_BARRIER) · 2ebca85a

OGAWA Hirofumi authored Aug 11, 2008

Barriers should be submitted with the WRITE flag set.
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

2ebca85a

blktrace: simplify flags handling in __blk_add_trace · 35ba8f70

David Woodhouse authored Aug 10, 2008

Let the compiler see what's going on, and it can all get a lot simpler.
On PPC64 this reduces the size of the code calculating these bits by
about 60%. On x86_64 it's less of a win -- only 40%.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

35ba8f70

blktrace: support discard requests · 27b29e86

David Woodhouse authored Aug 10, 2008

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

27b29e86

Support 'discard sectors' operation. · fdc53971

David Woodhouse authored Aug 05, 2008

We can benefit from knowing that the file system no longer cares about
the contents of certain sectors, by throwing them away immediately and
then never having to garbage collect them, and using the extra free
space to make our operations more efficient. Do so.
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

fdc53971

Support 'discard sectors' operation in translation layer support core · eae9acd1
David Woodhouse authored Aug 05, 2008
```
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
```
eae9acd1

Let the block device know when sectors can be discarded · 8c540a96

David Woodhouse authored Aug 05, 2008

[hirofumi@mail.parknet.co.jp: discard _after_ checking for corrupt chains]
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

8c540a96

Add 'discard' request handling · fb2dce86

David Woodhouse authored Aug 05, 2008

Some block devices benefit from a hint that they can forget the contents
of certain sectors. Add basic support for this to the block core, along
with a 'blkdev_issue_discard()' helper function which issues such
requests.

The caller doesn't get to provide an end_io functio, since
blkdev_issue_discard() will automatically split the request up into
multiple bios if appropriate. Neither does the function wait for
completion -- it's expected that callers won't care about when, or even
_if_, the request completes. It's only a hint to the device anyway. By
definition, the file system doesn't _care_ about these sectors any more.

[With feedback from OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> and
Jens Axboe <jens.axboe@oracle.com]
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

fb2dce86

Fix up comments about matching flags between bio and rq · d628eaef

David Woodhouse authored Aug 09, 2008

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

d628eaef

highmem: use bio_has_data() in the bounce path · 36144077
Jens Axboe authored Aug 14, 2008
```
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
```
36144077
block: use bio_has_data() in the IO completion path · 051cc395
Jens Axboe authored Aug 08, 2008
```
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
```
051cc395
block: use bio_has_data() to check for data carrying bio · a9c701e5
Jens Axboe authored Aug 08, 2008
```
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
```
a9c701e5
block: add bio_has_data() to detect whether a bio carries data or not · 7a67f63b
Jens Axboe authored Aug 08, 2008
```
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
```
7a67f63b

SG_IO block filter whitelist missing MMC SET READ AHEAD command · 35e396cd

xiphmont@xiph.org authored Aug 22, 2008

I have another request for the block filter SG_IO command whitelist,
specifically the MMC streaming command set SET READ AHEAD command.
The command applies only to MMC CDROM/DVDROM drives with the streaming
optional feature set.  The command is useful to cdparanoia in that it
allows explicit cache control side effects that are, on many drives,
cdparanoia's most efficient way to flush/disable the media cache on
cdrom drives. I am aware of no reason why it should not be accessible
from usespace.

Also note that the command is already fully accessible through the
SCSI-native version of the SG_IO ioctl as well as the traditional SG
interface.  The command is only being refused on block devices.  That
means that on a typical stock distro, the command is available through
/dev/sg* but not /dev/scd* although both are typically available and
accessible.  Filtering the command is not providing any protection,
only a confusing inconsistency.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

35e396cd

08 Oct, 2008 3 commits

Merge branch 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus · 69849375

Linus Torvalds authored Oct 08, 2008

* 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus:
  [MIPS] Sibyte: Register PIO PATA device only for Swarm and Litte Sur

69849375

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 · 392eaef2

Linus Torvalds authored Oct 08, 2008

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
  tcp: Fix tcp_hybla zero congestion window growth with small rho and large cwnd.
  net: Fix netdev_run_todo dead-lock
  tcp: Fix possible double-ack w/ user dma
  net: only invoke dev->change_rx_flags when device is UP
  netrom: Fix sock_orphan() use in nr_release
  ax25: Quick fix for making sure unaccepted sockets get destroyed.
  Revert "ax25: Fix std timer socket destroy handling."
  [Bluetooth] Add reset quirk for A-Link BlueUSB21 dongle
  [Bluetooth] Add reset quirk for new Targus and Belkin dongles
  [Bluetooth] Fix double frees on error paths of btusb and bpa10x drivers

392eaef2

[MIPS] Sibyte: Register PIO PATA device only for Swarm and Litte Sur · 88060488

Ralf Baechle authored Oct 08, 2008

Symbol name spaghetti which is too complicated to cleanup on this stage
of the release cycle breaks the build on BCM1480 platforms.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

88060488

07 Oct, 2008 4 commits

tcp: Fix tcp_hybla zero congestion window growth with small rho and large cwnd. · 9d2c27e1

Daniele Lacamera authored Oct 07, 2008

Because of rounding, in certain conditions, i.e. when in congestion
avoidance state rho is smaller than 1/128 of the current cwnd, TCP
Hybla congestion control starves and the cwnd is kept constant
forever.

This patch forces an increment by one segment after #send_cwnd calls
without increments(newreno behavior).
Signed-off-by: Daniele Lacamera <root@danielinux.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

9d2c27e1

net: Fix netdev_run_todo dead-lock · 58ec3b4d

Herbert Xu authored Oct 07, 2008

Benjamin Thery tracked down a bug that explains many instances
of the error

unregister_netdevice: waiting for %s to become free. Usage count = %d

It turns out that netdev_run_todo can dead-lock with itself if
a second instance of it is run in a thread that will then free
a reference to the device waited on by the first instance.

The problem is really quite silly.  We were trying to create
parallelism where none was required.  As netdev_run_todo always
follows a RTNL section, and that todo tasks can only be added
with the RTNL held, by definition you should only need to wait
for the very ones that you've added and be done with it.

There is no need for a second mutex or spinlock.

This is exactly what the following patch does.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

58ec3b4d

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/holtmann/bluetooth-2.6 · 742201e7
David S. Miller authored Oct 07, 2008

742201e7

tcp: Fix possible double-ack w/ user dma · 53240c20

Ali Saidi authored Oct 07, 2008

From: Ali Saidi <saidi@engin.umich.edu>

When TCP receive copy offload is enabled it's possible that
tcp_rcv_established() will cause two acks to be sent for a single
packet. In the case that a tcp_dma_early_copy() is successful,
copied_early is set to true which causes tcp_cleanup_rbuf() to be
called early which can send an ack. Further along in
tcp_rcv_established(), __tcp_ack_snd_check() is called and will
schedule a delayed ACK. If no packets are processed before the delayed
ack timer expires the packet will be acked twice.
Signed-off-by: David S. Miller <davem@davemloft.net>

53240c20