Commits · 9ca170a3c0cbb0d5251cef6f5a3300fa436ba8ec · linux / linux-davinci

10 Dec, 2009 40 commits

dm kcopyd: accept zero size jobs · 9ca170a3

Mikulas Patocka authored Dec 10, 2009

dm-kcopyd: accept zero-size jobs

This patch changes dm-kcopyd so that it accepts zero-size jobs and completes
them immediatelly via its completion thread.

It is needed for multisnapshots snapshot resizing. When we are writing to
a chunk beyond origin end, no copying is done. To simplify the code, we submit
an empty request to kcopyd and let kcopyd complete it. If we didn't submit
a request to kcopyd and called the completion routine immediatelly, it would
violate the principle that completion is called only from one thread and
it would need additional locking.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

9ca170a3

dm snapshot: track suspended state in target · c26655ca

Mike Snitzer authored Dec 10, 2009

Keep track of whether or not the device is suspended within the snapshot
target module, the same as we do in dm-raid1.

We will use this later to enforce the correct sequence of ioctls to
transfer the in-core exceptions from a snapshot target instance in
one table to a replacement one capable of merging them back
into the origin.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

c26655ca

dm snapshot: move cow ref from exception store to snap core · fc56f6fb

Mike Snitzer authored Dec 10, 2009

Store the reference to the snapshot cow device in the core snapshot
code instead of each exception store.  It can be accessed through the
new function dm_snap_cow().  Exception stores should each now maintain a
reference to their parent snapshot struct.

This is cleaner and makes part of the forthcoming snapshot merge code simpler.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Reviewed-by: Jonathan Brassow <jbrassow@redhat.com>
Cc: Mikulas Patocka <mpatocka@redhat.com>

fc56f6fb

dm snapshot: add allocated metadata to snapshot status · 985903bb

Mike Snitzer authored Dec 10, 2009

Add number of sectors used by metadata to the end of the snapshot's status
line.

Renamed dm_exception_store_type's 'fraction_full' to 'usage'.  Renamed
arguments to be clearer about what is being returned.  Also added
'metadata_sectors'.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

985903bb

dm snapshot: rename exception functions · 3510cb94

Jon Brassow authored Dec 10, 2009

Rename exception functions.  Preparing to pull them out of
dm-snap.c for broader use.
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

3510cb94

dm snapshot: rename exception_table to dm_exception_table · 191437a5

Jon Brassow authored Dec 10, 2009

Rename exception_table for broader use outside dm-snap.c
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

191437a5

dm snapshot: rename dm_snap_exception to dm_exception · 1d4989c8

Jon Brassow authored Dec 10, 2009

The exception structure is not necessarily just a snapshot
element (especially after we pull it out of dm-snap.c).

Renaming appropriately.
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

1d4989c8

dm snapshot: consolidate insert exception functions · d32a6ea6

Jon Brassow authored Dec 10, 2009

Consolidate the insert_*exception functions.  'insert_completed_exception'
already contains all the logic to handle 'insert_exception' (via
check for a hash_shift of 0), so remove redundant function.
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

d32a6ea6

dm snapshot: abstract minimum_chunk_size fn · 7e201b35

Mikulas Patocka authored Dec 10, 2009

The origin needs to find minimum chunksize of all snapshots.  This logic is
moved to a separate function because it will be used at another place in
the snapshot merge patches.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

7e201b35

dm snapshot: simplify sector_to_chunk expression · 102c6ddb

Mikulas Patocka authored Dec 10, 2009

Removed unnecessary 'and' masking: The right shift discards the lower
bits so there is no need to clear them.

(A later patch needs this change to support a 32-bit chunk_mask.)
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Jonathan Brassow <jbrassow@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

102c6ddb

dm snapshot: avoid else clause in persistent_read_metadata · f5acc834

Jon Brassow authored Dec 10, 2009

Minor code touch-up.  We don't need the 'else'.
Signed-off-by: Jonathan Brassow <jbrassow@redhat.com>
Reviewed-by: Mikulas Patocka <mpatocka@redhat.com>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

f5acc834

dm ioctl: prefer strlcpy over strncpy · a518b86d

Roel Kluin authored Dec 10, 2009

strlcpy() will always null terminate the string.

    The code should already guarantee this as the last bytes are already
    NULs and the string lengths were restricted before being stored in
    hc.  Removing the '-1' becomes necessary so strlcpy() doesn't
    lose the last character of a maximum-length string.
	- agk
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

a518b86d

dm raid1: explicitly initialise bio_lists · 5339fc2d

Mikulas Patocka authored Dec 10, 2009

Explicitly initialize bio lists instead of relying on kzalloc.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Reviewed-by: Takahiro Yasui <tyasui@redhat.com>
Tested-by: Takahiro Yasui <tyasui@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

5339fc2d

dm raid1: hold all write bios when leg fails · 929be8fc

Mikulas Patocka authored Dec 10, 2009

Hold all write bios when leg fails and errors are handled

When using a userspace daemon such as dmeventd to handle errors, we must
delay completing  bios until it has done its job.
This patch prevents the following race:
  - primary leg fails
  - write "1" fail, the write is held, secondary leg is set default
  - write "2" goes straight to the secondary leg
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Reviewed-by: Takahiro Yasui <tyasui@redhat.com>
Tested-by: Takahiro Yasui <tyasui@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

929be8fc

dm raid1: hold write bios when errors are handled · 60f355ea

Mikulas Patocka authored Dec 10, 2009

Hold all write bios when errors are handled.

Previously the failures list was used only when handling errors with
a userspace daemon such as dmeventd.  Now, it is always used for all bios.
The regions where some writes failed must be marked as nosync. This can only
be done in process context (i.e. in raid1 workqueue), not in the
write_callback function.

Previously the write would succeed if writing to at least one leg
succeeded.  This is wrong because data from the failed leg may be
replicated to the correct leg.  Now, if using a userspace daemon, the
write with some failures will be held until the daemon has done its job
and reconfigured the array.  If not using a daemon, the write still
succeeds if at least one leg succeeds. This is bad, but it is consistent
with current behavior.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Reviewed-by: Takahiro Yasui <tyasui@redhat.com>
Tested-by: Takahiro Yasui <tyasui@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

60f355ea

dm raid1: remove bio_endio from dm_rh_mark_nosync · c58098be

Mikulas Patocka authored Dec 10, 2009

Move bio completion out of dm_rh_mark_nosync in preparation for the
next patch.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Reviewed-by: Takahiro Yasui <tyasui@redhat.com>
Tested-by: Takahiro Yasui <tyasui@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

c58098be

dm raid1: abstract get_valid_mirror function · 87968ddd

Mikulas Patocka authored Dec 10, 2009

Move the logic to get a valid mirror leg into a function for re-use
in a later patch.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Reviewed-by: Takahiro Yasui <tyasui@redhat.com>
Tested-by: Takahiro Yasui <tyasui@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

87968ddd

dm raid1: use hold framework in do_failures · 0f398a84

Mikulas Patocka authored Dec 10, 2009

Use the hold framework in do_failures.

This patch doesn't change the bio processing logic, it just simplifies
failure handling and avoids periodically polling the failures list.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Reviewed-by: Takahiro Yasui <tyasui@redhat.com>
Tested-by: Takahiro Yasui <tyasui@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

0f398a84

dm raid1: add framework to hold bios during suspend · 04788507

Mikulas Patocka authored Dec 10, 2009

Add framework to delay bios until a suspend and then resubmit them with
either DM_ENDIO_REQUEUE (if the suspend was noflush) or complete them
with -EIO.  I/O barrier support will use this.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Reviewed-by: Takahiro Yasui <tyasui@redhat.com>
Tested-by: Takahiro Yasui <tyasui@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

04788507

dm raid1: report flush errors separately in status · 64b30c46

Mikulas Patocka authored Dec 10, 2009

Report flush errors as 'F' instead of 'D' for log and mirror devices.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

64b30c46

dm raid1: implement mirror_flush · c0da3748

Mikulas Patocka authored Dec 10, 2009

Implement flush callee. It uses dm_io to send zero-size barrier synchronously
and concurrently to all the mirror legs.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

c0da3748

dm log: use flush callback fn · 076010e2

Mikulas Patocka authored Dec 10, 2009

Call the flush callback from the log.

If flush failed, we have no alternative but to mark the whole log as dirty.
Also we set the variable flush_failed to prevent any bits ever being marked as
clean again.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

076010e2

dm log: add flush callback fn · 87a8f240

Mikulas Patocka authored Dec 10, 2009

Introduce a callback pointer from the log to dm-raid1 layer.

Before some region is set as "in-sync", we need to flush hardware cache on
all the disks. But the log module doesn't have access to the mirror_set
structure. So it will use this callback.

So far the callback is unused, it will be used in further patches.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

87a8f240

dm log: introduce flush_failed variable · 5adc78d0

Mikulas Patocka authored Dec 10, 2009

Introduce "flush failed" variable.  When a flush before clearing a bit
in the log fails, we don't know anything about which which regions are
in-sync and which not.

So we need to set all regions as not-in-sync and set the variable
"flush_failed" to prevent setting the in-sync bit in the future.

A target reload is the only way to get out of this situation.

The variable will be set in following patches.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

5adc78d0

dm log: add flush_header function · 20a34a8e

Mikulas Patocka authored Dec 10, 2009

Introduce flush_header and use it to flush the log device.

Note that we don't have to flush if all the regions transition
from "dirty" to "clean" state.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

20a34a8e

dm raid1: split touched state into two · b09acf1a

Mikulas Patocka authored Dec 10, 2009

Split the variable "touched" into two, "touched_dirtied" and
"touched_cleaned", set when some region was dirtied or cleaned.

This will be used to optimize flushes.

After a transition from "dirty" to "clean" state we don't have flush hardware
cache on the log device. After a transition from "clean" to "dirty" the cache
must be flushed.

Before a transition from "clean" to "dirty" state we don't have to flush all
the raid legs. Before a transition from "dirty" to "clean" we must flush all
the legs to make sure that they are really in sync.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

b09acf1a

dm raid1: support flush · 4184153f

Mikulas Patocka authored Dec 10, 2009

Flush support for dm-raid1.

When it receives an empty barrier, submit it to all the devices via dm-io.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

4184153f

dm io: remove extra bi_io_vec region hack · f1e53987

Mikulas Patocka authored Dec 10, 2009

Remove the hack where we allocate an extra bi_io_vec to store additional
private data. This hack prevents us from supporting barriers in
dm-raid1 without first making another little block layer change.
Instead of doing that, this patch eliminates the bi_io_vec abuse by
storing the region number directly in the low bits of bi_private.

We need to store two things for each bio, the pointer to the main io
structure and, if parallel writes were requested, an index indicating
which of these writes this bio belongs to. There can be at most
BITS_PER_LONG regions - 32 or 64.

The index (region number) was stored in the last (hidden) bio vector and
the pointer to struct io was stored in bi_private.

This patch now aligns "struct io" on BITS_PER_LONG bytes and stores the
region number in the low BITS_PER_LONG bits of bi_private.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

f1e53987

dm io: use slab for struct io · 952b3557

Mikulas Patocka authored Dec 10, 2009

Allocate "struct io" from a slab.

This patch changes dm-io, so that "struct io" is allocated from a slab cache.
It used to be allocated with kmalloc. Allocating from a slab will be needed
for the next patch, because it requires a special alignment of "struct io"
and kmalloc cannot meet this alignment.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

952b3557

dm crypt: make wipe message also wipe essiv key · 542da317

Milan Broz authored Dec 10, 2009

The "wipe key" message is used to wipe the volume key from memory
temporarily, for example when suspending to RAM.

But the initialisation vector in ESSIV mode is calculated from the
hashed volume key, so the wipe message should wipe this IV key too and
reinitialise it when the volume key is reinstated.

This patch adds an IV wipe method called from a wipe message callback.
ESSIV is then reinitialised using the init function added by the
last patch.

Cc: stable@kernel.org
Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

542da317

dm crypt: separate essiv allocation from initialisation · b95bf2d3

Milan Broz authored Dec 10, 2009

This patch separates the construction of IV from its initialisation.
(For ESSIV it is a hash calculation based on volume key.)

Constructor code now preallocates hash tfm and salt array
and saves it in a private IV structure.

The next patch requires this to reinitialise the wiped IV
without reallocating memory when resuming a suspended device.

Cc: stable@kernel.org
Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

b95bf2d3

dm crypt: restructure essiv error path · 5861f1be

Milan Broz authored Dec 10, 2009

Use kzfree for salt deallocation because it is derived from the volume
key.  Use a common error path in ESSIV constructor.

Required by a later patch which fixes the way key material is wiped
from memory.

Cc: stable@kernel.org
Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

5861f1be

dm crypt: move private iv fields to structs · 60473592

Milan Broz authored Dec 10, 2009

Define private structures for IV so it's easy to add further attributes
in a following patch which fixes the way key material is wiped from
memory.  Also move ESSIV destructor and remove unnecessary 'status'
operation.

There are no functional changes in this patch.

Cc: stable@kernel.org
Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

60473592

dm crypt: make wipe message also wipe tfm key · 0b430958

Milan Broz authored Dec 10, 2009

The "wipe key" message is used to wipe a volume key from memory
temporarily, for example when suspending to RAM.

There are two instances of the key in memory (inside crypto tfm)
but only one got wiped.  This patch wipes them both.

Cc: stable@kernel.org
Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

0b430958

dm snapshot: cope with chunk size larger than origin · 8e87b9b8

Mikulas Patocka authored Dec 10, 2009

Under some special conditions the snapshot hash_size is calculated as zero.
This patch instead sets a minimum value of 64, the same as for the
pending exception table.

rounddown_pow_of_two(0) is an undefined operation (it expands to shift
by -1).  init_exception_table with an argument of 0 would fail with -ENOMEM.

The way to trigger the problem is to create a snapshot with a chunk size
that is larger than the origin device.

Cc: stable@kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

8e87b9b8

dm snapshot: only take lock for statustype info not table · 94e76572

Mikulas Patocka authored Dec 10, 2009

Take snapshot lock only for STATUSTYPE_INFO, not STATUSTYPE_TABLE.

Commit 4c6fff44
(dm-snapshot-lock-snapshot-while-supplying-status.patch)
introduced this use of the lock, but userspace applications using
libdevmapper have been found to request STATUSTYPE_TABLE while the device
is suspended and the lock is already held, leading to deadlock.  Since
the lock is not necessary in this case, don't try to take it.

Cc: stable@kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

94e76572

dm: sysfs add empty release function to avoid debug warning · d2bb7df8

Milan Broz authored Dec 10, 2009

This patch just removes an unnecessary warning:
 kobject: 'dm': does not have a release() function,
 it is broken and must be fixed.

The kobject is embedded in mapped device struct, so
code does not need to release memory explicitly here.

Cc: stable@kernel.org
Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

d2bb7df8

dm exception store: free tmp_store on persistent flag error · 613978f8

Julia Lawall authored Dec 10, 2009

Error handling code following a kmalloc should free the allocated data.

Cc: stable@kernel.org
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

613978f8

dm: avoid _hash_lock deadlock · 6076905b

Mikulas Patocka authored Dec 10, 2009

Fix a reported deadlock if there are still unprocessed multipath events
on a device that is being removed.

_hash_lock is held during dev_remove while trying to send the
outstanding events.  Sending the events requests the _hash_lock
again in dm_copy_name_and_uuid.

This patch introduces a separate lock around regions that modify the
link to the hash table (dm_set_mdptr) or the name or uuid so that
dm_copy_name_and_uuid no longer needs _hash_lock.

Additionally, dm_copy_name_and_uuid can only be called if md exists
so we can drop the dm_get() and dm_put() which can lead to a BUG()
while md is being freed.

The deadlock:
 #0 [ffff8106298dfb48] schedule at ffffffff80063035
 #1 [ffff8106298dfc20] __down_read at ffffffff8006475d
 #2 [ffff8106298dfc60] dm_copy_name_and_uuid at ffffffff8824f740
 #3 [ffff8106298dfc90] dm_send_uevents at ffffffff88252685
 #4 [ffff8106298dfcd0] event_callback at ffffffff8824c678
 #5 [ffff8106298dfd00] dm_table_event at ffffffff8824dd01
 #6 [ffff8106298dfd10] __hash_remove at ffffffff882507ad
 #7 [ffff8106298dfd30] dev_remove at ffffffff88250865
 #8 [ffff8106298dfd60] ctl_ioctl at ffffffff88250d80
 #9 [ffff8106298dfee0] do_ioctl at ffffffff800418c4
#10 [ffff8106298dff00] vfs_ioctl at ffffffff8002fab9
#11 [ffff8106298dff40] sys_ioctl at ffffffff8004bdaf
#12 [ffff8106298dff80] tracesys at ffffffff8005d28d (via system_call)

Cc: stable@kernel.org
Reported-by: guy keren <choo@actcom.co.il>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>

6076905b

Merge branch 'acpica' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6 · 3067e02f

Linus Torvalds authored Dec 09, 2009

* 'acpica' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6:
  ACPICA: Update version to 20091112.
  ACPICA: Add additional module-level code support
  ACPICA: Deploy new create integer interface where appropriate
  ACPICA: New internal utility function to create Integer objects
  ACPICA: Add repair for predefined methods that must return sorted lists
  ACPICA: Fix possible fault if return Package objects contain NULL elements
  ACPICA: Add post-order callback to acpi_walk_namespace
  ACPICA: Change package length error message to an info message
  ACPICA: Reduce severity of predefined repair messages, Warning to Info
  ACPICA: Update version to 20091013
  ACPICA: Fix possible memory leak for Scope ASL operator
  ACPICA: Remove possibility of executing _REG methods twice
  ACPICA: Add repair for bad _MAT buffers
  ACPICA: Add repair for bad _BIF/_BIX packages

3067e02f