Commit 859d7182 authored by Vlad Apostolov's avatar Vlad Apostolov Committed by Tim Shimmin

[XFS] get_bulkall() could return incorrect inode state

In the following scenario xfs_bulkstat() returns incorrect stale inode
state:

1. File_A is created and its inode synced to disk. 2. File_A is unlinked
and doesn't exist anymore. 3. Filesystem sync is invoked. 4. File_B is
created. File_B happens to reclaim File_A's inode. 5. xfs_bulkstat() is
called and detects File_B but reports the

incorrect File_A inode state.

Explanation for the incorrect inode state is that inodes are not
immediately synced on file create for performance reasons. This leaves the
on-disk inode buffer uninitialized (or with old state from a previous
generation inode) and this is what xfs_bulkstat() would report.

The patch marks the on-disk inode buffer "dirty" on unlink. When the inode
is reclaimed (by a new file create), xfs_bulkstat() would filter this
inode by the "dirty" mark. Once the inode is flushed to disk, the on-disk
buffer "dirty" mark is automatically removed and a following
xfs_bulkstat() would return the correct inode state.

Marking the on-disk inode buffer "dirty" on unlink is achieved by setting
the on-disk di_nlink field to 0. Note that the in-core di_nlink has
already been set to 0 and a corresponding transaction logged by
xfs_droplink(). This is an exception from the rule that any on-disk inode
buffer changes has to be followed by a disk write (inode flush).
Synchronizing the in-core to on-disk di_nlink values in advance (before
the actual inode flush to disk) should be fine in this case because the
inode is already unlinked and it would never change its di_nlink again for
this inode generation.

SGI-PV: 970842
SGI-Modid: xfs-linux-melb:xfs-kern:29757a
Signed-off-by: default avatarVlad Apostolov <vapo@sgi.com>
Signed-off-by: default avatarAlex Elder <aelder@sgi.com>
Signed-off-by: default avatarDavid Chinner <dgc@sgi.com>
Signed-off-by: default avatarChristoph Hellwig <hch@infradead.org>
Signed-off-by: default avatarMark Goodwin <markgw@sgi.com>
Signed-off-by: default avatarTim Shimmin <tes@sgi.com>
parent ba532a98
...@@ -1931,9 +1931,9 @@ xfs_iunlink( ...@@ -1931,9 +1931,9 @@ xfs_iunlink(
*/ */
error = xfs_trans_read_buf(mp, tp, mp->m_ddev_targp, agdaddr, error = xfs_trans_read_buf(mp, tp, mp->m_ddev_targp, agdaddr,
XFS_FSS_TO_BB(mp, 1), 0, &agibp); XFS_FSS_TO_BB(mp, 1), 0, &agibp);
if (error) { if (error)
return error; return error;
}
/* /*
* Validate the magic number of the agi block. * Validate the magic number of the agi block.
*/ */
...@@ -1957,6 +1957,24 @@ xfs_iunlink( ...@@ -1957,6 +1957,24 @@ xfs_iunlink(
ASSERT(agi->agi_unlinked[bucket_index]); ASSERT(agi->agi_unlinked[bucket_index]);
ASSERT(be32_to_cpu(agi->agi_unlinked[bucket_index]) != agino); ASSERT(be32_to_cpu(agi->agi_unlinked[bucket_index]) != agino);
error = xfs_itobp(mp, tp, ip, &dip, &ibp, 0, 0);
if (error)
return error;
/*
* Clear the on-disk di_nlink. This is to prevent xfs_bulkstat
* from picking up this inode when it is reclaimed (its incore state
* initialzed but not flushed to disk yet). The in-core di_nlink is
* already cleared in xfs_droplink() and a corresponding transaction
* logged. The hack here just synchronizes the in-core to on-disk
* di_nlink value in advance before the actual inode sync to disk.
* This is OK because the inode is already unlinked and would never
* change its di_nlink again for this inode generation.
* This is a temporary hack that would require a proper fix
* in the future.
*/
dip->di_core.di_nlink = 0;
if (be32_to_cpu(agi->agi_unlinked[bucket_index]) != NULLAGINO) { if (be32_to_cpu(agi->agi_unlinked[bucket_index]) != NULLAGINO) {
/* /*
* There is already another inode in the bucket we need * There is already another inode in the bucket we need
...@@ -1964,10 +1982,6 @@ xfs_iunlink( ...@@ -1964,10 +1982,6 @@ xfs_iunlink(
* Here we put the head pointer into our next pointer, * Here we put the head pointer into our next pointer,
* and then we fall through to point the head at us. * and then we fall through to point the head at us.
*/ */
error = xfs_itobp(mp, tp, ip, &dip, &ibp, 0, 0);
if (error) {
return error;
}
ASSERT(be32_to_cpu(dip->di_next_unlinked) == NULLAGINO); ASSERT(be32_to_cpu(dip->di_next_unlinked) == NULLAGINO);
/* both on-disk, don't endian flip twice */ /* both on-disk, don't endian flip twice */
dip->di_next_unlinked = agi->agi_unlinked[bucket_index]; dip->di_next_unlinked = agi->agi_unlinked[bucket_index];
......
...@@ -290,8 +290,16 @@ xfs_bulkstat_use_dinode( ...@@ -290,8 +290,16 @@ xfs_bulkstat_use_dinode(
return 1; return 1;
dip = (xfs_dinode_t *) dip = (xfs_dinode_t *)
xfs_buf_offset(bp, clustidx << mp->m_sb.sb_inodelog); xfs_buf_offset(bp, clustidx << mp->m_sb.sb_inodelog);
/*
* Check the buffer containing the on-disk inode for di_nlink == 0.
* This is to prevent xfs_bulkstat from picking up just reclaimed
* inodes that have their in-core state initialized but not flushed
* to disk yet. This is a temporary hack that would require a proper
* fix in the future.
*/
if (be16_to_cpu(dip->di_core.di_magic) != XFS_DINODE_MAGIC || if (be16_to_cpu(dip->di_core.di_magic) != XFS_DINODE_MAGIC ||
!XFS_DINODE_GOOD_VERSION(dip->di_core.di_version)) !XFS_DINODE_GOOD_VERSION(dip->di_core.di_version) ||
!dip->di_core.di_nlink)
return 0; return 0;
if (flags & BULKSTAT_FG_QUICK) { if (flags & BULKSTAT_FG_QUICK) {
*dipp = dip; *dipp = dip;
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment