Commit 96303081 authored by Josef Bacik's avatar Josef Bacik Committed by Chris Mason

Btrfs: use hybrid extents+bitmap rb tree for free space

Currently btrfs has a problem where it can use a ridiculous amount of RAM simply
tracking free space.  As free space gets fragmented, we end up with thousands of
entries on an rb-tree per block group, which usually spans 1 gig of area.  Since
we currently don't ever flush free space cache back to disk this gets to be a
bit unweildly on large fs's with lots of fragmentation.

This patch solves this problem by using PAGE_SIZE bitmaps for parts of the free
space cache.  Initially we calculate a threshold of extent entries we can
handle, which is however many extent entries we can cram into 16k of ram.  The
maximum amount of RAM that should ever be used to track 1 gigabyte of diskspace
will be 32k of RAM, which scales much better than we did before.

Once we pass the extent threshold, we start adding bitmaps and using those
instead for tracking the free space.  This patch also makes it so that any free
space thats less than 4 * sectorsize we go ahead and put into a bitmap.  This is
nice since we try and allocate out of the front of a block group, so if the
front of a block group is heavily fragmented and then has a huge chunk of free
space at the end, we go ahead and add the fragmented areas to bitmaps and use a
normal extent entry to track the big chunk at the back of the block group.

I've also taken the opportunity to revamp how we search for free space.
Previously we indexed free space via an offset indexed rb tree and a bytes
indexed rb tree.  I've dropped the bytes indexed rb tree and use only the offset
indexed rb tree.  This cuts the number of tree operations we were doing
previously down by half, and gives us a little bit of a better allocation
pattern since we will always start from a specific offset and search forward
from there, instead of searching for the size we need and try and get it as
close as possible to the offset we want.

I've given this a healthy amount of testing pre-new format stuff, as well as
post-new format stuff.  I've booted up my fedora box which is installed on btrfs
with this patch and ran with it for a few days without issues.  I've not seen
any performance regressions in any of my tests.

Since the last patch Yan Zheng fixed a problem where we could have overlapping
entries, so updating their offset inline would cause problems.  Thanks,
Signed-off-by: default avatarJosef Bacik <jbacik@redhat.com>
Signed-off-by: default avatarChris Mason <chris.mason@oracle.com>
parent 83121942
......@@ -709,6 +709,9 @@ struct btrfs_free_cluster {
/* first extent starting offset */
u64 window_start;
/* if this cluster simply points at a bitmap in the block group */
bool points_to_bitmap;
struct btrfs_block_group_cache *block_group;
/*
* when a cluster is allocated from a block group, we put the
......@@ -726,6 +729,10 @@ struct btrfs_block_group_cache {
u64 pinned;
u64 reserved;
u64 flags;
u64 sectorsize;
int extents_thresh;
int free_extents;
int total_bitmaps;
int cached;
int ro;
int dirty;
......@@ -734,7 +741,6 @@ struct btrfs_block_group_cache {
/* free space cache stuff */
spinlock_t tree_lock;
struct rb_root free_space_bytes;
struct rb_root free_space_offset;
/* block group cache stuff */
......
......@@ -3649,7 +3649,6 @@ refill_cluster:
goto loop;
checks:
search_start = stripe_align(root, offset);
/* move on to the next group */
if (search_start + num_bytes >= search_end) {
btrfs_add_free_space(block_group, offset, num_bytes);
......@@ -7040,6 +7039,16 @@ int btrfs_read_block_groups(struct btrfs_root *root)
mutex_init(&cache->cache_mutex);
INIT_LIST_HEAD(&cache->list);
INIT_LIST_HEAD(&cache->cluster_list);
cache->sectorsize = root->sectorsize;
/*
* we only want to have 32k of ram per block group for keeping
* track of free space, and if we pass 1/2 of that we want to
* start converting things over to using bitmaps
*/
cache->extents_thresh = ((1024 * 32) / 2) /
sizeof(struct btrfs_free_space);
read_extent_buffer(leaf, &cache->item,
btrfs_item_ptr_offset(leaf, path->slots[0]),
sizeof(cache->item));
......@@ -7091,6 +7100,15 @@ int btrfs_make_block_group(struct btrfs_trans_handle *trans,
cache->key.objectid = chunk_offset;
cache->key.offset = size;
cache->key.type = BTRFS_BLOCK_GROUP_ITEM_KEY;
cache->sectorsize = root->sectorsize;
/*
* we only want to have 32k of ram per block group for keeping track
* of free space, and if we pass 1/2 of that we want to start
* converting things over to using bitmaps
*/
cache->extents_thresh = ((1024 * 32) / 2) /
sizeof(struct btrfs_free_space);
atomic_set(&cache->count, 1);
spin_lock_init(&cache->lock);
spin_lock_init(&cache->tree_lock);
......@@ -7103,6 +7121,11 @@ int btrfs_make_block_group(struct btrfs_trans_handle *trans,
cache->flags = type;
btrfs_set_block_group_flags(&cache->item, type);
cache->cached = 1;
ret = btrfs_add_free_space(cache, chunk_offset, size);
BUG_ON(ret);
remove_sb_from_cache(root, cache);
ret = update_space_info(root->fs_info, cache->flags, size, bytes_used,
&cache->space_info);
BUG_ON(ret);
......
This diff is collapsed.
......@@ -19,6 +19,14 @@
#ifndef __BTRFS_FREE_SPACE_CACHE
#define __BTRFS_FREE_SPACE_CACHE
struct btrfs_free_space {
struct rb_node offset_index;
u64 offset;
u64 bytes;
unsigned long *bitmap;
struct list_head list;
};
int btrfs_add_free_space(struct btrfs_block_group_cache *block_group,
u64 bytenr, u64 size);
int btrfs_remove_free_space(struct btrfs_block_group_cache *block_group,
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment