Commit 29d249ed authored by Greg Kroah-Hartman's avatar Greg Kroah-Hartman

Staging: dst: remove from the tree

DST is dead, no one is using it and upstream
has abandoned it, so remove it from the tree because
it is not going anywhere.
Acked-by: default avatarEvgeniy Polyakov <zbr@ioremap.net>
Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
parent d7edf479
......@@ -87,8 +87,6 @@ source "drivers/staging/frontier/Kconfig"
source "drivers/staging/dream/Kconfig"
source "drivers/staging/dst/Kconfig"
source "drivers/staging/pohmelfs/Kconfig"
source "drivers/staging/b3dfg/Kconfig"
......
......@@ -26,7 +26,6 @@ obj-$(CONFIG_RTL8192E) += rtl8192e/
obj-$(CONFIG_INPUT_MIMIO) += mimio/
obj-$(CONFIG_TRANZPORT) += frontier/
obj-$(CONFIG_DREAM) += dream/
obj-$(CONFIG_DST) += dst/
obj-$(CONFIG_POHMELFS) += pohmelfs/
obj-$(CONFIG_B3DFG) += b3dfg/
obj-$(CONFIG_IDE_PHISON) += phison/
......
config DST
tristate "Distributed storage"
depends on NET && CRYPTO && SYSFS && BLK_DEV
select CONNECTOR
---help---
DST is a network block device storage, which can be used to organize
exported storage on the remote nodes into the local block device.
DST works on top of any network media and protocol; it is just a matter
of configuration utility to understand the correct addresses. The most
common example is TCP over IP, which allows to pass through firewalls and
create remote backup storage in a different datacenter. DST requires
single port to be enabled on the exporting node and outgoing connections
on the local node.
DST works with in-kernel client and server, which improves performance by
eliminating unneded data copies and by not depending on the version
of the external IO components. It requires userspace configuration utility
though.
DST uses transaction model, when each store has to be explicitly acked
from the remote node to be considered as successfully written. There
may be lots of in-flight transactions. When remote host does not ack
the transaction it will be resent predefined number of times with specified
timeouts between them. All those parameters are configurable. Transactions
are marked as failed after all resends complete unsuccessfully; having
long enough resend timeout and/or large number of resends allows not to
return error to the higher (FS usually) layer in case of short network
problems or remote node outages. In case of network RAID setup this means
that storage will not degrade until transactions are marked as failed, and
thus will not force checksum recalculation and data rebuild. In case of
connection failure DST will try to reconnect to the remote node automatically.
DST sends ping commands at idle time to detect if remote node is alive.
Because of transactional model it is possible to use zero-copy sending
without worry of data corruption (which in turn could be detected by the
strong checksums though).
DST may fully encrypt the data channel in case of untrusted channel and implement
strong checksum of the transferred data. It is possible to configure algorithms
and crypto keys; they should match on both sides of the network channel.
Crypto processing does not introduce noticeble performance overhead, since DST
uses configurable pool of threads to perform crypto processing.
DST utilizes memory pool model of all its transaction allocations (it is the
only additional allocation on the client) and server allocations (bio pools,
while pages are allocated from the slab).
At startup DST performs a simple negotiation with the export node to determine
access permissions and size of the exported storage. It can be extended if
new parameters should be autonegotiated.
DST carries block IO flags in the protocol, which allows to transparently implement
barriers and sync/flush operations. Those flags are used in the export node where
IO against the local storage is performed, which means that sync write will be sync
on the remote node too, which in turn improves data integrity and improved resistance
to errors and data corruption during power outages or storage damages.
Homepage: http://www.ioremap.net/projects/dst
Userspace configuration utility and the latest releases: http://www.ioremap.net/archive/dst/
config DST_DEBUG
bool "DST debug"
depends on DST
---help---
This option will enable HEAVY debugging of the DST.
Turn it on ONLY if you have to debug some really obscure problem.
obj-$(CONFIG_DST) += nst.o
nst-y := dcore.o state.o export.o thread_pool.o crypto.o trans.o
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
/*
* 2007+ Copyright (c) Evgeniy Polyakov <zbr@ioremap.net>
* All rights reserved.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*/
#include <linux/kernel.h>
#include <linux/dst.h>
#include <linux/kthread.h>
#include <linux/slab.h>
/*
* Thread pool abstraction allows to schedule a work to be performed
* on behalf of kernel thread. One does not operate with threads itself,
* instead user provides setup and cleanup callbacks for thread pool itself,
* and action and cleanup callbacks for each submitted work.
*
* Each worker has private data initialized at creation time and data,
* provided by user at scheduling time.
*
* When action is being performed, thread can not be used by other users,
* instead they will sleep until there is free thread to pick their work.
*/
struct thread_pool_worker {
struct list_head worker_entry;
struct task_struct *thread;
struct thread_pool *pool;
int error;
int has_data;
int need_exit;
unsigned int id;
wait_queue_head_t wait;
void *private;
void *schedule_data;
int (*action)(void *private, void *schedule_data);
void (*cleanup)(void *private);
};
static void thread_pool_exit_worker(struct thread_pool_worker *w)
{
kthread_stop(w->thread);
w->cleanup(w->private);
kfree(w);
}
/*
* Called to mark thread as ready and allow users to schedule new work.
*/
static void thread_pool_worker_make_ready(struct thread_pool_worker *w)
{
struct thread_pool *p = w->pool;
mutex_lock(&p->thread_lock);
if (!w->need_exit) {
list_move_tail(&w->worker_entry, &p->ready_list);
w->has_data = 0;
mutex_unlock(&p->thread_lock);
wake_up(&p->wait);
} else {
p->thread_num--;
list_del(&w->worker_entry);
mutex_unlock(&p->thread_lock);
thread_pool_exit_worker(w);
}
}
/*
* Thread action loop: waits until there is new work.
*/
static int thread_pool_worker_func(void *data)
{
struct thread_pool_worker *w = data;
while (!kthread_should_stop()) {
wait_event_interruptible(w->wait,
kthread_should_stop() || w->has_data);
if (kthread_should_stop())
break;
if (!w->has_data)
continue;
w->action(w->private, w->schedule_data);
thread_pool_worker_make_ready(w);
}
return 0;
}
/*
* Remove single worker without specifying which one.
*/
void thread_pool_del_worker(struct thread_pool *p)
{
struct thread_pool_worker *w = NULL;
while (!w && p->thread_num) {
wait_event(p->wait, !list_empty(&p->ready_list) ||
!p->thread_num);
dprintk("%s: locking list_empty: %d, thread_num: %d.\n",
__func__, list_empty(&p->ready_list),
p->thread_num);
mutex_lock(&p->thread_lock);
if (!list_empty(&p->ready_list)) {
w = list_first_entry(&p->ready_list,
struct thread_pool_worker,
worker_entry);
dprintk("%s: deleting w: %p, thread_num: %d, "
"list: %p [%p.%p].\n", __func__,
w, p->thread_num, &p->ready_list,
p->ready_list.prev, p->ready_list.next);
p->thread_num--;
list_del(&w->worker_entry);
}
mutex_unlock(&p->thread_lock);
}
if (w)
thread_pool_exit_worker(w);
dprintk("%s: deleted w: %p, thread_num: %d.\n",
__func__, w, p->thread_num);
}
/*
* Remove a worker with given ID.
*/
void thread_pool_del_worker_id(struct thread_pool *p, unsigned int id)
{
struct thread_pool_worker *w;
int found = 0;
mutex_lock(&p->thread_lock);
list_for_each_entry(w, &p->ready_list, worker_entry) {
if (w->id == id) {
found = 1;
p->thread_num--;
list_del(&w->worker_entry);
break;
}
}
if (!found) {
list_for_each_entry(w, &p->active_list, worker_entry) {
if (w->id == id) {
w->need_exit = 1;
break;
}
}
}
mutex_unlock(&p->thread_lock);
if (found)
thread_pool_exit_worker(w);
}
/*
* Add new worker thread with given parameters.
* If initialization callback fails, return error.
*/
int thread_pool_add_worker(struct thread_pool *p,
char *name,
unsigned int id,
void *(*init)(void *private),
void (*cleanup)(void *private),
void *private)
{
struct thread_pool_worker *w;
int err = -ENOMEM;
w = kzalloc(sizeof(struct thread_pool_worker), GFP_KERNEL);
if (!w)
goto err_out_exit;
w->pool = p;
init_waitqueue_head(&w->wait);
w->cleanup = cleanup;
w->id = id;
w->thread = kthread_run(thread_pool_worker_func, w, "%s", name);
if (IS_ERR(w->thread)) {
err = PTR_ERR(w->thread);
goto err_out_free;
}
w->private = init(private);
if (IS_ERR(w->private)) {
err = PTR_ERR(w->private);
goto err_out_stop_thread;
}
mutex_lock(&p->thread_lock);
list_add_tail(&w->worker_entry, &p->ready_list);
p->thread_num++;
mutex_unlock(&p->thread_lock);
return 0;
err_out_stop_thread:
kthread_stop(w->thread);
err_out_free:
kfree(w);
err_out_exit:
return err;
}
/*
* Destroy the whole pool.
*/
void thread_pool_destroy(struct thread_pool *p)
{
while (p->thread_num) {
dprintk("%s: num: %d.\n", __func__, p->thread_num);
thread_pool_del_worker(p);
}
kfree(p);
}
/*
* Create a pool with given number of threads.
* They will have sequential IDs started from zero.
*/
struct thread_pool *thread_pool_create(int num, char *name,
void *(*init)(void *private),
void (*cleanup)(void *private),
void *private)
{
struct thread_pool_worker *w, *tmp;
struct thread_pool *p;
int err = -ENOMEM;
int i;
p = kzalloc(sizeof(struct thread_pool), GFP_KERNEL);
if (!p)
goto err_out_exit;
init_waitqueue_head(&p->wait);
mutex_init(&p->thread_lock);
INIT_LIST_HEAD(&p->ready_list);
INIT_LIST_HEAD(&p->active_list);
p->thread_num = 0;
for (i = 0; i < num; ++i) {
err = thread_pool_add_worker(p, name, i, init,
cleanup, private);
if (err)
goto err_out_free_all;
}
return p;
err_out_free_all:
list_for_each_entry_safe(w, tmp, &p->ready_list, worker_entry) {
list_del(&w->worker_entry);
thread_pool_exit_worker(w);
}
kfree(p);
err_out_exit:
return ERR_PTR(err);
}
/*
* Schedule execution of the action on a given thread,
* provided ID pointer has to match previously stored
* private data.
*/
int thread_pool_schedule_private(struct thread_pool *p,
int (*setup)(void *private, void *data),
int (*action)(void *private, void *data),
void *data, long timeout, void *id)
{
struct thread_pool_worker *w, *tmp, *worker = NULL;
int err = 0;
while (!worker && !err) {
timeout = wait_event_interruptible_timeout(p->wait,
!list_empty(&p->ready_list),
timeout);
if (!timeout) {
err = -ETIMEDOUT;
break;
}
worker = NULL;
mutex_lock(&p->thread_lock);
list_for_each_entry_safe(w, tmp, &p->ready_list, worker_entry) {
if (id && id != w->private)
continue;
worker = w;
list_move_tail(&w->worker_entry, &p->active_list);
err = setup(w->private, data);
if (!err) {
w->schedule_data = data;
w->action = action;
w->has_data = 1;
wake_up(&w->wait);
} else {
list_move_tail(&w->worker_entry,
&p->ready_list);
}
break;
}
mutex_unlock(&p->thread_lock);
}
return err;
}
/*
* Schedule execution on arbitrary thread from the pool.
*/
int thread_pool_schedule(struct thread_pool *p,
int (*setup)(void *private, void *data),
int (*action)(void *private, void *data),
void *data, long timeout)
{
return thread_pool_schedule_private(p, setup,
action, data, timeout, NULL);
}
/*
* 2007+ Copyright (c) Evgeniy Polyakov <zbr@ioremap.net>
* All rights reserved.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*/
#include <linux/bio.h>
#include <linux/dst.h>
#include <linux/slab.h>
#include <linux/mempool.h>
/*
* Transaction memory pool size.
*/
static int dst_mempool_num = 32;
module_param(dst_mempool_num, int, 0644);
/*
* Transaction tree management.
*/
static inline int dst_trans_cmp(dst_gen_t gen, dst_gen_t new)
{
if (gen < new)
return 1;
if (gen > new)
return -1;
return 0;
}
struct dst_trans *dst_trans_search(struct dst_node *node, dst_gen_t gen)
{
struct rb_root *root = &node->trans_root;
struct rb_node *n = root->rb_node;
struct dst_trans *t, *ret = NULL;
int cmp;
while (n) {
t = rb_entry(n, struct dst_trans, trans_entry);
cmp = dst_trans_cmp(t->gen, gen);
if (cmp < 0)
n = n->rb_left;
else if (cmp > 0)
n = n->rb_right;
else {
ret = t;
break;
}
}
dprintk("%s: %s transaction: id: %llu.\n", __func__,
(ret) ? "found" : "not found", gen);
return ret;
}
static int dst_trans_insert(struct dst_trans *new)
{
struct rb_root *root = &new->n->trans_root;
struct rb_node **n = &root->rb_node, *parent = NULL;
struct dst_trans *ret = NULL, *t;
int cmp;
while (*n) {
parent = *n;
t = rb_entry(parent, struct dst_trans, trans_entry);
cmp = dst_trans_cmp(t->gen, new->gen);
if (cmp < 0)
n = &parent->rb_left;
else if (cmp > 0)
n = &parent->rb_right;
else {
ret = t;
break;
}
}
new->send_time = jiffies;
if (ret) {
printk(KERN_DEBUG "%s: exist: old: gen: %llu, bio: %llu/%u, "
"send_time: %lu, new: gen: %llu, bio: %llu/%u, "
"send_time: %lu.\n", __func__,
ret->gen, (u64)ret->bio->bi_sector,
ret->bio->bi_size, ret->send_time,
new->gen, (u64)new->bio->bi_sector,
new->bio->bi_size, new->send_time);
return -EEXIST;
}
rb_link_node(&new->trans_entry, parent, n);
rb_insert_color(&new->trans_entry, root);
dprintk("%s: inserted: gen: %llu, bio: %llu/%u, send_time: %lu.\n",
__func__, new->gen, (u64)new->bio->bi_sector,
new->bio->bi_size, new->send_time);
return 0;
}
int dst_trans_remove_nolock(struct dst_trans *t)
{
struct dst_node *n = t->n;
if (t->trans_entry.rb_parent_color) {
rb_erase(&t->trans_entry, &n->trans_root);
t->trans_entry.rb_parent_color = 0;
}
return 0;
}
int dst_trans_remove(struct dst_trans *t)
{
int ret;
struct dst_node *n = t->n;
mutex_lock(&n->trans_lock);
ret = dst_trans_remove_nolock(t);
mutex_unlock(&n->trans_lock);
return ret;
}
/*
* When transaction is completed and there are no more users,
* we complete appriate block IO request with given error status.
*/
void dst_trans_put(struct dst_trans *t)
{
if (atomic_dec_and_test(&t->refcnt)) {
struct bio *bio = t->bio;
dprintk("%s: completed t: %p, gen: %llu, bio: %p.\n",
__func__, t, t->gen, bio);
bio_endio(bio, t->error);
bio_put(bio);
dst_node_put(t->n);
mempool_free(t, t->n->trans_pool);
}
}
/*
* Process given block IO request: allocate transaction, insert it into the tree
* and send/schedule crypto processing.
*/
int dst_process_bio(struct dst_node *n, struct bio *bio)
{
struct dst_trans *t;
int err = -ENOMEM;
t = mempool_alloc(n->trans_pool, GFP_NOFS);
if (!t)
goto err_out_exit;
t->n = dst_node_get(n);
t->bio = bio;
t->error = 0;
t->retries = 0;
atomic_set(&t->refcnt, 1);
t->gen = atomic_long_inc_return(&n->gen);
t->enc = bio_data_dir(bio);
dst_bio_to_cmd(bio, &t->cmd, DST_IO, t->gen);
mutex_lock(&n->trans_lock);
err = dst_trans_insert(t);
mutex_unlock(&n->trans_lock);
if (err)
goto err_out_free;
dprintk("%s: gen: %llu, bio: %llu/%u, dir/enc: %d, need_crypto: %d.\n",
__func__, t->gen, (u64)bio->bi_sector,
bio->bi_size, t->enc, dst_need_crypto(n));
if (dst_need_crypto(n) && t->enc)
dst_trans_crypto(t);
else
dst_trans_send(t);
return 0;
err_out_free:
dst_node_put(n);
mempool_free(t, n->trans_pool);
err_out_exit:
bio_endio(bio, err);
bio_put(bio);
return err;
}
/*
* Scan for timeout/stale transactions.
* Each transaction is being resent multiple times before error completion.
*/
static void dst_trans_scan(struct work_struct *work)
{
struct dst_node *n = container_of(work, struct dst_node,
trans_work.work);
struct rb_node *rb_node;
struct dst_trans *t;
unsigned long timeout = n->trans_scan_timeout;
int num = 10 * n->trans_max_retries;
mutex_lock(&n->trans_lock);
for (rb_node = rb_first(&n->trans_root); rb_node; ) {
t = rb_entry(rb_node, struct dst_trans, trans_entry);
if (timeout && time_after(t->send_time + timeout, jiffies)
&& t->retries == 0)
break;
#if 0
dprintk("%s: t: %p, gen: %llu, n: %s, retries: %u, max: %u.\n",
__func__, t, t->gen, n->name,
t->retries, n->trans_max_retries);
#endif
if (--num == 0)
break;
dst_trans_get(t);
rb_node = rb_next(rb_node);
if (timeout && (++t->retries < n->trans_max_retries)) {
dst_trans_send(t);
} else {
t->error = -ETIMEDOUT;
dst_trans_remove_nolock(t);
dst_trans_put(t);
}
dst_trans_put(t);
}
mutex_unlock(&n->trans_lock);
/*
* If no timeout specified then system is in the middle of exiting
* process, so no need to reschedule scanning process again.
*/
if (timeout) {
if (!num)
timeout = HZ;
schedule_delayed_work(&n->trans_work, timeout);
}
}
/*
* Flush all transactions and mark them as timed out.
* Destroy transaction pools.
*/
void dst_node_trans_exit(struct dst_node *n)
{
struct dst_trans *t;
struct rb_node *rb_node;
if (!n->trans_cache)
return;
dprintk("%s: n: %p, cancelling the work.\n", __func__, n);
cancel_delayed_work_sync(&n->trans_work);
flush_scheduled_work();
dprintk("%s: n: %p, work has been cancelled.\n", __func__, n);
for (rb_node = rb_first(&n->trans_root); rb_node; ) {
t = rb_entry(rb_node, struct dst_trans, trans_entry);
dprintk("%s: t: %p, gen: %llu, n: %s.\n",
__func__, t, t->gen, n->name);
rb_node = rb_next(rb_node);
t->error = -ETIMEDOUT;
dst_trans_remove_nolock(t);
dst_trans_put(t);
}
mempool_destroy(n->trans_pool);
kmem_cache_destroy(n->trans_cache);
}
/*
* Initialize transaction storage for given node.
* Transaction stores not only control information,
* but also network command and crypto data (if needed)
* to reduce number of allocations. Thus transaction size
* differs from node to node.
*/
int dst_node_trans_init(struct dst_node *n, unsigned int size)
{
/*
* We need this, since node with given name can be dropped from the
* hash table, but be still alive, so subsequent creation of the node
* with the same name may collide with existing cache name.
*/
snprintf(n->cache_name, sizeof(n->cache_name), "%s-%p", n->name, n);
n->trans_cache = kmem_cache_create(n->cache_name,
size + n->crypto.crypto_attached_size,
0, 0, NULL);
if (!n->trans_cache)
goto err_out_exit;
n->trans_pool = mempool_create_slab_pool(dst_mempool_num,
n->trans_cache);
if (!n->trans_pool)
goto err_out_cache_destroy;
mutex_init(&n->trans_lock);
n->trans_root = RB_ROOT;
INIT_DELAYED_WORK(&n->trans_work, dst_trans_scan);
schedule_delayed_work(&n->trans_work, n->trans_scan_timeout);
dprintk("%s: n: %p, size: %u, crypto: %u.\n",
__func__, n, size, n->crypto.crypto_attached_size);
return 0;
err_out_cache_destroy:
kmem_cache_destroy(n->trans_cache);
err_out_exit:
return -ENOMEM;
}
This diff is collapsed.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment