• Martin Peschke's avatar
    [SCSI] zfcp: fix cleanup of dismissed error recovery actions · 86e8dfc5
    Martin Peschke authored
    Calling zfcp_erp_strategy_check_action() after zfcp_erp_action_to_running()
    in zfcp_erp_strategy() might cause an unbalanced up() for erp_ready_sem,
    which makes the zfcp recovery fail somewhere along the way:
    
    erp thread processing erp_action:
    |
    |	someone waking up erp thread for erp_action
    |	|
    |	|		someone else dismissing erp_action:
    |	|		|
    V	V		V
    
    	write_lock_irqsave(&adapter->erp_lock, flags);
    	...
    	if (zfcp_erp_action_exists(erp_action) == ZFCP_ERP_ACTION_RUNNING) {
    		zfcp_erp_action_to_ready(erp_action);
    		up(&adapter->erp_ready_sem);	/* first up() for erp_action */
    	}
    	write_unlock_irqrestore(&adapter->erp_lock, flags);
    
    write_lock_irqsave(&adapter->erp_lock, flags);
    ...
    zfcp_erp_action_to_running(erp_action);
    write_unlock_restore(&adapter->erp_lock, flags);
    /* processing erp_action */
    
    			write_lock_irqsave(&adapter->erp_lock, flags);
    			...
    			erp_action->status |= ZFCP_STATUS_ERP_DISMISSED;
    			if (zfcp_erp_action_exists(erp_action) ==
    						ZFCP_ERP_ACTION_RUNNING) {
    				zfcp_erp_action_to_ready(erp_action);
    				up(&adapter->erp_ready_sem);
    				/* second, unbalanced up() for erp_action */
    			}
    			...
    			write_unlock_restore(&adapter->erp_lock, flags);
    
    write_lock_irqsave(&adapter->erp_lock, flags);
    if (erp_action->status & ZFCP_STATUS_ERP_DISMISSED) {
    	zfcp_erp_action_dequeue(erp_action);
    	retval = ZFCP_ERP_DISMISSED;
    }
    ...
    write_unlock_restore(&adapter->erp_lock, flags);
    down(&adapter->erp_ready_sem);
    /* this down() is meant to balance the first up() */
    
    The erp thread must not dismiss an erp_action after moving that action to
    erp_running_head. Instead it should just go through the down() operation,
    which balances the first up(), and run through zfcp_erp_strategy one more
    time for the second up(), which eventually cleans up erp_action. Which
    is similar to the normal processing of an event for erp_action doing
    something asynchronously (e.g. waiting for the completion of an fsf_req).
    
    This only works if we make sure that a dismissed erp_action is passed to
    zfcp_erp_strategy() prior to the other action, which caused actions to be
    dismissed. Therefore the patch implements this rule: running actions go to
    the head of the ready list; new actions go to the tail of the ready list;
    the erp thread picks actions to be processed from the ready list's head.
    Signed-off-by: default avatarMartin Peschke <mp3@de.ibm.com>
    Acked-by: default avatarSwen Schillig <swen@vnet.ibm.com>
    Signed-off-by: default avatarJames Bottomley <James.Bottomley@HansenPartnership.com>
    86e8dfc5
zfcp_erp.c 97 KB