BIO和Request的处理流程

背景

宋宝华: 文件读写(BIO)波澜壮阔的一生一文对BIO和Block层的描述高屋建瓴,通俗易懂,通过此文可以对Block层的有一个非常清晰的整体认识。不过要结合代码实现发现还有很多不明白的地方,因为Block层的代码写的精妙绝伦,但也晦涩难懂,因为Block层的代码既要通用又要高效,所以很多函数的执行路径和上下文都有多种,简直没有最复杂只有更复杂。现在为实现XXX功能,需要对Block里面的代码进行修改,需要先熟悉里面的代码流程,本文是对宋宝华老师的文中描述内容的扩展,本文也秉承宋老师的尽量简化原则,侧重代码流程不涉及过多的原理,主要描述一个实际例子的执行路径,不涉及过多的调用分支,忽略一些异常处理流程。

本文涉及的代码平台是Linux 4.19,块设备是UFS,文件系统是F2FS。BIO、Request的最终会转换为UFS的命令在UFS器件内执行,执行完毕后的结果会以中断的形式通知给Host,数据都是已DMA的方式进行传输。UFS器件支持Command Queue,即以异步方式同时处理多个Host传来的命令,命令执行完毕后给Host发送中断,命令完成顺序和接收顺序不保证完全一致。

本文把BIO、Request的处理流程分为如下2个阶段:

  • 发送过程处理:从BIO、Request的生成开始,到给器件发送UFS命令结束

  • 回调过程处理:从UFS命令执行完毕,到Request、BIO回调执行,资源销毁结束。

发送过程处理

发送阶段包含了宋老师原文中的原地蓄势,电梯排序,分发执行三个阶段,以f2fs_sync_meta_pages为例说明连续发送多个bio的处理流程,详见图1 BIO、Request发送处理流程图。

实例代码


long f2fs_sync_meta_pages(struct f2fs_sb_info *sbi, enum page_type type,
                                long nr_to_write, enum iostat_type io_type)
{
        struct address_space *mapping = META_MAPPING(sbi);
        pgoff_t index = 0, prev = ULONG_MAX;
        struct pagevec pvec;
        long nwritten = 0;
        int nr_pages;
        struct writeback_control wbc = {
                .for_reclaim = 0,
        };
        定义一个blk_plug变量,plug
        struct blk_plug plug;

        pagevec_init(&pvec);
        把plug变量赋值给current->plug
        blk_start_plug(&plug);

        while ((nr_pages = pagevec_lookup_tag(&pvec, mapping, &index,
                                PAGECACHE_TAG_DIRTY))) {
                int i;

                for (i = 0; i < nr_pages; i++) {
                        struct page *page = pvec.pages[i];

                        if (prev == ULONG_MAX)
                                prev = page->index - 1;
                        if (nr_to_write != LONG_MAX && page->index != prev + 1) {
                                pagevec_release(&pvec);
                                goto stop;
                        }

                        lock_page(page);

                        if (unlikely(page->mapping != mapping)) {
continue_unlock:
                                unlock_page(page);
                                continue;
                        }
                        if (!PageDirty(page)) {
                                /* someone wrote it for us */
                                goto continue_unlock;
                        }

                        f2fs_wait_on_page_writeback(page, META, true, true);

                        if (!clear_page_dirty_for_io(page))
                                goto continue_unlock;
                        此函数内部会分配BIO,设置BIO的属性,然后调用submit_bio
                        if (__f2fs_write_meta_page(page, &wbc, io_type)) {
                                unlock_page(page);
                                break;
                        }
                        nwritten++;
                        prev = page->index;
                        if (unlikely(nwritten >= nr_to_write))
                                break;
                }
                pagevec_release(&pvec);
                cond_resched();
        }
stop:
        if (nwritten)
                f2fs_submit_merged_write(sbi, type);
        结束current->plug
        blk_finish_plug(&plug);

        return nwritten;
}

调用栈

@k_stack[ufshcd_queuecommand,
    ufshcd_queuecommand+0
    scsi_request_fn+664
    __blk_run_queue+76
    queue_unplugged+224
    blk_flush_plug_list+524
    blk_queue_bio+632
    generic_make_request+372
    submit_bio+292
    __submit_merged_bio+1192
    f2fs_submit_page_write+544
    f2fs_do_write_meta_page+192
    __f2fs_write_meta_page+292
    f2fs_sync_meta_pages+356
    f2fs_write_checkpoint+1164
    f2fs_sync_fs+300
    f2fs_do_sync_file+1392
    f2fs_sync_file+84
    vfs_fsync_range+104
    do_fsync+68
    __arm64_sys_fsync+32
    el0_svc_common+164
    el0_svc_handler+124
    el0_svc+8
]: 16

调用流程图

bio_req_send.jpg

图1 BIO、Request发送处理流程图

回调过程处理

回调过程是UFS器件处理完毕命令后用中断通知Host,host通过中断处理函数和软中断完成BIO、Request的回调函数处理。处理过程分为中断上下文和软中断上下文,详见图2 BIO、Request回调处理流程图。

中断调用栈

调用栈就是在ISR里面的调用栈。

@k_stack[__blk_complete_request,
    __blk_complete_request+0
    scsi_done+184
    __ufshcd_transfer_req_compl+2800
    ufshcd_transfer_req_compl+112
    ufshcd_intr+1528
    __handle_irq_event_percpu+376
    handle_irq_event_percpu+52
    handle_irq_event+72
    handle_fasteoi_irq+168
    generic_handle_irq+48
    __handle_domain_irq+112
    gic_handle_irq+396

Softirq调用栈

调用栈就是在Softirq里面的调用栈。

@k_stack[blk_finish_request,
    blk_finish_request+0
    scsi_io_completion+144
    scsi_finish_command+200
    scsi_softirq_done+248
    blk_done_softirq+124
    __do_softirq+488

回调处理流程图

在ufshcd_send_command发送完UFS命令后,UFS开始处理命令,在处理完后触发中断,中断处理函数(ISR)和软中断处理函数(Softirq)触发各层级回调函数(cmd->scsi_done, req->end_io, bio->bio_endio)。

由于UFS内部命令是并行处理,命令完成顺序和命令到来顺序不能保证一致,所以中断到来的顺序和发送命令的顺序不能保证一致。

bio_req_callback.jpg

图2 BIO、Request回调处理流程图

发送过程主要函数

blk_start_plug

blk_start_plug是可选的,只有在连续发送多个BIO请求,进行原地蓄势时调用,必须和blk_finish_plug配对使用,在blk_start_plug和blk_finish_plug之间存在多个submit_bio操作,在蓄势阶段可以对BIO进行合并。blk_start_plug就是初始化一个链表,然后把传入的struct blk_plug指针赋值给current->plug,这样可以避免参数传递,在submit_bio和blk_finish_plug里面都可以直接操作这个链表。


/**
 * blk_start_plug - initialize blk_plug and track it inside the task_struct
 * @plug:        The &struct blk_plug that needs to be initialized
 *
 * Description:
 *   Tracking blk_plug inside the task_struct will help with auto-flushing the
 *   pending I/O should the task end up blocking between blk_start_plug() and
 *   blk_finish_plug(). This is important from a performance perspective, but
 *   also ensures that we don't deadlock. For instance, if the task is blocking
 *   for a memory allocation, memory reclaim could end up wanting to free a
 *   page belonging to that request that is currently residing in our private
 *   plug. By flushing the pending I/O when the process goes to sleep, we avoid
 *   this kind of deadlock.
 */
void blk_start_plug(struct blk_plug *plug)
{
        struct task_struct *tsk = current;

        /*
         * If this is a nested plug, don't actually assign it.
         */
        if (tsk->plug)
                return;

        INIT_LIST_HEAD(&plug->list);
        INIT_LIST_HEAD(&plug->mq_list);
        INIT_LIST_HEAD(&plug->cb_list);
        /*
         * Store ordering should not be needed here, since a potential
         * preempt will imply a full memory barrier
         */
        tsk->plug = plug;
}

submit_bio

submit_bio是提交bio的总的接口,在块层之上,即文件系统层根据实际场景有很多对submit_bio的不同封装,submit_bio主要调用generic_make_request把bio转换成request。


/**
 * submit_bio - submit a bio to the block device layer for I/O
 * @bio: The &struct bio which describes the I/O
 *
 * submit_bio() is very similar in purpose to generic_make_request(), and
 * uses that function to do most of the work. Both are fairly rough
 * interfaces; @bio must be presetup and ready for I/O.
 *
 */
blk_qc_t submit_bio(struct bio *bio)
{
        。。。
        return generic_make_request(bio);
}

generic_make_request

这个函数很难懂,只关注会继续调用q->make_request_fn,本例中实际的函数是blk_queue_bio


/**
 * generic_make_request - hand a buffer to its device driver for I/O
 * @bio:  The bio describing the location in memory and on the device.
 *
 * generic_make_request() is used to make I/O requests of block
 * devices. It is passed a &struct bio, which describes the I/O that needs
 * to be done.
 *
 * generic_make_request() does not return any status.  The
 * success/failure status of the request, along with notification of
 * completion, is delivered asynchronously through the bio->bi_end_io
 * function described (one day) else where.
 *
 * The caller of generic_make_request must make sure that bi_io_vec
 * are set to describe the memory buffer, and that bi_dev and bi_sector are
 * set to describe the device address, and the
 * bi_end_io and optionally bi_private are set to describe how
 * completion notification should be signaled.
 *
 * generic_make_request and the drivers it calls may use bi_next if this
 * bio happens to be merged with someone else, and may resubmit the bio to
 * a lower device by calling into generic_make_request recursively, which
 * means the bio should NOT be touched after the call to ->make_request_fn.
 */
blk_qc_t generic_make_request(struct bio *bio)
{
    。。。
    ret = q->make_request_fn(q, bio);
    。。。
}

blk_queue_bio

在生成Request前,先当前BIO进行一些处理看看是否需要分为多个BIO处理等,然后还会判断当前BIO访问地址是否可以和电梯算法里排队的Request进行合并。如果当前BIO不能合并,则获取一个新的request,并用bio给新的Request赋值,如果使能了plug功能,则把当前request加入到plug链表的尾部,否则调用__blk_run_queue处理request。经过此函数之后,bio已经转换为request了,struct request的bio变量指向相关的bio。


static blk_qc_t blk_queue_bio(struct request_queue *q, struct bio *bio)
{
       。。。
        req = get_request(q, bio->bi_opf, bio, 0, GFP_NOIO);
       。。。
        blk_init_request_from_bio(req, bio);
       。。。
        plug = current->plug;
        if (plug) {
                。。。
                list_add_tail(&req->queuelist, &plug->list);
                。。。
        } else {
                add_acct_request(q, req, where);
                __blk_run_queue(q);
                。。。
        }
        。。。
}

blk_finish_plug


void blk_finish_plug(struct blk_plug *plug)
{
        if (plug != current->plug)
                return;
        blk_flush_plug_list(plug, false);

        current->plug = NULL;
}

blk_flush_plug_list


void blk_flush_plug_list(struct blk_plug *plug, bool from_schedule)
{
        。。。
       执行blk_plug->cb_list的所有callback处理 
        flush_plug_callbacks(plug, from_schedule);
        。。。
        while (!list_empty(&list)) {
                。。。
                /*
                 * rq is already accounted, so use raw insert
                 */
                把Request加入到request_queue里面,根据rq->cmd_flags标志,有些request直接就执行
                有些request会等到下面queue_unplugged一起执行
                if (op_is_flush(rq->cmd_flags))
                        __elv_add_request(q, rq, ELEVATOR_INSERT_FLUSH);
                else
                        __elv_add_request(q, rq, ELEVATOR_INSERT_SORT_MERGE);

        }

        /*
         * This drops the queue lock
         */
        if (q)
                unplugged操作,开始泄洪,批量处理request
                queue_unplugged(q, depth, from_schedule);
}

queue_unplugged


/*
 * If 'from_schedule' is true, then postpone the dispatch of requests
 * until a safe kblockd context. We due this to avoid accidental big
 * additional stack usage in driver dispatch, in places where the originally
 * plugger did not intend it.
 */
static void queue_unplugged(struct request_queue *q, unsigned int depth,
                            bool from_schedule)
        __releases(q->queue_lock)
{
        。。。
        if (from_schedule)
                blk_run_queue_async(q);
        else
                __blk_run_queue(q);
}

__blk_run_queue


/**
 * __blk_run_queue - run a single device queue
 * @q:        The queue to run
 *
 * Description:
 *    Invoke request handling on this queue, if it has pending work to do.
 *    May be used to restart queueing when a request has completed.
 */
void __blk_run_queue(struct request_queue *q)
{
    。。。
    __blk_run_queue_uncond(q);
}

__blk_run_queue_uncond

本例中调用的是scsi_request_fn


/**
 * __blk_run_queue_uncond - run a queue whether or not it has been stopped
 * @q:        The queue to run
 *
 * Description:
 *    Invoke request handling on a queue if there are any pending requests.
 *    May be used to restart request handling after a request has completed.
 *    This variant runs the queue whether or not the queue has been
 *    stopped. Must be called with the queue lock held and interrupts
 *    disabled. See also @blk_run_queue.
 */
inline void __blk_run_queue_uncond(struct request_queue *q)
{
        。。。
        q->request_fn_active++;
        q->request_fn(q); 
        q->request_fn_active--;
}

scsi_request_fn

SCSI层处理的总的流程,不断循环处理

  • 从request_queue中取出一个经过电梯算法排序的request

  • 启动一个Timer,处理request的Timeout

  • 从request获取对应的struct scsi_cmnd的实例

  • 处理scsi命令


/*
 * Function:    scsi_request_fn()
 *
 * Purpose:     Main strategy routine for SCSI.
 *
 * Arguments:   q       - Pointer to actual queue.
 *
 * Returns:     Nothing
 *
 * Lock status: request queue lock assumed to be held when called.
 *
 * Note: See sd_zbc.c sd_zbc_write_lock_zone() for write order
 * protection for ZBC disks.
 */
static void scsi_request_fn(struct request_queue *q)
        __releases(q->queue_lock)
        __acquires(q->queue_lock)
{
        。。。
        for (;;) {
               取出一个经过电梯算法排序的request,回调用到elevator_dispatch_fn函数
                req = blk_peek_request(q);
                。。。

                启动一个Timer处理request的Timeout 
                if (!(blk_queue_tagged(q) && !blk_queue_start_tag(q, req)))
                        blk_start_request(req);

                。。。
                获取一个SCSI命令
                cmd = blk_mq_rq_to_pdu(req);

                。。。
                处理SCSI命令                 
                cmd->scsi_done = scsi_done;
                rtn = scsi_dispatch_cmd(cmd);
        }        
}

blk_start_request


/**
 * blk_start_request - start request processing on the driver
 * @req: request to dequeue
 *
 * Description:
 *     Dequeue @req and start timeout timer on it.  This hands off the
 *     request to the driver.
 */
void blk_start_request(struct request *req)
{
        。。。
        request从request_queue上面取出
        blk_dequeue_request(req);
        。。。
        启动一个Timer,Timer的处理函数是blk_rq_timed_out_timer
        blk_add_timer(req);
}

blk_rq_timed_out_timer

static void blk_rq_timed_out_timer(struct timer_list *t)
{
        struct request_queue *q = from_timer(q, t, timeout);
        timeout后启动一个work处理,work处理函数是blk_timeout_work
        kblockd_schedule_work(&q->timeout_work);
}

int kblockd_schedule_work(struct work_struct *work)
{
        在kblockd_workqueue执行work
        return queue_work(kblockd_workqueue, work);
}

blk_timeout_work

void blk_timeout_work(struct work_struct *work)
{
        。。。
        执行timeout_list上的所有pending操作
        list_for_each_entry_safe(rq, tmp, &q->timeout_list, timeout_list)
                blk_rq_check_expired(rq, &next, &next_set);
        。。。
}

blk_rq_check_expired


static void blk_rq_check_expired(struct request *rq, unsigned long *next_timeout,
                          unsigned int *next_set)
{
        。。。
        blk_rq_timed_out(rq);
        。。。
}

blk_rq_timed_out

static void blk_rq_timed_out(struct request *req)
{
        。。。
        if (q->rq_timed_out_fn)
                ret = q->rq_timed_out_fn(req);     ---调用scsi_times_out
        。。。
}

scsi_times_out

scsi命令的通用timeout处理

enum blk_eh_timer_return scsi_times_out(struct request *req)
{
        。。。
        if (host->hostt->eh_timed_out)
                rtn = host->hostt->eh_timed_out(scmd);此处调用ufshcd_eh_timed_out
        。。。
}

scsi_dispatch_cmd

下发SCSI命令给底层,在这里面会调用到UFS底层函数ufshcd_queuecommand,至此完成从SCSI层到UFS的过渡。


/**
 * scsi_dispatch_command - Dispatch a command to the low-level driver.
 * @cmd: command block we are dispatching.
 *
 * Return: nonzero return request was rejected and device's queue needs to be
 * plugged.
 */
static int scsi_dispatch_cmd(struct scsi_cmnd *cmd)
{
        。。。
        rtn = host->hostt->queuecommand(host, cmd); 调用ufshcd_queuecommand
        。。。
}

ufshcd_queuecommand

ufshcd_queuecommand是SCSI到UFS的总入口,经过一系列的判断最终调用ufshcd_send_command来完成想UFS硬件发送命令的操作。


/**
 * ufshcd_queuecommand - main entry point for SCSI requests
 * @host: SCSI host pointer
 * @cmd: command from SCSI Midlayer
 *
 * Returns 0 for success, non-zero in case of failure
 */
static int ufshcd_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *cmd)
{
    。。。
    err = ufshcd_comp_scsi_upiu(hba, lrbp);
    。。。        
    err = ufshcd_map_sg(hba, lrbp);
    。。。        
     err = ufshcd_send_command(hba, tag);
    。。。        
}

回调过程主要函数

中断处理主要函数

梳理在ISR上下文运行的主要函数,只关注主要流程,忽略一些错误处理和本例中未执行到的分支。

ufshcd_intr

ufs的硬件中断总入口,读取中断状态寄存器的值,进行一些中断响应的通用操作,然后调用ufshcd_sl_intr


/**
 * ufshcd_intr - Main interrupt service routine
 * @irq: irq number
 * @__hba: pointer to adapter instance
 *
 * Returns IRQ_HANDLED - If interrupt is valid
 *                IRQ_NONE - If invalid interrupt
 */
static irqreturn_t ufshcd_intr(int irq, void *__hba)
{
    。。。
     ufshcd_sl_intr(hba, enabled_intr_status);
    。。。 
}

ufshcd_sl_intr

根据不同的中断标志(UFSHCD_UIC_MASK,UTP_TASK_REQ_COMPL,UTP_TRANSFER_REQ_COMPL),调用不同的处理函数,在本例中scsi命令会触发ufshcd_transfer_req_compl执行。


/**
 * ufshcd_sl_intr - Interrupt service routine
 * @hba: per adapter instance
 * @intr_status: contains interrupts generated by the controller
 *
 * Returns
 *  IRQ_HANDLED - If interrupt is valid
 *  IRQ_NONE    - If invalid interrupt
 */
static irqreturn_t ufshcd_sl_intr(struct ufs_hba *hba, u32 intr_status)
{
    。。。
    if (intr_status & UFSHCD_UIC_MASK)
            retval |= ufshcd_uic_cmd_compl(hba, intr_status);

    if (intr_status & UTP_TASK_REQ_COMPL)
            retval |= ufshcd_tmc_handler(hba);

    if (intr_status & UTP_TRANSFER_REQ_COMPL)
            retval |= ufshcd_transfer_req_compl(hba);
    。。。
}

ufshcd_transfer_req_compl

UTP_TRANSFER_REQ_COMPL中断的处理函数,调用 __ufshcd_transfer_req_compl


/**
 * ufshcd_transfer_req_compl - handle SCSI and query command completion
 * @hba: per adapter instance
 *
 * Returns
 *  IRQ_HANDLED - If interrupt is valid
 *  IRQ_NONE    - If invalid interrupt
 */
static irqreturn_t ufshcd_transfer_req_compl(struct ufs_hba *hba)
{
    。。。
     __ufshcd_transfer_req_compl(hba, completed_reqs);
}

__ufshcd_transfer_req_compl

修改struct ufshcd_lrb *lrbp的一些状态信息,调用scsi_done


/**
 * __ufshcd_transfer_req_compl - handle SCSI and query command completion
 * @hba: per adapter instance
 * @completed_reqs: requests to complete
 */
static void __ufshcd_transfer_req_compl(struct ufs_hba *hba,
                                        unsigned long completed_reqs)
{
    。。。
    /* Do not touch lrbp after scsi done */
    cmd->scsi_done(cmd);

}

scsi_done

UFS层的处理完成了,又回到了SCSI层,调用blk_complete_request


/**
 * scsi_done - Invoke completion on finished SCSI command.
 * @cmd: The SCSI Command for which a low-level device driver (LLDD) gives
 * ownership back to SCSI Core -- i.e. the LLDD has finished with it.
 *
 * Description: This function is the mid-level's (SCSI Core) interrupt routine,
 * which regains ownership of the SCSI command (de facto) from a LLDD, and
 * calls blk_complete_request() for further processing.
 *
 * This function is interrupt context safe.
 */
static void scsi_done(struct scsi_cmnd *cmd)
{
        trace_scsi_dispatch_cmd_done(cmd);
        blk_complete_request(cmd->request);
}

blk_complete_request

结束request上的所有IO请求


/**
 * blk_complete_request - end I/O on a request
 * @req:      the request being processed
 *
 * Description:
 *     Ends all I/O on a request. It does not handle partial completions,
 *     unless the driver actually implements this in its completion callback
 *     through requeueing. The actual completion happens out-of-order,
 *     through a softirq handler. The user must have registered a completion
 *     callback through blk_queue_softirq_done().
 **/
void blk_complete_request(struct request *req)
{
    。。。
    if (!blk_mark_rq_complete(req))
            __blk_complete_request(req);
}

__blk_complete_request

结束中断上下文的处理,然后触发软中断BLOCK_SOFTIRQ的执行,进入softirq的上下文中执行。


void __blk_complete_request(struct request *req)
{
        。。。
        BUG_ON(!q->softirq_done_fn);
        raise_softirq_irqoff(BLOCK_SOFTIRQ);退出中断处理,触发SoftIrq执行
}

软中断处理主要函数

因为块设备的中断频率极高,在中断处理函数里面执行过长时间又不合适,所以Linux针对块设备定义了软中断BLOCK_SOFTIRQ。

blk_softirq_init

blk_softirq_init完成了软中断的注册,处理函数是blk_done_softirq。

static __init int blk_softirq_init(void)
{
        int i;

        for_each_possible_cpu(i)
                INIT_LIST_HEAD(&per_cpu(blk_cpu_done, i));

        open_softirq(BLOCK_SOFTIRQ, blk_done_softirq);
        cpuhp_setup_state_nocalls(CPUHP_BLOCK_SOFTIRQ_DEAD,
                                  "block/softirq:dead", NULL,
                                  blk_softirq_cpu_dead);
        return 0;
}
subsys_initcall(blk_softirq_init);

blk_done_softirq

blk_done_softirq就是执行percpu上,所有request的request_queue的softirq_done_fn回调函数,在本例中是scsi_softirq_done。注意Softirq是可以在多核上同时执行的,因此添加回调函数需要注意。


/*
 * Softirq action handler - move entries to local list and loop over them
 * while passing them to the queue registered handler.
 */
static __latent_entropy void blk_done_softirq(struct softirq_action *h)
{
        struct list_head *cpu_list, local_list;

        local_irq_disable();
        cpu_list = this_cpu_ptr(&blk_cpu_done);
        list_replace_init(cpu_list, &local_list);
        local_irq_enable();

        while (!list_empty(&local_list)) {
                struct request *rq;

                rq = list_entry(local_list.next, struct request, ipi_list);
                list_del_init(&rq->ipi_list);
                rq->q->softirq_done_fn(rq);  ---本例中调用scsi_softirq_done
        }
}

scsi_softirq_done

q->softirq_done_fn指向的函数,是SCSI SoftIrq的入口,处理SCSI命令执行的结果。如果命令执行成功,会调用scsi_finish_command


static void scsi_softirq_done(struct request *rq)
{
    。。。
        switch (disposition) {
                case SUCCESS:
                        scsi_finish_command(cmd);
                        break;
                case NEEDS_RETRY:
                        scsi_queue_insert(cmd, SCSI_MLQUEUE_EH_RETRY);
                        break;
                case ADD_TO_MLQUEUE:
                        scsi_queue_insert(cmd, SCSI_MLQUEUE_DEVICE_BUSY);
                        break;
                default:
                        scsi_eh_scmd_add(cmd);
                        break;
        }
}

scsi_finish_command

处理了很多状态控制信息,然后调用scsi_io_completion


/**
 * scsi_finish_command - cleanup and pass command back to upper layer
 * @cmd: the command
 *
 * Description: Pass command off to upper layer for finishing of I/O
 *              request, waking processes that are waiting on results,
 *              etc.
 */
void scsi_finish_command(struct scsi_cmnd *cmd)
{
        。。。
        scsi_io_completion(cmd, good_bytes);
}

scsi_io_completion

调用scsi_end_request,有多个调用点,仅列出正常成功的调用分支


/*
 * Function:    scsi_io_completion()
 *
 * Purpose:     Completion processing for block device I/O requests.
 *
 * Arguments:   cmd   - command that is finished.
 *
 * Lock status: Assumed that no lock is held upon entry.
 *
 * Returns:     Nothing
 *
 * Notes:       We will finish off the specified number of sectors.  If we
 *                are done, the command block will be released and the queue
 *                function will be goosed.  If we are not done then we have to
 *                figure out what to do next:
 *
 *                a) We can call scsi_requeue_command().  The request
 *                   will be unprepared and put back on the queue.  Then
 *                   a new command will be created for it.  This should
 *                   be used if we made forward progress, or if we want
 *                   to switch from READ(10) to READ(6) for example.
 *
 *                b) We can call __scsi_queue_insert().  The request will
 *                   be put back on the queue and retried using the same
 *                   command as before, possibly after a delay.
 *
 *                c) We can call scsi_end_request() with blk_stat other than
 *                   BLK_STS_OK, to fail the remainder of the request.
 */
void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes)
{
        。。。
        /*
         * Next deal with any sectors which we were able to correctly
         * handle. Failed, zero length commands always need to drop down
         * to retry code. Fast path should return in this block.
         */
        if (likely(blk_rq_bytes(req) > 0 || blk_stat == BLK_STS_OK)) {
                if (likely(!scsi_end_request(req, blk_stat, good_bytes, 0)))大多数走这个分支
                        return; /* no bytes remaining */
        }
         。。。
}

scsi_end_request

scsi_end_request里面会先调用blk_update_request,在blk_update_request里面会结束bio,执行完blk_update_request之后,request->bio已经为NULL了,然后调用blk_finish_request结束request


/* Returns false when no more bytes to process, true if there are more */
static bool scsi_end_request(struct request *req, blk_status_t error,
                unsigned int bytes, unsigned int bidi_bytes)
{
        。。。
        if (blk_update_request(req, error, bytes))
                return true;
        。。。
                spin_lock_irqsave(q->queue_lock, flags);
                blk_finish_request(req, error);
                spin_unlock_irqrestore(q->queue_lock, flags);
        。。。                
}

blk_update_request

更新request的状态,在这里面会调用到BIO的回调函数bi_end_io,会处理request上挂的所有BIO,此函数执行完毕后request的bio链表已经为NULL。


/**
 * blk_update_request - Special helper function for request stacking drivers
 * @req:      the request being processed
 * @error:    block status code
 * @nr_bytes: number of bytes to complete @req
 *
 * Description:
 *     Ends I/O on a number of bytes attached to @req, but doesn't complete
 *     the request structure even if @req doesn't have leftover.
 *     If @req has leftover, sets it up for the next range of segments.
 *
 *     This special helper function is only for request stacking drivers
 *     (e.g. request-based dm) so that they can handle partial completion.
 *     Actual device drivers should use blk_end_request instead.
 *
 *     Passing the result of blk_rq_bytes() as @nr_bytes guarantees
 *     %false return from this function.
 *
 * Note:
 *        The RQF_SPECIAL_PAYLOAD flag is ignored on purpose in both
 *        blk_rq_bytes() and in blk_update_request().
 *
 * Return:
 *     %false - this request doesn't have any more data
 *     %true  - this request has more data
 **/
bool blk_update_request(struct request *req, blk_status_t error,
                unsigned int nr_bytes)
{
         。。。
         while (req->bio) {
                struct bio *bio = req->bio;
                unsigned bio_bytes = min(bio->bi_iter.bi_size, nr_bytes);

                if (bio_bytes == bio->bi_iter.bi_size)
                        req->bio = bio->bi_next;

                /* Completion has already been traced */
                bio_clear_flag(bio, BIO_TRACE_COMPLETION);
                req_bio_endio(req, bio, bio_bytes, error);

                total_bytes += bio_bytes;
                nr_bytes -= bio_bytes;

                if (!nr_bytes)
                        break;
        }
        。。。
}

req_bio_endio

修改BIO的一些状态参数,调用bio_endio来结束bio


static void req_bio_endio(struct request *rq, struct bio *bio,
                          unsigned int nbytes, blk_status_t error)
{
        if (error)
                bio->bi_status = error;

        if (unlikely(rq->rq_flags & RQF_QUIET))
                bio_set_flag(bio, BIO_QUIET);

        bio_advance(bio, nbytes);

        /* don't actually finish bio if it's part of flush sequence */
        if (bio->bi_iter.bi_size == 0 && !(rq->rq_flags & RQF_FLUSH_SEQ))
                bio_endio(bio);
}

bio_endio

结束BIO,如果指定了bi_end_io回调,则调用。一般情况下都会注册bi_end_io回调函数,因为Submit_bio是一个异步的接口,bi_end_io是块层通知上层的唯一接口,在写这个回调函数的时候需要考虑此函数会在多核同时执行。因为要调用回调函数,因此在bio_endio里面并没有释放bio。


/**
 * bio_endio - end I/O on a bio
 * @bio:        bio
 *
 * Description:
 *   bio_endio() will end I/O on the whole bio. bio_endio() is the preferred
 *   way to end I/O on a bio. No one should call bi_end_io() directly on a
 *   bio unless they own it and thus know that it has an end_io function.
 *
 *   bio_endio() can be called several times on a bio that has been chained
 *   using bio_chain().  The ->bi_end_io() function will only be called the
 *   last time.  At this point the BLK_TA_COMPLETE tracing event will be
 *   generated if BIO_TRACE_COMPLETION is set.
 **/
void bio_endio(struct bio *bio)
{
    。。。
        if (bio->bi_end_io)
                bio->bi_end_io(bio);
    。。。              
}

f2fs_write_end_io

f2fs_write_end_io是本例中赋值给bi_end_io的一个例子,因为在bio_endio里面并没有释放bio,所以在f2fs_write_end_io函数里面除了包含F2FS系统一些相关操作外,最后要调用bio_put(bio)来释放bio结构体占用的资源,至此一个BIO结束了他辉煌的一生。


static void f2fs_write_end_io(struct bio *bio)
{
        。。。
        bio_put(bio);
}

blk_finish_request

scsi_end_request里面调用blk_update_request结束了Request挂载的BIO,然后调用blk_finish_request来结束request的生命周期。如果注册了end_io,则调用在里面需要调用__blk_put_request来释放request的资源,如果没有注册end_io,则在这里直接调用__blk_put_request来释放资源,至此一个Request处理完毕,软中断的处理流程也结束了。


void blk_finish_request(struct request *req, blk_status_t error)
{
        。。。
        if (req->end_io) {
                rq_qos_done(q, req);
                req->end_io(req, error);
        } else {
                if (blk_bidi_rq(req))
                        __blk_put_request(req->next_rq->q, req->next_rq);

                __blk_put_request(q, req);
        }
}

参考文献

宋宝华: 文件读写(BIO)波澜壮阔的一生_Linux阅码场的博客-CSDN博客

全部评论

相关推荐

03-10 20:17
已编辑
门头沟学院 Java
点赞 评论 收藏
分享
03-17 19:21
门头沟学院 Java
面试官_我太想进步了:正常企查查显示的员工一般比设计的少
点赞 评论 收藏
分享
评论
点赞
收藏
分享

创作者周榜

更多
牛客网
牛客企业服务