BIO和Request的处理流程
背景
宋宝华: 文件读写(BIO)波澜壮阔的一生一文对BIO和Block层的描述高屋建瓴,通俗易懂,通过此文可以对Block层的有一个非常清晰的整体认识。不过要结合代码实现发现还有很多不明白的地方,因为Block层的代码写的精妙绝伦,但也晦涩难懂,因为Block层的代码既要通用又要高效,所以很多函数的执行路径和上下文都有多种,简直没有最复杂只有更复杂。现在为实现XXX功能,需要对Block里面的代码进行修改,需要先熟悉里面的代码流程,本文是对宋宝华老师的文中描述内容的扩展,本文也秉承宋老师的尽量简化原则,侧重代码流程不涉及过多的原理,主要描述一个实际例子的执行路径,不涉及过多的调用分支,忽略一些异常处理流程。
本文涉及的代码平台是Linux 4.19,块设备是UFS,文件系统是F2FS。BIO、Request的最终会转换为UFS的命令在UFS器件内执行,执行完毕后的结果会以中断的形式通知给Host,数据都是已DMA的方式进行传输。UFS器件支持Command Queue,即以异步方式同时处理多个Host传来的命令,命令执行完毕后给Host发送中断,命令完成顺序和接收顺序不保证完全一致。
本文把BIO、Request的处理流程分为如下2个阶段:
-
发送过程处理:从BIO、Request的生成开始,到给器件发送UFS命令结束
-
回调过程处理:从UFS命令执行完毕,到Request、BIO回调执行,资源销毁结束。
发送过程处理
发送阶段包含了宋老师原文中的原地蓄势,电梯排序,分发执行三个阶段,以f2fs_sync_meta_pages为例说明连续发送多个bio的处理流程,详见图1 BIO、Request发送处理流程图。
实例代码
long f2fs_sync_meta_pages(struct f2fs_sb_info *sbi, enum page_type type,
long nr_to_write, enum iostat_type io_type)
{
struct address_space *mapping = META_MAPPING(sbi);
pgoff_t index = 0, prev = ULONG_MAX;
struct pagevec pvec;
long nwritten = 0;
int nr_pages;
struct writeback_control wbc = {
.for_reclaim = 0,
};
定义一个blk_plug变量,plug
struct blk_plug plug;
pagevec_init(&pvec);
把plug变量赋值给current->plug
blk_start_plug(&plug);
while ((nr_pages = pagevec_lookup_tag(&pvec, mapping, &index,
PAGECACHE_TAG_DIRTY))) {
int i;
for (i = 0; i < nr_pages; i++) {
struct page *page = pvec.pages[i];
if (prev == ULONG_MAX)
prev = page->index - 1;
if (nr_to_write != LONG_MAX && page->index != prev + 1) {
pagevec_release(&pvec);
goto stop;
}
lock_page(page);
if (unlikely(page->mapping != mapping)) {
continue_unlock:
unlock_page(page);
continue;
}
if (!PageDirty(page)) {
/* someone wrote it for us */
goto continue_unlock;
}
f2fs_wait_on_page_writeback(page, META, true, true);
if (!clear_page_dirty_for_io(page))
goto continue_unlock;
此函数内部会分配BIO,设置BIO的属性,然后调用submit_bio
if (__f2fs_write_meta_page(page, &wbc, io_type)) {
unlock_page(page);
break;
}
nwritten++;
prev = page->index;
if (unlikely(nwritten >= nr_to_write))
break;
}
pagevec_release(&pvec);
cond_resched();
}
stop:
if (nwritten)
f2fs_submit_merged_write(sbi, type);
结束current->plug
blk_finish_plug(&plug);
return nwritten;
}
调用栈
@k_stack[ufshcd_queuecommand,
ufshcd_queuecommand+0
scsi_request_fn+664
__blk_run_queue+76
queue_unplugged+224
blk_flush_plug_list+524
blk_queue_bio+632
generic_make_request+372
submit_bio+292
__submit_merged_bio+1192
f2fs_submit_page_write+544
f2fs_do_write_meta_page+192
__f2fs_write_meta_page+292
f2fs_sync_meta_pages+356
f2fs_write_checkpoint+1164
f2fs_sync_fs+300
f2fs_do_sync_file+1392
f2fs_sync_file+84
vfs_fsync_range+104
do_fsync+68
__arm64_sys_fsync+32
el0_svc_common+164
el0_svc_handler+124
el0_svc+8
]: 16
调用流程图
图1 BIO、Request发送处理流程图
回调过程处理
回调过程是UFS器件处理完毕命令后用中断通知Host,host通过中断处理函数和软中断完成BIO、Request的回调函数处理。处理过程分为中断上下文和软中断上下文,详见图2 BIO、Request回调处理流程图。
中断调用栈
调用栈就是在ISR里面的调用栈。
@k_stack[__blk_complete_request,
__blk_complete_request+0
scsi_done+184
__ufshcd_transfer_req_compl+2800
ufshcd_transfer_req_compl+112
ufshcd_intr+1528
__handle_irq_event_percpu+376
handle_irq_event_percpu+52
handle_irq_event+72
handle_fasteoi_irq+168
generic_handle_irq+48
__handle_domain_irq+112
gic_handle_irq+396
Softirq调用栈
调用栈就是在Softirq里面的调用栈。
@k_stack[blk_finish_request,
blk_finish_request+0
scsi_io_completion+144
scsi_finish_command+200
scsi_softirq_done+248
blk_done_softirq+124
__do_softirq+488
回调处理流程图
在ufshcd_send_command发送完UFS命令后,UFS开始处理命令,在处理完后触发中断,中断处理函数(ISR)和软中断处理函数(Softirq)触发各层级回调函数(cmd->scsi_done, req->end_io, bio->bio_endio)。
由于UFS内部命令是并行处理,命令完成顺序和命令到来顺序不能保证一致,所以中断到来的顺序和发送命令的顺序不能保证一致。
图2 BIO、Request回调处理流程图
发送过程主要函数
blk_start_plug
blk_start_plug是可选的,只有在连续发送多个BIO请求,进行原地蓄势时调用,必须和blk_finish_plug配对使用,在blk_start_plug和blk_finish_plug之间存在多个submit_bio操作,在蓄势阶段可以对BIO进行合并。blk_start_plug就是初始化一个链表,然后把传入的struct blk_plug指针赋值给current->plug,这样可以避免参数传递,在submit_bio和blk_finish_plug里面都可以直接操作这个链表。
/**
* blk_start_plug - initialize blk_plug and track it inside the task_struct
* @plug: The &struct blk_plug that needs to be initialized
*
* Description:
* Tracking blk_plug inside the task_struct will help with auto-flushing the
* pending I/O should the task end up blocking between blk_start_plug() and
* blk_finish_plug(). This is important from a performance perspective, but
* also ensures that we don't deadlock. For instance, if the task is blocking
* for a memory allocation, memory reclaim could end up wanting to free a
* page belonging to that request that is currently residing in our private
* plug. By flushing the pending I/O when the process goes to sleep, we avoid
* this kind of deadlock.
*/
void blk_start_plug(struct blk_plug *plug)
{
struct task_struct *tsk = current;
/*
* If this is a nested plug, don't actually assign it.
*/
if (tsk->plug)
return;
INIT_LIST_HEAD(&plug->list);
INIT_LIST_HEAD(&plug->mq_list);
INIT_LIST_HEAD(&plug->cb_list);
/*
* Store ordering should not be needed here, since a potential
* preempt will imply a full memory barrier
*/
tsk->plug = plug;
}
submit_bio
submit_bio是提交bio的总的接口,在块层之上,即文件系统层根据实际场景有很多对submit_bio的不同封装,submit_bio主要调用generic_make_request把bio转换成request。
/**
* submit_bio - submit a bio to the block device layer for I/O
* @bio: The &struct bio which describes the I/O
*
* submit_bio() is very similar in purpose to generic_make_request(), and
* uses that function to do most of the work. Both are fairly rough
* interfaces; @bio must be presetup and ready for I/O.
*
*/
blk_qc_t submit_bio(struct bio *bio)
{
。。。
return generic_make_request(bio);
}
generic_make_request
这个函数很难懂,只关注会继续调用q->make_request_fn,本例中实际的函数是blk_queue_bio
/**
* generic_make_request - hand a buffer to its device driver for I/O
* @bio: The bio describing the location in memory and on the device.
*
* generic_make_request() is used to make I/O requests of block
* devices. It is passed a &struct bio, which describes the I/O that needs
* to be done.
*
* generic_make_request() does not return any status. The
* success/failure status of the request, along with notification of
* completion, is delivered asynchronously through the bio->bi_end_io
* function described (one day) else where.
*
* The caller of generic_make_request must make sure that bi_io_vec
* are set to describe the memory buffer, and that bi_dev and bi_sector are
* set to describe the device address, and the
* bi_end_io and optionally bi_private are set to describe how
* completion notification should be signaled.
*
* generic_make_request and the drivers it calls may use bi_next if this
* bio happens to be merged with someone else, and may resubmit the bio to
* a lower device by calling into generic_make_request recursively, which
* means the bio should NOT be touched after the call to ->make_request_fn.
*/
blk_qc_t generic_make_request(struct bio *bio)
{
。。。
ret = q->make_request_fn(q, bio);
。。。
}
blk_queue_bio
在生成Request前,先当前BIO进行一些处理看看是否需要分为多个BIO处理等,然后还会判断当前BIO访问地址是否可以和电梯算法里排队的Request进行合并。如果当前BIO不能合并,则获取一个新的request,并用bio给新的Request赋值,如果使能了plug功能,则把当前request加入到plug链表的尾部,否则调用__blk_run_queue处理request。经过此函数之后,bio已经转换为request了,struct request的bio变量指向相关的bio。
static blk_qc_t blk_queue_bio(struct request_queue *q, struct bio *bio)
{
。。。
req = get_request(q, bio->bi_opf, bio, 0, GFP_NOIO);
。。。
blk_init_request_from_bio(req, bio);
。。。
plug = current->plug;
if (plug) {
。。。
list_add_tail(&req->queuelist, &plug->list);
。。。
} else {
add_acct_request(q, req, where);
__blk_run_queue(q);
。。。
}
。。。
}
blk_finish_plug
void blk_finish_plug(struct blk_plug *plug)
{
if (plug != current->plug)
return;
blk_flush_plug_list(plug, false);
current->plug = NULL;
}
blk_flush_plug_list
void blk_flush_plug_list(struct blk_plug *plug, bool from_schedule)
{
。。。
执行blk_plug->cb_list的所有callback处理
flush_plug_callbacks(plug, from_schedule);
。。。
while (!list_empty(&list)) {
。。。
/*
* rq is already accounted, so use raw insert
*/
把Request加入到request_queue里面,根据rq->cmd_flags标志,有些request直接就执行
有些request会等到下面queue_unplugged一起执行
if (op_is_flush(rq->cmd_flags))
__elv_add_request(q, rq, ELEVATOR_INSERT_FLUSH);
else
__elv_add_request(q, rq, ELEVATOR_INSERT_SORT_MERGE);
}
/*
* This drops the queue lock
*/
if (q)
unplugged操作,开始泄洪,批量处理request
queue_unplugged(q, depth, from_schedule);
}
queue_unplugged
/*
* If 'from_schedule' is true, then postpone the dispatch of requests
* until a safe kblockd context. We due this to avoid accidental big
* additional stack usage in driver dispatch, in places where the originally
* plugger did not intend it.
*/
static void queue_unplugged(struct request_queue *q, unsigned int depth,
bool from_schedule)
__releases(q->queue_lock)
{
。。。
if (from_schedule)
blk_run_queue_async(q);
else
__blk_run_queue(q);
}
__blk_run_queue
/**
* __blk_run_queue - run a single device queue
* @q: The queue to run
*
* Description:
* Invoke request handling on this queue, if it has pending work to do.
* May be used to restart queueing when a request has completed.
*/
void __blk_run_queue(struct request_queue *q)
{
。。。
__blk_run_queue_uncond(q);
}
__blk_run_queue_uncond
本例中调用的是scsi_request_fn
/**
* __blk_run_queue_uncond - run a queue whether or not it has been stopped
* @q: The queue to run
*
* Description:
* Invoke request handling on a queue if there are any pending requests.
* May be used to restart request handling after a request has completed.
* This variant runs the queue whether or not the queue has been
* stopped. Must be called with the queue lock held and interrupts
* disabled. See also @blk_run_queue.
*/
inline void __blk_run_queue_uncond(struct request_queue *q)
{
。。。
q->request_fn_active++;
q->request_fn(q);
q->request_fn_active--;
}
scsi_request_fn
SCSI层处理的总的流程,不断循环处理
-
从request_queue中取出一个经过电梯算法排序的request
-
启动一个Timer,处理request的Timeout
-
从request获取对应的struct scsi_cmnd的实例
-
处理scsi命令
/*
* Function: scsi_request_fn()
*
* Purpose: Main strategy routine for SCSI.
*
* Arguments: q - Pointer to actual queue.
*
* Returns: Nothing
*
* Lock status: request queue lock assumed to be held when called.
*
* Note: See sd_zbc.c sd_zbc_write_lock_zone() for write order
* protection for ZBC disks.
*/
static void scsi_request_fn(struct request_queue *q)
__releases(q->queue_lock)
__acquires(q->queue_lock)
{
。。。
for (;;) {
取出一个经过电梯算法排序的request,回调用到elevator_dispatch_fn函数
req = blk_peek_request(q);
。。。
启动一个Timer处理request的Timeout
if (!(blk_queue_tagged(q) && !blk_queue_start_tag(q, req)))
blk_start_request(req);
。。。
获取一个SCSI命令
cmd = blk_mq_rq_to_pdu(req);
。。。
处理SCSI命令
cmd->scsi_done = scsi_done;
rtn = scsi_dispatch_cmd(cmd);
}
}
blk_start_request
/**
* blk_start_request - start request processing on the driver
* @req: request to dequeue
*
* Description:
* Dequeue @req and start timeout timer on it. This hands off the
* request to the driver.
*/
void blk_start_request(struct request *req)
{
。。。
request从request_queue上面取出
blk_dequeue_request(req);
。。。
启动一个Timer,Timer的处理函数是blk_rq_timed_out_timer
blk_add_timer(req);
}
blk_rq_timed_out_timer
static void blk_rq_timed_out_timer(struct timer_list *t)
{
struct request_queue *q = from_timer(q, t, timeout);
timeout后启动一个work处理,work处理函数是blk_timeout_work
kblockd_schedule_work(&q->timeout_work);
}
int kblockd_schedule_work(struct work_struct *work)
{
在kblockd_workqueue执行work
return queue_work(kblockd_workqueue, work);
}
blk_timeout_work
void blk_timeout_work(struct work_struct *work)
{
。。。
执行timeout_list上的所有pending操作
list_for_each_entry_safe(rq, tmp, &q->timeout_list, timeout_list)
blk_rq_check_expired(rq, &next, &next_set);
。。。
}
blk_rq_check_expired
static void blk_rq_check_expired(struct request *rq, unsigned long *next_timeout,
unsigned int *next_set)
{
。。。
blk_rq_timed_out(rq);
。。。
}
blk_rq_timed_out
static void blk_rq_timed_out(struct request *req)
{
。。。
if (q->rq_timed_out_fn)
ret = q->rq_timed_out_fn(req); ---调用scsi_times_out
。。。
}
scsi_times_out
scsi命令的通用timeout处理
enum blk_eh_timer_return scsi_times_out(struct request *req)
{
。。。
if (host->hostt->eh_timed_out)
rtn = host->hostt->eh_timed_out(scmd);此处调用ufshcd_eh_timed_out
。。。
}
scsi_dispatch_cmd
下发SCSI命令给底层,在这里面会调用到UFS底层函数ufshcd_queuecommand,至此完成从SCSI层到UFS的过渡。
/**
* scsi_dispatch_command - Dispatch a command to the low-level driver.
* @cmd: command block we are dispatching.
*
* Return: nonzero return request was rejected and device's queue needs to be
* plugged.
*/
static int scsi_dispatch_cmd(struct scsi_cmnd *cmd)
{
。。。
rtn = host->hostt->queuecommand(host, cmd); 调用ufshcd_queuecommand
。。。
}
ufshcd_queuecommand
ufshcd_queuecommand是SCSI到UFS的总入口,经过一系列的判断最终调用ufshcd_send_command来完成想UFS硬件发送命令的操作。
/**
* ufshcd_queuecommand - main entry point for SCSI requests
* @host: SCSI host pointer
* @cmd: command from SCSI Midlayer
*
* Returns 0 for success, non-zero in case of failure
*/
static int ufshcd_queuecommand(struct Scsi_Host *host, struct scsi_cmnd *cmd)
{
。。。
err = ufshcd_comp_scsi_upiu(hba, lrbp);
。。。
err = ufshcd_map_sg(hba, lrbp);
。。。
err = ufshcd_send_command(hba, tag);
。。。
}
回调过程主要函数
中断处理主要函数
梳理在ISR上下文运行的主要函数,只关注主要流程,忽略一些错误处理和本例中未执行到的分支。
ufshcd_intr
ufs的硬件中断总入口,读取中断状态寄存器的值,进行一些中断响应的通用操作,然后调用ufshcd_sl_intr
/**
* ufshcd_intr - Main interrupt service routine
* @irq: irq number
* @__hba: pointer to adapter instance
*
* Returns IRQ_HANDLED - If interrupt is valid
* IRQ_NONE - If invalid interrupt
*/
static irqreturn_t ufshcd_intr(int irq, void *__hba)
{
。。。
ufshcd_sl_intr(hba, enabled_intr_status);
。。。
}
ufshcd_sl_intr
根据不同的中断标志(UFSHCD_UIC_MASK,UTP_TASK_REQ_COMPL,UTP_TRANSFER_REQ_COMPL),调用不同的处理函数,在本例中scsi命令会触发ufshcd_transfer_req_compl执行。
/**
* ufshcd_sl_intr - Interrupt service routine
* @hba: per adapter instance
* @intr_status: contains interrupts generated by the controller
*
* Returns
* IRQ_HANDLED - If interrupt is valid
* IRQ_NONE - If invalid interrupt
*/
static irqreturn_t ufshcd_sl_intr(struct ufs_hba *hba, u32 intr_status)
{
。。。
if (intr_status & UFSHCD_UIC_MASK)
retval |= ufshcd_uic_cmd_compl(hba, intr_status);
if (intr_status & UTP_TASK_REQ_COMPL)
retval |= ufshcd_tmc_handler(hba);
if (intr_status & UTP_TRANSFER_REQ_COMPL)
retval |= ufshcd_transfer_req_compl(hba);
。。。
}
ufshcd_transfer_req_compl
UTP_TRANSFER_REQ_COMPL中断的处理函数,调用 __ufshcd_transfer_req_compl
/**
* ufshcd_transfer_req_compl - handle SCSI and query command completion
* @hba: per adapter instance
*
* Returns
* IRQ_HANDLED - If interrupt is valid
* IRQ_NONE - If invalid interrupt
*/
static irqreturn_t ufshcd_transfer_req_compl(struct ufs_hba *hba)
{
。。。
__ufshcd_transfer_req_compl(hba, completed_reqs);
}
__ufshcd_transfer_req_compl
修改struct ufshcd_lrb *lrbp的一些状态信息,调用scsi_done
/**
* __ufshcd_transfer_req_compl - handle SCSI and query command completion
* @hba: per adapter instance
* @completed_reqs: requests to complete
*/
static void __ufshcd_transfer_req_compl(struct ufs_hba *hba,
unsigned long completed_reqs)
{
。。。
/* Do not touch lrbp after scsi done */
cmd->scsi_done(cmd);
}
scsi_done
UFS层的处理完成了,又回到了SCSI层,调用blk_complete_request
/**
* scsi_done - Invoke completion on finished SCSI command.
* @cmd: The SCSI Command for which a low-level device driver (LLDD) gives
* ownership back to SCSI Core -- i.e. the LLDD has finished with it.
*
* Description: This function is the mid-level's (SCSI Core) interrupt routine,
* which regains ownership of the SCSI command (de facto) from a LLDD, and
* calls blk_complete_request() for further processing.
*
* This function is interrupt context safe.
*/
static void scsi_done(struct scsi_cmnd *cmd)
{
trace_scsi_dispatch_cmd_done(cmd);
blk_complete_request(cmd->request);
}
blk_complete_request
结束request上的所有IO请求
/**
* blk_complete_request - end I/O on a request
* @req: the request being processed
*
* Description:
* Ends all I/O on a request. It does not handle partial completions,
* unless the driver actually implements this in its completion callback
* through requeueing. The actual completion happens out-of-order,
* through a softirq handler. The user must have registered a completion
* callback through blk_queue_softirq_done().
**/
void blk_complete_request(struct request *req)
{
。。。
if (!blk_mark_rq_complete(req))
__blk_complete_request(req);
}
__blk_complete_request
结束中断上下文的处理,然后触发软中断BLOCK_SOFTIRQ的执行,进入softirq的上下文中执行。
void __blk_complete_request(struct request *req)
{
。。。
BUG_ON(!q->softirq_done_fn);
raise_softirq_irqoff(BLOCK_SOFTIRQ);退出中断处理,触发SoftIrq执行
}
软中断处理主要函数
因为块设备的中断频率极高,在中断处理函数里面执行过长时间又不合适,所以Linux针对块设备定义了软中断BLOCK_SOFTIRQ。
blk_softirq_init
blk_softirq_init完成了软中断的注册,处理函数是blk_done_softirq。
static __init int blk_softirq_init(void)
{
int i;
for_each_possible_cpu(i)
INIT_LIST_HEAD(&per_cpu(blk_cpu_done, i));
open_softirq(BLOCK_SOFTIRQ, blk_done_softirq);
cpuhp_setup_state_nocalls(CPUHP_BLOCK_SOFTIRQ_DEAD,
"block/softirq:dead", NULL,
blk_softirq_cpu_dead);
return 0;
}
subsys_initcall(blk_softirq_init);
blk_done_softirq
blk_done_softirq就是执行percpu上,所有request的request_queue的softirq_done_fn回调函数,在本例中是scsi_softirq_done。注意Softirq是可以在多核上同时执行的,因此添加回调函数需要注意。
/*
* Softirq action handler - move entries to local list and loop over them
* while passing them to the queue registered handler.
*/
static __latent_entropy void blk_done_softirq(struct softirq_action *h)
{
struct list_head *cpu_list, local_list;
local_irq_disable();
cpu_list = this_cpu_ptr(&blk_cpu_done);
list_replace_init(cpu_list, &local_list);
local_irq_enable();
while (!list_empty(&local_list)) {
struct request *rq;
rq = list_entry(local_list.next, struct request, ipi_list);
list_del_init(&rq->ipi_list);
rq->q->softirq_done_fn(rq); ---本例中调用scsi_softirq_done
}
}
scsi_softirq_done
q->softirq_done_fn指向的函数,是SCSI SoftIrq的入口,处理SCSI命令执行的结果。如果命令执行成功,会调用scsi_finish_command
static void scsi_softirq_done(struct request *rq)
{
。。。
switch (disposition) {
case SUCCESS:
scsi_finish_command(cmd);
break;
case NEEDS_RETRY:
scsi_queue_insert(cmd, SCSI_MLQUEUE_EH_RETRY);
break;
case ADD_TO_MLQUEUE:
scsi_queue_insert(cmd, SCSI_MLQUEUE_DEVICE_BUSY);
break;
default:
scsi_eh_scmd_add(cmd);
break;
}
}
scsi_finish_command
处理了很多状态控制信息,然后调用scsi_io_completion
/**
* scsi_finish_command - cleanup and pass command back to upper layer
* @cmd: the command
*
* Description: Pass command off to upper layer for finishing of I/O
* request, waking processes that are waiting on results,
* etc.
*/
void scsi_finish_command(struct scsi_cmnd *cmd)
{
。。。
scsi_io_completion(cmd, good_bytes);
}
scsi_io_completion
调用scsi_end_request,有多个调用点,仅列出正常成功的调用分支
/*
* Function: scsi_io_completion()
*
* Purpose: Completion processing for block device I/O requests.
*
* Arguments: cmd - command that is finished.
*
* Lock status: Assumed that no lock is held upon entry.
*
* Returns: Nothing
*
* Notes: We will finish off the specified number of sectors. If we
* are done, the command block will be released and the queue
* function will be goosed. If we are not done then we have to
* figure out what to do next:
*
* a) We can call scsi_requeue_command(). The request
* will be unprepared and put back on the queue. Then
* a new command will be created for it. This should
* be used if we made forward progress, or if we want
* to switch from READ(10) to READ(6) for example.
*
* b) We can call __scsi_queue_insert(). The request will
* be put back on the queue and retried using the same
* command as before, possibly after a delay.
*
* c) We can call scsi_end_request() with blk_stat other than
* BLK_STS_OK, to fail the remainder of the request.
*/
void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes)
{
。。。
/*
* Next deal with any sectors which we were able to correctly
* handle. Failed, zero length commands always need to drop down
* to retry code. Fast path should return in this block.
*/
if (likely(blk_rq_bytes(req) > 0 || blk_stat == BLK_STS_OK)) {
if (likely(!scsi_end_request(req, blk_stat, good_bytes, 0)))大多数走这个分支
return; /* no bytes remaining */
}
。。。
}
scsi_end_request
scsi_end_request里面会先调用blk_update_request,在blk_update_request里面会结束bio,执行完blk_update_request之后,request->bio已经为NULL了,然后调用blk_finish_request结束request
/* Returns false when no more bytes to process, true if there are more */
static bool scsi_end_request(struct request *req, blk_status_t error,
unsigned int bytes, unsigned int bidi_bytes)
{
。。。
if (blk_update_request(req, error, bytes))
return true;
。。。
spin_lock_irqsave(q->queue_lock, flags);
blk_finish_request(req, error);
spin_unlock_irqrestore(q->queue_lock, flags);
。。。
}
blk_update_request
更新request的状态,在这里面会调用到BIO的回调函数bi_end_io,会处理request上挂的所有BIO,此函数执行完毕后request的bio链表已经为NULL。
/**
* blk_update_request - Special helper function for request stacking drivers
* @req: the request being processed
* @error: block status code
* @nr_bytes: number of bytes to complete @req
*
* Description:
* Ends I/O on a number of bytes attached to @req, but doesn't complete
* the request structure even if @req doesn't have leftover.
* If @req has leftover, sets it up for the next range of segments.
*
* This special helper function is only for request stacking drivers
* (e.g. request-based dm) so that they can handle partial completion.
* Actual device drivers should use blk_end_request instead.
*
* Passing the result of blk_rq_bytes() as @nr_bytes guarantees
* %false return from this function.
*
* Note:
* The RQF_SPECIAL_PAYLOAD flag is ignored on purpose in both
* blk_rq_bytes() and in blk_update_request().
*
* Return:
* %false - this request doesn't have any more data
* %true - this request has more data
**/
bool blk_update_request(struct request *req, blk_status_t error,
unsigned int nr_bytes)
{
。。。
while (req->bio) {
struct bio *bio = req->bio;
unsigned bio_bytes = min(bio->bi_iter.bi_size, nr_bytes);
if (bio_bytes == bio->bi_iter.bi_size)
req->bio = bio->bi_next;
/* Completion has already been traced */
bio_clear_flag(bio, BIO_TRACE_COMPLETION);
req_bio_endio(req, bio, bio_bytes, error);
total_bytes += bio_bytes;
nr_bytes -= bio_bytes;
if (!nr_bytes)
break;
}
。。。
}
req_bio_endio
修改BIO的一些状态参数,调用bio_endio来结束bio
static void req_bio_endio(struct request *rq, struct bio *bio,
unsigned int nbytes, blk_status_t error)
{
if (error)
bio->bi_status = error;
if (unlikely(rq->rq_flags & RQF_QUIET))
bio_set_flag(bio, BIO_QUIET);
bio_advance(bio, nbytes);
/* don't actually finish bio if it's part of flush sequence */
if (bio->bi_iter.bi_size == 0 && !(rq->rq_flags & RQF_FLUSH_SEQ))
bio_endio(bio);
}
bio_endio
结束BIO,如果指定了bi_end_io回调,则调用。一般情况下都会注册bi_end_io回调函数,因为Submit_bio是一个异步的接口,bi_end_io是块层通知上层的唯一接口,在写这个回调函数的时候需要考虑此函数会在多核同时执行。因为要调用回调函数,因此在bio_endio里面并没有释放bio。
/**
* bio_endio - end I/O on a bio
* @bio: bio
*
* Description:
* bio_endio() will end I/O on the whole bio. bio_endio() is the preferred
* way to end I/O on a bio. No one should call bi_end_io() directly on a
* bio unless they own it and thus know that it has an end_io function.
*
* bio_endio() can be called several times on a bio that has been chained
* using bio_chain(). The ->bi_end_io() function will only be called the
* last time. At this point the BLK_TA_COMPLETE tracing event will be
* generated if BIO_TRACE_COMPLETION is set.
**/
void bio_endio(struct bio *bio)
{
。。。
if (bio->bi_end_io)
bio->bi_end_io(bio);
。。。
}
f2fs_write_end_io
f2fs_write_end_io是本例中赋值给bi_end_io的一个例子,因为在bio_endio里面并没有释放bio,所以在f2fs_write_end_io函数里面除了包含F2FS系统一些相关操作外,最后要调用bio_put(bio)来释放bio结构体占用的资源,至此一个BIO结束了他辉煌的一生。
static void f2fs_write_end_io(struct bio *bio)
{
。。。
bio_put(bio);
}
blk_finish_request
scsi_end_request里面调用blk_update_request结束了Request挂载的BIO,然后调用blk_finish_request来结束request的生命周期。如果注册了end_io,则调用在里面需要调用__blk_put_request来释放request的资源,如果没有注册end_io,则在这里直接调用__blk_put_request来释放资源,至此一个Request处理完毕,软中断的处理流程也结束了。
void blk_finish_request(struct request *req, blk_status_t error)
{
。。。
if (req->end_io) {
rq_qos_done(q, req);
req->end_io(req, error);
} else {
if (blk_bidi_rq(req))
__blk_put_request(req->next_rq->q, req->next_rq);
__blk_put_request(q, req);
}
}