题解 | #解体解答 SQL140 思路清晰#
未完成率较高的50%用户近三个月答卷情况
https://www.nowcoder.com/practice/3e598a2dcd854db8b1a3c48e5904fe1c
(1) 涉及的主要知识点:
(1)percent_rank()
(2)sum()
(3)row_number()函数
基础的知识点可以这篇文章,关于窗口函数的:
Mysql窗口函数 (知识点梳理+题目解析+面试实战)(四万字长文 建议收藏后食用)
https://blog.csdn.net/weixin_45532984/article/details/127854668?spm=1001.2014.3001.5501
(2) 题目分析和要求梳理:
请统计SQL试卷上未完成率较高的50%用户中,6级和7级用户在有试卷作答记录的近三个月中,每个月的答卷数目和完成数目。按用户ID、月份升序排序。
1.SQL试卷
2.完成率较高的50%用户 按照未完成率排序区前一半
3.6级和7级用户
4.有试卷作答记录的近三个月中
5.每个月的答卷数目和完成数目
6.按用户ID、月份升序排序
本题的要求其实都容易理解,但是限制条件要注意顺序,在解释中可以看出坑。
(3) 解体解答:
第一步,求出每个uid的总的答题数目和未完成的数目
select distinct u1.uid uid, sum(if(score is null,1,0)) over(partition by u1.uid) incomplete_cnt, count(*) over(partition by u1.uid) total_cnt from user_info u1 join exam_record e1 on u1.uid = e1.uid join examination_info e2 on e1.exam_id = e2.exam_id where tag = 'SQL'
第二步,求出未完成率:
select uid,incomplete_cnt,total_cnt,incomplete_cnt/total_cnt incomplete_rate from( select distinct u1.uid uid, sum(if(score is null,1,0)) over(partition by u1.uid) incomplete_cnt, count(*) over(partition by u1.uid) total_cnt from user_info u1 join exam_record e1 on u1.uid = e1.uid join examination_info e2 on e1.exam_id = e2.exam_id where tag = 'SQL' )t1
第三步,求出分布率,此处用了present_rank()函数,刚开始使用的row_number函数然后排序取前一半,总是只能过2/3案例,最后排查是取前一半除了问题,在解释中明确提到了:1001、1002、1003分别排在1.0、0.5、0.0的位置,所以有意让我们使用present_rank()函数。
select uid,count(*) over() ct_all,percent_rank() over(order by incomplete_rate desc) rk from ( select uid,incomplete_cnt,total_cnt,incomplete_cnt/total_cnt incomplete_rate from( select distinct u1.uid uid, sum(if(score is null,1,0)) over(partition by u1.uid) incomplete_cnt, count(*) over(partition by u1.uid) total_cnt from user_info u1 join exam_record e1 on u1.uid = e1.uid join examination_info e2 on e1.exam_id = e2.exam_id where tag = 'SQL' )t1 )t2
第四步,选出满足条件的uid
select uid from ( select uid,count(*) over() ct_all,percent_rank() over(order by incomplete_rate desc) rk from ( select uid,incomplete_cnt,total_cnt,incomplete_cnt/total_cnt incomplete_rate from( select distinct u1.uid uid, sum(if(score is null,1,0)) over(partition by u1.uid) incomplete_cnt, count(*) over(partition by u1.uid) total_cnt from user_info u1 join exam_record e1 on u1.uid = e1.uid join examination_info e2 on e1.exam_id = e2.exam_id where tag = 'SQL' )t1 )t2 )t3 where rk <= 0.5
第五步,选出满足条件的uid的相应的每个月的完成试卷数和总的试卷数
with t4 as( select uid from ( select uid,count(*) over() ct_all,percent_rank() over(order by incomplete_rate desc) rk from ( select uid,incomplete_cnt,total_cnt,incomplete_cnt/total_cnt incomplete_rate from( select distinct u1.uid uid, sum(if(score is null,1,0)) over(partition by u1.uid) incomplete_cnt, count(*) over(partition by u1.uid) total_cnt from user_info u1 join exam_record e1 on u1.uid = e1.uid join examination_info e2 on e1.exam_id = e2.exam_id where tag = 'SQL' )t1 )t2 )t3 where rk <= 0.5 ) select u1.uid uid , date_format(start_time,'%Y%m') start_month, count(*) total_cnt, sum(if(score is not null,1,0)) complete_cnt from user_info u1 join exam_record e1 on u1.uid = e1.uid join examination_info e2 on e1.exam_id = e2.exam_id where u1.uid in (select uid from t4) && (level = 6 || level = 7) group by u1.uid, start_month order by u1.uid, start_month
第六步,利用row_number()对每个月进行排序
select uid,start_month,total_cnt,complete_cnt,row_number() over(partition by uid order by start_month desc) rk from( select u1.uid uid , date_format(start_time,'%Y%m') start_month, count(*) total_cnt, sum(if(score is not null,1,0)) complete_cnt from user_info u1 join exam_record e1 on u1.uid = e1.uid join examination_info e2 on e1.exam_id = e2.exam_id where u1.uid in (select uid from t4) && (level = 6 || level = 7) group by u1.uid, start_month order by u1.uid, start_month )t5
第七步,选取rk<=3的月份,也就是最近的三个月份,最后进行排序
select uid,start_month,total_cnt,complete_cnt from ( select uid,start_month,total_cnt,complete_cnt,row_number() over(partition by uid order by start_month desc) rk from( select u1.uid uid , date_format(start_time,'%Y%m') start_month, count(*) total_cnt, sum(if(score is not null,1,0)) complete_cnt from user_info u1 join exam_record e1 on u1.uid = e1.uid join examination_info e2 on e1.exam_id = e2.exam_id where u1.uid in (select uid from t4) && (level = 6 || level = 7) group by u1.uid, start_month order by u1.uid, start_month )t5 )t6 where rk <= 3 order by uid,start_month
(4) 答案总和:
with t4 as( select uid from ( select uid,count(*) over() ct_all,percent_rank() over(order by incomplete_rate desc) rk from ( select uid,incomplete_cnt,total_cnt,incomplete_cnt/total_cnt incomplete_rate from( select distinct u1.uid uid, sum(if(score is null,1,0)) over(partition by u1.uid) incomplete_cnt, count(*) over(partition by u1.uid) total_cnt from user_info u1 join exam_record e1 on u1.uid = e1.uid join examination_info e2 on e1.exam_id = e2.exam_id where tag = 'SQL' )t1 )t2 )t3 where rk <= 0.5 ) select uid,start_month,total_cnt,complete_cnt from ( select uid,start_month,total_cnt,complete_cnt,row_number() over(partition by uid order by start_month desc) rk from( select u1.uid uid , date_format(start_time,'%Y%m') start_month, count(*) total_cnt, sum(if(score is not null,1,0)) complete_cnt from user_info u1 join exam_record e1 on u1.uid = e1.uid join examination_info e2 on e1.exam_id = e2.exam_id where u1.uid in (select uid from t4) && (level = 6 || level = 7) group by u1.uid, start_month order by u1.uid, start_month )t5 )t6 where rk <= 3 order by uid,start_month
(5) 题目小结:
本题目要求清晰,不难解答,但是步骤繁琐,要注意在每一个表里的限制条件,是不是在应该的步骤中添加限制条件,比如:where tag = 'SQL' 和(level = 6 || level = 7)这两个限制条件选错位置那将得到不对的结果,还有就是percent_rank()函数的使用,一般用的少,刚开始想使用row_number排序来解决,但是始终不对,最后换成percent_rank()得以通过。