2022-08-10 10:15 已编辑青岛大学数据分析师

关注

SQL编程练习总结-进阶篇

SQL123 SQL类别高难度试卷得分的截断平均值

试卷信息表 examination_info：exam_id试卷ID, tag试卷类别, difficulty试卷难度, duration考试时长, release_time发布时间
考试记录表 exam_record：uid用户ID, exam_id试卷ID, start_time开始作答时间, submit_time交卷时间, score得分

问题需求：
查询tag，difficulty，clip_avg_score

问题拆解：
SQL类别——tag='SQL'
高难度试卷——difficulty=‘hard’
截断平均值——可采用两种办法：
1.先将所有分数加总，减去一个最高分数和一个最低分数，再除以（人数-2）；
2.使用窗口函数dense_rank. 当最大值和最小值不唯一时，用这种方法可将重复最值全部过滤.

细节处理：
1.使用round函数对计算结果四舍五入，保留一位小数;
2.窗口函数不会自动忽略空值，因此要手动过滤score为null的行.

方法一：

select tag,difficulty,
round((sum(e2.score)-max(e2.score)-min(e2.score))/(count(e2.score)-2),1)as clip_avg_score
from examination_info as ei,exam_record as er
where tag='SQL' and difficulty='hard' and ei.exam_id=er.exam_id;

方法二：

select tag,difficulty,round(avg(score),1) as clip_avg_score
from(
    select tag,difficulty,score,
    dense_rank() over(partition by er.exam_id order by score desc) as rank1,
    dense_rank() over(partition by er.exam_id order by score asc) as rank2
    from exam_record as er
    left join examination_info as ei
    on ei.exam_id=er.exam_id
    where tag='SQL' and difficulty='hard' and score is not null
) as t
where rank1 !=1 and rank2 !=1
group by tag;

SQL126 平均活跃天数和月活人数

考试记录表 exam_record

问题需求：
查询month，avg_active_days，mau

问题拆解：
活跃用户人均活跃几天——可采用两种办法：
1.使用count函数，对用户id去重，计算人均活跃天数；
2.使用子查询，先找出活跃用户和活跃天数，再计算人均活跃天数.
月活人数——使用distinct关键字对用户id去重
每个月——group by month

细节处理：
使用date_format函数修改日期格式；
使用round函数对计算结果四舍五入.

方法一：

select date_format(submit_time,'%y%m') as month,
round(count(distinct uid,day(submit_time))/count(distinct uid),2) as avg_active_days,
count(distinct uid) as mau
from exam_record
where year(start_time)=2021
and submit_time is not null
group by month;

两条记录的两个字段uid和date_format转换后的starttime相同时判定为重复的记录，count只会+1.

方法二：

 select substr(ymd,1,6) as month,
 round(count(1)/count(distinct uid),2) as avg_active_days,
 count(distinct uid) as mau
from (
    select distinct uid, date_format(submit_time, "%Y%m%d") as ymd
    from exam_record
    where submit_time is not NULL and year(submit_time)='2021'
) as t_active_day
group by month;

substr(string, start,length)：从string的start位置开始提取字符串;
count(1)与count(*)的作用相同.

SQL127 月总刷题数和日均刷题数

题目练习记录表 practice_record

问题需求：
查询submit_month，month_q_cnt，avg_day_q_cnt

问题拆解：
月总刷题数——count(score)；
日均刷题数——月总刷题数/每月天数，使用last_day函数得到每月最后一天的日期，再用day函数转换为天；
汇总——使用union all连接，字段名要与之前的一致.

select date_format(submit_time,'%Y%m')as submit_month,
count(score) as month_q_cnt,
round(count(score)/avg(day(last_day(submit_time))),3)as avg_day_q_cnt
from practice_record
where date_format(submit_time,'%Y')='2021'
group by submit_month
union all
select '2021汇总'as submit_month,count(score) as month_q_cnt,
round(count(score)/31,3) as avg_day_q_cnt
from practice_record
where date_format(submit_time,'%Y')='2021'
order by submit_month;

在测试用例时因不符合ONLY_FULL_GROUP_BY SQL模式而报错.ONLY_FULL_GROUP_BY的语义是确定select 中的所有列的值要么是来自于聚合函数（sum、avg、max等）的结果，要么是来自于group by list中的表达式的值.在day函数前添加聚合函数可解决该问题；另外MySQL提供any_value()函数抑制ONLY_FULL_GROUP_BY值被拒绝，因此也可将聚集函数改成any_value()函数.

SQL128 未完成试卷数大于1的有效用户

试卷作答记录表exam_record
试卷信息表examination_info

问题需求：
查询uid，incomplete_cnt，complete_cnt，detail

问题拆解：
未完成/已完成试卷数——count函数嵌套if函数；
作答过的试卷的信息——需要试卷tag和作答日期两个信息，用concat_ws函数连接；展示每个符合条件的用户的试卷信息，使用group_concat函数.
细节处理：
1.过滤分组要用having子句而不是where，并且group by子句要在having之前.
2.group_concat(distinct str1 order by str2 separator)将group by产生的同一个分组中的值连接起来，返回一个字符串结果.
3.concat_ws(separator,str1,str2,...)将多个字符串用指定的分隔符连接起来.

select uid,count(if(submit_time is null,1,null)) as incomplete_cnt,
count(if(submit_time is null,null,1)) as complete_cnt,
group_concat(distinct concat_ws(':',date(start_time),tag) separator ';') as detail
from exam_record er 
left join 
examination_info ei on er.exam_id = ei.exam_id where year(start_time)=2021
group by uid
having complete_cnt>=1 and incomplete_cnt<5 and incomplete_cnt>1
order by incomplete_cnt desc;

SQL129 月均完成试卷数不小于3的用户爱作答的类别

试卷作答记录表exam_record
试卷信息表examination_info

问题需求：
查询tag，tag_cnt

问题拆解：
先找出符合条件的用户：
月均完成试卷数——完成的试卷总数/有提交记录的月份总数,使用count函数；
再统计其爱作答的类别：count(tag).

细节处理：
1.使用in操作符来制定用户的范围；
2.过滤分组使用having子句.

select ei.tag,count(ei.tag) as tag_cnt
from exam_record as er
join examination_info as ei
on er.exam_id=ei.exam_id
where er.uid in (
select uid
from exam_record
group by uid
having count(submit_time)/count(distinct month(submit_time))>2
)
group by ei.tag
order by tag_cnt desc;

记录本题的一个错误答案：

select t.tag,count(t.uid)as tag_cnt
from (
      select uid,tag
      from exam_record
      join examination_info using(exam_id)
      group by uid
      having(count(submit_time)/count(distinct month(submit_time)))>2
) as t
group by tag
order by tag_cnt desc;

该答案试图创建一个新表，筛选出月均答题数大于等于3的用户及试卷tag，再在该表的基础上统计他们爱作答的试卷类别.但t表中对uid group by，只保留了他们作答试卷的tag而并未保留作答次数，因此无法得到正确答案.
alt

SQL133 分别满足两个活动的人

试卷作答记录表exam_record
试卷信息表examination_info

题目中要求将两类用户分别标记为activity1和activity2同时输出，考虑用union all连接两个查询.其中，activity1为“每次试卷得分都能到85分的人”，即考试成绩最低为85分；activity2为“至少有一次用了一半时间就完成高难度试卷且分数大于80的人”,完成时间可以使用timestampdiff函数，返回以天/小时/分/秒为单位的两个时间的间隔，比datediff函数更加灵活.

(select uid,'activity1' as activity
from exam_record as e1
where year(start_time)='2021'
group by uid
having min(score)>=85)
union all
(select uid,'activity2' as activity
from exam_record as e1 
left join 
examination_info as e2 on e1.exam_id=e2.exam_id
where e2.difficulty='hard' and score>=80 
and timestampdiff(second,e1.start_time,e1.submit_time)<= e2.duration*30
group by uid)
order by uid;

另外，“每次试卷得分都能到85分的人” 也可以采用这样的方法处理：只要某人某次考试得分在85分以下，就将其剔除.

select distinct uid, 'activity1' as activity
from exam_record
where uid not in (select uid
                  from exam_record
                  where score < 85)
order by uid, activity;

SQL134 满足条件的用户的试卷完成数和题目练习数

用户信息表user_info
试卷信息表examination_info
试卷作答记录表exam_record
题目练习记录表practice_record

问题需求：
查询uid，exam_cnt，question_cnt

问题拆解：
满足条件的用户：高难度SQL试卷得分平均值大于80并且用户等级是7级——此处的限定条件是针对用户，因此可使用in操作符，把符合上述条件的用户筛选出来；
2021年——year函数；
试卷总完成次数和题目总练习次数：count函数.

细节处理：
1.有些用户试卷和习题只做了其中之一，如果join exam_record和practice_record on uid，那么由于这个人的uid在某个表中不存在，uid就会被排除，因此联结使用left join，把user_info中所有uid都选出来；
2.因为left join导致多表联结后出现了出现了重复的部分，因此要用到distinct关键字.distinct不是对exam_id或者question_id去重，而是为了解决多出来的重复部分的问题.

alt

如图，左联结后统计1001和1002所做的试卷数时都会有重复.
3.同样因为有些用户试卷和习题只做了其中之一，筛选年份时要在联结表的同时进行，否则submit_time为null时会去掉一些uid导致结果变少.
参考

SQL135 每个6/7级用户活跃情况

用户信息表user_info
试卷作答记录表exam_record
题目练习记录表practice_record

问题需求：
查询uid，act_month_total，act_days_2021,act_days_2021_exam,act_days_2021_question

问题拆解：
act_days_2021_exam/act_days_2021_question——分别在两个表中用count函数；
act_month_total，act_days_2021——将exam_record表和practice_record表联结成一张表，分别计算月份数和天数；
2021年——year函数.

细节处理：
1.2021年活跃天数不等于2021年试卷作答活跃天数+2021年答题活跃天数，因为如果同一天完成试卷跟答题，活跃天数仍应该为1，但计算会得到2；
2.因为试卷和题目的活跃天数和月份数要合并统计，因此在联结表时，将两张表的活跃时间合并为同一列；同时因为还要分别统计各自的活跃天数，可将试卷或题目作为tag列保留这个信息；
3.使用union all联结的表要具有完全相同的列名.

select u.uid,
count(distinct date_format(act_time,'%Y%m')) as act_month_total,
count(distinct if(year(act_time)=2021,date(act_time),null)) as act_days_2021,
count(distinct if(year(act_time)=2021 and tag='exam',act_time,null)) as act_days_2021_exam,
count(distinct if(year(act_time)=2021 and tag='question',act_time,null)) as act_days_2021_question
from user_info as u
left join(
          select uid,date(start_time) as act_time,'exam' as tag
          from exam_record as er
          union all
          select uid,date(submit_time) as act_time,'question' as tag
          from practice_record as pr
)as t
on u.uid=t.uid
where level>5
group by u.uid
order by act_month_total desc,act_days_2021 desc;

SQL136 每类试卷得分前3名

试卷信息表examination_info
试卷作答记录表exam_record

问题需求：
查询tag,uid,ranking
问题拆解：
每类试卷——group by tag
得分的前3名——窗口函数
细节处理：
1.如果查询的其他部分（WHERE，GROUP BY，HAVING）需要窗口函数，要在子查询中使用窗口函数；
2.三种排序窗口函数的区别：
rank：当有重复排名，跳过之后的排名（1、1、3……）
dense_rank:当有重复排名，不跳过之后的排名（1、1、2……）
row_number:赋予唯一且连续的排名（1、2、3）

select*
from(
    select tag as tid,uid,
    row_number() over(partition by tag order by tag,max(score) desc,min(score) desc,uid desc)as ranking
    from examination_info as ei
    left join exam_record as er using(exam_id)
    group by tid,uid)as t
where ranking<4

SQL137 第二快/慢用时之差大于试卷时长一半的试卷

试卷信息表examination_info
试卷作答记录表exam_record

问题需求：
查询exam_id，duration，release_time
问题拆解：
第二快/慢用时——窗口函数，分别用升序和降序排序；此时得到的是用时排名，还要将该排名对应的用时取出来，因此要另外查询出每份试卷的用时——timestampdiffhans
细节处理：
1.注意分秒换算，用秒计算时间差得到准确值
方法一：

select t.exam_id,duration,release_time
from(
    select exam_id,
    row_number()over (partition by exam_id order by timestampdiff(second,start_time,submit_time))
    as rank_fast,
    row_number()over (partition by exam_id order by timestampdiff(second,start_time,submit_time)desc) 
    as rank_slow,
    timestampdiff(second,start_time,submit_time) as timex
    from exam_record
    where score is not null
)as t
join examination_info as ei using(exam_id)
group by t.exam_id
having (sum(case when rank_slow=2 then timex else 0 end)-sum(case when rank_fast=2 then timex else 0 end))> 0.5 * duration*60
order by t.exam_id desc

方法二：使用nth_value函数，直接得到对应排名的用时长度.

select distinct t.exam_id,duration,release_time 
from 
    (select er.exam_id, 
        nth_value(timestampdiff(second,start_time,submit_time),2) over (partition by exam_id order by timestampdiff(second,start_time,submit_time) desc ) as rank_slow,
        nth_value(timestampdiff(second,start_time,submit_time),2) over (partition by exam_id order by timestampdiff(second,start_time,submit_time) asc) as rank_fast,
        duration,release_time
    from exam_record as er 
    left join examination_info as ei using(exam_id)
    ) as t
where rank_slow-rank_fast>duration*60/2
order by t.exam_id desc;

SQL138 连续两次作答试卷的最大时间窗

试卷作答记录表exam_record

问题需求：
查询uid,days_window,avg_exam_cnt
问题拆解：
连续两次作答试卷——窗口函数lead，得到连续的下次作答时间
最大时间间隔——datediff函数，聚合函数max/min
至少有两天作答过试卷——做题天数间隔>1
细节处理：
1.datediff得到的是两天中间隔的天数，求时间窗时要在此基础上+1；
2.datediff(time1,tim2)得到的结果是time1-tim2，因此要注意顺序.

select uid,days_window,round(days_window*exam_cnt/day_diff,2)as avg_exam_cnt
from(
    select uid,max(datediff(next_time,start_time))+1 as days_window,
    datediff(max(start_time),min(start_time))+1 as day_diff,count(start_time)as exam_cnt
    from(
        select uid,start_time,
            lead(start_time) over(partition by uid order by start_time)as next_time
        from exam_record
        where year(start_time)='2021')as t1
    group by uid)as t2
where day_diff>1
order by days_window desc,avg_exam_cnt desc

SQL139 近三个月未完成试卷数为0的用户完成情况

试卷作答记录表exam_record

问题需求：
查询uid,exam_complete_cnt
问题拆解：
近三个月——dense_rank降序，令rank值<=3
未完成试卷数为0——count（开始时间）=count（提交时间）

select uid,count(score) as exam_complete_cnt
from(
    select uid,dense_rank()over(partition by uid order by date_format(start_time,'%Y%m') desc)as rank1,score,start_time,submit_time
    from exam_record
    )as t
where rank1<=3
group by uid
having count(start_time)=count(submit_time)
order by exam_complete_cnt desc,uid desc

SQL140 未完成率较高的50%用户近三个月答卷情况

用户信息表user_info
试卷信息表examination_info
试卷作答记录表exam_record

问题需求：:
查询uid,start_month,total_cnt,complete_cnt
问题拆解：
未完成率较高的50%用户——窗口函数percent_rank
近三个月——窗口函数dense_rank
嵌套顺序为：①找出SQL试卷未完成率较高的50%用户
②找出6，7级用户
③找出这些用户近三个月的答题情况
细节处理：
1.注意答题情况是针对所有类型的试卷，而不只是SQL试卷；
2.每个用户各自近三个月的答题情况都要显示，因此在对月份排序时要加上partition by；
3.percent_rank是对所有用户排序，无需分组，因此不使用partition by；但排序条件是每个用户各自的未完成率，因此最后要加group by;
4.使用窗口函数排序时注意升序还是降序.

select uid, 
       date_format(start_time, '%Y%m') as start_month,
       count(start_time) as tatol_cnt,
       count(score) as complete_cnt
from(
    select uid,start_time,score,
           dense_rank()over(partition by uid order by date_format(start_time,'%Y%m')desc) as rank1
    from exam_record
)as t1
where rank1<=3
and uid in(
     select uid
     from(
         select uid,percent_rank() over(order by count(if(submit_time is null,1,null))/count(start_time)desc)as rank2
         from exam_record
         group by uid
        ) as t2 
     right join user_info using(uid)
     where level>=6
     and rank2<=0.5
)
group by uid, start_month
order by uid, start_month

使用临时表，在本地通过测试的答案：

 with a as(select uid
          from(
                select uid,start_time,
                        percent_rank() over(order by count(if(submit_time is null,1,null))/count(start_time)desc)as rank1
                from exam_record as er 
                left join examination_info as ei using(exam_id)
                where tag='SQL'
                group by uid)as a1
          where rank1<=0.5
        )
select u.uid,start_month,total_cnt,complete_cnt
from(
    select a.uid,date_format(start_time,'%Y%m')as start_month,
        dense_rank()over(partition by uid order by date_format(start_time,'%Y%m')desc)as rank2,
        count(start_time) over(partition by uid,date_format(start_time,'%Y%m') order by uid)as total_cnt,
        count(submit_time) over(partition by uid,date_format(start_time,'%Y%m') order by uid)as complete_cnt
    from a left join exam_record using(uid)
    )as t
right join user_info as u using(uid)
where rank2<=3 and level>5
group by u.uid,start_month
order by u.uid,start_month;

SQL141 试卷完成数同比2020年的增长率及排名变化

试卷信息表examination_info
试卷作答记录表exam_record

问题需求：:
查询tag,exam_cnt_20,exam_cnt_21,growth_rate,exam_cnt_rank_20,exam_cnt_rank_21
问题拆解：
20和21年上半年的答题数——count函数，if函数
做完次数排名——rank窗口函数
细节处理：
1.增长率要按要求输出，因此在对计算完的数据处理之后，用concat函数将数值和百分号拼接起来；
2.排名变化有可能是负数，而窗口函数返回的排名数值是无符号整数，直接减会出错，要使用cast函数将其转换.

select tag,exam_cnt_20,exam_cnt_21,growth_rate,exam_cnt_rank_20,exam_cnt_rank_21,
        cast(exam_cnt_rank_21 as signed)-cast(exam_cnt_rank_20 as signed) as rank_delta
from(
    select tag,exam_cnt_20,exam_cnt_21,
            concat(round((exam_cnt_21-exam_cnt_20)*100/exam_cnt_20,1),'%') as growth_rate,
            rank()over(order by exam_cnt_20 desc) as exam_cnt_rank_20,
            rank()over(order by exam_cnt_21 desc) as exam_cnt_rank_21
    from(
        select tag,count(if(year(submit_time)=2020 and month(submit_time)<=6,1,null)) as exam_cnt_20,
                count(if(year(submit_time)=2021 and month(submit_time)<=6,1,null)) as exam_cnt_21
        from exam_record r left join examination_info i using(exam_id)
        group by tag
        ) a  
    )b
where exam_cnt_20!=0 and exam_cnt_21!=0
order by growth_rate desc,exam_cnt_rank_21 desc;

使用临时表，在本地通过测试的答案：

with a as(
    select tag,exam_cnt as exam_cnt_20,
    rank() over(order by exam_cnt desc)as exam_cnt_rank_20
    from(select tag,submit_time,
            count(submit_time) over(partition by tag,year(submit_time) order by year(submit_time))as exam_cnt
         from exam_record left join examination_info using(exam_id)
        where year(submit_time)=2020 and month(submit_time)<=6
         )as a1
    group by tag
),
b as(
    select tag,exam_cnt as exam_cnt_21,
            rank() over(order by exam_cnt desc)as exam_cnt_rank_21
    from(select tag,submit_time,
            count(submit_time) over(partition by tag,year(submit_time) order by year(submit_time))as exam_cnt
         from exam_record left join examination_info using(exam_id)
          where year(submit_time)=2021 and month(submit_time)<=6
        )as b1
    group by tag
)
select tag,exam_cnt_20,exam_cnt_21,
        concat(round((exam_cnt_21-exam_cnt_20)/exam_cnt_20*100,1),'%') as growth_rate,
        exam_cnt_rank_20,exam_cnt_rank_21,exam_cnt_rank_21-exam_cnt_rank_20 as rank_delta
from a join b using(tag)
group by tag
order by growth_rate desc,exam_cnt_rank_21 desc;

SQL142 对试卷得分做min-max归一化

用户信息表user_info
试卷作答记录表exam_record

问题需求：:
查询uid，exam_id，avg_new_score
问题拆解：
每份试卷的最高分和最低分——聚合窗口函数max，min
根据试卷作答记录得分数决定输出——if函数，注意当只有一个作答分数时，最高分=最低分
计算新得分——聚合函数avg
细节处理：
分数归一化后缩放到[0,100]区间，直接对原结果*100即可.

select uid,exam_id,round(avg(ns),0)as avg_new_score
from(
    select uid,exam_id,
        if(min_score!=max_score,(score-min_score)/(max_score-min_score)*100,score)as ns
    from(select er.uid,er.exam_id,score,
            max(score) over(partition by exam_id)as max_score,
            min(score) over(partition by exam_id)as min_score
        from exam_record as er
        left join examination_info as ei using(exam_id)
        where score is not null and difficulty='hard'
        ) as t1
    )as t2
group by uid,exam_id
order by exam_id,avg_new_score desc

SQL143 每份试卷每月作答数和截止当月的作答总数

试卷作答记录表exam_record

问题需求：
查询exam_id，start_month，month_cnt，cum_exam_cnt
问题拆解：
每份试卷每月作答数——聚合窗口函数count
截止当月的作答总数——聚合窗口函数count
细节处理：
当窗口函数中order by后面缺少窗口从句条件，默认取本行及之前所有行；当无order by时，默认取所有行。因此计算每月作答数时，无order by条件；计算截至当前月作答数时要加上order by条件.

select distinct exam_id,date_format(start_time,'%Y%m')as start_month,
count(start_time)over(partition by exam_id,date_format(start_time,'%Y%m'))as month_cnt,
count(start_time)over(partition by exam_id order by date_format(start_time,'%Y%m'))as cum_exam_cnt
from exam_record
order by exam_id,start_month

SQL144 每月及截止当月的答题情况

试卷作答记录表exam_record

问题需求：
查询start_month,mau,month_add_uv, max_month_add_uv,cum_sum_uv
问题拆解：
每月月活——count distinct uid
每月新增用户数——先找出每个用户最早的做题时间，若最早做题时间是在当月，则当月新用户+1.用到min函数，count函数，if函数
截止当月的单月最大新增用户数——max聚合窗口函数
截止当月的累积用户数——sum聚合窗口函数，将每个月的新增用户加起来
细节处理：
截至当月的答题情况不需要用到partition by分组，直接用order by排序即可.

select start_month,mau,month_add_uv,
max(month_add_uv)over(order by start_month) as max_month_add_uv,
sum(month_add_uv)over(order by start_month) as cum_sum_uv
from(
    select date_format(start_time,'%Y%m')as start_month,count(distinct uid)as mau,
            count(distinct if(date_format(start_time,'%Y%m')=first_month,t1.uid,null))as month_add_uv
    from(
        select uid,min(date_format(start_time,"%Y%m"))as first_month
        from exam_record
        group by uid
        )as t1
    left join exam_record using(uid)
    group by start_month
    )as t2
order by start_month

SQL148 筛选昵称规则和试卷规则的作答记录

用户信息表user_info
试卷信息表examination_info
试卷作答记录表exam_record

问题需求：
查询uid，exam_id，avg_score

问题拆解：
对于昵称的要求——"牛客"+纯数字+"号"：牛客%号；纯数字——正则表达式
对科目的要求——以大写或小写c开头——正则表达式.

select u.uid,e1.exam_id,round(avg(score)) as avg_score
from user_info as u
left join exam_record as e2 on u.uid=e2.uid
left join examination_info as e1 on e1.exam_id=e2.exam_id    
where (nick_name rlike '^[0-9]+$' or nick_name like '牛客%号') and tag  rlike '^(C|c).*'
and submit_time is not null
group by u.uid,e1.exam_id order by u.uid,avg_score;

正则表达式"^ [0-9]+$"的含义：

1.字符^
意义：表示匹配的字符必须在最前边
例如：^A不匹配“an A”中的‘A’，但匹配“An A”中最前面的‘A’.

2.字符图片说明
意义：匹配最末的字符.
例如：不匹配“eater”中的‘t’，但匹配“eat”中的‘t’.

3.字符[0-9]
意义：字符列表，匹配列出中的任一个字符.可以通过连字符-指出字符范围.
例如：[abc]跟[a-c]一样.它们匹配“brisket”中的‘b’和“ache”中的‘c’.

4.字符+
意义：匹配+号前面的字符1次及以上.等价于{1,}.
例如：a+匹配“candy”中的‘a’和“caaaaaaandy”中的所有‘a’.

来源：https://www.yulucn.com/question/4866556442

SQL149 根据指定记录是否存在输出不同情况

用户信息表user_info
试卷作答记录表exam_record

问题需求：
查询uid，incomplete_cnt，incomplete_rate
问题拆解：
筛选条件：任意一个0级用户未完成试卷数大于2——if函数
根据是否满足筛选条件输出不同的结果——exist关键字+嵌套
细节处理：
1.若不存在符合条件的用户，题目要求输出有作答记录的用户的数据，因此在if函数中的筛选条件应为start_time is not null and submit_time is null；在筛选符合条件用户同时计算做题数，最后要求做题数>0；
2.若存在符合条件的用户，题目中有未作答试卷的用户，要求将其未完成数和未完成率默认为0输出，也就是即使没作答过试卷也要将这个0级用户筛选出来，因此where后要再加一次level=0；
3.可以在开始先用with-as短语定义一个子片段，筛选出需要用到的数据，方面后续的操作.

with t as(
    select u.uid,level,count(if(start_time is not null and submit_time is null,1,null))as incomplete_cnt,
    round(count(if(start_time is not null and submit_time is null,1,null))/count(u.uid),3) as incomplete_rate,
    count(exam_id) as num
    from user_info as u
    left join exam_record as er using(uid)
    group by u.uid
)

select uid,incomplete_cnt,incomplete_rate
from t
where exists(
    select uid
    from t
    where level=0 and incomplete_cnt>2
)and level=0
union
select uid,incomplete_cnt,incomplete_rate
from t
where not exists(
    select uid
    from t
    where level=0 and incomplete_cnt>2
)
order by incomplete_rate

SQL150 各用户等级的不同得分表现占比

用户信息表user_info
试卷作答记录表exam_record

问题需求：
查询level，score_grade，ratio
问题拆解：
根据分数分成不同的等级——case-when函数；
根据不同的等级记录答题的人数——聚合窗口函数

with t as(
    select u.uid,exam_id,score,level,
    count(*)over(partition by level)as num,
    (case when score >= 90 then '优'
    when score >= 75 then '良'
    when score >= 60 then '中'
    else '差' end)as score_grade
    from user_info as u
    left join exam_record as er using(uid)
    where score is not null
)
select level,score_grade,
round(count(score_grade)/num,3)as ratio
from t
group by level,score_grade
order by level desc,ratio desc

SQL152 注册当天就完成了试卷的名单第三页

用户信息表user_info
试卷信息表examination_info
试卷作答记录表exam_record
问题需求：
查询uid，level，register_time，max_score
问题拆解：
注册当天——date函数
第三页的三条——limit子句，跳过前6条，选取3条

select u.uid,level, register_time,max(score) as max_score
from exam_record as er left join examination_info as ei
on er.exam_id=ei.exam_id
left join user_info as u
on er.uid=u.uid
where job = '算法'
and tag = '算法'
and date(register_time) = date(submit_time)
group by uid
order by max_score desc 
limit 6, 3

SQL155 大小写混乱时的筛选统计

试卷信息表examination_info
试卷作答记录表exam_record

问题需求：
查询tag，answer_cnt
问题拆解：
将tag转换为大写——upper函数
对应的原本试卷作答数——将表进行自联结，联结条件为tag不同，转换为大写后相同

with a as(
    select tag,count(start_time) as answer_cnt
    from exam_record er
    join examination_info ei
    on er.exam_id=ei.exam_id
    group by tag)
select a.tag,b.answer_cnt
from a join a as b
on upper(a.tag)=b.tag 
and a.tag!=b.tag
where a.answer_cnt<3;