题解 | #连续两次作答试卷的最大时间窗#
连续两次作答试卷的最大时间窗
https://www.nowcoder.com/practice/9dcc0eebb8394e79ada1d4d4e979d73c
本题我的思路是把它拆分为两部分,一部分是计算每个用户连续作答的最大时间窗口,另一部分是计算每个用户平均每天做多少套试题。
关于第一部分时间窗口的计算,有几个要点:
- 必须是连续两次作答的时间窗;
- 时间限制在2021年;
- 每套试题有作答记录即可,并没说一定要完成。
在此我的思路是先对源数据按用户分区并按作答开始时间升序排列,这时会得到用户作答信息以及第一个分组排名rk1,将该表称之为t1;这时为得到连续两次的时间窗,我们对rk1进行滞后一名处理,也即查询一张新表,这张新表的信息除了排名字段外其他均与t1保持一致,同时同样采用按用户分区并按用户开始作答时间升序排序,不同的是,这里需要将得到的排名减1,得到排序字段rk2,这张新表作为t2。将表t1和t2按用户字段和排名字段进行连接后即可将某用户的当期数据和其之后连续一期的数据放在同一行中,这样即能得到计算连续时间窗的数据源;后续在计算连续时间窗时还会碰到的一个问题是,如果直接使用timestampdiff()函数计算两个开始作答时点相隔的天数,很可能出现相隔天数不准确的情况,因此我在此的处理是先用date_format()函数将时点标准化。
关于第二部分计算每个用户平均每天做多少套试题的思路则较为简单,先分别得到每个用户在时间窗口内完成了多少套题目,再计算总的时间窗口,相除后即可得到平均每天完成的试卷套数。
select s1.uid, days_window, round(days_window * ag, 2) avg_exam_cnt from ( select t1.uid uid, max( timestampdiff( day, date_format(t1.start_time, "%Y%m%d"), date_format(t2.start_time, '%Y%m%d') )) + 1 days_window from ( select *, row_number() over( partition by uid order by start_time asc ) rk1 from exam_record where year(start_time) = 2021 and uid in ( select uid from exam_record group by uid having count(distinct date_format(start_time, "%Y%m%d")) >= 2 )) t1 join ( select *, row_number() over( partition by uid order by start_time asc) -1 rk2 from exam_record where year(start_time) = 2021 and uid in ( select uid from exam_record group by uid having count(distinct date_format(start_time, "%Y%m%d")) >= 2 )) t2 on t1.uid = t2.uid and t1.rk1 = t2.rk2 group by t1.uid ) s1 join ( select a1.uid uid, num / ( timestampdiff( day, date_format(mins, "%Y%m%d"), date_format(maxs, '%Y%m%d')) + 1) ag from ( ( select uid, min(start_time) mins, count(start_time) as num from exam_record where year(start_time) = 2021 group by uid ) a1 join ( select uid, max(start_time) maxs from exam_record where year(start_time) = 2021 group by uid ) a2 on a1.uid = a2.uid ) ) s2 on s1.uid = s2.uid order by days_window desc, avg_exam_cnt desc