解题步骤 part1:题目基本信息 1)统计2021年11月每天新用户的次日留存率(保留2位小数) date_format(时间,‘%Y%m’)=‘202111’ 2)次日留存率为当天新增的用户数中第二天又活跃了的用户数占比。 换成容易理解的话就是,新用户的次留率。 3)如果in_time-进入时间和out_time-离开时间跨天了,在两天里都记为该用户活跃过,结果按日期升序,没有新用户,不用输出。 这里可以理解为把out_time也算作是活跃日。 part2:解法1 基本的思路是:用union把in_time和out_time并联起来,对uid和date去重活获得一张用户活跃表。之后找出次活用户和活跃用户,计算留存率即可。 不要忘了几个基本条件,没有新用户不用输出,留存率保留2位小数,结果按日期升序,时间范围为2021年11月。 1)关联in_time和out_time字段,建立用户活跃表 SELECT DISTINCT uid,DATE(in_time) AS dt FROM tb_user_logUNIONSELECT DISTINCT uid,DATE(out_time) AS dt FROM tb_user_log 2)找出新用户和次活新用户,并用‘1’来定义。 找出次活新用户 CASE WHEN (uid,dt)IN (SELECT uid,date_add(dt,INTERVAL -1 DAY) FROM t1) AND (uid,dt)IN (SELECT uid,MIN(dt)FROM t1 GROUP BY uid)THEN 1 ELSE 0 END AS is_next_day#当用户满足条件“连续两天登录(找次活用户),且第一个登录日期为首次登录日(找新用户)”,则整个用户为“次活新用户” 找出新用户 CASE WHEN (uid,dt) IN (SELECT uid,MIN(dt)FROM t1 GROUP BY uid)THEN 1 ELSE 0 END is_new_day#找出新用户 构建一张判断次活用户和新用户的表 WITH t1 AS(SELECT DISTINCT uid,DATE(in_time) AS dt FROM tb_user_logUNIONSELECT DISTINCT uid,DATE(out_time) AS dt FROM tb_user_log)SELECT dt, CASE WHEN (uid,dt)IN (SELECT uid,date_add(dt,INTERVAL -1 DAY) FROM t1) AND (uid,dt)IN (SELECT uid,MIN(dt)FROM t1 GROUP BY uid) THEN 1 ELSE 0 END AS is_next_day ,#次活新用户 CASE WHEN (uid,dt) IN (SELECT uid,MIN(dt)FROM t1 GROUP BY uid)#次活用户 THEN 1 ELSE 0 END is_new_dayFROM t1WHERE DATE_FORMAT(dt,'%Y%m')='202111';#取2021年11月的数据3)计算每日的次活用户数和新用户数,并求新用户次日留存率。 每日次活用户数:SUM(is_next_day) 每日新用户数:SUM(is_new_day) 次日留存率,保留2位小数ROUND(SUM(is_next_day)/SUM(is_new_day),2) 把is_next_day和 is_new_day的定义放进上一个代码结果如下 WITH t1 AS(SELECT DISTINCT uid,DATE(in_time) AS dt FROM tb_user_logUNIONSELECT DISTINCT uid,DATE(out_time) AS dt FROM tb_user_log)#用户活跃表SELECT dt, ROUND(SUM(CASE WHEN (uid,dt)IN (SELECT uid,date_add(dt,INTERVAL -1 DAY) FROM t1) AND (uid,dt)IN (SELECT uid,MIN(dt) FROM t1 GROUP BY uid) THEN 1 ELSE 0 END) / SUM(CASE WHEN (uid,dt) IN (SELECT uid,MIN(dt)FROM t1 GROUP BY uid) THEN 1 ELSE 0 END),2) AS uv_left_rateFROM t1WHERE DATE_FORMAT(dt,'%Y%m')='202111'GROUP BY dtHAVING uv_left_rate IS NOT NULLORDER BY dt; PART3:解法2窗口函数解法 1)照例找出用户活跃表 SELECT DISTINCT uid,DATE(in_time) AS dt FROM tb_user_logUNIONSELECT DISTINCT uid,DATE(out_time) AS dt FROM tb_user_log 2)找出次活日期和首次登录日期,并判断新用户和次留新用户。 通过位移窗口函数对每个用户的活跃日向上迁移一行 SELECT uid,dt, MIN(dt) OVER (PARTITION BY uid) AS new_dt, LEAD(dt,1) OVER (PARTITION BY uid ORDER BY dt) AS next_dt FROM( SELECT DISTINCT uid,DATE(in_time) AS dt FROM tb_user_log UNION SELECT DISTINCT uid,DATE(out_time) AS dt FROM tb_user_log)act_tableWHERE DATE_FORMAT(dt,'%Y%m')='202111' 如果dt=new_dt那这个用户为新用户,如果dt=new_dt且next_dt和new_dt的日期差为1则这个用户为次留新用户 WITH t1 AS(SELECT uid,dt, MIN(dt) OVER (PARTITION BY uid) AS new_dt, LEAD(dt,1) OVER (PARTITION BY uid ORDER BY dt) AS next_dt FROM( SELECT DISTINCT uid,DATE(in_time) AS dt FROM tb_user_log UNION SELECT DISTINCT uid,DATE(out_time) AS dt FROM tb_user_log) act_tableWHERE DATE_FORMAT(dt,'%Y%m')='202111')SELECT dt, CASE WHEN dt=new_dt AND datediff(next_dt,dt)=1 THEN 1 ELSE 0 END is_next_day, CASE WHEN dt=new_dt THEN 1 ELSE 0 END is_new_dayFROM t1; 3)计算新用户次留率 WITH t1 AS(SELECT uid,dt, MIN(dt) OVER (PARTITION BY uid) AS new_dt, LEAD(dt,1) OVER (PARTITION BY uid ORDER BY dt) AS next_dt FROM( SELECT DISTINCT uid,DATE(in_time) AS dt FROM tb_user_log UNION SELECT DISTINCT uid,DATE(out_time) AS dt FROM tb_user_log) act_tableWHERE DATE_FORMAT(dt,'%Y-%m')='2021-11')SELECT dt, ROUND(SUM(CASE WHEN dt=new_dt AND datediff(next_dt,dt)=1 THEN 1 ELSE 0 END) / SUM(CASE WHEN dt=new_dt THEN 1 ELSE 0 END),2) uv_left_rateFROM t1GROUP BY dtHAVING uv_left_rate IS NOT NULLORDER BY dt;