【场景】:近几天活跃【分类】:条件函数、日期函数、datediff分析思路难点:1.这个题有一个坑题目中没有说清楚,沉睡用户(近7天未活跃但更早前活跃过),根据定义它是包含流失用户(近30天未活跃但更早前活跃过),也就是说只要是流失用户就是沉睡用户!实际上答案并不包括。所以沉睡用户应该这样定义:近7天未活跃但更早前活跃过且非流失用户。2.case when 判断使每个类型的用户不会有重合。而且先判断哪个后判断哪个尤其重要,顺序不能换!新学到:1.用户最近一次活跃时间:max(date);一直没有想到,其实在之前是知道求用户第一次活跃时间是用:min(date)。如果使用这个会省很多力气。因为最后一次活跃的时间可以用来确定 近几天未活跃但更早前活跃过2.select里面可以套用select,尤其是从另外一个表用group by查询,因为group by又不好放在总的框架里面。(1)统计用户第一次活跃的时间;用户最后一次活跃的时间如果in_time-进入时间和out_time-离开时间跨天了,在两天里都记为该用户活跃过[使用]:min()确定用户第一次活跃时间,max()确定用户最后一次活跃时间(2)近7天和近30天,涉及两个不同的活跃条件一种方法是求出第7天是什么时候:日期减去天数得到日期;另一种是日期减去日期得到天数[使用]:date_sub()、datediff('2021-11-04',dt_max) <=6、timestampdiff(day,expr1,expr2)都可以(3)对用户分类且求比例case when 判断就使每个类型的用户不会有重合,所以‘新晋用户’在’忠实用户‘的前面,’流失用户‘在’沉睡用户‘的前面。[使用]:case when求解代码方法一:with子句 + 使用 union 合并几种分类情况#依次去求每个类型的uid有哪些with main as( #用户第一次活跃的时间 select uid, min(date(in_time)) as min_date from tb_user_log group by uid ), attr as( #近7天和近30天,两个不同条件的活跃记录表 #先求出今日是哪一天,往前七天是那一天,往前30天是那一天 select max(date(in_time)) as now_date, date_sub(max(date(in_time)),interval 6 day) as 7_date, date_sub(max(date(in_time)),interval 29 day) as 30_date from tb_user_log ) ,temp as( #新晋用户(近7天新增) select distinct uid as new_u from main,attr where min_date >= 7_date )#输出:105、102 ,temp1 as( #忠实用户(近7天活跃过且非新晋用户) #when date(in_time)>= 7_date and not in 新晋用户 select distinct uid as zs_u from tb_user_log,attr where date(in_time)>= 7_date and uid not in( select uid from main,attr where min_date >= 7_date ) )#输出:109、108、104 ,temp2 as( #流失用户(近30天未活跃但更早前活跃过) #when date(in_time) < 30_date and not in 近30天活跃过的用户 select distinct uid as ls_u from tb_user_log,attr where date(in_time)< 30_date and uid not in( select uid from tb_user_log,attr where date(in_time) >= 30_date ) )#输出:101 ,temp3 as( #沉睡用户(近7天未活跃但更早前活跃过) #when date(in_time) < 7_date and not in 近7天活跃过的用户 select distinct uid as cs_u from tb_user_log,attr where date(in_time) < 7_date and uid not in( select uid from tb_user_log,attr where date(in_time) >= 7_date ) )#输出:103、101(select '新晋用户' as user_grade, count(*) as ratiofrom tempgroup by user_grade)union(select '忠实用户' as user_grade, count(*) as ratiofrom temp1group by user_grade)union(select '流失用户' as user_grade, count(*) as ratiofrom temp2group by user_grade)union(select '沉睡用户' as user_grade, count(*) as ratiofrom temp3group by user_grade)方法二:with 子句 + case whenwith main as( #用户第一次活跃的时间,用户最后活跃的时间 select uid, min(date(in_time)) as min_date, max(date(in_time)) as max_date from tb_user_log group by uid ), attr as( #近7天和近30天,涉及两个不同的活跃条件 #先求出今日是哪一天,往前七天是哪一天,往前30天是哪一天 select max(date(in_time)) as now_date, date_sub(max(date(in_time)),interval 6 day) as 7_date, date_sub(max(date(in_time)),interval 29 day) as 30_date from tb_user_log )#case when 判断就使每个类型的用户不会有重合。select (case when min_date >= 7_date then '新晋用户' when max_date >= 7_date then '忠实用户' when max_date <= 30_date then '流失用户' else '沉睡用户' end) as user_grade, round(count(distinct uid)/(select count(distinct uid) from tb_user_log),2) as ratiofrom main,attrgroup by user_gradeorder by ratio desc方法三:case when + 日期函数:date_sub()#case when 判断就使每个类型的用户不会有重合。select (case when min_date >= 7_date then '新晋用户' when max_date >= 7_date then '忠实用户' when max_date <= 30_date then '流失用户' else '沉睡用户' end) as user_grade, round(count(distinct uid)/(select count(distinct uid) from tb_user_log),2) as ratiofrom( #用户第一次活跃的时间,用户最后活跃的时间 select uid, min(date(in_time)) as min_date, max(date(in_time)) as max_date from tb_user_log group by uid) main,( #近7天和近30天,涉及两个不同的活跃条件 #先求出今日是哪一天,往前七天是哪一天,往前30天是哪一天 select max(date(in_time)) as now_date, date_sub(max(date(in_time)),interval 6 day) as 7_date, date_sub(max(date(in_time)),interval 29 day) as 30_date from tb_user_log) attrgroup by user_gradeorder by ratio desc方法四:case when + 日期函数: timestampdiff(day,expr1,expr2)#case when 判断就使每个类型的用户不会有重合。select (case when timestampdiff(day,min_date,'2021-11-04') <= 6 then '新晋用户' when timestampdiff(day,max_date,'2021-11-04') <= 6 then '忠实用户' when timestampdiff(day,max_date,'2021-11-04') >= 30 then '流失用户' else '沉睡用户' end) as user_grade, round(count(distinct uid)/(select count(distinct uid) from tb_user_log),2) as ratiofrom( #用户第一次活跃的时间,用户最后活跃的时间 select uid, min(date(in_time)) as min_date, max(date(in_time)) as max_date from tb_user_log group by uid ) maingroup by user_gradeorder by ratio desc方法五:日期函数:datediff(expr1,expr2) <=6#case when 判断就使每个类型的用户不会有重合。select (case when datediff('2021-11-04',min_date) <= 6 then '新晋用户' when datediff('2021-11-04',max_date) <= 6 then '忠实用户' when datediff('2021-11-04',max_date) >= 30 then '流失用户' else '沉睡用户' end) as user_grade, round(count(distinct uid)/(select count(distinct uid) from tb_user_log),2) as ratiofrom( #用户第一次活跃的时间,用户最后活跃的时间 select uid, min(date(in_time)) as min_date, max(date(in_time)) as max_date from tb_user_log group by uid ) maingroup by user_gradeorder by ratio desc