SQL连续N天活跃用户
题目描述:查询连续登陆不少于3天的新注册用户
某产品在2022年2月8日各端口用户注册信息及后几日登录信息如下:
该题有两种解题方法,使用row_number排序或者lead窗口函数
方法1(row_number):
1.先筛选出login_tb中的user_id在register_tb的user_id,确保都是新注册用户,并考虑对login_tb进行去重,作为t1
2.对t1使用窗口函数row_number排序,作为t2,通过date_sub生成辅助时间列result,作为t3
3.对t3按照user_id和result进行group by,通过having筛选出数量大于等于3,最后加上排序
易错点:
1.忘记筛选新注册用户
2.最后外层表需要groupby到辅助时间列result层
3.date_sub里面使用的是interval rn day
WITH
reg_login AS (
SELECT DISTINCT
l.user_id,
DATE(l.log_time) AS login_date
FROM
login_tb l
WHERE
l.user_id IN (
SELECT
user_id
FROM
register_tb
)
)
select
user_id
from
(
select
user_id,
date_sub(login_date, interval rn day) result
from
(
select
user_id,
login_date,
row_number() over (
partition by
user_id
order by
login_date
) rn
from
reg_login
) t1
) t3
group by
user_id,
result
having
count(*) >= 3
order by
user_id
方法2(lead):
1.先筛选出login_tb中的user_id在register_tb的user_id,确保都是新注册用户,并考虑对login_tb进行去重,作为t1
2.对t1使用窗口函数lead(login_date, n),n为连续的天数,作为t2
3.对t2使用where判断datediff相减是否为n-1,并且对user_id进行去重,因为没有group by
易错点:
1.忘记筛选新注册用户
2.lead中最关键的是order by log_date
3.最后一步是去重distinct user_id
with
login_tb_fliter as (
select
user_id,
date(log_time) log_date
from
login_tb
where
user_id in (
select
user_id
from
register_tb
)
)
select
distinct user_id
from
(
select
user_id,
log_date,
lead(log_date, 2) over (
partition by
user_id
order by
log_date
) lead_date
from
login_tb_fliter
) t1
where datediff(lead_date,log_date) = 2
order by user_id