题解2021年11月每天新用户的次日留存率 为何不先过滤?

2021年11月每天新用户的次日留存率

https://www.nowcoder.com/practice/1fc0e75f07434ef5ba4f1fb2aa83a450

1.错误解法

select dt,round(avg(if(timestampdiff(day,in_time,lead_time)=1,1,0)),2)
from(
    select uid, date_format(in_time,'%Y-%m-%d') dt, in_time,out_time, lead(in_time) over(partition by uid order by in_time) lead_time,row_number() over(partition by uid order by in_time) rk
    from tb_user_log
    where date_format(in_time,'%Y-%m') = '2021-11'
)t1
where rk = 1
group by dt
order by dt

2.正确解法

select dt,round(sum(if(timestampdiff(day,dt,lead_time)=1,1,0))/count(*),2) from(
    select *, row_number() over(partition by uid order by dt) rk,
    lead(dt,1) over(partition by uid order by dt) lead_time
    from(
        select uid,date(in_time) as dt from tb_user_log
        union 
        select uid,date(out_time) as dt from tb_user_log
    )t1
)t2
where rk = 1
group by dt
having date_format(dt,'%Y-%m') = '2021-11'
order by dt

3.为何如此

这里引用Cole4Youreyez在题解中的原话:

“该题有一个易错点,那就是题目中第一句话就说明了在十一月份的,可能做题人会考虑在建立表t时就直接where语句选出11月的,这样是错误的。 原因在于:在建立表t是where会先运行,也就是选择了11月的信息再进行排序。那么如果一个用户10月活跃过,在11月中,11月2号和3号活跃了,这样的操作会使得该用户11-02的排名rk=1,也会被我们认为是新用户,但是显然并非如此。”

使用having就能避免先where过滤掉的情况,即该用户在10月活跃过,时间来到11月,他不是新用户。

4.解体解答

4.1 首先求出所有的用户的情况

(因为题目要求如果in_time-进入时间和out_time-离开时间跨天了,在两天里都记为该用户活跃过)这一点比较棘手,需要用union过滤,

        select uid,date(in_time) as dt from tb_user_log
        union 
        select uid,date(out_time) as dt from tb_user_log

4.2 进行排序,并且用lead函数提出下一次用户出现的时间

    select *, row_number() over(partition by uid order by dt) rk,
    lead(dt,1) over(partition by uid order by dt) lead_time
    from(
        select uid,date(in_time) as dt from tb_user_log
        union 
        select uid,date(out_time) as dt from tb_user_log
    )t1

4.3 求得留存率

select dt,round(sum(if(timestampdiff(day,dt,lead_time)=1,1,0))/count(*),2) from(
    select *, row_number() over(partition by uid order by dt) rk,
    lead(dt,1) over(partition by uid order by dt) lead_time
    from(
        select uid,date(in_time) as dt from tb_user_log
        union 
        select uid,date(out_time) as dt from tb_user_log
    )t1
)t2
where rk = 1
group by dt
having date_format(dt,'%Y-%m') = '2021-11'
order by dt

rk为1证明第一次出现,having在最后使用避免出现用户非新用户的情况

全部评论

相关推荐

野猪不是猪🐗:我assume that你must技术aspect是solid的,temperament也挺good的,however面试不太serious,generally会feel style上不够sharp
点赞 评论 收藏
分享
评论
点赞
收藏
分享

创作者周榜

更多
牛客网
牛客企业服务