最全的两种解法 | 最长连续登录天数
最长连续登录天数
https://www.nowcoder.com/practice/cb8bc687046e4d32ad38de62c48ad79b
本题两种解法
1.使用LAG()OVER和SUM()OVER()两个窗口函数
使用LAG用于找出连续天的节点,如果和前一个日期相减不为0,则为新的开始
SELECT
user_id,
fdate,
CASE
WHEN prev_date IS NULL THEN 1
WHEN DATEDIFF(fdate, prev_date) = 1 THEN 0
ELSE 1
END AS is_new_group
FROM
(
SELECT
user_id,
fdate,
LAG(fdate) OVER (
PARTITION BY
user_id
ORDER BY
fdate
) AS prev_date
FROM
tb_dau
WHERE
fdate BETWEEN '2023-01-01' AND '2023-01-31'
) t1
随后使用SUM()OVER()函数对其进行累加,由于中间天数为0,起始点为1,故只要发生数字的变化,就是新的连续开始
例如1,1,2,2,2,3,3
SELECT
user_id,
fdate,
SUM(is_new_group) OVER (
PARTITION BY
user_id
ORDER BY
fdate
) AS grp
最后只需使用GROUP BY 对分好类的数字统计,选取数量最多的就是正解
SELECT
user_id,
MAX(consec_days) AS max_consec_days
总代码为
SELECT
user_id,
MAX(consec_days) AS max_consec_days
FROM
(
SELECT
user_id,
grp,
COUNT(*) AS consec_days
FROM
(
SELECT
user_id,
fdate,
SUM(is_new_group) OVER (
PARTITION BY
user_id
ORDER BY
fdate
) AS grp
FROM
(
SELECT
user_id,
fdate,
CASE
WHEN prev_date IS NULL THEN 1
WHEN DATEDIFF(fdate, prev_date) = 1 THEN 0
ELSE 1
END AS is_new_group
FROM
(
SELECT
user_id,
fdate,
LAG(fdate) OVER (
PARTITION BY
user_id
ORDER BY
fdate
) AS prev_date
FROM
tb_dau
WHERE
fdate BETWEEN '2023-01-01' AND '2023-01-31'
) t1
) t2
) t3
GROUP BY
user_id,
grp
) t4
GROUP BY
user_id;
2.使用日期-行号,也就是标准答案中的做法
首先使用ROW_NUMBER()OVER()输出行号,随后用日期减去行号,会得到一个新的日期,但这个日期我们可以不当作日期来看待,其实就是方法一中最后输出的那个常数1,2,3,最后一步只需对其进行COUNT并找出最大的那个即可
WITH logs AS (
-- 1. 先按天去重(一天多次登录算一天)
SELECT DISTINCT user_id, fdate
FROM tb_dau
WHERE fdate BETWEEN '2023-01-01' AND '2023-01-31'
),
rn AS (
-- 2. 对每个用户按日期排序并编号
SELECT
user_id,
fdate,
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY fdate) AS date_order
FROM logs
),
grp AS (
-- 3. 用 “日期 - 行号” 当分组 key
SELECT
user_id,
DATE_SUB(fdate, INTERVAL date_order DAY) AS grp_key
FROM rn
),
streak AS (
-- 4. 每个连续区间的长度
SELECT
user_id,
grp_key,
COUNT(*) AS consec_day
FROM grp
GROUP BY user_id, grp_key
)
-- 5. 每个用户的最长连续天数
SELECT
user_id,
MAX(consec_day) AS max_consec_days
FROM streak
GROUP BY user_id;
查看29道真题和解析