SQL Notes | BiliBili Internship - 下架视频评论弹幕增量占比
数据来源表:b_dim.dim_ctnt_arch_business_tag_info_d
表格性质:全量表
需求背景:需要拉取手机游戏行业20250501-20250513时间段内下架视频的评论/弹幕增量占所有视频的评论弹幕增量的占比
整体逻辑:
- 拉取时间周期开始、时间周期结束时的所有视频评论、弹幕数、状态明细表
- 算出所有视频评论增量、弹幕增量、状态明细表
- 将2中的子表按不同状态sum
- 将3中的字表中的数据除一下得到占比
WITH
today_data AS ( -- 拉取至今日的所有视频的评论弹幕
SELECT
avid,
COALESCE(interact_map_acc['评论'], 0) AS comment_today,
COALESCE(interact_map_acc['弹幕'], 0) AS danmu_today,
state AS state_today, -- 视频下架 state != 0; 视频正常 state = 0
pubtime
FROM
b_dim.dim_ctnt_arch_business_tag_info_d
WHERE
log_date = '20250513'
),
former_data AS ( -- 拉取至月初的所有视频的评论弹幕
SELECT
avid,
COALESCE(interact_map_acc['评论'], 0) AS comment_former,
COALESCE(interact_map_acc['弹幕'], 0) AS danmu_former,
state as state_former
FROM
b_dim.dim_ctnt_arch_business_tag_info_d
WHERE
log_date = '20250501'
)
SELECT
SUM(
CASE
WHEN state_today != 0
AND state_former = 0 then diff_comment
ELSE 0
END
) * 1.0 / (
SUM(
CASE
WHEN state_today = 0
AND state_former = 0 then diff_comment
ELSE 0
END
) * 1.0
) AS comment_ratio, -- 下架视频增量评论占比
SUM(
CASE
WHEN state_today != 0
AND state_former = 0 then diff_comment
ELSE 0
END
) * 1.0, -- 下架视频增量评论条数
SUM(
CASE
WHEN state_today = 0
AND state_former = 0 then diff_comment
ELSE 0
END
) * 1.0, -- 所有增量评论条数
SUM(
CASE
WHEN state_today != 0
AND state_former = 0 then diff_danmu
ELSE 0
END
) * 1.0 / SUM(
CASE
WHEN state_today = 0
AND state_former = 0 then diff_danmu
ELSE 0
END
) * 1.0 AS danmu_ratio, -- 下架视频增量弹幕占比
SUM(
CASE
WHEN state_today != 0
AND state_former = 0 then diff_danmu
ELSE 0
END
) * 1.0, -- 下架视频增量弹幕条数
SUM(
CASE
WHEN state_today = 0
AND state_former = 0 then diff_danmu
ELSE 0
END
) * 1.0 -- 全部增量弹幕条数
FROM
(
SELECT
touch.avid,
td.comment_today, -- 至今日评论数
COALESCE(fd.comment_former, 0) as comment_former, -- 至月初评论数
td.danmu_today, -- 至今日弹幕数
COALESCE(fd.danmu_former, 0) as danmu_former, -- 至月初弹幕数
td.state_today, -- 今日视频状态
fd.state_former, -- 月初视频状态
td.comment_today - COALESCE(fd.comment_former, 0) as diff_comment,
-- 月初至今日视频评论增量
td.danmu_today - COALESCE(fd.danmu_former, 0) as diff_danmu
-- 月初至今日视频弹幕增量
FROM
(
SELECT DISTINCT
avid
FROM
bili_sycpb.dwd_bdata_common_arch_touch_a_d
WHERE
log_date = '20250513'
AND bdata_industry_name = '手机游戏'
) touch -- 特定日期和行业下的视频id去重明细表
LEFT JOIN today_data td ON touch.avid = td.avid
LEFT JOIN former_data fd ON touch.avid = fd.avid
)
-- WHERE
-- state_today != 0
-- AND state_former = 0
结果:
- comment: 1.67%
- 弹幕: 0.56%
- 好像写的有问题啊,我回顾的时候发现。我的分母里放的是“周期开始:状态正常;周期结束:状态正常”的聚合数,但是是不是应该放“周期开始:状态正常;周期结束:状态任意”的聚合数,就是 WHEN state_today = 0 AND state_former = 0 改为 WHEN state_today = 0 就可以了。离职了才发现,我晕。
- 还有好像还可以简洁一点,不用写touch这个子表,直接在today_data, former_data这两个初始明细表里边限制行业不就行了。

查看10道真题和解析