SQL Notes | BiliBili Internship - 下架视频评论弹幕增量占比
数据来源表:b_dim.dim_ctnt_arch_business_tag_info_d
表格性质:全量表
需求背景:需要拉取手机游戏行业20250501-20250513时间段内下架视频的评论/弹幕增量占所有视频的评论弹幕增量的占比
整体逻辑:
- 拉取时间周期开始、时间周期结束时的所有视频评论、弹幕数、状态明细表
- 算出所有视频评论增量、弹幕增量、状态明细表
- 将2中的子表按不同状态sum
- 将3中的字表中的数据除一下得到占比
WITH today_data AS ( -- 拉取至今日的所有视频的评论弹幕 SELECT avid, COALESCE(interact_map_acc['评论'], 0) AS comment_today, COALESCE(interact_map_acc['弹幕'], 0) AS danmu_today, state AS state_today, -- 视频下架 state != 0; 视频正常 state = 0 pubtime FROM b_dim.dim_ctnt_arch_business_tag_info_d WHERE log_date = '20250513' ), former_data AS ( -- 拉取至月初的所有视频的评论弹幕 SELECT avid, COALESCE(interact_map_acc['评论'], 0) AS comment_former, COALESCE(interact_map_acc['弹幕'], 0) AS danmu_former, state as state_former FROM b_dim.dim_ctnt_arch_business_tag_info_d WHERE log_date = '20250501' ) SELECT SUM( CASE WHEN state_today != 0 AND state_former = 0 then diff_comment ELSE 0 END ) * 1.0 / ( SUM( CASE WHEN state_today = 0 AND state_former = 0 then diff_comment ELSE 0 END ) * 1.0 ) AS comment_ratio, -- 下架视频增量评论占比 SUM( CASE WHEN state_today != 0 AND state_former = 0 then diff_comment ELSE 0 END ) * 1.0, -- 下架视频增量评论条数 SUM( CASE WHEN state_today = 0 AND state_former = 0 then diff_comment ELSE 0 END ) * 1.0, -- 所有增量评论条数 SUM( CASE WHEN state_today != 0 AND state_former = 0 then diff_danmu ELSE 0 END ) * 1.0 / SUM( CASE WHEN state_today = 0 AND state_former = 0 then diff_danmu ELSE 0 END ) * 1.0 AS danmu_ratio, -- 下架视频增量弹幕占比 SUM( CASE WHEN state_today != 0 AND state_former = 0 then diff_danmu ELSE 0 END ) * 1.0, -- 下架视频增量弹幕条数 SUM( CASE WHEN state_today = 0 AND state_former = 0 then diff_danmu ELSE 0 END ) * 1.0 -- 全部增量弹幕条数 FROM ( SELECT touch.avid, td.comment_today, -- 至今日评论数 COALESCE(fd.comment_former, 0) as comment_former, -- 至月初评论数 td.danmu_today, -- 至今日弹幕数 COALESCE(fd.danmu_former, 0) as danmu_former, -- 至月初弹幕数 td.state_today, -- 今日视频状态 fd.state_former, -- 月初视频状态 td.comment_today - COALESCE(fd.comment_former, 0) as diff_comment, -- 月初至今日视频评论增量 td.danmu_today - COALESCE(fd.danmu_former, 0) as diff_danmu -- 月初至今日视频弹幕增量 FROM ( SELECT DISTINCT avid FROM bili_sycpb.dwd_bdata_common_arch_touch_a_d WHERE log_date = '20250513' AND bdata_industry_name = '手机游戏' ) touch -- 特定日期和行业下的视频id去重明细表 LEFT JOIN today_data td ON touch.avid = td.avid LEFT JOIN former_data fd ON touch.avid = fd.avid ) -- WHERE -- state_today != 0 -- AND state_former = 0
结果:
- comment: 1.67%
- 弹幕: 0.56%
- 好像写的有问题啊,我回顾的时候发现。我的分母里放的是“周期开始:状态正常;周期结束:状态正常”的聚合数,但是是不是应该放“周期开始:状态正常;周期结束:状态任意”的聚合数,就是 WHEN state_today = 0 AND state_former = 0 改为 WHEN state_today = 0 就可以了。离职了才发现,我晕。
- 还有好像还可以简洁一点,不用写touch这个子表,直接在today_data, former_data这两个初始明细表里边限制行业不就行了。