DRN

DRN: A deep reinforcement learning framework for news recommendation
遇到的问题:无法体现dynamic nature of news recommendation
First, they only try to model current reward
Second, very few studies consider to use user feedback other than click/no click labels (how frequent user returns) to help improve recommendations.
Third, these methods tend to keep recommending similar news to users, which may cause users to bored.

-greedy问题: 会推荐完全不相关的东西

需要尝试多次才能准确得到价值估计
*contribution: *
1.强化学习框架
2.用户活跃度,比起仅仅用点击和不点击的反馈要好很多
3.Dueling Bandit Gradient Descent
4.效果确实好

method:
We use a continuous state feature representation of users and continuous action feature representation of items as inputs to DQN.
model framework:

  1. Push: when a user sends a news request to the system, the recommendation agent G will take the feature representation of the current user and news candidates as input, and generates a top-k list of news to recommend L. L is generated by combining the exploitation of current model and exploration of movie items
  2. Feedback: User u who has received recommended news L will give their feedback B by his clicks on this set of news.
  3. Minor update: After each timestamp, with feature representation of the previous user u and news list L, and the feedback B. G 会比较两个DQN exploitation Q Network and exploration Q network, 哪个效果好,如果后者效果好,现在的模型会朝着exploration更新一点。
  4. Major update: 经验回放, agent保留者最近历史点击和用户活跃度记录.

User Activeness

user activeness

全部评论

相关推荐

白火同学:1、简历可以浓缩成一页,简历简历先要“简”方便HR快速过滤出有效信息,再要“历”用有效信息突出个人的含金量。 2、教育背景少了入学时间~毕业时间,HR判断不出你是否为应届生。 3、如果你的平台账号效果还不错,可以把账号超链接或者用户名贴到对应位置,一是方便HR知道你是具体做了什么内容的运营,看到账号一目了然,二是口说无凭,账号为证,这更有说服力。
面试被问期望薪资时该如何...
点赞 评论 收藏
分享
评论
点赞
收藏
分享

创作者周榜

更多
牛客网
牛客网在线编程
牛客网题解
牛客企业服务