sabo - 个人主页动态 - 牛客网

发布(204) 评论刷题收藏

2020-02-27 07:24

腾讯_数据平台部_算法工程师

词法分析-语法分析-中间代码产生-优化-目标代码生成目标代码三种形式：1. 汇编指令代码：需要进行汇编 2. 可直接运行：可以直接运行 3. 可重新定位指令代码：需要连接

0 点赞评论收藏

分享

2020-02-26 09:02

已编辑

腾讯_数据平台部_算法工程师

Policy gradient

Optimizing the average reward objectpolicy gradient theorem:对于同一个状态，各个动作的梯度，然后对各个状态累加。 Gaussian policies for continuous actions  课程的所有算法

0 点赞评论收藏

分享

2020-02-26 06:29

腾讯_数据平台部_算法工程师

Coarse coding和function approximation

Coarse codingcoarse coding可以影响泛化和分辨的性能。The size, number and shape of features affect generalization.The resulting shape interactions affect the ability to discrimination. Exploration in function approximationOptimistic initial value should be localized in function approximation. 想象如果只有一个特征,那么无论怎样都会被...

0 点赞评论收藏

分享

2020-02-25 14:14

腾讯_数据平台部_算法工程师

Functional approximation

Generalization and discrimination:Generalization: 相似借鉴Discrimination：不同区分 reinforcement online settingreinforcement learning里的算法是online learning,所以并不是所有的supervised learning algorithm都能用到reinforcement learning里面。Supervised learning: target是固定的,不会随着时间而改变。Reinforcement learning会随着自身状态估计而改变。 Gradient Mo...

0 点赞评论收藏

分享

2020-02-25 08:03

已编辑

腾讯_数据平台部_算法工程师

Dyna:Framework for reinforcement learning

Dyna two types of experience  Direct learning from experience generated from actual environment. Simulated experience from models used for planning.  Dyna: combine direct RL and planningPlanning（如DP） 和learning（如MC、TD）方法的核心都是用backing-up 更新公式计算value function 的估计值。区别在于Planning 所用经验是有模型生成的simulated expe...

0 点赞评论收藏

分享

2020-02-24 02:56

腾讯_数据平台部_算法工程师

Reinforcement learning for recommendation system

看了下ICML2019 Craig Boutilier, 里面给出了RL在推荐系统里形式化的框架。所以摘录一下。状态：用户特征，用户历史，上下文特征动作：推荐的候选(recommendation slate)奖励: 交互行为(immediate engagement) 物品交互问题 value of slates depend on user choice model用户选择模型联合优化   Tractable slate optimization Decomposed Sarsa/TD

0 点赞评论收藏

分享

2020-02-24 15:02

已编辑

腾讯_数据平台部_算法工程师

Reinforcement learning: Alberta

optimistic initial values set an initial value which is larger than the max valueThis heuristic can only drive early exploration.They are not suited for non-stationary problems.We do not know the maximum value of each arm-bandit.But it is still a good heuristic to combine with other methods. explore...

0 点赞评论收藏

分享

2020-02-22 08:01

腾讯_数据平台部_算法工程师

斯坦纳树模版

题目链接：https://vjudge.net/problem/HDU-3311题目解答（模版）： #include <bits/stdc++.h> using namespace std; int n,m,p; const int N=1009,M=1e4+5+N*2,INF=1e9; int idx= 0; int cost[N],f[(1<<5)+1][N],e[M]; bool inq[N]; int h[N]; int w[M]; int ne[M]; int c; queue<int> q; void addEdge(int u,int v,in...

0 点赞评论收藏

分享

2020-02-16 10:59

腾讯_数据平台部_算法工程师

stochastic multi-armed bandits, regret minimization

chernoff bounds  运用Markov inequality   最后根据可以得到   Hoeffding's equality  Stochastic mult-armed bandits  休息一下

0 点赞评论收藏

分享

2020-02-16 08:01

腾讯_数据平台部_算法工程师

Introduction to online optimization: online gradient descent

online gradient descent: Theorem:For any closed convex action set A such that ,for any subdifferentiable loss with bounded subgradient , the OGD strategy with parameters and satisfies:  strongly convex loss with bounded subgradient

0 点赞评论收藏

分享

2020-02-16 06:48

已编辑

腾讯_数据平台部_算法工程师

Introduction to online optimization: continuous exp strategy

objective: extend exp strategy to continuous exp strategy where   convex and bounded For any convex loss taking values in [0,1], the continuous exp strategy satisfies >0: (虽然敲一遍有助于理解,但是太长了)  exp-concave For any -exp-concave loss, the continuous exp strategy with parameter satisfies:

0 点赞评论收藏

分享

2020-02-16 06:03

腾讯_数据平台部_算法工程师

在牛客打卡4天，今天也很努力鸭！

0 点赞评论收藏

分享

2020-02-16 05:33

已编辑

腾讯_数据平台部_算法工程师

Introduction to online optimization: introduction

online learning protocol: characteristic:limited feedback Exponentially weighed average forecaster  Bounded convex loss and expert regret Hoeffding’s's inequalitylemma 2.1:  Therorem: For any convex loss taking values in [0,1], the Exp strategy satisfies:  Exp-concave loss and expert regret  Lower b...

0 点赞评论收藏

分享

2020-02-12 06:47

腾讯_数据平台部_算法工程师

warp loss是推荐系统中针对rank-pairwise的损失函数 paddle tagspace实现，包含了warp loss损失函数的实现 def network(vocab_text_size, vocab_tag_size, emb_dim=10, hid_dim=1000, win_size=5, margin=0.1, neg_size=5): """ network definition """ text = io.data(name="text", shape=[1], lod_level=1,...

0 点赞评论收藏

分享

2020-02-12 03:06

已编辑

腾讯_数据平台部_算法工程师

GNN解决什么问题：针对non-euclidean数据限制：浅层语义结构的拓展特征融合非固定点探索 Graph Types Directed Graphsmethods: use two kinds of weight matrix, to incorporate more precise structural information.Heterogeneous Graphs  The simplest way to process heterogeneous graph is to convert the type of each node to a one-hot feature vect...

0 点赞评论收藏

分享

创作者周榜

更多

关注他的用户也关注了：

牛客网
牛客企业服务