Reinforcement learning for recommendation system