#reinforcement-learning
3 articles
Advanced
Online Learning and Cost Optimization: Routers Need to Evolve Too
#model-routing
#bandit
#reinforcement-learning
#pareto
#cost-optimization
Intermediate
Reinforcement Learning Foundations: From Agent to Bellman Equation
#reinforcement-learning
#mdp
#bellman-equation
#value-function
#q-learning
Intermediate
When RL Meets LLM: From Language Generation to Policy Optimization
#reinforcement-learning
#llm
#post-training
#rlhf
#policy-optimization
#alignment