Content on this site is AI-generated and may contain errors. If you find issues, please report at
GitHub Issues
.
LLM Learning
Home
Resources
Ctrl K
δΈζ
/
EN
Esc
#advantage
2 articles
Intermediate
Policy Gradient: Directly Optimizing the Policy
#policy-gradient
#reinforce
#baseline
#variance-reduction
#advantage
Advanced
Actor-Critic and PPO: Stable Policy Optimization
#actor-critic
#ppo
#gae
#advantage
#clipping
#trust-region