Content on this site is AI-generated and may contain errors. If you find issues, please report at
GitHub Issues
.
LLM Learning
Home
Resources
Ctrl K
δΈζ
/
EN
Esc
#reinforce
1 articles
Intermediate
Policy Gradient: Directly Optimizing the Policy
#policy-gradient
#reinforce
#baseline
#variance-reduction
#advantage