Reinforcement Learning With AI Feedback…

Sep 16, 2023

The problem is RLHF is that it does not scale. To do RLHF you need human labelers to label the generations by your SFT(supervised finetuned) model.

Read →

Comments

#nojs-banner { position: fixed; bottom: 0; left: 0; padding: 16px 16px 16px 32px; width: 100%; box-sizing: border-box; background: red; color: white; font-family: -apple-system, "Segoe UI", Roboto, Helvetica, Arial, sans-serif, "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol"; font-size: 13px; line-height: 13px; } #nojs-banner a { color: inherit; text-decoration: underline; } This site requires JavaScript to run correctly. Please turn on JavaScript or unblock scripts

Aziz et al. Paper Summaries

Reinforcement Learning With AI Feedback…