The problem is RLHF is that it does not scale. To do RLHF you need human labelers to label the generations by your SFT(supervised finetuned) model.
Share this post
Reinforcement Learning With AI Feedback…
Share this post
The problem is RLHF is that it does not scale. To do RLHF you need human labelers to label the generations by your SFT(supervised finetuned) model.