RLHFlow/ArmoRM-Llama3-8B-v0.1
Text Classification • Updated
• 15.1k • 183
Reward models trained by RLHFlow codebase (https://github.com/RLHFlow/RLHF-Reward-Modeling/)
Note Bradley-Terry reward model trained with RLHFlow codebase
Note Tech report that covers Pairwise Preference Model
Note Tech report for ArmoRM