Batched reward model inference and Best-of-N sampling

(raw.sh)

16 points | by rawsh 6 hours ago ago

No comments yet.