LLMs Don't Quite Beat Classical Hyperparameter Optimization Algorithms

(github.com)

3 points | by achierius 11 hours ago ago

2 comments

One of those rare papers where the code speaks for itself. They do a bunch of comparisons but the most salient is comparing Karpathy's autoresearch (verbatim, best as I can tell) vs. some HPO algorithms, and as of yet the Tree-structured Parzen estimator still wins out -- but just barely!

More interesting though is that the best results come from 'centaur' approaches, where an LLM is hooked up with a standard HPO. Somewhere around 1:3 LLM:HPO control seems to work best, with more LLM control degrading performance. But either way this method far outperforms either the naive autoresearch loop or the bare HPO approach.

[-]

achierius 11 hours ago

> Centaur outperformed all methods including CMA-ES alone by using the LLM on only 30% of trials. The LLM receives CMA-ES's full internal state (mean vector, step-size, covariance matrix), the top-5 configurations, and the last 20 trials. A 0.8B LLM already suffices to outperform all classical and pure LLM methods. Scaling from 0.8B (0.9766) to 27B (0.9763) to Gemini Pro (0.9767) yields no improvement, suggesting a capability plateau [which Claude sliightly beats]

> We ablate the LLM ratio: higher ratios degrade performance, confirming that CMA-ES should retain majority control.