Not as many changes to the files under library as I expected to see. Most changes seemed to be under a single ‘add stuff’ commit. If some of the solvers are randomised, then repeatedly running and recording best solution found will continually improve over time and give the illusion of the agent making algorithmic advancements, won’t it?
Prof. Cunxi Yu and his students at UMD is working on this exact topic and published a paper on agents for improving SAT solvers [1].
I believe they are extending this idea to EDA / chip design tools and algorithms which are also computationally challenging to solve. They have an accepted paper on this for logic synthesis which will come out soon.
It should be noted that MaxSAT 2024 did not include z3, as with many competitions. It’s possible (I’d argue likely) that the agent picked up on techniques from Z3 or some other non-competing solver, rather than actually discovering some novel approach.
Funnily, this was precisely the question I had after posting this (and the topic of an LLM disagreement discussed in another thread). Turns out not, but sibling comment is another confounding factor.
as its from 2024 (MaxSAT was not held in 2025), its quite likely all the solvers are in the training data. so the interesting part here is the instances for which we actually got better costs that what is currently known (in the best-cost.csv) file.
Not as many changes to the files under library as I expected to see. Most changes seemed to be under a single ‘add stuff’ commit. If some of the solvers are randomised, then repeatedly running and recording best solution found will continually improve over time and give the illusion of the agent making algorithmic advancements, won’t it?
Prof. Cunxi Yu and his students at UMD is working on this exact topic and published a paper on agents for improving SAT solvers [1].
I believe they are extending this idea to EDA / chip design tools and algorithms which are also computationally challenging to solve. They have an accepted paper on this for logic synthesis which will come out soon.
[1] "Autonomous Code Evolution Meets NP-Completeness", https://arxiv.org/abs/2509.07367
nice. EDA indeed one of the top applications of SAT
It should be noted that MaxSAT 2024 did not include z3, as with many competitions. It’s possible (I’d argue likely) that the agent picked up on techniques from Z3 or some other non-competing solver, rather than actually discovering some novel approach.
Is z3 competitive in SAT competitions? My impression was that it is popular due to the theories, the python API, and the level of support from MSR.
Funnily, this was precisely the question I had after posting this (and the topic of an LLM disagreement discussed in another thread). Turns out not, but sibling comment is another confounding factor.
Or for that matter even from later versions of the same solvers that were in its training data!
True. I’d be curious whether a combination of matching comp/training cutoff and censoring web searches could yield a more precise evaluation.
as its from 2024 (MaxSAT was not held in 2025), its quite likely all the solvers are in the training data. so the interesting part here is the instances for which we actually got better costs that what is currently known (in the best-cost.csv) file.
sounds like AlphaDev [1] might be a better approach for a problem like this.
[1] https://github.com/google-deepmind/alphadev
What counts as “our cost”? How long it takes to find the MaxSAT?
the sum of the weights of the unsatistied clauses. we want to reduce this number
anyone else finding that agent architectures are way more expensive than expected?