When training lots of models with subtly different parameters like this, Is there anything to be learned from the differences in logprobs between them for the same input. Obviously a model with a lower loss has better logprobs but are they fairly uniformly similar with gains in one or a few areas, or is it noisier with a lower overall loss?
> are they fairly uniformly similar with gains in one or a few areas, or is it noisier with a lower overall loss?
It seems like you want to know what median, 5-95 or 1-99 differences might be? I also wonder how the "residual" plot looks like... If there are too many residual data points for a scatter plot then a histogram might be useful to visualize the modes. I suspect that as loss decreases multiple modes should condense or altogether collapse into one.
First time I am seeing this or autoresearch in general. Incredibly cool. I can think of plenty of use cases this can apply to (e.g., drug research, trading).
Yeah the obvious workloads are for training, I think I want to point this at RL next, but I think drug research is a really strong common good next target too. We were heavily inspired by folding@home and BOINC
The agents also monitor and follow research strategies regardless of performance baseline, so anything used in the knowledge base include local minimums are considered during strategy ideation. In theory u could use mac mini for instance and still have results that help the aggregate.
We thought about storing all of the commits on Ensue too, but we wanted to match the spirit of Andrej's original design, which leans heavily on github. Curious what you were looking for when trying to inspect the code?
I know it's a bit of a barrier. . . but I set one up on vast.ai really quickly and ran it for a day for the price of lunch. One of our teammates ran it from their old gaming PC too, and it still found novel strategies
When training lots of models with subtly different parameters like this, Is there anything to be learned from the differences in logprobs between them for the same input. Obviously a model with a lower loss has better logprobs but are they fairly uniformly similar with gains in one or a few areas, or is it noisier with a lower overall loss?
> are they fairly uniformly similar with gains in one or a few areas, or is it noisier with a lower overall loss?
It seems like you want to know what median, 5-95 or 1-99 differences might be? I also wonder how the "residual" plot looks like... If there are too many residual data points for a scatter plot then a histogram might be useful to visualize the modes. I suspect that as loss decreases multiple modes should condense or altogether collapse into one.
First time I am seeing this or autoresearch in general. Incredibly cool. I can think of plenty of use cases this can apply to (e.g., drug research, trading).
Yeah the obvious workloads are for training, I think I want to point this at RL next, but I think drug research is a really strong common good next target too. We were heavily inspired by folding@home and BOINC
The agents also monitor and follow research strategies regardless of performance baseline, so anything used in the knowledge base include local minimums are considered during strategy ideation. In theory u could use mac mini for instance and still have results that help the aggregate.
Cool! However when I click the commit_url links I get a 404 page at github.
We thought about storing all of the commits on Ensue too, but we wanted to match the spirit of Andrej's original design, which leans heavily on github. Curious what you were looking for when trying to inspect the code?
Could the website also make it clearer that you need a GPU to contribute!
I know it's a bit of a barrier. . . but I set one up on vast.ai really quickly and ran it for a day for the price of lunch. One of our teammates ran it from their old gaming PC too, and it still found novel strategies
fwiw the agents just drop their whole solutions