This seems like a near perfect use of coding LLMs and a useful way to implement reinforcement learning.
“Add a major bug to this file that is not covered by existing tests” vs “Find the bug in this file” vs “write a sensible test in this file that protects against this type of bug”
I'm pretty sure that's the premise for GANs (generative adversarial networks) rather than diffusion. Diffusion is more about noise reduction than pitting models against each other
A downside of naively changing source code is that you have to recompile the code for every mutant, which can be very slow (especially for Rust!). Obviously the right thing to do is to decide at runtime whether to insert a bug or not for each mutation point.
I had a brief skim through the help for cargo-mutants and it looks like it takes the naive approach which is rather unfortunate.
This is a cool project. Related fun anecdote: I once found an application at work where just about the ENTIRE test suite was a no-op as the author (and subsequent copy-pasters) misunderstood a GTest feature. Yep, dozens of unit tests which did not actually test anything. Fortunately the application did mostly work and wasn't critical.
Experiences like these make me want to write "negative tests" to test test failure conditions.
That's why I practice "red, green, refactor". I must see a new test fail once before I believe that its passing means anything
This generalizes into the strategy that learning and programming are both skimming the edge of an envelope, alternating between things that work and things that almost work
Really cool! I wish there was such a thing for JavaScript.
I say this as a so-so software engineer. I badly wish there was more emphasis on increasing software quality. There is so much the industry could do to radically improve quality, such as tools like this.
I know the incentives just aren't there, but still.
There's mull[1], which is based on LLVM toolchain and can accept any language that transforms into LLVM bytecode. SQLite[2] also does mutation testing, by compile and mutate generated assembly code.
This seems like a near perfect use of coding LLMs and a useful way to implement reinforcement learning.
“Add a major bug to this file that is not covered by existing tests” vs “Find the bug in this file” vs “write a sensible test in this file that protects against this type of bug”
Somehow that reminds me of how diffusion models are trained.
I'm pretty sure that's the premise for GANs (generative adversarial networks) rather than diffusion. Diffusion is more about noise reduction than pitting models against each other
This is called mutation testing and is very common in formal silicon verification.
https://en.wikipedia.org/wiki/Mutation_testing
A downside of naively changing source code is that you have to recompile the code for every mutant, which can be very slow (especially for Rust!). Obviously the right thing to do is to decide at runtime whether to insert a bug or not for each mutation point.
I had a brief skim through the help for cargo-mutants and it looks like it takes the naive approach which is rather unfortunate.
Here's a list of mutation testing tools for various languages: https://github.com/theofidry/awesome-mutation-testing
If you're looking for an option for Go, https://github.com/gtramontina/ooze can be of help. It was heavily inspired by https://github.com/zimmski/go-mutesting
At RustConf 2024 in Montreal, Cargo-mutants' creator Martin Pool's presentation was excellent. One of the best sessions of the conference.
https://www.youtube.com/watch?v=PjDHe-PkOy8
This is a cool project. Related fun anecdote: I once found an application at work where just about the ENTIRE test suite was a no-op as the author (and subsequent copy-pasters) misunderstood a GTest feature. Yep, dozens of unit tests which did not actually test anything. Fortunately the application did mostly work and wasn't critical. Experiences like these make me want to write "negative tests" to test test failure conditions.
That's why I practice "red, green, refactor". I must see a new test fail once before I believe that its passing means anything
This generalizes into the strategy that learning and programming are both skimming the edge of an envelope, alternating between things that work and things that almost work
Neat.
Is this to be used in addition to the tools mentioned in this talk: https://youtube.com/watch?v=qfknfCsICUM
Really cool! I wish there was such a thing for JavaScript.
I say this as a so-so software engineer. I badly wish there was more emphasis on increasing software quality. There is so much the industry could do to radically improve quality, such as tools like this.
I know the incentives just aren't there, but still.
There is such a thing. Basically every popular language has a mutation-testing frameworks. It's pretty common for large scale projects.
Good news it is a thing in js!
https://stryker-mutator.io/
There was some hype about it some years ago.
I wonder if there’s something similar to run on a c codebase.
There's mull[1], which is based on LLVM toolchain and can accept any language that transforms into LLVM bytecode. SQLite[2] also does mutation testing, by compile and mutate generated assembly code.
[1]: https://mull.readthedocs.io/en/0.26.0/ [2]: https://sqlite.org/testing.html#mutation_testing
There are a few. See: https://github.com/theofidry/awesome-mutation-testing
Need to replace the :zombie: with a
Hacker News blocks emojis
:V