49 comments

  • logannyeMD 16 hours ago

    Hey guys, this is my github repo. Glad it's received some interest - I figured HN might be the culprit when it suddenly jumped ~100 stars despite not working on the code base since last year. I prototyped this out of personal curiosity last year and moved on abruptly so there's a lot of gaps I still need to close and knobs that need to be optimized. But if people genuinely find "deterministic genomics workloads on edge devices" proposal useful, I'll begin refining the code tonight and try to make it as useful as possible. If you have any particular bioinformatics tasks or use cases that you want to be feasible on edge devices, lmk and I'll work on integrating new capabilities. Always happy to be helpful

  • devlovstad 5 hours ago

    I work with genomics pipelines in my day job. This repo does not seem quite ready for serious usage until a comparison is made with existing tools such as Bowtie 2/samtools/Strelka or similar. For cancer genomes, it's also a bit limiting that it does not call structural variants instead of just SNVs/indels.

  • mriet 12 hours ago

    Realistically, without data from a large testset that compares this thoroughly to Samtools (and others?), I wouldn't touch this.

    Note to the OP: specify a focus please? short, long, mega-long read and bacterial, human, small plant or large plant genome? Alignment heuristics and performance differ significantly across those axes.

  • a_bonobo 12 hours ago

    There has been a bit of a 'trend' to rewrite common bioinformatics/comp-bio into faster languages (Rust) via LLMs, OP's repo seems to be an early example.

    Seqera Labs has a bit of a manifesto: https://rewrites.bio/

    Heng Li has an overview here too: https://lh3.github.io/2026/04/17/the-ai-rewrite-dilemma

    IMHO it's... OK? Bioinformatics code quality is generally poor, untrained biologists writing functioning code that is poor in scoping, but works. (Unguided) LLMs write on that level, too, so not much harm done.

  • p4ul 20 hours ago

    This is interesting; thanks for sharing! I have been curious about the adoption of Rust in computational biology. I know that the folks at Saint Jude's [1] are also using Rust for their 'omics research.

    [1] https://github.com/stjude-rust-labs

  • samuell 4 hours ago

    I shared this since it seems to address a somewhat similar niche that I have had hopes to one day develop, based on FlowBase [1]; A library of streaming processing components based on basic operations, that can be easily stitched together into larger pipelines in a compiled language that can run on smaller hardware too.

    FlowBase or I didn't have much of ideas about how to keep data structures compact, as the linked library does, and I was mostly aiming to make it really easy to build streaming pipelines.

    I haven't yet got my head around how the composability story is in rosalind though, so would be interested in any pointers or examples on how this would be done using it.

    [1] https://github.com/flowbase/flowbase

  • vfalbor an hour ago

    Have you tested with other similar softwares such as Blast, which is the most common?

  • croemer 16 hours ago

    Those are all the tests for alignment. They don't even check the alignment is correct. Just that there are no errors. This is a joke: https://github.com/logannye/rosalind/blob/main/tests/alignme...

    Looks like total slop to me. All code in one commit, then a bunch of commits polishing the Readme.

    No release, no updates in half a year.

  • Jerry2 an hour ago

    Awesome piece of software! Quick side question... does anyone have a recommendation for a DNA genotyping service that prioritizes privacy? I'm looking for a company that provides private results and doesn't add them to any sort of database (dystopian or otherwise). I'd love to get my DNA profile, but I'm concerned about privacy issues. :\

  • danborn26 4 hours ago

    Rust is a great fit for genomics. Processing whole genomes locally on a laptop is a huge step up from typical Python pipelines.

  • vatsachak 17 hours ago

    Looking at the commenting pattern, it seems like AI unfortunately

    • jghn 17 hours ago

      The OP? They're not AI, they've been active on X and bsky for years.

      • vatsachak 16 hours ago

        Sorry, I meant the code in the repo

  • semiinfinitely 18 hours ago

    bioinformaticians have been making these useless bioinformatic-toolkit-in-my-favorite-programming-language repos for years

    • maxall4 18 hours ago

      Well, what else are we going to do while waiting for the bench scientists to finish collecting data?

    • asdff 16 hours ago

      Dissertationware is common in a lot of fields, honestly.

    • gilleain 18 hours ago

      Hate to agree, but it is true. For a while, I think, the main sequencing framework was in perl (Bioperl). Not sure what was best for structures - possibly Biojava?

      It is very tempting, though - 'just' make a nice, clean API in your favourite language (eg Haskell, Ruby, ...) and everyone will flock to use it! Maybe.

      • alice-fishr 6 hours ago

        Why don't you mention Biopython? Bioperl is already too old and not much up-to-date with newest data.

        • flobosg 5 hours ago

          He’s talking about the past (“For a while, …”). Up to early 2010s, I would say.

  • boron1006 18 hours ago

    Lots of bad smells in this repo.

    • the__alchemist 17 hours ago

      Do you have some examples to look at? I am curious.

      • boron1006 17 hours ago

        Well the √t stuff looks like nonsense or way overblown, existing tools already do similar things, there’s pretty much a single commit with no follow up commits etc etc.

  • shauniel 18 hours ago

    I would love to hear about what the sacrifices are, but this project really looks amazing.

  • Rijanhastwoears 18 hours ago

    > A deterministic genomics engine with a compact memory footprint.

    Uhh... are there stochastic genomics pipelines?

  • peterfirefly 19 hours ago

    Should have called it Raymond.

    • flobosg 19 hours ago
      • cmpb 19 hours ago

        I'm not familiar with Margaret Oakley Dayhoff, but I am aware that Rosalind Franklin [1] was extremely important for our understanding of DNA, comparable to Watson/Crick, with whom she co-discovered the structure of DNA. So it seems "Rosalind" is at least very appropriate as a name for a genomics tool such as this.

        Not to say the other names mentioned aren't also deserving of similar honors

        [1] https://en.wikipedia.org/wiki/Rosalind_Franklin

        • samuell an hour ago

          > So it seems "Rosalind" is at least very appropriate as a name for a genomics tool such as this.

          Indeed. The only argument against it might be that Rosalind is already a pretty well-known website for doing bioinformatics exercises and have them automatically graded:

          https://rosalind.info

        • philipallstar 16 hours ago

          Rosalind Franklin was the team lead of the research team that photographed DNA.

          The actual team member that took the key photo[0] was Raymond Gosling.

          That team didn't interpret the double helix structure of DNA that the photograph had captured - that was Watson and Crick working it out from the photograph.

          [0] https://en.wikipedia.org/wiki/Photo_51

          • groby_b 15 hours ago

            It's not quite that clear-cut. Franklin was pretty clear on the helical structure in both research notes and papers, but she didn't quite nail the overall structure (2 strands with opposing winding, complementing bases).

            Fundamentally, she suffered the curse of the experimental scientist - waiting for actual data before being willing to build a model. Watson & Crick postulated ahead based on partial data.

            • dnautics 11 hours ago

              > Franklin was pretty clear on the helical structure

              the type of diffraction her lab was doing only makes sense on helical structures. it being helical was already kind of? established -- linus pauling was contemporaneously working on some sort of alpha-helix inspired single helix model.

              watson and crick immediately recognized the position of the diffraction spots fit the distances suggested by their chemical modeling of a, t, c, g, which franklin was not able to do since she hadn't made a structural prediction.

              > postulated ahead based on partial data

              not quite. if you know that a t c and g are the raw chemicals made, you can make a (possibly even literal) model and say, "this ball and stick model predicts diffractions here".

              this is arguably better science than waiting for data and fitting a model to the data, falsifiability and all that.

        • flobosg 18 hours ago

          > I'm not familiar with Margaret Oakley Dayhoff

          Then you’re one of today’s lucky 10,000. Any time!

  • bonsai_spool 18 hours ago

    Didn't see a publication or preprint for this - is there one?