1 comments

  • noamteyssier 15 hours ago

    Was sitting around in meetings today and remembered an old shell script I had to count the number of unique lines in a file. Gave it a shot in rust and with a little bit of (over-engineering)™ I managed to get 25x throughput over the naive approach using coreutils as well as improve over some existing tools.

    Some notes on the improvements:

    1. using csv (serde) for writing leads to some big gains

    2. arena allocation of incoming keys + storing references in the hashmap instead of storing owned values heavily reduced the number of allocations and improves cache efficiency (I'm guessing, I did not measure).

    There are some regex functionalities and some table filtering built in as well.

    happy hacking