LLMs are very good at lossless compression via arithmetic coding. But I didn't know that it was possible to go the reverse direction (do language modeling via a compressor). It's not super great quality, but I'm surprised it worked! Other compression algorithms (like PPMd) use variable n-grams under the hood, and should be much better (although less interesting due to already containing basic language models internally).
LLMs are very good at lossless compression via arithmetic coding. But I didn't know that it was possible to go the reverse direction (do language modeling via a compressor). It's not super great quality, but I'm surprised it worked! Other compression algorithms (like PPMd) use variable n-grams under the hood, and should be much better (although less interesting due to already containing basic language models internally).
[flagged]