Show HN: Searchable compression for JSON – ~99% page skip and sub-ms lookups

(github.com)

13 points | by kodomonocch1 16 hours ago ago

6 comments

esafak 15 hours ago

It looks like you want to make money off this file format? That seems difficult. You would need to build a product around it first. I suppose some kind of a search or observability company could get funded if you have a demo. But be warned that running a company involves a lot more than developing a secret sauce.

The easiest thing is to popularize it and get a well-paying job from your fame. Make some friends and start your company together.

stuartjohnson12 15 hours ago

From OP's Github: "I am a 20-year-old university student living in Japan. Although I'm a liberal arts major, I aspire to become an engineer."

Just FYI - this is most likely vibe coding that a sycophantic AI has persuaded OP is cutting edge research.

zahlman 15 hours ago

It doesn't exactly inspire confidence observing that the .see "archive" included in the zip distribution apparently gets further compressed by more than 2:1 within the zip archive....

kodomonocch1 16 hours ago

Happy to answer design details (page layout, Bloom tuning, codec selection, failure modes). Minimal Python examples for exists(key) and positions(key) are in the repo. If anyone needs deeper materials (reproducible FULL benches, wheel artifacts, and design notes) we have an NDA-gated VDR; I can share the form on request.

duanhjlt 16 hours ago

Congrats on the release. The SEE approach—schema-aware delta, dictionaries, PageDir, and tuned Bloom filters—seems thoughtfully engineered. The tradeoff versus pure zstd makes sense if selective probes dominate TCO. I’ll try the quick demo; curious about failure modes and Bloom tuning across varied schemas.

throwuxiytayq 15 hours ago

“Millisecond lookups” sounds funny when you work in game dev. Anyway, interesting idea, thanks for sharing. Where the code at, though?