Show HN: Skim – 90% token reduction for LLM code analysis

(github.com)

2 points | by dean0x 9 hours ago ago

5 comments

onion2k 9 hours ago

One of the main reasons why code is often hard to understand is that it doesn't actually do what it appears to do unless you get in and read the code in depth. It looks like skim is making an assumption that would make this happen a lot - that the code doesn't rely on side effects (modifying a global, using exceptions to control flow, throwing an event on a bus for something else to pick up, etc.)

I imagine this is a really useful tool for a codebase built on pure functions, but it'll get very confused by legacy code that hasn't been written without that goal.

[-]

dean0x 9 hours ago

Thats a great point, thank you. I use it mostly to get my agent oriented in the beginning of a task. Been using it for the last couple of weeks and I seem to get better results, less code duplication, and better integrated features. I still need to reference specific files I want the agent to work on.

[-]

onion2k 9 hours ago

I suspect, but will probably never find the time to try, that you could build an AST from the source, discard the bits that are just the internals of functions and methods, and then turn what remains back into something an LLM could use.

[-]

dean0x 8 hours ago

just plug this in to your CLAUDE.md or AGENTS.md ;)

  ## Codebase Analysis

  **Before analyzing unfamiliar codebases**, use skim for efficient context:

  ```bash
  # Get architectural overview (60% reduction)
  skim src/ --mode structure

  # Get API surface (88% reduction)
  skim src/ --mode signatures

  # Get type system (91% reduction)
  skim src/ --mode types

  When to use:
  - First time exploring a repository
  - Understanding service architecture
  - Mapping API boundaries
  - Analyzing type relationships

  Install: npm install -g rskim or cargo install rskim

dean0x 9 hours ago

I built Skim to solve a specific problem: coding agents hitting context limits when analyzing codebases.

  The insight: humans don't read every line of code. We skim structure,
  signatures, comments. Agents should do the same.

  What it does:
  - Walks your repository
  - Removes function bodies and implementation details
  - Keeps structure, signatures, docstrings
  - Result: ~90% token reduction without semantic loss

  Built in Rust for performance on large repos.

  Works with most major languages (also markdown). Designed for Claude Code, GitHub Copilot, Cursor,
  or any LLM that analyzes code.

  GitHub: https://github.com/dean0x/skim

  MIT licensed. Looking for feedback on the approach and edge cases I might
  have missed.