$ sem impact authenticateUser
⊕ function authenticateUser (src/auth/login.ts:26)
→ depends on: db.findUser, rateLimiter.check
← used by: loginRoute, authMiddleware
! 42 entities transitively affected
ᛋ 7 tests affected
Okay that is pretty cool. I appreciate this information as a human also.
I got about halfway through reinventing something like this last year (minus the git part). I was trying to make a graph of dependencies in the codebase. (I actually got pretty far with a regex!)
Ha, the regex approach is honestly how a lot of people start with this problem and you can get surprisingly far with it until you hit the edge cases around aliased imports, re-exports, and nested scopes where things start falling apart. That's basically why we went with tree-sitter under the hood it gives you the actual parse tree so you don't have to keep patching regex patterns for every new language construct.
I doubt if this actually solves a real problem for humans or agents, especially in complex projects. It might help if the examples show scenarios where this tool and its commands could make a difference.
I am interested in subtle ways in which we can change how we write software to get better outcomes out of harnesses (model + tools + skills). I'm imagining that use of Sem will be more effective on code written in some shapes than others.
Can you describe what ways this might be beyond just breaking up code into smaller functions?
An example of this is that Models tend to create unit tests that are mostly just mock + reimplementations of imperative code in the functions they test. If you could force behavioral testing by only allowing test creation agents to accessing the function docstring, name/args/types, branch statements and log events, you could potentially avoid these classes of weak tests being created. But that would mean that your code has to optimize to providing signal via those elements.
This is just an example I'm not sure that would actually work.
Also I keep seeing solutions in this space that are doing inheritance and call stack dependency linkage, but I haven't seen the same level of exploration into data lifetime dependency. Not lifetime in the way it exists in Rust (to my knowledge), but like including when you copy data and transform that copy. The motivation is "if I change this variable, enumerate all the areas that change would propagate to". The idea is similar, to evaluate blast radius of modifications. Ideally something like this could make refactoring more token efficient and consistent as well.
I don't know if you can reliably do that with static analysis tho. I would be interested in some sort of debug attachment like process that does a code coverage type evaluation. If you can't tell this is at least on the edge of (if not past) my depth of expertise
This is a really interesting direction, you're essentially talking about data flow or taint analysis, where you track how a value propagates through copies and transformations rather than just following call edges. Honestly pure static analysis gets you partway there but it hits real limits once you run into dynamic dispatch, runtime branching, or serialization boundaries where data gets written somewhere and read back in a completely different part of the codebase.
We're on the structural side right now with call graphs and dependency edges, but a hybrid approach that combines the static graph with runtime instrumentation to fill in the gaps is definitely something I'd love to explore. Thanks for the feedback.
What I've been more interested in lately is structural intelligence as a field in whole.
Things with LLMs break because our infra was always designed for analyzing lines(tools like grep fuzzy matching) and working on quite small sections of code. LLMs struggle with this in cases when they have to analyze different parts of a codebase they either get too much context where you're throwing whole files at them, or too little where they only see the function in isolation, with no real understanding of how the pieces actually connect to each other.
That's really the gap sem is trying to fill. With sem impact you can give an agent the precise blast radius of a change instead of guessing which files matter, and sem diff --patch lets you enforce that a change only touches specific functions and reject anything that bleeds outside that boundary something that's really hard to do with line-level diffs.
Your testing idea is actually closer than you might think. sem already extracts entity signatures, dependencies, and call graphs, so you could build a harness that gives the test-writing agent only the function signature with its dependency graph and behavioral contract, while withholding the implementation entirely. That would force the agent toward behavioral tests because it literally can't see the internals to mock them. I haven't built this harness myself yet but sem graph and sem inspect expose everything you'd need.
The general principle is that sem gives you a structural map of the codebase to both constrain and validate what the model produces, rather than treating code as flat text and hoping the model figures out the relationships on its own.
Another usecase can be about figuring out dead code present in the codebase.
Edit: Also one last thing because I started working on this while solving the fundamental issue of why merge conflicts were occuring with git, so you might also like the merge drive I open sourced on the same Github org - Weave
The benchmarks aren't great, they're super specific to sem's output: why would I ask Claude how many "entities" were modified by a commit and do I need a tool specifically for this request ? Note that an "entity" is a sem-specific concept...
Thanks for pointing it out. I agree with you here, my testing process was quite specific to sem's output but also would love any suggestion from you of how you would design the whole testing process for this kind of tool?
I can also give my thought process, because I was more interested in figuring out the model's inherent search results and understanding without sem.
I really like this idea and have been experimenting with it over a week or so.
I think there’s an opportunity to use an AST diff system for code forges where you don’t present the user with line diffs in the UI — or at least not as the first diff the user sees.
I firmly believe code review should happen in your editor.
Really glad you've been using it, and yeah that's exactly the direction I've been thinking about. The line diff as the default view in code forges has always felt like an accident of history it definitely was easy to compute, but not what's actually useful for understanding what changed.
It can do that, but that's a small slice of what it does. sem parses your codebase into entities (functions, classes, methods) and builds a dependency graph across files.
So instead of line level analysis the whole granularity of seeing changes and tracking thing shifts to entities. It helps in attention mapping of your agent and lets you track the changes faster.
LSPs have been doing it for quite long but using treesitters is faster even tho type awareness is not great with this approach but overall working across multiple languages with a single tool can be quite helpful.
I got about halfway through reinventing something like this last year (minus the git part). I was trying to make a graph of dependencies in the codebase. (I actually got pretty far with a regex!)
Ha, the regex approach is honestly how a lot of people start with this problem and you can get surprisingly far with it until you hit the edge cases around aliased imports, re-exports, and nested scopes where things start falling apart. That's basically why we went with tree-sitter under the hood it gives you the actual parse tree so you don't have to keep patching regex patterns for every new language construct.
I doubt if this actually solves a real problem for humans or agents, especially in complex projects. It might help if the examples show scenarios where this tool and its commands could make a difference.
I am interested in subtle ways in which we can change how we write software to get better outcomes out of harnesses (model + tools + skills). I'm imagining that use of Sem will be more effective on code written in some shapes than others.
Can you describe what ways this might be beyond just breaking up code into smaller functions?
An example of this is that Models tend to create unit tests that are mostly just mock + reimplementations of imperative code in the functions they test. If you could force behavioral testing by only allowing test creation agents to accessing the function docstring, name/args/types, branch statements and log events, you could potentially avoid these classes of weak tests being created. But that would mean that your code has to optimize to providing signal via those elements.
This is just an example I'm not sure that would actually work.
Also I keep seeing solutions in this space that are doing inheritance and call stack dependency linkage, but I haven't seen the same level of exploration into data lifetime dependency. Not lifetime in the way it exists in Rust (to my knowledge), but like including when you copy data and transform that copy. The motivation is "if I change this variable, enumerate all the areas that change would propagate to". The idea is similar, to evaluate blast radius of modifications. Ideally something like this could make refactoring more token efficient and consistent as well.
I don't know if you can reliably do that with static analysis tho. I would be interested in some sort of debug attachment like process that does a code coverage type evaluation. If you can't tell this is at least on the edge of (if not past) my depth of expertise
This is a really interesting direction, you're essentially talking about data flow or taint analysis, where you track how a value propagates through copies and transformations rather than just following call edges. Honestly pure static analysis gets you partway there but it hits real limits once you run into dynamic dispatch, runtime branching, or serialization boundaries where data gets written somewhere and read back in a completely different part of the codebase.
We're on the structural side right now with call graphs and dependency edges, but a hybrid approach that combines the static graph with runtime instrumentation to fill in the gaps is definitely something I'd love to explore. Thanks for the feedback.
What I've been more interested in lately is structural intelligence as a field in whole.
Things with LLMs break because our infra was always designed for analyzing lines(tools like grep fuzzy matching) and working on quite small sections of code. LLMs struggle with this in cases when they have to analyze different parts of a codebase they either get too much context where you're throwing whole files at them, or too little where they only see the function in isolation, with no real understanding of how the pieces actually connect to each other.
That's really the gap sem is trying to fill. With sem impact you can give an agent the precise blast radius of a change instead of guessing which files matter, and sem diff --patch lets you enforce that a change only touches specific functions and reject anything that bleeds outside that boundary something that's really hard to do with line-level diffs.
Your testing idea is actually closer than you might think. sem already extracts entity signatures, dependencies, and call graphs, so you could build a harness that gives the test-writing agent only the function signature with its dependency graph and behavioral contract, while withholding the implementation entirely. That would force the agent toward behavioral tests because it literally can't see the internals to mock them. I haven't built this harness myself yet but sem graph and sem inspect expose everything you'd need.
The general principle is that sem gives you a structural map of the codebase to both constrain and validate what the model produces, rather than treating code as flat text and hoping the model figures out the relationships on its own.
Another usecase can be about figuring out dead code present in the codebase.
Edit: Also one last thing because I started working on this while solving the fundamental issue of why merge conflicts were occuring with git, so you might also like the merge drive I open sourced on the same Github org - Weave
The benchmarks aren't great, they're super specific to sem's output: why would I ask Claude how many "entities" were modified by a commit and do I need a tool specifically for this request ? Note that an "entity" is a sem-specific concept...
Thanks for pointing it out. I agree with you here, my testing process was quite specific to sem's output but also would love any suggestion from you of how you would design the whole testing process for this kind of tool?
I can also give my thought process, because I was more interested in figuring out the model's inherent search results and understanding without sem.
I really like this idea and have been experimenting with it over a week or so.
I think there’s an opportunity to use an AST diff system for code forges where you don’t present the user with line diffs in the UI — or at least not as the first diff the user sees.
I firmly believe code review should happen in your editor.
Really glad you've been using it, and yeah that's exactly the direction I've been thinking about. The line diff as the default view in code forges has always felt like an accident of history it definitely was easy to compute, but not what's actually useful for understanding what changed.
Is this for checking what Claude Code just did to your repo?
It can do that, but that's a small slice of what it does. sem parses your codebase into entities (functions, classes, methods) and builds a dependency graph across files.
So instead of line level analysis the whole granularity of seeing changes and tracking thing shifts to entities. It helps in attention mapping of your agent and lets you track the changes faster.
LSPs have been doing it for quite long but using treesitters is faster even tho type awareness is not great with this approach but overall working across multiple languages with a single tool can be quite helpful.