I had a good idea of what it did before reading the article, it is a long name but not Java-long, and none of the suggestions so far are clear to me, even after reading the article.
The only somewhat confusing part is the "twice", because it can be more than twice. But if you think about it, if it has been changed more than twice, it had to be changed twice at some point, so it is not totally wrong.
Tools like this are also useful if you need to cherry pick a patch onto a release branch and want to know potential dependencies:
↑ newer
D* fixes bug in crypto.py
C
B* rewrites crypto.sh in Python
A
0 last month’s release
↓ older
In this example, if the release needs the fix in D you’ll also need to cherry pick the rewrite in B.
You get false positives and false negatives: if B fixed a comment typo for example it’s not really a dependency, and if C updated a module imported in the new code in D you’d miss it. (For the latter, in Python at least, you can build an import DAG with ast. It’s a really useful module and is incredibly fast!)
So I would say the author’s tool is really multiple tools:
1/ build a dependency graph between commits based on file changes in a range of commits;
2/ automate the reordering and squashing of dependent commits on a private dev branch;
3/ automate cherry-picking commits onto a proposed release branch (which is basically the same as git-rebase -i); and
4/ build a dependency graph based on external analysis (in my example, Python module imports) rather than / as well as file changes.
Their use case is (1) and (2), (3) is a similar but slightly different tool to (2), and (4) is a language specific nicety that goes beyond the scope of simple git changes for, arguably, diminished returns.
"what-changed-twice" tells me exactly what the command does. "squash-what" tells me nothing, why is the program name asking me what to squash, and then why does it not squash? The only inaccuracy I can think of in the name is that it's technically "what-changed-more-than-once." But if something has changed thrice, by definition it's also been changed twice.
I suggest group-commits-by-file , group-commits , or group-by-file, depending on whether you want it to make sense out of context and whether you ever group commits differently. You might then feel compelled to add a final line like “… and 12 files with 1 commit each”, or even to enumerate them, which sounds like it’d be useful anyway. “what” isn’t doing any work, there’s already an implicit “what” in the call-response paradigm. “Changed” implies you’re detecting changes, but you’re not, you’re operating on a data structure that happens to represent changes.
Jujutsu has a command which is helpful for this sort of workflow called absorb which pushes all changes from the current commit into the most recent commit which modified that file. (Each file may be merged into a different commit).
This seems very similar to how I work by default. I sort of think in terms of "keyframes" and "frames", or "commits" and "fixes to commits."
Whenever I sit down to code with a purpose, I'll make a branch for that purpose:
git checkout -b wip/[desc]
When I make changes that I think will be a "keyframe" commit, I use:
git add .
git commit -m "wip: desc of chunk" (like maybe "wip: readme")
if I make refinements, I'll do:
git add .
git commit --amend
and when I make a nee "keyframe commit":
git commit -m "wip: [desc 2]"
and still amend fixes.
Occasionally I'll make a change that I know fixes something earlier (i.e. an earlier "keyframe" commit) but I won't remember it. I'll commit and then do:
git add .
git commit -m "fixup: wip desc, enough to describe which keyframe commit should be amended"
at the end I'll do a git rebase -i main and see something like:
123 wip: add readme (it's already had a number of amends made to it)
456 wip: add Makefile (also has had amendments)
789 wip: add server (ditto)
876 fixup: readme stuff
098 fixup: more readme
543 fixup: makefile
and I'll use git rebase -i to change it to reword for the good commits, and put the fixups right under the ones they edit. then i'll have a nice history to fast forward into main.
> There's bonus information too. If a commit is not mentioned in the report, then it only changed files that didn't change in any other commit. That means that in a rebase, I can move that commit literally anywhere else in the sequence without creating a conflict. Only the commits in the report can cause conflicts if they are reordered.
This is only true in the textual level.
Semantically, re-shuffling commits like this can still cause conflicts. Ie it can break your tests. Not at the end, but for the intermediate commits.
This is why I no longer do atomic commits. I've just never had it be a benefit to walk through and guarantee that each commits tests and builds successfully. I so rarely back out changes that when I do, I test then that everything is working (and let's be honest, I back out usually at the PR level, not the commit).
I am familiar with an algorithm that stably brings a disjoint selection of items together around a specified point. Sounds similar to this case, where the disjoint selection are changes that happened to a given file.
The name of the algorithm is “gather”, by Sean Parent and Marshall Clow.
Suggestion: `git squash-report`. (Or `git rebase-report`, except I wouldn't call it that because it would interfere with my tab-completion of, and/or muscle memory of, `git rebase -i`.)
When I make Bash aliases or functions for Git functionality, I always name them as `git-something-or-other`. That way they're namespaced in a way that I find pleasant both for tab completion and for easy of memory. I think that should apply to more complex utilities, too.
By my usual naming conventions, this one would be `git-repeatedly-changed`.
Last but decidedly not least: if you have `git-foo` on the PATH, you can do `git foo` and it will automatically pick up your program.
If I remember early git days correctly, that's how git was implemented: a bunch of separate utilities working together on the database which is the .git folder.
Why does it needs a new name?
I had a good idea of what it did before reading the article, it is a long name but not Java-long, and none of the suggestions so far are clear to me, even after reading the article.
The only somewhat confusing part is the "twice", because it can be more than twice. But if you think about it, if it has been changed more than twice, it had to be changed twice at some point, so it is not totally wrong.
At the time I started writing the article, the utility was called `analyze-commits`. Hard to think of a worse name than that!
By the time I finished writing it I had come up with a less crappy name, but I thought I'd leave the question in the post anyway.
Tools like this are also useful if you need to cherry pick a patch onto a release branch and want to know potential dependencies:
In this example, if the release needs the fix in D you’ll also need to cherry pick the rewrite in B.You get false positives and false negatives: if B fixed a comment typo for example it’s not really a dependency, and if C updated a module imported in the new code in D you’d miss it. (For the latter, in Python at least, you can build an import DAG with ast. It’s a really useful module and is incredibly fast!)
So I would say the author’s tool is really multiple tools:
1/ build a dependency graph between commits based on file changes in a range of commits;
2/ automate the reordering and squashing of dependent commits on a private dev branch;
3/ automate cherry-picking commits onto a proposed release branch (which is basically the same as git-rebase -i); and
4/ build a dependency graph based on external analysis (in my example, Python module imports) rather than / as well as file changes.
Their use case is (1) and (2), (3) is a similar but slightly different tool to (2), and (4) is a language specific nicety that goes beyond the scope of simple git changes for, arguably, diminished returns.
"what-changed-twice" tells me exactly what the command does. "squash-what" tells me nothing, why is the program name asking me what to squash, and then why does it not squash? The only inaccuracy I can think of in the name is that it's technically "what-changed-more-than-once." But if something has changed thrice, by definition it's also been changed twice.
> “squash-what" tells me nothing, why is the program name asking me what to squash, and then why does it not squash?
‘squash-candidates’ would address all of that.
I suggest group-commits-by-file , group-commits , or group-by-file, depending on whether you want it to make sense out of context and whether you ever group commits differently. You might then feel compelled to add a final line like “… and 12 files with 1 commit each”, or even to enumerate them, which sounds like it’d be useful anyway. “what” isn’t doing any work, there’s already an implicit “what” in the call-response paradigm. “Changed” implies you’re detecting changes, but you’re not, you’re operating on a data structure that happens to represent changes.
Jujutsu has a command which is helpful for this sort of workflow called absorb which pushes all changes from the current commit into the most recent commit which modified that file. (Each file may be merged into a different commit).
This seems very similar to how I work by default. I sort of think in terms of "keyframes" and "frames", or "commits" and "fixes to commits."
Whenever I sit down to code with a purpose, I'll make a branch for that purpose: git checkout -b wip/[desc]
When I make changes that I think will be a "keyframe" commit, I use: git add . git commit -m "wip: desc of chunk" (like maybe "wip: readme")
if I make refinements, I'll do: git add . git commit --amend
and when I make a nee "keyframe commit": git commit -m "wip: [desc 2]"
and still amend fixes.
Occasionally I'll make a change that I know fixes something earlier (i.e. an earlier "keyframe" commit) but I won't remember it. I'll commit and then do: git add . git commit -m "fixup: wip desc, enough to describe which keyframe commit should be amended"
at the end I'll do a git rebase -i main and see something like:
123 wip: add readme (it's already had a number of amends made to it) 456 wip: add Makefile (also has had amendments) 789 wip: add server (ditto) 876 fixup: readme stuff 098 fixup: more readme 543 fixup: makefile
and I'll use git rebase -i to change it to reword for the good commits, and put the fixups right under the ones they edit. then i'll have a nice history to fast forward into main.
git-absorb (https://github.com/tummychow/git-absorb) does a bit more, figuring out the exact changes that should be fixed up.
Yes, totally useful compared to default git base commands.
And also - melding the "changed twice" (or thrice...) mutations into a single commit is a brilliant isolation of a subtle common pattern.
> There's bonus information too. If a commit is not mentioned in the report, then it only changed files that didn't change in any other commit. That means that in a rebase, I can move that commit literally anywhere else in the sequence without creating a conflict. Only the commits in the report can cause conflicts if they are reordered.
This is only true in the textual level.
Semantically, re-shuffling commits like this can still cause conflicts. Ie it can break your tests. Not at the end, but for the intermediate commits.
This is why I no longer do atomic commits. I've just never had it be a benefit to walk through and guarantee that each commits tests and builds successfully. I so rarely back out changes that when I do, I test then that everything is working (and let's be honest, I back out usually at the PR level, not the commit).
I am familiar with an algorithm that stably brings a disjoint selection of items together around a specified point. Sounds similar to this case, where the disjoint selection are changes that happened to a given file.
The name of the algorithm is “gather”, by Sean Parent and Marshall Clow.
https://github.com/stlab/adobe_source_libraries/blob/7659244...
https://listarchives.boost.org/Archives/boost/2013/01/200366...
I gotta say, I don't see the greatness any more than most of the repliers in that Boost thread — it's just two stable_partitions in a row.
"[...] Or is there some optimization that gather provides over (stable_)partition? —— Nope. [...]"
"muster" comes to mind and is different than "gather".
Suggestion: `git squash-report`. (Or `git rebase-report`, except I wouldn't call it that because it would interfere with my tab-completion of, and/or muscle memory of, `git rebase -i`.)
Why did you opt for "highly-abbreviated commit IDs"?
Instead of:
``` calendar/seasons.blog 196 40 d1
```The tool should simply display:
``` calendar/seasons.blog 196e749 40c52f4 d142598 ```
That's it!
The second table only complicates the output.
PS:
`what-changed-twice` is a good name.
When I make Bash aliases or functions for Git functionality, I always name them as `git-something-or-other`. That way they're namespaced in a way that I find pleasant both for tab completion and for easy of memory. I think that should apply to more complex utilities, too.
By my usual naming conventions, this one would be `git-repeatedly-changed`.
Last but decidedly not least: if you have `git-foo` on the PATH, you can do `git foo` and it will automatically pick up your program.
If I remember early git days correctly, that's how git was implemented: a bunch of separate utilities working together on the database which is the .git folder.
git-delta -n <times>
i.e. git-delta -n 2 = 'what changed twice'
or if its just what changed twice in every case then just 'git-delta-delta'
Double Jeopardy?
Oidia- Oops I did it again
You know, I find myself partially agreeing that a number of utilities for git could be done quite nicely in perl.
Git's repository includes quite a bit of Perl, but they want to get rid of it.
No it does not.
change-cluster?
FlipFlopStop?
FFS for short, which has suitably disgruntled other exclamatory meanings.