The Art of Manually Editing Hunks

(kennyballou.com)

25 points | by alraj 8 months ago ago

19 comments

Sounds like a problem for rediff (https://github.com/twaugh/patchutils).

Instead of directly editing the unified diff file, make a copy and edit the copy. Once you’re done, call rediff original-patch edited-patch to get a patchfile with fixed offsets.

Note that rediff only fixes your offsets, and doesn’t remove empty hunks. patch is not happy about empty hunks existing, so if you happen to revert all changes in a hunk, remember to delete the hunk after rediff. Probably this ought to be contributed back upstream.

lionelw 8 months ago

If this is about staging partial file changes, then I think this is where git GUIs especially excel. It's typically a matter of selecting and deselecting individual lines of change. The concept and term "hunk" (to mean a group of lines, and honestly "chunk" would've been a better choice) never shows up. Pretty intuitive and foolproof.

I've used Git Cola on Linux and Github Desktop on Windows. Both free.

[-]

your_fin 8 months ago

For the TUI inclined, lazygit [1] and magit (emacs) [2] both have quick and intuitive ways of handling this. They're also both wonderful companions to the git cli for day to day version control.

[1]: https://github.com/jesseduffield/lazygit

[2]: https://magit.vc/

ris 8 months ago

Further to this, don't underestimate how powerful git diff | sed | git apply can be for mass edits of files.

impure 8 months ago

Yeah, you can do this to make your commits cleaner, but I find it easier to just manually edit the file and then undo after you add the patch.

[-]

recursive 8 months ago

I've never felt the urge to commit half of a file, but if I did, surely I would just copy the whole file to scratch buffer somewhere, edit the real file to the first state I want, and then commit that.

Some people say they have to re-learn regular expressions each time they use them. I think this is mostly because they don't use them much. For me, I have to re-learn fancy git stuff like hunk staging each time I use it. Because I never use it.

[-]

mananaysiempre 8 months ago

I don’t know, I feel that having to go through each change individually before committing is useful discipline. Nothing to learn: just tell yourself to only ever use `git add -p` and the rest will come naturally. Your first edited hunk, in particular, will come the first time you want to stage your actual change without having to delete that debug print that you figure might still come in handy. (Also useful but admittedly situational are git checkout -p, git reset -p, and git stash -p; and editing hunks within those is a bit of a rarity.)

I guess the difference might be that my workflow is less edit → stage → commit and more edit → (stage things I’m sure about → edit/debug things I’m not →)* commit. For example, “the staged version is what I have in the debugger, modulo debugging code” can be a useful invariant to maintain. (Now that I’m thinking about it, having the build system automatically save a snapshot of the source it used somewhere in the VCS—and stamp it in the executable—sounds tremendously useful. Surely somebody’s already built that?..)

[-]

recursive 8 months ago

I do review all my changes before committing. In my workflow, that's the same time I remove the debugging instrumentation. I can't imagine a scenario where there's debug code that I don't want to commit, but I still have a use for it after committing.

If there's still a known bug I'm trying to solve in this area of code, I'll include that also. If this is a long-standing mystery, then the logging code probably needs to get committed anyway.

PittleyDunkin 8 months ago

I use hunk editing quite a bit to organize long chains of commits into more logical order. This often involves merging and then re-splitting commits, but sometimes interactive rebasing is sufficient.

self_awareness 8 months ago

I originally thought this was about editing AmigaOS executable files. Those contain HUNKs.

kelnos 8 months ago

The article mentions this, but suggests that doing it that way is error-prone, and if you accidentally make changes after your undo, but before your redo, you can lose the changes you want to save.

I do it your way often too, but I'm going to give hunk editing a try; assuming it isn't too difficult to get right, it feels safer and ultimately easier.

rapidlua 8 months ago

Is it just me, or the piece doesn’t explain how to make edits without messing up the hunk?

[-]

mananaysiempre 8 months ago

If you were actually editing a patch file, that would be a concern; but in Git, you only edit one hunk at a time, so all’s fine as long as you don‘t mess up the context lines—Git will ignore the source and destination line numbers and counts that your editing has likely rendered incorrect. (An easy way to mess up the context lines is to have your editor strip trailing whitespace on save, as the unified diff syntax for an empty context line is a single space. If anybody asks, in no way did this cause me to be puzzled for literal weeks by hunk edits failing seemingly at random.)

[-]

rapidlua 8 months ago

I do occasionally attempt to edit patch files produced by git-format-patch. Frequently I end up with corrupt patch. Still curious how to fix those numbers.

[-]

mananaysiempre 8 months ago

With git format-patch I’d say take a worktree, git apply the patch to its intended base, commit, rebase, and git format-patch again :)

Otherwise, well, the numbers are -(old start line number),(old line count) +(new start line number),(new line count) for the entire hunk introduced by @@ (whether it contains one group of changed lines or more). I'm sure you see how to fix them up, but accumulating a line number shift as you go through the file sounds very fiddly and error-prone. It also sounds like something the computer should be able to do for you (given the old patch and the new messed-up patch whose hunks correspond one-to-one to the old).

ETA: I seem to have been nerd-sniped. Mind the bugs, etc etc:

  #!/usr/bin/awk -f
  # usage: awk -f fix-patch.awk ORIGINAL EDITED > FIXED
  # assumes hunks in ORIGINAL are grouped by file and sorted by line number
  # assumes every unchanged or deleted line in ORIGINAL remains in EDITED
  # can get confused by lines starting with @@ before start of diff
  
  function flush() {
      coline += odelta; odelta = (coline + cosize) - (oline[n] + osize[n])
      cnline += ndelta; ndelta = (cnline + cnsize) - (nline[n] + nsize[n])
      if (hunk) printf "@@ -%d,%d +%d,%d %s", coline, cosize, cnline, cnsize, hunk
      hunk = ""; coline = cosize = cnline = cnsize = 0
  }
  
  BEGIN { FS = "[-+, ]+" }
  FNR == 1 { n = 0 }
  /^@@ / { n++; coline = $2; cnline = $4 }
  coline { cosize += /^[- ]/; cnsize += /^[+ ]/ }
  /^@@ / && FNR == NR { oline[n] = $2; osize[n] = $3; nline[n] = $4; nsize[n] = $5 }
  cosize == osize[n] && !/^\+/ { flush() } END { flush() }
  FNR == NR { next }
  !hunk && /^\+\+\+/ { odelta = ndelta = 0 }
  /^@@ / { sub(/^@@ [-0-9,]+ [+0-9,]+ /, "") }
  coline { hunk = hunk $0 "\n"; next }
  { print }

kelnos 8 months ago

Yeah, I was hoping for some simple rubric/trick/rules to help when needing to edit a diff file manually. Since 'git add -p' handles updating the hunk metadata for you, it handles what I'd consider the hard part.

Optimal_Persona 8 months ago

I honestly thought this would be about retouching photos of buff men!

[-]

jccc 8 months ago

“Stage this hunk!”

(As a gay man, I appreciate an extended and detailed discussion of hunks finally appearing on HN.)

8 months ago

[deleted]