The Art of Manually Editing Hunks

(kennyballou.com)

24 points | by alraj 10 days ago ago

19 comments

  • qazxcvbnm 2 days ago

    Sounds like a problem for rediff (https://github.com/twaugh/patchutils).

    Instead of directly editing the unified diff file, make a copy and edit the copy. Once you’re done, call rediff original-patch edited-patch to get a patchfile with fixed offsets.

    Note that rediff only fixes your offsets, and doesn’t remove empty hunks. patch is not happy about empty hunks existing, so if you happen to revert all changes in a hunk, remember to delete the hunk after rediff. Probably this ought to be contributed back upstream.

  • lionelw 2 days ago

    If this is about staging partial file changes, then I think this is where git GUIs especially excel. It's typically a matter of selecting and deselecting individual lines of change. The concept and term "hunk" (to mean a group of lines, and honestly "chunk" would've been a better choice) never shows up. Pretty intuitive and foolproof.

    I've used Git Cola on Linux and Github Desktop on Windows. Both free.

  • ris 2 days ago

    Further to this, don't underestimate how powerful git diff | sed | git apply can be for mass edits of files.

  • impure 2 days ago

    Yeah, you can do this to make your commits cleaner, but I find it easier to just manually edit the file and then undo after you add the patch.

    • recursive 2 days ago

      I've never felt the urge to commit half of a file, but if I did, surely I would just copy the whole file to scratch buffer somewhere, edit the real file to the first state I want, and then commit that.

      Some people say they have to re-learn regular expressions each time they use them. I think this is mostly because they don't use them much. For me, I have to re-learn fancy git stuff like hunk staging each time I use it. Because I never use it.

      • mananaysiempre 2 days ago

        I don’t know, I feel that having to go through each change individually before committing is useful discipline. Nothing to learn: just tell yourself to only ever use `git add -p` and the rest will come naturally. Your first edited hunk, in particular, will come the first time you want to stage your actual change without having to delete that debug print that you figure might still come in handy. (Also useful but admittedly situational are git checkout -p, git reset -p, and git stash -p; and editing hunks within those is a bit of a rarity.)

        I guess the difference might be that my workflow is less edit → stage → commit and more edit → (stage things I’m sure about → edit/debug things I’m not →)* commit. For example, “the staged version is what I have in the debugger, modulo debugging code” can be a useful invariant to maintain. (Now that I’m thinking about it, having the build system automatically save a snapshot of the source it used somewhere in the VCS—and stamp it in the executable—sounds tremendously useful. Surely somebody’s already built that?..)

        • recursive 2 days ago

          I do review all my changes before committing. In my workflow, that's the same time I remove the debugging instrumentation. I can't imagine a scenario where there's debug code that I don't want to commit, but I still have a use for it after committing.

          If there's still a known bug I'm trying to solve in this area of code, I'll include that also. If this is a long-standing mystery, then the logging code probably needs to get committed anyway.

      • PittleyDunkin 2 days ago

        I use hunk editing quite a bit to organize long chains of commits into more logical order. This often involves merging and then re-splitting commits, but sometimes interactive rebasing is sufficient.

      • self_awareness 2 days ago

        I originally thought this was about editing AmigaOS executable files. Those contain HUNKs.

    • kelnos 2 days ago

      The article mentions this, but suggests that doing it that way is error-prone, and if you accidentally make changes after your undo, but before your redo, you can lose the changes you want to save.

      I do it your way often too, but I'm going to give hunk editing a try; assuming it isn't too difficult to get right, it feels safer and ultimately easier.

  • rapidlua 2 days ago

    Is it just me, or the piece doesn’t explain how to make edits without messing up the hunk?

    • mananaysiempre 2 days ago

      If you were actually editing a patch file, that would be a concern; but in Git, you only edit one hunk at a time, so all’s fine as long as you don‘t mess up the context lines—Git will ignore the source and destination line numbers and counts that your editing has likely rendered incorrect. (An easy way to mess up the context lines is to have your editor strip trailing whitespace on save, as the unified diff syntax for an empty context line is a single space. If anybody asks, in no way did this cause me to be puzzled for literal weeks by hunk edits failing seemingly at random.)

      • rapidlua 2 days ago

        I do occasionally attempt to edit patch files produced by git-format-patch. Frequently I end up with corrupt patch. Still curious how to fix those numbers.

        • mananaysiempre 2 days ago

          With git format-patch I’d say take a worktree, git apply the patch to its intended base, commit, rebase, and git format-patch again :)

          Otherwise, well, the numbers are -(old start line number),(old line count) +(new start line number),(new line count) for the entire hunk introduced by @@ (whether it contains one group of changed lines or more). I'm sure you see how to fix them up, but accumulating a line number shift as you go through the file sounds very fiddly and error-prone. It also sounds like something the computer should be able to do for you (given the old patch and the new messed-up patch whose hunks correspond one-to-one to the old).

          ETA: I seem to have been nerd-sniped. Mind the bugs, etc etc:

            #!/usr/bin/awk -f
            # usage: awk -f fix-patch.awk ORIGINAL EDITED > FIXED
            # assumes hunks in ORIGINAL are grouped by file and sorted by line number
            # assumes every unchanged or deleted line in ORIGINAL remains in EDITED
            # can get confused by lines starting with @@ before start of diff
            
            function flush() {
                coline += odelta; odelta = (coline + cosize) - (oline[n] + osize[n])
                cnline += ndelta; ndelta = (cnline + cnsize) - (nline[n] + nsize[n])
                if (hunk) printf "@@ -%d,%d +%d,%d %s", coline, cosize, cnline, cnsize, hunk
                hunk = ""; coline = cosize = cnline = cnsize = 0
            }
            
            BEGIN { FS = "[-+, ]+" }
            FNR == 1 { n = 0 }
            /^@@ / { n++; coline = $2; cnline = $4 }
            coline { cosize += /^[- ]/; cnsize += /^[+ ]/ }
            /^@@ / && FNR == NR { oline[n] = $2; osize[n] = $3; nline[n] = $4; nsize[n] = $5 }
            cosize == osize[n] && !/^\+/ { flush() } END { flush() }
            FNR == NR { next }
            !hunk && /^\+\+\+/ { odelta = ndelta = 0 }
            /^@@ / { sub(/^@@ [-0-9,]+ [+0-9,]+ /, "") }
            coline { hunk = hunk $0 "\n"; next }
            { print }
    • kelnos 2 days ago

      Yeah, I was hoping for some simple rubric/trick/rules to help when needing to edit a diff file manually. Since 'git add -p' handles updating the hunk metadata for you, it handles what I'd consider the hard part.

  • Optimal_Persona 2 days ago

    I honestly thought this would be about retouching photos of buff men!

    • jccc 2 days ago

      “Stage this hunk!”

      (As a gay man, I appreciate an extended and detailed discussion of hunks finally appearing on HN.)

    • 2 days ago
      [deleted]