Converting a Git repo from tabs to spaces (2016)

(eev.ee)

40 points | by keybored 15 hours ago ago

88 comments

  • mmastrac 15 hours ago

    I wish Git had a way to "skip" a commit for blame for mechanical changes like this. It's the one big shortcoming I keep running into. A commit should be able to be marked as "blame-free" and git blame should then walk up to the parent commit.

    It might be expensive to compute but man it would be so useful.

    Edit: TIL about .git-blame-ignore-revs. I am the 1 in 10000 for this one today, thanks.

    • js2 15 hours ago

      It does. See `--ignore-revs-file`:

      https://git-scm.com/docs/git-blame

      You can configure a default:

        git config blame.ignoreRevsFile .git-blame-ignore-revs
      
      GitHub supports it too:

      https://docs.github.com/en/repositories/working-with-files/u...

      I'm really curious though. This is a feature you've wished for: have you never bothered to run `man git-blame`, `git blame --help`, or Google for it? Git has supported it for ages and it's a trivially easy feature to find. Using your own description:

      https://www.google.com/search?hl=en&q=git%20skip%20commit%20...

      • McP 7 hours ago

        Nice to see ignore-revs getting some love :)

        I originally wrote it because I wanted to do a mass-refactoring to llvm-project to change its weird naming convention and "it will mess up git blame" was an objection that was raised. Getting ignore-revs landed took many iterations over several months (thanks Barret!) and at the end of it I felt so drained that I didn't have the energy to do the mass refactoring I originally planned. Oh well. Maybe someday.

        • mabster 6 hours ago

          A big thank you! Blame history being correct is something i care quite a bit about and I always add one of these files when I do formatting changes. I think I'm probably the only developer on my teams with this configured on though haha!

      • jayd16 10 hours ago

        The annoying thing about git is that you can't really set this kind of stuff up globally for a project w/o digging into some custom hook solutions. They should really have some kind of default config file with all these things. I really don't understand why everything needs to be per user settings ONLY.

      • IshKebab 14 hours ago

        It would be a lot more usable if you could put that info in the commit.

        • prepend 9 hours ago

          No. I don’t want the author to make that decision for me. I’d rather git record everything and then I can choose how to view or render it.

          Different people have different view preferences.

          • jayd16 2 hours ago

            Ok but why not both checked in and user settings with the user overriding the repo?

        • OptionOfT 14 hours ago

          That on its own is a security risk, as it would introduce means to hide a commit in the commit itself.

          At least with the . file you have to make 2 separate transactions.

          • IshKebab 11 hours ago

            No it wouldn't. You would still be able to see the commit in logs and file histories and if you ran blame without the skipping option.

      • mmastrac 15 hours ago

        I've been using Git since the early 2010s and this feature was released in Aug 2019 (https://github.blog/open-source/git/highlights-from-git-2-23...).

        You don't think I looked for it for the first 7-8 years of using Git at least a few times and came up empty? Seems a little uncharitable. Hacker News is a place to learn about stuff, not be chided for missing a point note in a release.

        Come on man, you've been using HN for almost as long as I have. Be curious, treat people's comments with charity, continue the life-long learning tradition.

        Obligatory XKCD lucky 10,000 link: https://xkcd.com/1053/

        • js2 14 hours ago

          You're right. My apologies. It wasn't meant as a critique. I've been using git even longer and my memory was that the feature had been there way before 2019. Time flies. Relevant commit:

          https://github.com/git/git/commit/ae3f36dea16e51041c56ba9ed6...

          • xiaoyu2006 13 hours ago

            I like how the length of the commit message is at the same order of magnitude as the commit itself.

          • keybored 14 hours ago

            Thanks. Git (proj.) commits can be an enjoyable read.

    • hannasm 42 minutes ago

      Everybody else here has a fantastic solution to your complaint but wouldn't it be even better to think big here, and wish that stupid whitespace formatting issues weren't something that git was tokenizing to begin with?

    • joshstrange 15 hours ago

      `.git-blame-ignore-revs` is probably what you are looking for

      Example: https://gist.github.com/kateinoigakukun/b0bc920e587851bfffa9...

      • y-curious 15 hours ago

        My one gripe with this is that devs need to point their IDE to the file in the IDE settings. When I implemented .git-blame-ignore-revs, I got a lot of people complaining about blame disappearing completely and I had to point them all to editing IDE settings

    • braiamp 15 hours ago

      `blame -w` ignores the ones that are described in the article.

    • PhilipRoman 14 hours ago

      git-blame-ignore-revs is great, but ultimately a half measure. Replace blame with log -L

    • patrickthebold 15 hours ago

      Is .git-blame-ignore-revs what you are looking for?

    • kwk1 10 hours ago

      See also `git blame -w`

  • joshstrange 15 hours ago

    > Then, commit! As per Yelp tradition when rewriting every single file in the whole codebase, I attributed the commit to Yelp’s lovable mascot Darwin. It stands out better in git blame, and it preserved the extremely critical integrity of my commit stats.

    Interesting, I fully expected this blog post to touch on `.git-blame-ignore-revs` as a way to not "pollute" the git history but I'm not sure when that "came out". I found a Github issue from 2021 asking for support to be added to Github so it may just be newer.

    How do other people feel about this? Massive code changes across the codebase? Where I work some people are (understandably) concerned about it "ruining" `git blame` or IDE tools to blame. It's not useful to see "Converting to spaces!" on every line you want more context on. Yes, you can step further back but that's always been a little awkward for me (at least in IntelliJ) but maybe I'm missing something. I just find it incredibly helpful to understand the context of why a line was last changed and I'd want to skip over any edits like tabs->spaces.

    • johnmaguire 14 hours ago
    • zahlman 14 hours ago

      > I fully expected this blog post to touch on `.git-blame-ignore-revs` as a way to not "pollute" the git history but I'm not sure when that "came out".

      Per https://news.ycombinator.com/item?id=43869828, it appeared August 2019 - so, indeed too late for OP.

      e: Also, FTA:

      > Blame is not, in fact, permanently ruined. git blame -w ignores whitespace-only changes.

    • matsemann 15 hours ago

      What if one instead rewrote the last commit for each line to use spaces for that line? Or just rewrite the whole history to have used spaces. Might break something in the history if one were to check out an old commit, though. And makes it hard to revert if something breaks due to changing to spaces (impossible to find the offender in the diff).

      • _Algernon_ 15 hours ago

        >Or just rewrite the whole history to have used spaces.

        Ah, yes. The 1984 approach to coding

    • woodrowbarlow 15 hours ago

      `git blame -w` ignores whitespace-only changes, for what it's worth.

  • Alifatisk 15 hours ago

    Why would you want to convert from tabs to spaces?

    • diggan 15 hours ago

      > their mostly-Python codebase had always been indented with tabs

      Tabs VS spaces isn't usually very important, but what's more important is that all the stuff is the same way. So if all the other codebases (in the same language) are using tabs, then make everything (in that language) use tabs. Consistency basically :)

      • gwbas1c 10 hours ago

        I used to agree with that, until I read this article. I would always use the IDE's default and "not care" as long as the code was consistent.

        The problem with tabs is that they render as different widths in different contexts. For example, Visual Studio shows them as 5 spaces, but Github shows them as 8.

        Puts me firmly in the spaces camp now.

        • InsideOutSanta 9 hours ago

          > The problem with tabs is that they render as different widths in different contexts

          The funny thing is that this is why I prefer them. It means I control how indents render rather than the person who wrote the code.

          • mabster 6 hours ago

            I would agree, except that only deals with the left hand side of the code. We are also making decisions on the right hand side of the code to deal with lime width as well which only really works if all developers have the same tab size.

            Nowadays I just chuck format on save on all the code I deal with so I don't have to deal with any of this stuff anymore.

            If we take this to its longer conclusion though, it would be pretty good if our tooling supporting a difference between the view (using your own preferences) and storage (consistent code for committing to git or whatever).

        • diggan 9 hours ago

          > I would always use the IDE's default and "not care" as long as the code was consistent.

          I mean, "just use the IDE's default" isn't really agreeing, unless that's what your entire organization does too, and you all use the same IDE :)

      • Alifatisk 15 hours ago

        I agree.

    • Defletter 8 hours ago

      I've been firmly pro-spaces ever since I discovered there was an everlasting war over this, and it came about primarily over documentation. Say you're writing documentation within a /***/ block, so each line is prefixed by three characters. Now say your documentation includes a code snippet. Or lets say that particular sections of the documentation (such as JavaDoc's @see) are indented so each line always starts after the @see. You end up with documentation indented with spaces because it's the only way to ensure consistency. And if you're doing it with your documentation... why not your code too?

      However, my conviction has since been tested by Dart which opinionatedly forces you to use two-space indentation. There's no way to disable this and its IDE plugins enforce the style. I just find it so difficult to read, even with Rainbow Brackets. I yearn for Dart to use tabs just so I can configure the tabs to appear as four-space indentation. Or better yet, stop trying to coerce how people write their own code.

    • mcdonje 14 hours ago

      Because they're deranged control freaks who need to convert a single character that is semantically a tab into multiple characters that are an opinionated representation of a tab.

      Devs: We need to separate concerns and split the view from the model.

      Also devs: Someone might view the code differently!!1!

      • maw 11 hours ago

        A codebase that's formatted notgivingashittily is an accessibility issue. It's not just deranged control freakism.

        Maybe Yelp's codebase was otherwise clean, but aside from golang projects (and the Linux kernel) I've come to associate tabs with unreadable slop code. Maybe your experience is different.

        • smrq 9 hours ago

          Forcing a single opinionated tab width is an accessibility issue -- a real one, not a weird heuristic that boils down to "tab fans can't format". I've read multiple accounts from people who need either very small tab widths (to accommodate unusually large font sizes for eyesight reasons without cascading off the side of the screen), or very large tab widths (to accommodate difficulty in seeing indentation differences, again for eyesight reasons).

        • umbra07 3 hours ago

          I'm confused. how does handing control of the reading experience over to the reader = accessibility issue? isn't it the other way around? accessibility issues come in many different forms, and you can't accommodate them all yourself.

    • david2ndaccount 8 hours ago

      Tabs are a control character and have no business being in a text file. Do you use ascii record separator characters too?

      • imiric 5 hours ago

        No, but I do use Line Feed and Carriage Return.

      • yjftsjthsd-h 5 hours ago

        > Do you use ascii record separator characters too?

        The only reason I don't use them is because nothing supports/expects/shows them. The alternate history where we properly use them is a world where CSV isn't needed and we're better off for it.

      • IvyMike 7 hours ago

        Galaxy brain: indent using U+001F Unit Separator

    • ooterness 15 hours ago
      • zahlman 14 hours ago

        My best guess: using spaces selects for developers who understand how their editor works (which correlates with higher overall cluefulness), because they'd go insane otherwise.

    • mmastrac 15 hours ago

      Because it's the one true way, and tabs are WRONG.

      Also Vim > Emacs, the new BSG was better than the old BSG, TNG is the best Trek, and all the other hashed-out flamewars of the 90's and 2000's. :)

      • evbogue 15 hours ago

        There's a debate about new BSG being better than old BSG?

        • mmastrac 15 hours ago

          I posit:

          For every topic of A vs B where A and B are related in some way, no matter how small, there exists an argument C where two people take increasingly opposed positions about which is better.

        • HideousKojima 14 hours ago

          I actually love the original BSG. And the new one started out strong but the writers clearly didn't have a plan for where they wanted things to go despite the opening credits insisting the Cylons have a plan.

          • mixmastamyk 8 hours ago

            Agreed. Not to mention the original BSG was strangled in its crib for costing too much. Something a production in the aughts didn't have to worry much about.

    • rascul 14 hours ago

      This is the discussion I came here for.

    • y-curious 15 hours ago
    • daneel_w 15 hours ago

      Because the latter is universal, and it can always align perfectly.

          # using tabs with tabsize 4
          
          some_func( eyesore,
                      blah );
          
          some_func(
                  eyesore,
                  blah );
          
          some_func(
                      eyesore,
                      blah
                  );
      • DaiPlusPlus 14 hours ago

        > and it can always align perfectly.

        I'm firmly in Team Tab, and I want to arrest any misconception that us Tabbers would do anything as nonsensical as using our precious variable-width tab-stop chars for anything like column-aligning identifiers: we don't.

        My very hard and fast rule is that tabs are for only indenting at the block level, while spaces are used for alignment after the initial tab chars; tabs must never be used on a line if preceded by any non-tab char.

        Whereas I can't stand always-using-only-spaces-for-indenting-and-alignment - especially because when you're drag-selecting text most editors won't snap your selection to the indent level, so you get RSI in your wrist from having to make micro-movements to make sure you don't select more - or less - spaces than the intended indent. ...or worse: when moving the caret via the keyboard and having to tap your arrow-keys 4 or 8 times per indent instead of just once.

        You spaces-only people are totally spaced out, man.

        • Ferret7446 7 hours ago

          > I want to arrest any misconception that us Tabbers would do anything as nonsensical as using our precious variable-width tab-stop chars for anything like column-aligning identifiers: we don't.

          The irony is that this is exactly what tab characters are used for. Have you wondered why they're called tabs? Because they're used for tabulation, making tables. They are intended for aligning columns in a table. Not for indentation.

          • imiric 5 hours ago

            I'm so happy that languages like Go have formatting tools that sidestep all these pointless discussions. :)

          • DaiPlusPlus 7 hours ago

            We aren't using typewriters anymore.

            • layer8 an hour ago

              Word processors still use tabs for that purpose.

        • Cieric 13 hours ago

          I personally agree with this, but a lot of the tools out there break this easily. I'm curious if you have any tools that handle the formatting like this properly. I've written my own tool that will report invalid whitespace when following this, but it can't fix any of it automatically. The commonly used clang-format also messes up this scheme as it will convert alignment space to tabs.

          • DaiPlusPlus 13 hours ago

            I'll admit that I spend most of my time in Visual Studio which supports my preferences very well, including my .editorconfig (which is now the .editorconfig for my entire org... it's almost as if good ideas have a following ;)

            I do understand the appeal and advantages of having automated+opinionated re-formatting as part of a gated check-in process, because it's about having a normalized and consistent representation in the canonical repo; the idea being that you'd have a git-hook that would apply your own preferred formatting style on checkout which would be undone on commit; alas, we're not quite there yet.

            ...but having a single, normalized format (even if everyone hates it for different reasons) is the reason why gofmt and clang-format stick to spaces. I remember (back in 2017) being forced to submit to gofmt's dominion over my code and it ruining my beautifully aligned mass-assignments - and in my frustration I complained about this on StackOverflow and almost immediately someone replied with a working solution: use C-style comments to "protect" whitespace from being mangled by gofmt, see here: https://stackoverflow.com/questions/46940772/how-can-i-use-g...

            Also, apparently clang-format now supports tabs with some hoops: https://stackoverflow.com/questions/69135590/how-make-clang-... - does that work for you?

            • Cieric 11 hours ago

              I'm mainly in Visual Studio at my job as well, I was more asking for my personal projects since at work the issue has been "solved." Sadly the clang-format stuff doesn't work, while it looks like it supports tabs on the surface all those settings do (at least last I used it a year or 2 ago) is convert all of the tabs to spaces, do all the formatting it typically does, and then convert all x number of spaces back to tabs if they're at the beginning of the line. Effectively converting all the alignment spaces to tabs (leaving a few spaces at the end if it's not an even multiple.)

              My tool at this point basically just has a bunch of rules like,

                1) if tab indentation changes, spaces for alignment aren't allowed
                2) tab indentation can never be off by more than 1 of the prior line
              
              Also flags cases of trailing whitespace and I believe tabs not at the beginning of a line. Still debating how I'd like to handle fully spaced files as my current program reports no errors in that case, maybe just throw a warning somewhere that the file looks suspicious.
        • daneel_w 14 hours ago

          Sir, you are way out of alignment. Detabulate immediately.

      • zahlman 14 hours ago

        My style using spaces is

          some_func(
              eyesore,
              blah
          )
        
        which would work just as well with tabs.

        Many years ago, I used tabs, and set them to two-space indent. The former because the entire point is that tabs carry different semantic information - this is a level of indentation, not just making things align vertically - and allow each developer to set the indentation width to their preference. (The other comment from DaiPlusPlus explains the proper use of tabs, just as I did it.)

        The latter because that makes them more square. Aesthetics matter.

        I switched mostly out of peer pressure. But one argument I did find convincing is that setting some specific limit on line length - whether it's 72 or 78 or 80 or 100 or anything else - makes sense, and letting people change the amount of indentation defeats that purpose. That is: the guy who likes 8-space indents can't actually have them, because it produces a horizontal scroll for code that "conformed to the style guidelines" when written by the 2-space guy.

        But now I alias names, break up complex subexpressions etc. to avoid questions of how to split code across multiple lines - and most lines in my code are nowhere near any such length limit. And I write short functions, so there aren't enough levels of indentation to matter.

        And I use 4-space indents, because standards have value after all.

      • Asooka 15 hours ago

        We use 4-wide tabs and in our code style it would be

            some_func(
                verylongarg0,
                verylongarg1
            );
        
        Which I feel is the most readable option. If you have to break the args into a vertical list, you want to use the least amount of whitespace before each arg. It's also a bit easier to read with every term starting at a tab break.
        • daneel_w 14 hours ago

          With a large set of arguments broken down to multiple lines I prefer to keep them clear of the function name.

              with_long_func_names(
                  this_scheme,
                  looks_muddled
              );
              
              long_func_name(
                              tidier,
                              scheme
                            );
          
          But my main gripe with tabs is that no one agrees on the width.
          • layer8 an hour ago

            That’s not invariant under refactor-renaming the function name.

            An invariant way of keeping the arguments clear of the function name would be:

                with_long_func_names
                (
                    this_scheme,
                    looks_muddled
                );
            
            Though it would take some getting used to intuitively recognizing that as a function call.
          • Vendan 14 hours ago

            that's the entire point of tabs, they can be customized to what the person reading them wants. It's an accessibility issue (https://www.reddit.com/r/javascript/comments/c8drjo/nobody_t...).

            • layer8 an hour ago

              Both tabs and spaces can in principle behave any way a user would want in editors, at least for whitespace at the start of a line. In particular, an editor could have (suitable ranges of) spaces behave like tabs, or vice versa. However, almost no editor provides the full range of desired behaviors for both. For example, one thing I don’t like about tabs is that cursoring over them makes the cursor jump, and not proceed at constant speed horizontally. An editor could in principle have an option to not do those jumps, but barely any editor has.

          • mcdonje 14 hours ago

            Not agreeing on width is an argument in favor of tabs.

          • creeble 14 hours ago

            The beauty is you don’t have to.

          • jdrek1 14 hours ago

            > But my main gripe with tabs is that no one agrees on the width.

            That's the entire point of tabs. One tab means one indentation level and you as the user can decide how that's displayed. Spaces forces everyone to see the code exactly as whoever decided on his favourite width and that is in the best case "only" annoying to people with different preferences and in the worst case actively hurtful to people with disabilities.

            The only argument spaces people ever have is "some of my colleagues are too stupid to properly indent with tabs and align with spaces" and that is trivially fixed by either of those:

            - don't use alignment, it's useless anyway

            - get better coworkers

            - educate your coworkers

            - use commit hooks to check wrong usage

            So basically there is no argument left on the spaces side at all^[1]. Meanwhile tabs semantically mean "one indentation level", take up less bytes, and most importantly allow everyone to have their own preferences without affecting other people. And honestly I am insanely baffled by how many people don't get the importance of that last part. Accessibility like that costs you nothing but means the world to other people, similarly how we have ramps at public buildings for the elder, wheelchair users, strollers, and so on. And not to mention the fact that there are a lot of autistic people in programming, which often have a harder time dealing with things not being as they want them to be. Is there any reason to choose an objectively inferior method and force that onto those demographics just because "muh alignment"?

            [1] Okay fine, there is one: "Tools I don't own don't display tabs as I want them, for example GitHub with their retarded default of 8". But first of all you can change that if you're logged in and second you're supposed to use your IDE and not a web interface...

            • Asraelite 11 hours ago

              I would agree that there aren't any arguments for spaces and would be 100% on the side of tabs, except for one problem: variable width means you can't enforce a maximum column limit.

              Some people don't care about column limits, but they're important to me because I like to tile multiple editor panes side-by-side with no space wasted.

              The entire debate is stupid anyway and should already be a solved problem. If we used tooling that operates on syntax trees instead of source text, then every developer could have exactly the formatting they want without conflicts. I don't know why that isn't more widespread; the only language I know of to do it is Unison.

              • aeonik 7 hours ago

                Why can't you just have a linter or a hook check that (tabs*2 + chars) < $defined_width

                • layer8 an hour ago

                  Because a coworker might want (tabs*4 + chars) < $defined_width. Whatever multiplier you pick, it will either mean a too short or too long limit, and inconsistent limits between lines depending on how much indented a line is, if everyone uses a different tab size.

      • SoftTalker 15 hours ago

        Well, obviously tabs should always be 8 spaces.

        • daneel_w 14 hours ago

          Not sure if you're joking since 8 makes the whole problem even worse :)

  • zzo38computer 4 hours ago

    Note that some files will need tabs such as Gopher menus and Makefile.

  • gwbas1c 14 hours ago

    FYI: If you're in the .net ecosystem, you can choose your tabbing style (tabs or spaces) with an .editorconfig file. Then running "dotnet format" will change everything for you. (And, if you use github, you can create actions to assert that the .editorconfig is followed.)

    • diggan 11 hours ago

      FWIW: EditorConfig isn't a ".net ecosystem" thing but works across a ton of languages, editors and IDEs: https://editorconfig.org/

      Also, rather than using GitHub Actions to validate if it was followed (after branch was pushed/PR was opened), add it as a Git hook (https://git-scm.com/docs/githooks) to run right before commit, so every commit will be valid and the iteration<>feedback loop gets like 400% faster as you don't have to wait for Actions to finish.

      • gwbas1c 10 hours ago

        Git hooks require environment-specific configuration. CI enforcement makes sure that everyone follows the rules, even if they "forget" to set up the git hook.

        Also: dotnet format is kind of slow, which is why they aren't used where I work.

        • diggan 9 hours ago

          > CI enforcement makes sure that everyone follows the rules, even if they "forget" to set up the git hook.

          Yeah, my wording was a bit poor (shouldn't have said "rather"), both are needed, one just helps you fix stuff faster :)

          And if you write your hook in a language that can cross-compile and can easily deal with multiple platforms (Go, Rust, NodeJS, many options [probably .net too?]), it's really easy. Just need to make the setup of them part of the onboarding.

  • s09dfhks 6 hours ago

    what is this furry tomfoolory

  • baobun 7 hours ago

    I would just approach this like text. Something like:

        find -type f -name '*.py' -exec sed -i 's/^\t/    /' {} \+
    
    , until you don't see a diff

    Seems simpler to adjust that general approach to whatever codebase and replacement.

  • gwbas1c 10 hours ago

    One funny anecdote: I once did a similar cleanup on a codebase that was mostly spaces, but a few tabs slipped in. (I just did a find and replace on \t -> " ")

    Suddenly, one unit test broke. On closer inspection, whoever wrote it put a tab character into a string. I changed the test to use \t.

  • kgwxd 12 hours ago

    I never understood why programmers universally like fixed width fonts, but then about half want just 1 of those characters to be batshit crazy.

  • user9999999999 6 hours ago

    whitespace is a terrible block scope definition, its literally using 'invisible' characters to determine block scope! just use semi colons. LONG LIVE SEMI COLONS ;;;;;;;;;;;

  • gwbas1c 13 hours ago

    > One way or another, you must get this block in your devs’ Git configuration

    Uhm, things like this should be enforced in CI. IE, as a rule that must pass in order for a pull request to be merged.