LoC Is a Dumb Metric for Functions

(theaxolot.wordpress.com)

36 points | by Axol 8 hours ago ago

32 comments

A personal guideline for a lot of stuff is that a function may be too long when people add comments to mark what sections of it do. (ofc not really a hard rule). I just think it's easier to see "oh this is calling the load_some_stuff function, which I can easily see returns some data from a file." Rather than <100 lines of stuff, inlined in a big function, that you have to scan through to realize it loads some stuff and/or find the comment saying it loads some stuff>. That is to say, descriptive functions names are easier to read than large chunks of code!

smaller functions are also usually easier to test :shrug:

DarkNova6 17 minutes ago

> Locality is the design principle of grouping related things together. It’s a mental shortcut or heuristic that allows the mind to assume a higher degree of independence from the code block being read, easing the mental burden. Spreading code across your codebase diminishes locality.

> Linearity is the idea of having instructions be read one after the other, from top to bottom, just like how code is executed. Having to jump back and forth from function call to function call reduces linearity.

1000 times yes.

mirekrusin 2 hours ago

Worth noting that "cognitive complexity" may mean SonarQube metric – a metric that is not widely recognized by industry, created by SonarQube employee as (imho failed) attempt to address "issues" with "cyclomatic complexity" (principled, intdustry recognised metric). It'll count things like nullish coalescing and optional chaning as +1 on your complexity which makes it unusable with jsx or ts code.

With low thresholds ("clean code" like low) both LoC and "cognitive complexity" (as in SQ) are bad measures that lit up in red large percentage of otherwise correct code in complex projects. The way people usually "solve" them is through naive copy pasting which doesn't reduce any cognitive load - it just scatters complexity around making it harder to reason about and modify in the future.

al_borland 8 hours ago

LOC is often a rough approximation for complexity. We once had an intern who made some useful things, but he didn’t know how to break anything down. One of them was a 1,000 line perl all as one function. I asked if it could be broken down into something more maintainable and he said no. There were several projects like this.

Knowing At a high level what needed to happen to accomplish what the code did, I know for a fact it could have and should have been broken down more. However, because of the complexity of what he wrote, no one had the patience to go through it. When he left, the code got thrown away.

While 10 LOC vs 50 LOC doesn’t matter, when commas enter the number, it’s a safe bet that things have gone off the rails.

[-]

jcranmer 5 hours ago

> While 10 LOC vs 50 LOC doesn’t matter, when commas enter the number, it’s a safe bet that things have gone off the rails.

There are times when even a 1,000 LOC function is the better solution. The best example I can think of involve a switch statement with a very high branch factor (e.g., 100), where the individual cases are too simple to break out into separate functions. Something like the core loop of an interpreter will easily become a several-thousand line function.

[-]

nake89 15 minutes ago

John Carmack wrote about inlining code: http://number-none.com/blow/john_carmack_on_inlined_code.htm...

imoverclocked 5 hours ago

Cyclomatic complexity is not equal to LOC and cyclomatic complexity of a switch can be seen as adding just 1 to its enclosing function. Either way, LOC is still not a great metric while cyclomatic complexity approaches a better one.

In my experience, there are very few places where something can't be broken up to reduce cognitive overhead or maintainability. When a problem is solved in a way as to make it difficult to do then maybe it's better to approach the problem differently.

[-]

zdragnar 4 hours ago

Higher cyclomatic complexity requires more lines (barring bad code style like writing an entire program on a single line).

The inverse is not always true, but often is.

[-]

imoverclocked 2 hours ago

> Higher cyclomatic complexity requires more lines

Often true; That's why cognitive complexity wins over cyclomatic complexity.

eg: Embedded logic doesn't add linearly.

fn() { // 1

  if(...) { // +1
    if(...) { // +2
      if(...) { // +3
         ...
      }
    }
  }

} // cognitive complexity of 7 vs cyclomatic complexity of 4

It's been a while since I've implemented anything around this and was remembering cognitive complexity while writing cyclomatic complexity in the previous response. They both have their place but limiting cognitive complexity is vastly more helpful, IMHO. eg: The above code might be better written as guard-clauses to reduce cognitive complexity to 4.

kqr 4 hours ago

I would argue an interpreter that needs 1,000 lines for its core loop is probably a complex piece of software, comparable to other 1000-line projects I've made.

morshu9001 5 hours ago

That's a special case

manwe150 6 hours ago

Seems like the article missed an opportunity to talk about testing and MC/DC coverage. I don’t care if the PR is long or short, just show me that you have meaningfully tested how each branch can be reached with the full range of values and behaves correctly. Unit testing is easier with well chosen interfaces and worse without them, regardless of LOC.

hu3 4 hours ago

> I asked if it could be broken down into something more maintainable and he said no.

This is something AI is good at. It could have shown the intern that it is indeed possible to break such function without wasting an hour of a seniors time.

mrheosuper 6 hours ago

>When he left, the code got thrown away.

Why even merging his code at first place. He was intern so i assume whatever he was working on was not mission critical.

[-]

al_borland 5 hours ago

They were one-off projects where he was the sole dev, not part of a larger project. While not mission critical, the 1k line Perl script was helpful at the time. When we ran into a specific issue it allowed for recovery in minutes vs hours. We eventually added safeguards which prevented the issue from happening in the first place.

bsder 3 hours ago

> LOC is often a rough approximation for complexity.

I would argue that the word you are looking for is "containment".

Is there any real difference between calling a function and creating a "const varname = { -> insert lots of computation <-; break retval;}" inline?

If you never do that computation a second time anywhere else, I would argue that a new function is worse because you can't just scan in it quickly top to bottom. It also ossifies the interface if you have the function as you have to rearrange the function signature to pass in/out new things. Premature optimization ... mumble ... mumble.

Even Carmack mentioned this almost 20 years ago: http://number-none.com/blow/john_carmack_on_inlined_code.htm...

Given that we've shrugged off "object-itis" which really demands small functions and given modern IDEs and languages, that kind of inline-style seems a lot more useful than it used to be.

[-]

RHSeeger 3 hours ago

> If you never do that computation a second time anywhere else, I would argue that a new function is worse because you can't just scan in it quickly top to bottom.

I feel exactly the opposite. By moving "self contained" code out of a method, and replacing it with a call, you make it easier to see what the calling location is doing. Ie, you can have a method that says very clearly and concisely what it does... vs making the reader scan through dozens or hundreds of lines of code to understand it all.

Chapter 12 of Kent Beck's book Tidy First? is about exactly this.

[-]

bsder an hour ago

Do not cite Kent Beck as a software authority. Consider this your semi-regular reminder that Kent Beck is part of the posse (along with Ron Jeffries and Martin Fowler) responsible for the software disaster that was the "Chrysler Comprehensive Compensation System". They were also proponents of Smalltalk--which leads to object-itis and the concomitant tiny functions.

As for your "function documentation", the primary difference between that assignment and a function is solely ossified arguments. And ossified function arguments is something that I would put under "premature abstraction".

At some point you can pull that out into a separate function. But you should have something that causes you to do that rather than it being a default.

000ooo000 6 hours ago

>Use your judgment, and don’t be bullied by people who prescribe specific line numbers.

Ironic way to end an article that repeatedly belittles the target audience.

[-]

klysm 4 hours ago

Use your judgement, don’t be bullied by people who prescribe using your judgment.

Kimitri 2 hours ago

It's not the LoC I care about, it's the logical separation of concerns and testability. Large functions usually do many things which makes them really hard to test. Also, just mashing all the things in a single function is indicative of the author not having a clear picture of what problem he or she is dealing with.

I have been programming for 30 years and, while I don't consider myself a great programmer, I don't like large functions. In my experience they usually fail at clearly expressing intent and therefore make the code a lot harder to reason about. There are, of course, exceptions to this.

grandfugue an hour ago

It's about structures. A long function is unfortunately "flat" at the first glance. Even if there are inherent structures, it usually burns a lot of brain power for a human to abstract structures out of "flat". Being flat is a more acute issue for LLM to understand. I think in the LLM era it's more important than ever to keep things short and structured.

DanHulton 2 hours ago

It may be a bad metric, if you're being proscriptive, but it's a great heuristic. If I see a 500-line function where I'm not expecting one, I'm going to pay a lot more attention to it in the PR to try to figure out just why it isn't shorter.

jandrewrogers 4 hours ago

Sometimes I write long functions because I do not have time to write a shorter function. Eventually, if I have time, I go back and refactor a much shorter and tidier version. While later versions may be somewhat more efficient computationally, it is mostly a no-op beyond the aesthetics.

What is the LoC for that function? The first implementation or the final rewrite? They express the same thing.

morshu9001 5 hours ago

The whole point of a rule-of-thumb is to be dumb. There are very few good reasons for a 500 loc func, and the code reviewer won't want to verify it's as simple as you claim. You could still have a 50 loc func that's overly complex, and they'll complain about complexity instead of length.

One consideration is that about 40 loc will comfortably fit in one code split, and it's nice if one func will fit in 1-2 splits. Past that length, no matter how simple it is, the reader will have to start jumping around for declarations. That might be ok at first, but at some point the ratio of offscreen to onscreen code gets too big.

[-]

fragmede 5 hours ago

I wonder if there is a way to see how physical monitor quality and size improvements have led to more complicated code, nevermind moving off of punch cards.

[-]

morshu9001 5 hours ago

The type of monitor doesn't matter a whole lot because it's really limited by human eyesight. High-res monitor will enable rendering things tiny if you disable hi-dpi, but then it's unreadable. If you use a big 8K TV to display everything larger, you have to sit further away to comfortably view it. If you add more monitors, at some point it becomes too hard to look at so many things at once.

Personally, my setup has shown the same number of vim rows/cols on my 4K monitor vs my 1080p one (or whatever you call the 16:10 equivalents).

[-]

nine_k 4 hours ago

The effective resolution of the eye should be an angular value. I'd say that the smallest comfortable size of text characters for me is about 1/8° wide. I see the width of my 14" FHD laptop screen at about 30°, so the full width contains about 240 characters. This is a large upgrade from 80 columns of VT220 or VGA, and almost twice as much as the densest VT220 mode of 132 columns.

My 28" 4K monitor is exactly like four 14" FHD screens put together. It offers me 480 columns (a bit fewer, because window borders and scroll bars of multiple windows).

So indeed, better screens allow me to see much more text than poorer screens of old. There is a limit, but definitely it was not reached 30-40 years ago yet.

adrianN 2 hours ago

I can read small things much better on a high res monitor.

WillAdams 5 hours ago

For a longer, more detailed take on this see:

https://www.youtube.com/watch?v=bmSAYlu0NcY

which is for the book:

https://www.goodreads.com/book/show/39996759-a-philosophy-of...

[-]

schrodinger 4 hours ago

This needs more than just an upvote — that's one of the best software engineering books I've ever read!

imiric 3 hours ago

What an arrogant, wordy, and partly wrong take on this matter.

No, LoC is not a dumb metric. In the same way that cognitive complexity, code coverage, and similar metrics aren't. CC is definitely not "superior" to LoC, it's just different.

These are all signals that point to a piece of code that might need further attention. It is then up to the developer to decide whether to do something about it, or not.

You know what's dumb? Being dogmatic, one way or the other. There are very few hard rules in software development—they're mostly tradeoffs. So whenever I hear someone claiming that X is always bad or always good, I try to poke holes in their argument, or just steer clear from their opinion.