Testing is better than data structures and algorithms

(nedbatchelder.com)

177 points | by rsyring 2 days ago ago

165 comments

Always gonna have to side with Peter Norvig on this one: https://pindancing.blogspot.com/2009/09/sudoku-in-coders-at-...

> They said, “Look at the contrast—here’s Norvig’s Sudoku thing and then there’s this other guy, whose name I’ve forgotten, one of these test-driven design gurus. He starts off and he says, “Well, I’m going to do Sudoku and I’m going to have this class and first thing I’m going to do is write a bunch of tests.” But then he never got anywhere. He had five different blog posts and in each one he wrote a little bit more and wrote lots of tests but he never got anything working because he didn’t know how to solve the problem. I actually knew—from AI—that, well, there’s this field of constraint propagation—I know how that works. There’s this field of recursive search—I know how that works. And I could see, right from the start, you put these two together, and you could solve this Sudoku thing. He didn’t know that so he was sort of blundering in the dark even though all his code “worked” because he had all these test cases.

[-]

pncnmnp a day ago

I love what Norvig said. I can relate to it. As far as data structures are concerned, I think it's worth playing smart with your tests - focus on the "invariants" and ensure their integrity.

A classic example of invariant I can think of is the min-heap - node N is less than or equal to the value of its children - the heap property.

Five years from now, you might forget the operations and the nuanced design principles, but the invariants might stay well in your memory.

runeblaze a day ago

That story reads like what happens when the average senior engineer tries to do a hardish usaco problem; turns out algorithm engineering is different from your average enterprise engineering; turns out there are people in both camps

MrJohz a day ago

I think the point Norvig is making there broadly agrees with this post though. In the Sudoku affair, Norvig had the DSA knowledge there, sure, but his point is more that you need to be willing to look up other people's answers, rather than assuming you have enough knowledge or that you can slowly iterate towards a correct answer. You can't expect to solve every problem yourself with the right application of DSA/TDD/whatever.

That's the same as the blog post: you need to know enough DSA to be able to understand how to look for the right solution if presented with a problem. But Batchelder's point is that, beyond that knowledge, learning testing as a skill will be more valuable to you than learning a whole bunch of individual DSA tricks.

IvyMike a day ago

More context, from an earlier HN comment: https://news.ycombinator.com/item?id=3033446

roxolotl a day ago

This totally misses the point of the article. The article agrees that knowing when a problem is a data structure and algo problem is a key strength. The article also isn’t saying that all development should be done TDD.

The point of the article is that knowing how to test well is more useful than memorizing solutions to algo problems. You can always look those up.

quotemstr a day ago

> “Well, I’m going to do Sudoku and I’m going to have this class and first thing I’m going to do is write a bunch of tests.” But then he never got anywhere

There's a blog post I read once and that I've since been unable to locate anywhere, even with AI deep research. It was a blow-by-blow record of an attempt to build a simple game --- checkers, maybe? I can't recall --- using pure and dogmatic test driven development. No changes at all without tests first. It was a disaster, and a hilarious one at that.

Ring a bell for anyone?

[-]

moron4hire a day ago

Norvig mentions it in the article linked in the post to which you are replying. The game was Sudoku. The person was Ron Jeffries. https://ronjeffries.com/articles/-z022/01121/sudoku-again/

[-]

quotemstr 19 hours ago

Thanks!

[-]

fifilura 12 hours ago

To be fair, I'd like to be forgiven for anything I did in 2006.

It is a story that reads like a fairy tale, but it is time to give the guy a break.

EVa5I7bHFq9mnYK 17 hours ago

If you write all the tests, I'm sure the LLM can figure out the implementation.

casey2 21 hours ago

To be fair to the other guy a Sudoku solver easier to bang out than a tiny distributed operating system environment that happens to solve sudoku, even if your language does help you.

JackSlateur a day ago

I think the author is mislead

Let's grab a simple use case: some basic CRUD http API. Easy, you say, no need to know fancy stuff ! Just test it and that's all.

You do your test, all good, you can roll in production !

But sadly, in production, you have multiple users (what an idea ..). Suddenly, your CRUD api has become a concurrent system. Suddenly, you have data corruption, because you never thought about anything about that, and "your tests were green".

Algorithms are the backbone tools of programming. Knowing them help us, ignoring them burdens us.

[-]

themafia a day ago

Macro Pierre White says "perfection is lots of little things done well."

Which is something I've always agreed with, so, I never understand articles that seek to eschew an important part of releasing software because they believe their approach elsewhere is enough to overcome these intentionally suboptimal choices.

[-]

yakshaving_jgt a day ago

Almost all of professional software should be intentionally suboptimal.

This is what we mean when we say that premature optimisation is the root of all evil.

[-]

FridgeSeal 20 hours ago

We ought to ban people saying that quote, due to the way it has been abused to avoid considering performance _at all_.

“Intentionally suboptimal” is also a strange way of phrasing it, as it makes it sound a bit like you’re intentionally building something bad, as opposed to “only as good as it needs to be”.

[-]

yakshaving_jgt 19 hours ago

In general I avoid considering performance at all. Instead, I focus on adding testing and instrumentation. When the telemetry tells me I have a performance problem, then I can solve that part while being confident that the performance improvement doesn't change the functionality because I first invested in tests.

hinkley a day ago

Mediocre testing can also lead to a situation where there is friction for improvement because the tests are brittle and coupled (with each other and the misfeature you’re interested in fixing).

I like a more uniform distribution in my testing efforts. Start earlier, end later than most, and it’s experiences like this that inform that preference. And also production bugs in code with supposed 100% test coverage.

[-]

sarchertech a day ago

> Mediocre testing can also lead to a situation where there is friction for improvement because the tests are brittle and coupled

This is very very common among inexperienced devs and in immature organizations that think that more tests necessarily means better.

[-]

hinkley a day ago

And devs who think they are experienced by disappear when the testing gets tough.

wubrr a day ago

I mean, in your example you just have an incomplete test suite. (Though writing a complete one is often unrealistic)

While understanding algorithms and data structures is important, the only way you really know how well it works, and how well it's implemented is by thoroughly testing it. There are an infinite amount of clever algorithms out there with terrible implementations.

You need both.

[-]

JackSlateur a day ago

Testing concurrency is extremely hard

For instance, get sql queries; You ran them, and you have no issue; Is your code sane ? Or is it because one query ran 10ms earlier and, thus, you avoided the issue ?

I truly wonder if there is real world tests around this; I bet there is only algorithm and fuzzing;

[-]

hxtk a day ago

I’ve long wished for an SQL error model: given a schema, query, and transaction isolation mode, what errors are theoretically possible?

I have a hard time answering this for Postgres, which disappoints me because I don’t see any reason it sounds very easy to answer, like there could be an extension to EXPLAIN that would dry run the query and list all the error states reachable.

[-]

jiggawatts 21 hours ago

Computer Science is incredibly immature. Many of its "founding fathers" are still alive!

Something akin to this that blew my mind recently was an IDE for a functional language that used typed holes, the programming equivalent of a semiconductor electron "vacancy", a quasi-particle with real properties that is actually just the lack of a particle. The concept was that if you delete (or haven't yet typed) some small segment out of an otherwise well-typed program, the compiler can figure out the type that the missing part must have. This can be used for rapid development because many such "holes" can only have one possible type.

This kind of mechanistic development with tool assistance is woefully under-developed.

toolslive a day ago

There are "lightweight formal methods". Most problems can be produced via small models. Tools like alloy are built around this idea. (IIRC alloy was used to show that a famous DHT had issues with the churn protocol)

https://en.wikipedia.org/wiki/Alloy_(specification_language)

terpimost 13 hours ago

https://antithesis.com/ was made to deal with this. You can think of its as a fuzzing but it has overall determinism for the whole system, so there is a time travel and interactive debugging.

lucianbr a day ago

https://jepsen.io/

wubrr a day ago

> Testing concurrency is extremely hard

Writing a non-trivial concurrent system based on your understanding of the 'algorithm' , without relying on testing is much harder.

> I truly wonder if there is real world tests around this

Of course there are. There are many tools, methods, and test suites out there for concurrency testing, for almost any major language out there. Of course, understanding your algorithm, and the systems involved is required to write a proper test suite.

> For instance, get sql queries; You ran them, and you have no issue; Is your code sane ?

Take those queries and run them 1000x+ times concurrently in a loop. That will catch most common issues. If you want to go a step further you can build a setup to execute your queries in any desired order.

[-]

sarchertech a day ago

I’ve never worked somewhere (in 20 years from big tech companies to small startups) that was generally and reliably testing for concurrency bugs.

And I’ve seen dozens of bugs caused by people assuming that transactions (with the default isolation level) protect against race conditions.

[-]

wubrr a day ago

Every place I worked at, that had any kind of reliable, high-throughput concurrent system had an extensive suite of concurrent tests.

https://github.com/postgres/postgres/tree/master/src/test/is...

https://muratbuffalo.blogspot.com/2023/08/distributed-transa...

https://learn.microsoft.com/en-us/archive/msdn-magazine/2008...

https://go.dev/blog/synctest

https://learntla.com/core/concurrency.html

[-]

sarchertech a day ago

> Every place I worked at, that had any kind of reliable, high-throughput concurrent system

Pretty much anyone with high throughput is running a high throughput concurrent system, and very few companies have an extensive suite of concurrency tests unless you just mean load tests (that aren’t setup to catch race conditions).

The “reliable” part of that statement might be doing a lot of heavy lifting depending on what exactly you mean by that.

[-]

wubrr a day ago

I gave you several concrete examples. Your claims of 'very few companies have...' aren't very convincing, and the apparent popularity of concurrency testing isn't really a strong argument for or against it's effectiveness or do-ability.

[-]

sarchertech a day ago

Did you Google “concurrency testing” and send me the top 5 results?

[-]

fn-mote a day ago

Kind of looks like it … the supporting evidence includes work from Microsoft: learning how to write concurrent programs. Surely not evidence that Microsoft is testing for concurrency bugs (of course they are).

[-]

ongy a day ago

> In Go 1.24, we are introducing a new, experimental testing/synctest package Clearly a mature mechanism we'd see in large companies...

wubrr a day ago

Maybe you should have googled 'concurrency testing' before telling me a story about how you worked at every tech company for 76000 years and never saw any concurrency testing lmao.

[-]

hluska a day ago

They said twenty, not 76000. What a crock.

[-]

jamespo a day ago

Perhaps he had a concurrency bug writing the sentence and didn't lock the value

gubicle a day ago

same thing

nijave 16 hours ago

Not sure about other languages but I believe the stock test tooling for `go` generates a random sorting seed and has a flag to run tests concurrently. You can also manually pass the seed to simulate a certain ordering.

While not perfect, our e2e tests caught a couple bugs running them with concurrency.

pixl97 17 hours ago

I mean if you're talking SQL on a large real database vs a small test db you can get some pretty big differences in performance and behavior. Of course query planning is something that should be monitored as an app is deployed and used, but testing never does seem to catch the edge cases.

JackSlateur a day ago

Running 1000x queries in a loop is called luck.

[-]

wubrr a day ago

No, it's called testing many concurrent operations.

Implementing a complex concurrent algorithm based on your understanding of it, without proper testing is called luck, and often called delusion.

[-]

teraflop a day ago

You can't easily, automatically test concurrent code for correctness without testing all possible interleavings of instructions, and that state space is usually galactically huge.

It is very easy to write multithreaded code that is incorrect (buggy), but where the window of time for the incorrectness to manifest is only a few CPU instructions at a time, sprinkled occasionally throughout the flow of execution.

Such a bug is unlikely to be found by test cases in a short period of time, even if you have 1000 concurrent threads running. And yet it'll show up in production eventually if you keep running the code long enough. And of course, when it does show up, you won't be able to reproduce it.

That is, I think, what the parent commenter means by "luck".

This is similar to the problem you'll run into when testing code that explicitly uses randomness. If you have a program that calls rand(), and it works perfectly almost all the time but fails when rand() returns the specific number 12345678, and you don't know ahead of time to test that value, then your automated test suite is unlikely to ever catch the problem. And testing all possible return values of rand() is usually impractical.

[-]

nick__m a day ago

There is a cost benefit ratio and context that matters.

The repeat the concurent operations 1000times technique is adequate for a CRUD API but it's whofully inadequate for a database engine or garbage collector.

nijave 16 hours ago

There's still value in eliminating ways your program can wrong even if you can't eliminate all of them.

Using your logic, why bother testing at all.

wubrr a day ago

It will obviously not catch all bugs. Nothing will. But it is a relatively easy and reliable way to catch many of them. It works.

afiori a day ago

and if you have too many threads running you could have slowdowns in the system that prevent the race condition from happening

JackSlateur a day ago

What algorithm ? The whole idea is that algorithms are useless, and you should just write a bunch of tests and go with it

Yes, if I write stuff with locks, I shall ensure that my code acquires and releases locks correctly

This is completely off-topic with the original post;

Also, you cannot prove something by tests; Just because you found 100000 cases where your code works does not mean there is not a case where is does not (just as you cannot prove that unicorn does not exist) :)

[-]

sarchertech a day ago

> Also, you cannot prove something by tests; Just because you found 100000 cases where your code works does not mean there is not a case where is does not (just as you cannot prove that unicorn does not exist) :)

That’s exactly it. For any non trivial program, there exists an infinite number of ways your program can be wrong and still pass all your tests.

Unless you can literally test every possible input and every bit of state this holds true.

[-]

wubrr a day ago

For any 'non trivial' program there exists an infinite number of ways your program can be wrong but you still believe it's right.

Testing is not a perfect solution to catch all bugs. It's a relatively easy, efficient and reliable way to catch many common bugs though.

pfdietz a day ago

And yet, (1) testing finds bugs in any nontrivial program that hasn't been tested, and (2) test long enough and with enough variety and you can make programs significantly more reliable.

Perfect is the enemy of good, and absent academic fantasies of verified software testing is essential (even then, it's still essential, since you are unlikely to have verified every component of your system.)

wubrr a day ago

It's not about making sure your system is 100% perfect. You cannot do that on any real sufficiently complex system. It's about testing the core functionality in a relatively straightforward and reliable way (including concurrency testing), to catch many common bugs.

[-]

JackSlateur 15 hours ago

My shit is the backbone of a multibillions compagny

Common bugs are not enough, uncommon bugs are just too expensive

matheusmoreira a day ago

> Of course some engineers need to implement hash tables, or sorting algorithms or whatever.

> We love those engineers: they write libraries we can use off the shelf so we don’t have to implement them ourselves.

The world needs to love "infrastructure developers" more. To me it seems only the killer app writing crowd is valued. Nobody really thinks about the work that goes into programming languages, libraries and tools. It's invisible work, taken for granted, often open source, not rarely unpaid.

> It wasn’t opening a textbook to find the famous algorithm that would solve my problem.

I had that exact experience. I'm working on my own programming language. After weeks of trying to figure something out by myself, someone told me to read Structure and Interpretation of Computer Programs. It literally had the exact algorithm I wanted.

[-]

chamomeal a day ago

Dang I got this book a few weeks ago and still haven’t cracked it open. Maybe today is the day

[-]

matheusmoreira a day ago

The algorithm I'm talking about is at the very end of the book. If you start reading it from start to finish you might stop before you reach it. Certainly happened to me. Someone had to point it out for me to realize SICP had the answer all along.

https://eng.libretexts.org/Bookshelves/Computer_Science/Prog...

The explicit control evaluator. It's a register and stack machine which evaluates lisp expressions without transforming them into bytecode.

[-]

hinkley a day ago

A common story with JIT languages is to go back and forth between having a bytecode interpreter and not.

The paradox is that when the interpreter is fast enough then you delay JIT because it takes longer for the amortized cost to be justified. But that also means the reasons for that high amortization cost don’t get prioritized because they don’t really show up as a priority.

Eventually the evidence piles so high nobody can ignore it and the code gets rearranged to create a clearer task list. And when that list runs out they rearrange again because now that other part is 2x too slow.

Personally I’d love to see a JIT that was less just in time. Queuing functions for optimization that only get worked on when there are idle processors. So there’s a threshold where the JIT pre-empts, and one where it only offers best effort.

[-]

matheusmoreira a day ago

I wanted to preserve the "code is just lists" property of lisps. Compiling the lists away means that property is lost: the code becomes bytecode or native code instead, sacrificing lisp's soul in exchange for performance.

I want to implement a partial evaluator one day. That should go a long way to improving performance by precomputing and inlining things as much as possible.

[-]

hinkley a day ago

Sometimes people avoid that by making a stupidly cheap code generator that goes straight from the input file format to unoptimized machine code. Because you only have to reach a fraction of what the optimized code would achieve for throughput.

alabhyajindal a day ago

This might be useful: https://docs.racket-lang.org/sicp-manual/index.html

jerf 2 days ago

This is one of the things I'd tune in the current curriculum.

When I went to college in the late 1990s, we were right on the verge of a major transition to DSAs being something every programmer would implement themselves to something that you just pick up out of your libraries. So it makes sense that we would have some pretty heavy-duty labs on implementing very basic data structures.

That said, I escaped into the dynamic programming world for the next 15 years or so, so I almost never actually did anything of significance with this. And now even in the static world, I almost never do anything with this stuff directly because it's all libraries for them now too. Even a lot of modern data structures work is just using associative maps and arrays together properly.

So I would agree that we could A: spend somewhat less time on this in the curriculum and B: tune it to more about how to use arrays and maps and less about how to bit bang efficient hash tables.

People always get frosty about trying to remove or even "tune down" the amount of time spent in a curriculum, but consider the number of things you want to add and consider that curricula are essentially zero-sum games; you can't add to them without removing something. If we phrase this in terms of "what else could we be teaching other than a fifth week on pointer-based data structures" I imagine it'll sound less horrifying to tweak this.

Not that it'll be tweaked, of course. But it'd be nice to imagine that I could live in a world where we could have reasonable discussions about what should be in them.

[-]

jkhdigital a day ago

About 20 years ago I failed out of the undergrad CS program at UIUC because I thought I was smart enough to skip most lectures. I did manage to get an A in the C++ Data Structures course because the lectures were recorded and I just binged them all the night before each test.

Anyways, now I’m a full-time lecturer teaching undergraduate CS courses (long story) and I’m actually shaping curriculum. As soon as I read this article I thought “I need to tell my data structures students to read this” because it echoes a lot of what I’ve been saying in class.

Case in point: right after two lectures covering the ArrayList versus LinkedList implementations of the Java List interface, I spent an entire lecture on JUnit and live-coded a performance test suite that produced actual data to back up our discussions of big-O complexity. The best part of all? They learned about JIT compilation in the JVM firsthand because it completely blew apart the expected test results.

[-]

MrJohz a day ago

> Anyways, now I’m a full-time lecturer teaching undergraduate CS courses (long story)

I would love to hear that story if you're willing to tell it.

It sounds like you're a great lecturer, though, giving the students exactly the sort of stuff they need. I remember a university lecturer explaining to us that "JIT" just meant that Java loaded the class files when it needed them, rather than loading them all at the start, so your lesson sounds like a far cry from those days!

philwelch a day ago

I don’t think the primary value in learning data structures and algorithms is the ability to implement them yourself. It’s more of a way to get repetitions in on basic programming skills while learning about the tools that are available to you. Later in a CS curriculum you might learn how to write an operating system or a compiler, not because you’re necessarily going to ever actually do it again but because it’s a way of learning how those systems work as well as getting repetitions building larger projects.

mekoka a day ago

Many programmers think that the way DS&A work is that one beautiful spring morning, some requirements fall onto your lap where the words "nodes" and "edges" are neatly circled, thus marking a key moment in your life where you can finally put to use that old dusty algorithm book. Meanwhile, trees and graphs have been flying up to your face, slapping you left and right, every single day of your career and you kept repeating to yourself "see, you don't need all that DS&A stuff".

I think the article is unnecessarily trying to contrast two orthogonal and separately useful software development concerns. DS&A are about having the most efficient tools for the specific job. Testing is about ensuring that results match expectations. But I'd be surprised by anyone who only learns DS&A in theory and also naturally develops the related, but subtler skill of recognizing classes of problems that match up to particular algorithms. It's an almost tacit skill, which many programmers don't have despite knowing some of the classic DS&A, because it requires to linger a bit longer in the material. And that's the real gap leetcode type challenges try to bridge. The more you do them, the more you reinforce an intuitive understanding of involved structures and techniques, including some of the subtler properties and oft undocumented corner cases. What looks like "memorization" to some is actually more of an almost indelible grokking, which you carry with you in professional programming.

Admittedly, people who only do leetcode type challenges and never write software in the real world don't know which parts of these challenges are useless (in my opinion, the puzzles and lateral thinking parts). But people who never do these challenges also know squat about how useful they can be at honing one's "real life" problem solving and coding skills.

[-]

specialist a day ago

> ...unnecessarily trying to contrast two orthogonal and separately useful software development concerns.

> ...only do leetcode type challenges and never write software in the real world...

Respectfully, I agree with OC's point about testing vs implementation. (Both are programming.) It's been a (long) while since I've had peers who regularly, usefully tested their own work.

Like this OC, I too have struggled to articulate why leetcode has tiny IRL relevance.

How's this:

A useful distinction may be implementation vs usage.

Of course, a tiny handful of people are entrusted with implementing libraries of algorithms, so should be properly vetted for that work. Ditto crypto, parsing, and other misc arcane arts.

OC's point remains that leetcode hazing ignores the majority of actual work as a programmer. Such as fixing bugs, testing, modeling, documenting, enduring meetings, etc.

IIRC, I haven't implemented a sort algorithm, either in anger or for fun, since my BASIC & Pascal days. Why would I?

But I have recast real world problems as a Traveling Sales Person problems a handful of times.

As you well know, modeling using well known data structures, to enable using stock algorithms, is a big part of the job.

I'd rather interviews verified applicable skills. Such as data modeling, sequence diagrams, etc. And maybe some lightweight arch like (from the hip) caching, serialization, indexing, queuing and backpressure, locks, and validation (and whatever other bog standard stuff is immediately relevant).

(Oops: I did recently implement topo sort, just to understand it better, even though I was using a ready-baked solution. Just like +25 years ago I implemented (poorly) taboo, simulated annealing, etc. during my brief optimization kick.)

fastaguy88 a day ago

It really depends. Working on genome analysis, I once encountered/interrupted (by rebooting after a software update) a student who had been running an analysis for more than a week, because they had not pre-sorted the data. With pre-sorted data, it took a few minutes.

Not everyone works on web sites using well-optimized libraries; some people need to know about N and Nlog(N) vs N^2.

[-]

matheusmoreira a day ago

> some people need to know about N and Nlog(N) vs N^2.

Every programmer should know enough to at least avoid accidentally making things quadratic.

https://news.ycombinator.com/item?id=26296339

[-]

hetman a day ago

Indeed. As an anecdote, I've come across a self professed frontend UI guru writing quadratic code that worked fine in testing because it only had to display a few tens of items there, but at a complete loss why it was unusable in production.

hvb2 2 days ago

This feels backwards. When you have a good understanding of data structures you have the luxury of testing.

If you focus on testing over data structures, you might end up testing something that you didn't need to test because you used the wrong data structures.

IMHO too often people dont consider big O because it works fine with their 10 row test case.... And then it grinds to a halt when given a real problem

[-]

cogman10 2 days ago

That wasn't the thrust of the article.

The article is saying that it's more important to write tests than it is to learn how to write data structures. It specifically says you should learn which data structures you should use, but don't focus on knowing how to implement all them.

It calls out, specifically, that you should know that `sort` exists but you really don't need to know how to implement quicksort vs selection sort.

[-]

hvb2 2 days ago

No, it says learn data structures first, then focus on testing.

You don't have to go super deep on all the sort algorithms, sure. That's like saying that learning testing implies writing a mocking library

[-]

MrJohz a day ago

I think the issue is that most DSA curriculums do go super deep on things like sort algorithms or linked lists or whatever else. Whereas testing is usually barely taught in universities or colleges, and when it is taught, it's usually very lightweight.

jancsika a day ago

> IMHO too often people dont consider big O because it works fine with their 10 row test case.... And then it grinds to a halt when given a real problem

Not if the user can, say, farm 1,000,000 different rows 100 times over an hour and a half while gossiping with their office mates. I over Excel as Exhibit A.

matheusmoreira a day ago

> too often people dont consider big O because it works fine with their 10 row test case.... And then it grinds to a halt when given a real problem

The reverse also happens frustratingly often. One could spend a lot of time obsessing over theoretical complexity only for it to amount to nothing. One might carefully choose a data structure and algorithm based on these theoretical properties and discover that in practice they get smoked by dumb contiguous arrays just because they fit in caches.

The sad fact is the sheer brute force of modern processors is often enough in the vast majority of cases so long as people avoid accidentally making things quadratic.

Sometimes people don't even do that and we get things such as the GTA5 dumpster fire.

https://news.ycombinator.com/item?id=26296339

[-]

janalsncm a day ago

Slightly related, in ML I write a lot of code which will be executed exactly once. Data analysis, creating a visualization, one-off ETL tasks.

There are a lot of times where I could spend mental energy writing “correct” code which trades off space for time etc. Sometimes, it’s worth it, sometimes not. But it’s better to spend an extra 30 seconds of CPU time running the code than an extra 10 minutes carefully crafting a function no one will see later, or that someone will see but is harder to understand. Simpler is better sometimes.

What Big O gives you is an ability to assess the tradeoffs. Computers are fast so a lot of times quadratic time doesn’t matter for small N. And you can always optimize later.

Arun2009 a day ago

I have always thought of DSA as a proxy for a subset of general software development skills: the ability to translate a problem into computer science or programming terms, implement it in code, and argue that the implementation is correct and efficient. Skill in solving DSA problems can signal both an aptitude for absorbing computer science knowledge in general and a capacity for solving problems through programming. It's not the whole thing, but it's certainly an important component.

It’s not unlike a research mathematician being expected to solve quadratic equations. He may not need them in his day-to-day work, but with a little preparation he should be able to handle them. If he struggles with quadratic equations in an interview where such knowledge is expected, that would raise a red flag about his training.

[-]

solumos a day ago

> It’s not unlike a research mathematician being expected to solve quadratic equations. He may not need them in his day-to-day work, but with a little preparation he should be able to handle them. If he struggles with quadratic equations in an interview where such knowledge is expected, that would raise a red flag about his training.

The absurdity of this indicates a big part of what’s wrong with modern SWE interviews.

You wouldn’t ask a research mathematician to derive the quadratic formula, and you wouldn’t give a writer a spelling test — passing or not isn’t related to their aptitude or proficiency.

Additionally, imagine interviewing a complex analyst and quizzing them only on calc 1 — sure, it may be a course that they were expected to take at some point, but it really has little to do with what their actual work entails, and so half of the interview they’d be trying to slot what the proper layer of abstraction is in the limited context of the problem.

pezgrande a day ago

Maybe in some companies. At "Enterprise" level your control-alt-f skills are more valuable (for your average IC) based on my experience.

teo_zero a day ago

As usual, the title doesn't reflect the message of the article. While critical against learning tons of algorithms and structures by heart, the author advocates understanding what they do and why you would want to use them. This is what it means to master TSA.

The last part denounces that testing is not tought enough, and learning it may be beneficial for your career. It's never said that testing is better than TSA, just that we would need more.

JoeAltmaier a day ago

We wrote a conferencing app and server (years before Zoom). Tested the server by having automated headless apps run in gangs, a hundred at a time, hopping from conversation to conversation, turning mic and camera on and off, logging out and logging back in. Used it for years, the Bot Army we called it. Responsible for our rock-solid quality reputation. Not API design or test classes or constraints or anything. Just, trying the damn thing, in large cases, for a long time.

When it ran an hour, we celebrated. When it ran overnight, we celebrated. When it ran a week we celebrated, and called that good enough.

[-]

rossant a day ago

How much work was it to go from 1 hour to 1 week? How many issues have you discovered, what were they? Genuinely interested.

[-]

JoeAltmaier a day ago

It took a big fraction of our energy to get to each new stage. Always it was something new and unexpected. It was quite a while ago, but lets see what I remember.

Some good fraction were mismatches between the app (bot) state and the server state. A bot would be expecting a message and stall. The server thought it had said enough.

The app side used a lot of libraries, which it turns out are never as robust as advertised. They leak, race, are very particular about call order. Have no sense of humor if they're still connecting and a disconnect call is made, for instance.

The open source server components were fragile. In one instance, the database consistency library had an update where, for performance, a success message was returned before the operation upstream was complete. Which broke, utterly, the consistency promise that was the entire point of using that product.

A popular message library created a timer on each instantiation. Cancelled it, but in typical Java fashion didn't unlink it. So, leak. Tiny, but you do it enough times, even the biggest server instance runs out of memory.

We ran bots on Windows, Linux, even a Mac. Their network libraries had wildly different socket support. We'd run out of sockets! They got garbage collected after a time, but the timer could be enormous (minutes).

Our server used a message-distribution component to 'shard' messages. It had a hard limit on message dispatching per second. I had to aggregate the client app messages (we used UDP and a proprietary signaling protocol) to drop the message rate (ethernet packet rate) by an order of magnitude. Added a millisecond of latency, which was actually important and another problem.

Add the usual Java null pointers, order-dependent service termination rules (never documented), object lifetime surprises. It went on and on.

Each doubling of survival-time the issues got more arcane and more interesting. Sometimes took a new tool or technique to ferret out the problem.

To be honest, I was in hog heaven. Kept my brain plastic for a long time.

[-]

rossant 19 hours ago

Wow, really interesting write-up, thank you! It really proves the immense value of this kind of automated, realistic stress test.

yakshaving_jgt a day ago

As effective as that sounds, having that integrated test suite didn’t preclude you from also having more granular isolated tests.

[-]

JoeAltmaier a day ago

Sure, in a generous world with lots of resources. Given the startup environment and the overworked team, it's a choice how to spend limited time and energy.

glitchc 2 days ago

The article fails to demonstrate how code-tests result in objectively better code. Many comp sci programs have courses on testing that cover TDD, unit testing and fuzzing, among other topics.

Yet much of the safety critical code we rely on for critical infrastructure (nuclear reactors, aircraft, drones, etc) is not tested in-situ. It is tested via simulation, but there's minimal testing in the operating environment which can be quite complex. Instead the code follows carefully chosen design patterns, data structures and algorithms, to ensure that the code is hazard-free, fault-tolerant and capable of graceful degradation.

So, testing has its place, but testing is really no better than simulation. And in simulation, the outputs are only as good as the inputs. It cannot guarantee code safety and is not a substitute for good software design (read: structures and algorithms).

Having said that, fuzzing is a great way to find bugs in your code, and highly recommended for any software that exposes an API to other systems.

[-]

MoreQARespect 2 days ago

>fails to demonstrate how code-tests result in objectively better code.

Tests give the freedom to refactor which results in better code.

>So, testing has its place, but testing is really no better than simulation

Testing IS simulation and simulation IS testing.

>And in simulation, the outputs are only as good as the inputs. It cannot guarantee code safety

Only juniors think that you can get guarantees of code safety. Seniors look for ways to de-risk code, knowing that you're always trending towards a minima.

One of the key skills in testing is defining good, realistic inputs.

azeirah a day ago

I don't understand what the difference between a simulation and a test is?

[-]

qayxc a day ago

Mostly just semantics.

glitchc a day ago

There is none, and that's my point. Simulations themselves are contrived scenarios that are not representative of production environments.

danielmarkbruce 2 days ago

This will annoy a lot of folks, but:

1 - If you work on large scale software systems, especially infrastructure software of most types then you need to know and understand DSA and feel it in your bones.

2 - Most people work on crud apps or similar and don't really need to know this stuff. Many people in this camp don't realize that people working on 1 really do need to know this stuff.

What someone says on this topic says more about what things they have worked on in their life than anything else.

[-]

ecshafer 2 days ago

> What someone says on this topic says more about what things they have worked on in their life than anything else.

This is the crux of the debate. If you work on CRUD apps, you basically need to know hash maps, and lists, but getting better at SQL and writing clean code is good. But there are many areas where writing the right code vs the wrong code really matters. I was writing something the other day where one small in loop operation was the difference betweeen a method running in miliseconds and minutes. Or choose the right data structure can simplify a feature into 1/10th the code and makes it run 100x better than the wrong one.

[-]

MoreQARespect a day ago

This happens to me too it just happens roughly 100x less than me needing to know how to test properly.

It's never the other day it's 10x a day, every day.

So, OP is still correct.

mamcx a day ago

> Most people work on crud apps or similar

CRUD apps are the ones that become more complex, not less. The idea that a "CRUD app" is the poster child of simplicity is mega-misleading.

Building a ERP or similar will eat you alive in forms that making a total OS from scratch with all the features and more of linux not. (Probably the only part that is hard as "crud apps" is the drivers, and that is because you see what kind of madness is interface with others code)

[-]

danielmarkbruce a day ago

I didn't mention the word complex nor imply it. Complexity of an application and scale aren't the same thing.

mlinhares a day ago

That's going to be true in all fields, people think their experiences are the only valid experiences and everyone else must think and work on what they think is important, otherwise they're wrong.

[-]

fuzztester a day ago

Right.

It is a very basic flaw in their logical thinking ability.

I never cease to be amazed by the number of HN people who display this flaw via their comments.

Or rather, I have ceased to be amazed, because I have seen it so many times by now here, and got resigned to the fact that it's gonna continue.

arvinsim a day ago

In the end it doesn't really matter.

In software development hiring, everyone tests for DSA whether it is useful or not in the actual job description.

evrydayhustling 2 days ago

This is so true. When you get DSA wrong, you end up needing insanely complex system designs to compensate -- and being great at Testing just can't keep up with the curse of dimensionality from having more moving parts.

4ndrewl a day ago

I'm not sure the article disagrees on that point. As you say, for most people, testing is better than dsa.

(Alternatively you could just argue it's a false dichotomy)

uncivilized 2 days ago

I already know the answer to this, but did you read the article? Ned addresses your concerns.

[-]

danielmarkbruce a day ago

No, he doesn't. He doesn't discuss the gigantic dividing line between the two different types of systems I categorize above. He also doesn't cover the "feel it in your bones" required in the type 1 systems. Spend a minute reading or listening to Jeff Dean talk, and you'll see what is required to build those types of systems. Spend some time somewhere working on those systems and you'll come across some folks who just have this ready to go and can apply it and the drop of a hat.

hatthew a day ago

My work involves petabyte scale data, and the algorithms are very straightforward:

- What you want to do is probably trivially O(kn).

- There isn't a <O(kn) algorithm, so try to reduce overhead in k.

- Cache results when they are O(1), don't when they are O(n).

- If you want to do something >O(kn), don't.

- If you really need to so something >O(kn), do it in SQL and then go do something else while it's running.

None of that requires any DSA knowledge beyond what you learn in the first weeks of CS101. Instead, what's useful is knowing how to profile to optimize k, knowing how SQL works, and being able to write high quality maintainable code. Any smart algorithms that have a large time complexity improvement will probably be practically difficult to create and test even if you are very comfortable with the underlying theoretical algorithm. And the storage required for an <O(n) algorithm is probably at least as expensive as the compute required for the naive O(n) algorithm.

My general impression is that for small-scale problems, a trustworthy and easy algorithm is fine, even if it's inefficient ($100 of compute < $1000 of labor). For large-scale problems, domain knowledge and data engineering trumps clever DSA skills. The space between small- and large-scale problems is generally either nonexistent or already has premade solutions. The only people who make those "premade solutions" obviously need to feel it in their bones the way you describe, but they're a very very small portion of the total software population, and are not the target audience of this article.

[-]

fuzztester a day ago

>My work involves

As the GP said:

>>What someone says on this topic says more about what things they have worked on in their life than anything else.

[-]

hatthew a day ago

TFA isn't saying "DSA is useless", it's saying "intermediate/advanced DSA is not useful for most people". It's obvious that it's useful for some people, but I think even most people working on "large scale systems" should probably value general software engineering skills over DSA skills. The very few people who actually need DSA skills already know that the advice "you don't need DSA" doesn't apply to them.

[-]

danielmarkbruce a day ago

> The very few people who actually need DSA skills already know that the advice "you don't need DSA" doesn't apply to them.

This is right. And most of those people know a lot of their job is very far removed from many other software engineers. But the prevalence of the idea "you don't really use DSA in practice" does suggest many people building applications where DSA isn't as applicable seem to misunderstand the situation. It matters in some sense - it explains why interviews at google are the way they are, why universities teach what they teach, what one should do if they really like such things.

ChrisMarshallNY 2 days ago

I agree with the article, but I'll bet a lot of others, don't. Discussions on Code Quality, don't fare well, here. Wouldn't surprise me, if the article already has flags.

Of course, "testing," is in the eye of the beholder.

Some folks are completely into TDD, and insist that you need to have 100% code coverage tests, before writing one line of application code, and some folks think that 100% code coverage unit tests, means that the system is fully tested.

I've learned that it's a bit more nuanced than this[0].

[0] https://littlegreenviper.com/testing-harness-vs-unit/

[-]

general1465 a day ago

Testing, especially vstest.console.exe in Visual Studio has carried my business really far. I have accumulated thousands of tests on my codebase usually based on customer requirements or on past bugs which I have been trying to replicate.

I think that a lot of people dislike testing because a lot of tests can run for hours. In my case it is almost 6 hours from start to finish. However as a software developer I have accumulated a lot of computers which are kind of good and I don't want to throw them out yet but they are not really usable for current development - i.e. 8GB of RAM, 256GB SSD, i5 CPU from 2014 - That would be a punishment to use it with Visual Studio today. But it is a perfect machine for compiling in console i.e. dotnet build or msbuild and running tests via vstest glued together with PowerShell script. So this dedicated testing machine is running on changes over night and I will see if it passed or not and if not fix tests which did not passed.

This setup may feel clunky, but it allows me to make sweeping changes in a codebase and be confident enough, that if the tests pass, it will very likely work for the customer too. The most obvious example where tests were carrying me around has been moving to .NET8 from .NET Framework 4.8. I have went from 90% failure rate on tests to all tests clear in like 3-4 iterations.

[-]

ChrisMarshallNY a day ago

I have not done it, myself, but I think that Xcode, for Apple stuff, can parallelize tests, across multiple machines (maybe VMs?).

I would assume that Microsoft systems could do the same.

[-]

general1465 a day ago

A lot of tests are sharing one resource (USB device) which can't be accessed in parallel. So that's my constraint which I need to live with and the main reason why I can't parallelize or offload testing to cloud.

Otherwise yes, you can run tests in parallel in vstest. That's completely possible.

[-]

ChrisMarshallNY 19 hours ago

Oh yeah. I did a lot of hardware stuff.

Quite familiar with the drill. Carry on...

Izikiel43 a day ago

You could have a pipeline in some cloud provider to run the tests, and distribute the load across machines if tests are independent to reduce the time, if that's more important. If it's ok to just run the overnight, keep it on.

cogman10 2 days ago

I agree.

The main benefit of being familiar with how data structures and algorithms work is that you become familiar with their runtime characteristics and thus can know when to reach for them in a real problem.

The author is correct here. You'll almost never need to implement a B-Tree. What's important is knowing that B-Trees have log n insertion times with good memory locality making them faster than simple binary trees. Knowing how the B-Tree works could help you in tuning it correctly, but otherwise just knowing the insertion/lookup efficiencies is enough.

gdubs 12 hours ago

I find that test-driven advocates border on a religious way of thinking about software development, and I also find it sucks joy out of coding.

Does that mean I don't think tests are valuable – no, that's not what I'm saying. There are critical pieces of code where testing will bring you more joy because you'll catch very bad bugs that would have really ruined your day.

But as a way of constructing an application, it just has always felt really dull and uninspiring.

I know people will vehemently disagree with this. But, sometimes you gotta write the novel once to know what story you're telling. Then you can go back, edit, put tests in the critical places.

jillesvangurp a day ago

Well, you need both. You are going to suck at implementing algorithms if you don't learn how to test they actually work as intended.

Most algorithms are used in library form. Unless you are writing those libraries, you probably should not be reinventing a lot of wheels. So there's a valid argument there that most of the stuff you learn as part of your computer science courses, you will not be implementing over and over again (if ever). So, you could argue that testing is a more universal skill that you need either way.

But algorithms can come up once in a while. And it helps if you can guesstimate complexity of various algorithms and make some trade offs over picking one or the other. The skill you learn in college is not any particular algorithm but a broad knowledge of which mainstream ones are there, how they work, their tradeoffs, and the skill of implementing those or similar algorithms.

You gain the skill of good judgment, being able to figure out how stuff works, and general intuition of how things are done at a high level. I've never developed a file system. But I know tree data structures such as b trees and red black trees have something to do with it. It's been decades since I looked at that stuff. But I could read up in an afternoon or so if it comes up. That doesn't qualify to start working on a file system. But, I don't think that's going to come up anyway. I have plenty of other things to do.

I do dabble a bit with search algorithms once in a while. More of a hobby than work related. But there's some room in that space for being able to do some basic things with algorithms instead of using some prefab search product.

wjrb 2 days ago

Are there any resources out there that anyone can recommend for learning testing in the way the author describes?

In-the-trenches experience (especially "good" or "doing it right" experience) can be hard to come by; and why not stand on the shoulders of giants when learning it the first time?

[-]

Jtsummers 2 days ago

Working Effectively with Legacy Code by Michael Feathers. It spends a lot of time on how to introduce testability into existing software systems that were not designed for testing.

Property-Based Testing with PropEr, Erlang, and Elixir by Fred Hebert. While a book about a particular tool (PropEr) and pair of languages (Erlang and Elixir), it's a solid introduction to property-based testing. The techniques described transfer well to other PBT systems and other languages.

Test-Driven Development by Kent Beck.

https://www.fuzzingbook.org/ by Zeller et al. and https://www.debuggingbook.org/ by Andreas Zeller. The latter is technically about debugging, but it has some specific techniques that you can incorporate into how you test software. Like Delta Debugging, also described in a paper by Zeller et al. https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=988....

I'm not sure of other books I can recommend, the rest I know is from learning on the job or studying specific tooling and techniques.

[-]

GrumpyYoungMan a day ago

TDD is a development methodology, not a testing methodology. The main thing it does is check whether the developer implemented what they thought they should be implementing, which is not necessarily what the spec actually says to implement or what the end user expects.

[-]

Jtsummers a day ago

It's still a useful technique and way to apply testing to development. But yes, it's not the best resource in telling you what tests to write, more about how they can be applied effectively. Which is a skill that seems absent in many professionals.

cogman10 2 days ago

Resources, none that I'm aware of. I generally think this is an OK way to look at testing [1], though I think it goes too far if you completely adopt their framework.

The boil down the tests I like to see. Structure them with "Given/when/then" statements. You don't need a framework for this, just make method calls with whatever unit test framework you are using. Keep the methods small, don't do a whole lot of "then"s, split that into multiple tests. Structure your code so that you aren't testing too deep. Ideally, you don't need to stand up your entire environment to run a test. But do write some of those tests, they are important for catching issues that can hide between unit tests.

[1] https://cucumber.io/docs/bdd/

fuzztester a day ago

The Art of Software Testing. New York: Wiley, 1979

The Art of Software Testing, Second Edition. with Tom Badgett and Todd M. Thomas, New York: Wiley, 2004.

It is by Glenford Myers (and others).

https://en.m.wikipedia.org/wiki/Glenford_Myers

From the top of that page:

[ Glenford Myers (born December 12, 1946) is an American computer scientist, entrepreneur, and author. He founded two successful high-tech companies (RadiSys and IP Fabrics), authored eight textbooks in the computer sciences, and made important contributions in microprocessor architecture. He holds a number of patents, including the original patent on "register scoreboarding" in microprocessor chips.[1] He has a BS in electrical engineering from Clarkson University, an MS in computer science from Syracuse University, and a PhD in computer science from the Polytechnic Institute of New York University. ]

I got to read it early in my career, and applied it some, in commercial software projects I was a part of, or led, when I could.

Very good book, IMO.

There is a nice small testing-related question at the start of the book that many people don't answer well or fully.

[-]

pfdietz a day ago

As I recall this was a book that included the orthodoxy at the time that random testing was the worst kind of testing, to be avoided if possible.

That turned out to be bullshit. Today, with computers many orders of magnitude faster, using randomly generated tests is a very cost effective away of testing, compared to carefully handcrafted tests. Use extremely cheap machine cycles to save increasingly expensive human time.

[-]

fuzztester a day ago

Interesting. Don't remember that from the book, but then, I read it long ago.

I agree that random testing can be useful. For example, one kind of fuzzing is using tons of randomly generated test data against a program to try to find unexpected bugs.

But I think both kinds have their place.

Also, I think the author might have mean that random testing is bad when used with a small amount of test data, in which case I'd agree with him, because in that case, an equally small amount of carefully crafted test data would be the better option, e.g. using some test data in each equivalence class of the input.

[-]

pfdietz 18 hours ago

Here is the quote (from the 3rd ed., page 41):

"In general, the least effective methodology of all is random-input testing—the process of testing a program by selecting, at random, some subset of all possible input values. In terms of the likelihood of detecting the most errors, a randomly selected collection of test cases has little chance of being an optimal, or even close to optimal, subset. Therefore, in this chapter, we want to develop a set of thought processes that enable you to select test data more intelligently."

You can immediately see the problem here. It's optimizing for number of tests run, not for the overall cost of creating and running the tests. It's an attitude suited to when running a program was an expensive thing using precious resources. It was very wrong in 2012 when this edition came out and even more wrong today.

pfdietz 20 hours ago

I'd say in any sufficiently complex program, random testing is not only useful, it's essential, in that it will quickly find bugs no other approach would.

Even better, it subsumes many other testing paradigms. For example, there was all sorts of talk about things like "pairwise testing": be sure to test all pairwise combinations of features. Well, randomly generated tests will do that automatically.

I view random testing as another example of the Bitter Lesson, that raw compute dominates manually curated knowledge.

rr808 a day ago

Do the people who love testing run JS or Python or something that compiles really quick? I've worked on some big projects and just to make a change, compile and run a unit test often takes 5-15 minutes. TDD works great if you have a trivial library in an interpreted language.

[-]

markmark a day ago

Testing doesn't imply TDD.

yakshaving_jgt a day ago

Understanding how to write tests in an economical fashion is a skill in itself. With care, you can write fast (and effective) tests in Haskell (and, I’m sure, every language ever).

atmavatar 2 days ago

The title is unfortunately more than a little irresponsible, considering it's the norm for many (most?) to read only the title.

There is no dichotomy here: you need to know testing as well as data structures and algorithms.

However, the thrust of the article itself I largely agree with -- that it's less important to have such in-depth knowledge about data structures and algorithms that you can implement them from scratch and from memory. Nearly any modern language you'll program in includes a standard library robust enough that you'll almost never have to implement many of the most well-known data structures and algorithms yourself. The caveat: you still need to know enough about how they work to be capable of selecting which to use.

In the off-chance you do have to implement something yourself, there's no shortage of reference material available.

jsd1982 a day ago

Are we assuming that "testing" is limited to only exercising the single-threaded behavior of a function? I'm curious how others approach effective testing of multi-threaded behavior.

marcosdumay 2 days ago

When testing job candidates, sure, no doubt about that.

For for learning, no, it's not. You should not spend as much time learning testing as you spend leaning data structures.

[-]

Jtsummers 2 days ago

I feel like this mischaracterizes the blog. You seem to be taking this:

> People should spend less time learning DSA, more time learning testing.

And reading it as "More total time should be spent on learning testing than the total time spent learning DSA". That's one reading, another is that people are studying DSA too much, and testing too little. The ratio of total time can still be in favor of studying DSA more, but maybe instead of 10:1 it should be more like 8:1 or 5:1.

[-]

marcosdumay 2 days ago

That's a fair point. But then the author makes a blatant and unrealistic generalization about how much time people spend on each of those. Between CS undergrads and introduction to programming bootcamps, the variance on that number is extreme.

vahid4m a day ago

I think they enough unnecessary content in universities that there be room for both DSA and testing.

sakesun a day ago

Just realise that I have been reading his blog for two decades already.

[-]

rossant a day ago

Same. Love his blog.

varjag a day ago

Sudokugate flashbacks.

karmakaze 2 days ago

The context what you should spend time to learn starting out. TL;DR

> Here is what I think in-the-trenches software engineers should know about data structures and algorithms: [...]

> If you want to prepare yourself for a career, and also stand out in job interviews, learn how to write tests: [...]

I feel like I keep writing these little context comments to fix the problem of clickbait titles or those lacking context. It helps to frame the rest of the comments which might be coming at it from different angles.

Izikiel43 a day ago

Testing is the case I've found AI actually useful. Write one good test, maybe happy path, and then tell your AI to test for scenario XYZ and how it should fail, etc.

The generated code is in general 90% there.

This allows me to write many more tests than before to try to catch all scenarios.

nice_byte 2 days ago

ignore this advice.

spend plenty of time studying data structures and algorithms as well as computer architecture. these are actually difficult things that take a long time to understand and will have a positive impact on your career.

study the underlying disciplines of your preferred domain.

in general, focus on more fundamental things and limit the amount of time you spend on stupid shit like frameworks, build systems, quirks of an editor or a programming language. all these things will find a way to steal your time _anyway_, and your time is extremely precious.

"testing" is not fundamental. there is no real skill to be learned there, it's just one of those things that will find a way to steal your time anyway so there is no point in focusing actively on it.

put it that way: you will NEVER get the extra time to study fundamental theory. you will ALWAYS be forced to spend time to write tests.

if you somehow find the time, spend it on things that are worth it.

[-]

KevinMS a day ago

> "testing" is not fundamental. there is no real skill to be learned there, it's just one of those things that will find a way to steal your time anyway so there is no point in focusing actively on it.

that's an edgy take and a red flag

[-]

nice_byte a day ago

it is not edgy whatsoever. it reflects the actual reality on the ground.

nobody goes to school to learn how to use git or how to write unit tests. it's not something that needs to be actively "learned", you'll just absorb it eventually because you can't escape it.

The more interesting and important things you will never "just absorb", you actually have to make a conscious effort to engage with them.

[-]

KevinMS a day ago

I'm replying to statements like this

> "testing" is not fundamental.

and

> there is no real skill to be learned there

one of the biggest problems that has plagued software is failed projects. There have been a lot of them, and its probably costs hundreds of billions of dollars.

I can guarantee not one of those projects failed because somebody had to take the time to look up the best data structure. But I'll bet a lot of them failed because they didn't follow smart testing practices and collapsed under their own weight of complexity, untestability and inflexibility.

[-]

nice_byte a day ago

Citation needed.

I've seen projects fail for a multitude of reasons, by far the most common are boring political ones, like the leadership not understanding what it is that they want to build.

Hiring people who think bloom filters are "exotic" to work on a distributed system could certainly doom that project to failure regardless of how diligently tested it is.

I assure you that if you have enough competence to actually go through with designing and building a thing, you certainly have more than enough competence to test it. It is not a fundamental discipline that needs to be studied, much less at the expense of fundamental knowledge.

Edit: to reframe it a bit differently: you can always add more tests. you can't fix the problems you don't even know you have due to lack of thorough understanding of the problem domain.

blind_tomato a day ago

I'll ignore your advice. It's one-sided and misleading, lacking the nuances the OP had.

[-]

nice_byte a day ago

feel free to. it's your career and your own precious time.

pshirshov 2 days ago

Pure bullshit and incompetence.

> esoteric things like Bloom filters, so you can find them later in the unlikely case you need them.

They are not esoteric, they are trivial and extremely useful in many cases.

> Less DSA, more testing.

Testing can't cover all the cases by definition, why not property testing? Why not formal proofs?

Plus, in our days, it's easy to delegate testcase writing to LLMs, while they literally cannot invent new useful AnDS.

[-]

cogman10 2 days ago

> extremely useful in many cases.

I've not ran into a case where I can apply a bloom filter. I keep looking because it always seems like it'd be useful. The problem I have is bloom filter has practically reverse characteristics from what I want. It gives false positives and true negatives. I most often want true positives and false negatives.

[-]

guiand an hour ago

> true positives and false negatives

That would be a simple cache in most instances.

burch45 a day ago

Its entire purpose is an optimization. You have an expensive operation. A bloom filter can tell you that you definitely don’t need to do that operation. So rather than wasting a lot of time unnecessarily doing that operation, you get the cheap Bloom filter Che most of the time and only occasionally have the false positive where you do the expensive thing when it turns out you didn’t need to. That as far as I am aware of is the only use case for a bloom filter. That said, I have used it for that purpose effectively several times in my career.

pshirshov 2 days ago

Assume that you need to build a large-scale search or analytics tool for example. All the sketch data structures (like cuckoo filters and especially hypermihashes) are extremely useful in these scenarios.

burnt-resistor 2 days ago

It's a strawman besmirching niche knowledge for methodology. The two aren't mutex and shouldn't be competitors. Bloom filters are really trivial to implement and are great examples of time/space tradeoffs, and are useful mostly for checking if a key isn't a member of an otherwise expensive lookup operation and so can be avoided early.

What's more concerning is "engineers" incurious about how lower levels of the stack work, or aren't interested in learning breadth, depth, or new things.

quotemstr a day ago

> I see new learners asking about “DSA” a lot.

I've noticed this "DSA" acronym appearing overnight. I can't recall people using it this much (at all actually) even six months ago. Where did it come from? Why do we suddenly need a term to talk about the concept?

[-]

burch45 a day ago

That is the standard acronym for the course in American universities and has been for many decades.

[-]

quotemstr a day ago

Maybe so, but that doesn't explain why have people suddenly started using it more.

ngcc_hk a day ago

Can a program function without data, algorithm or testing or users or programmers. Now which one is better … sorry what rubbish question is that. All are important. And limited time you still have to consider all these … noth8ng is better.

lunias 17 hours ago

"If my Grandmother had wheels, she would have been a bike."

29athrowaway a day ago

Learning both is not mutually exclusive.

zeroCalories a day ago

The reason you study ds&a is because it's a difficult skill. Testing is incredibly straightforward and easy to learn. The reason tests suck is because the infra suck, the eng is lazy, or both.

burnt-resistor 2 days ago

Sigh. Monochromatic myopia denying the need for holistic quality and mastery in multiple arenas and methodologies. Belts and suspenders, not just elastic waistbands.

[-]

CyberDildonics 2 days ago

Oh if the holistic myopia is multiples of monochromatic does it really need elastic mastery? Sigh.

lolive a day ago

Data structures are overrated.

One day, a trainee told me that maybe a Set would be better than an array in one of my codes. I fired this ignorant immediately.

#noOverengineering #arrayIsGoodEnough

curtisszmania a day ago

[dead]