Everything I know about good system design

(seangoedecke.com)

232 points | by dondraper36 5 hours ago ago

107 comments

  • motorest 30 minutes ago

    What a great article. It's always a treat to read this sort of take.

    I have some remarks though. Taken from the article:

    > Avoid having five different services all write to the same table. Instead, have four of them send API requests (or emit events) to the first service, and keep the writing logic in that one service.

    This is not so cut-and-dry. The trade offs are far from obvious or acceptable.

    If the five services access the database then you are designing a distributed system where the interface being consumed is the database, which you do not need to design or implement, and already supports authorization and access controls out of the box, and you have out-of-the-box support for transactions and custom queries. On the other hand, if you design one service as a high-level interface over a database then you need to implement and manage your own custom interface with your own custom access controls and constrains, and you need to design and implement yourself how to handle transactions and compensation strategies.

    And what exactly do you buy yourself? More failure modes and a higher micro services tax?

    Additionally, having five services accessing the same database is a code smell. Odds are that database fused together two or three separate databases. This happens a lot, as most services grow by accretion and adding one more table to a database gets far less resistance than proposing creating an entire new persistence service. And is it possible that those five separate services are actually just one or two services?

    • paffdragon 13 minutes ago

      > the interface being consumed is the database, which you do not need to design or implement

      You absolutely should design and implement it, exactly because it is now your interface. In fact, it will add more constraints to your design, because now you have different consumers and potentially writers all competing for the same resource with potentially different access patterns. Plus the maintenance overhead that migrations of such shared tables come with. And eventually you might have data in this table that are only needed for some of the services, so you now need to implement views and access controls at the DB level.

      Ideally, if you have a chance to implement it, an API is cleaner and more flexible. The problem in most cases is simply business pushing for faster features which often leads to quick hacks including just giving direct access to some DB table from another service, because the alternative would take more time, and we don't have time, we want features, now.

      But I agree with your thoughts in the last paragraph. It happens very often that people don't want to undertake the effort of a whole new design or redesign to match the evolving requirements and just patch it by adding a new table to an existing DB, then another,...

    • bubblebeard 17 minutes ago

      I think the author meant, in a general way, it’s better to avoid simultaneous writes from different services, because this is an easy way to introduce race conditions.

  • bambax 4 hours ago

    > When querying the database, query the database. It’s almost always more efficient to get the database to do the work than to do it yourself. For instance, if you need data from multiple tables, JOIN them instead of making separate queries and stitching them together in-memory.

    Oh yes! Never do a join in the application code! But also: use views! (and stored procedures if you can). A view is an abstraction about the underlying data, it's functional by nature, unlikely to break for random reasons in the future, and if done well the underlying SQL code is surprisingly readable and easy to reason about.

    • bob1029 3 hours ago

      This is a big part of what makes ORMs a problem.

      Writing raw SQL views/queries per MVC view in SSR arrangements is one of the most elegant and performant ways to build complex web products. Let the RDBMS do the heavy lifting with the data. There are optimizations in play you can't even recall (because there's so many) if you're using something old and enterprisey like MSSQL or Oracle. The web server should be able to directly interpolate sql result sets into corresponding <table>s, etc. without having to round trip for each row or perform additional in memory join operations.

      The typical ORM implementation is the exact opposite of this - one strict object model that must be used everywhere. It's about as inflexible as you can get.

      • richardlblair 32 minutes ago

        If your ORM is going to the DB per row you're using it wrong. N+1 queries are a performance killer. They are easy to spot in any modern APM.

        Rails makes this easy to avoid. Using `find_each` batches the queries (by 1,000 records at a time by default).

        Reading through the comment section on this has been interesting. Either lots of people using half baked ORMs, people who have little experience with an ORM, or both.

      • hk1337 2 hours ago

        Even in the article the solution wasn’t to abandon the ORM in favor of raw SQL but knowing how to write the code so it doesn’t have to run 100 extra queries when it doesn’t need to.

        > Particularly if you’re using an ORM, beware accidentally making queries in an inner loop. That’s an easy way to turn a select id, name from table to a select id from table and a hundred select name from table where id = ?.

      • tossandthrow 3 hours ago

        Have you ever build a complex app like this?

        In particular, have you have to do testing, security (eg. row level security), manage migrations, change management (eg. for SOC2 or other security frameworks), cache offloads (Redis, and friends), support for microservices, etc.

        Comments like this give me a vibe of young developers trying out Supabase for the first time feeling like that approach can scale indefinitely.

        • Yokohiii 12 minutes ago

          I don't understand why all these problems should be easier handled with an ORM then with raw sql?

        • rbees 2 hours ago

          > Comments like this give me a vibe of young developers

          I don’t think so. The context is about avoiding joining in memory, which is fairly awful to do in a application, and should be avoided, along with uninformed use of ORMs, which often just add a layer of unwarranted complexity leading to things like the dreaded N+1 problem that most inexperienced Rails developers had when dealing with ActiveRecord.

          If anything, what you’re talking about sounds like development hell. I can understand a database developer having to bake in support for that level of security, but developing an app that actually uses it gets you so far in the weeds that you can barely make progress trying to do normal development.

          A developer with several years of experience or equivalent will have pride in developing complexity and using cool features that make them feel important.

          After a developer has maybe twice that many years experience or equivalent, they may develop frameworks with the intent to make code easier to develop and manage.

          And beyond that level of experience, developers just want code that’s easy to maintain and doesn’t make stupid decisions like excessive complexity. But, they know they have to let the younger devs make mistakes, because they don’t listen, so there is no choice but to watch hell burn.

          Then you retire or get a different job.

          • tossandthrow 26 minutes ago

            I don't know what I am talking about that sounds like hell?

            I am merely talking about properties of developing complex web applications that have traditionally not been easy to work with in SQL.

            I am in particular not proposing any frameworks.

            How can that sound like hell?

        • lurking_swe 2 hours ago

          Not the person you replied to, but I have! A java project I worked on a couple years ago used a thin persistence layer called JOOQ (java library). It basically helps you safely write sql in java, without ORM abstractions. Worked just fine for our complex enterprise app.

          Sql migrations? This is a solved problem: https://github.com/flyway/flyway

          What about micro services? You write some terraform to provision a sql database (e.g. aws aurora) just like you would with dynamo db or similar. What does that have to do with ORMs?

          What about redis? Suddenly we need an ORM to query redis, to check if a key exists in the cache before hitting our DB? That’s difficult code to write?

          I’m confused reading your comment. It has “you don’t do things my way so you must be dumb and playing with toy projects” vibes.

          • tossandthrow 20 minutes ago

            From what I can se jooq is only really type safe with pojo mappings, to what point it is an orm with an expressive query dsl.

            Alternatively you use record style outputs, but that is prone to errors if positions are changed.

            Regardless, even with jooq you still accept that there is a sizable application layer to take responsibility of the requirements I listed.

          • __MatrixMan__ 16 minutes ago

            As a previous user of alembic I was surprised that flyway's migrations only go forward by default and that reversing them is a premium feature. That's like having the luxury trim being the one with seatbelts.

        • cpursley 2 hours ago

          Guessing you are a Rails dev?

      • mattmanser 2 hours ago

        Most ORMs will happily let you map stored procedures and views to a class, you can have as many models as you want. So your point doesn't really make sense.

        The author's said nothing about ORMs. It feels like you're trying to post a personal beef about ORMs that's entirely against the "pragmatic" software design engineering the author's opining. Using ORMs to massively reduce your boiler-plate CRUD code, then using raw SQL (or raw SQL + ORM doing the column mapping) for everything else is a pragmatic design choice.

        You might not like them, but using ORMs for CRUD saves a ton of boilerplate, error-prone, code. Yes, you can footgun yourself. But that's what being a senior developer is all about, using the tools you have pragmatically and not foot gunning yourself.

        And it's just looking for the patterns, if you see a massive ORM query, you're probably seeing a code smell. A query that should be in raw SQL.

        • dondraper36 an hour ago

          In Go, for example, there is a mixed approach of pgx + sqlc, which is basically a combo of the best Postgres driver + type-safe code generator (based on raw SQL).

          https://brandur.org/sqlc

          Even though I often use pgx only, for a new project, I would use the approach above.

    • tialaramex 23 minutes ago

      Stored procedures seem like a win but the big problem is that while I could write the rest of the software in a very nice modern language like Rust, or more practically in C# since my team all know C# if I write a stored procedure it will be in Transact-SQL because that's the only choice.

      T-SQL was not a good programming language last century when it was vaguely current, and so no I do not want to write any significant amount of code in T-SQL. For my sins I maintain a piece of software with huge T-SQL procedures (multi-page elaborations by somebody who really, really like this stuff) and they're a nightmare. The tooling doesn't really believe in version control, the diagnostics when you make a mistake are either non-existent or C++ style useless spew.

      We hire a lot of very junior developers. People who still need to be told not to comment out code in release, that variable numbers are for humans to read not machines, that sort of thing. We're not quite hiring physicists to write software (I have done that at a startup) but it's close. However, none of the poor "My first program" code I see in a merge request by a new hire is anywhere close to as unreadable as the T-SQL we already own and maintain.

    • mdavid626 2 hours ago

      I disagree. In modern highly scalable architectures I’d prefer doing joins in the layer front of the database (backend).

      The “backend” scales much easier than the database. Loading data by simple indexes, eg. user_id, and joining it on the backend, keeps the db fast. Spinning up another backend instance is easy - unlike db instance.

      If you think, your joins must happen in db, because data too big to be loaded to memory on backend, restructure it, so it’s possible.

      Bonus points for moving joins to the frontend. This makes data highly cacheable - fast to load, as you need to load less data and frees up resources on server side.

      • riv991 an hour ago

        High Scale is so subjective here, I'd hazard a guess that 99% of businesses are not at the scale where they need to worry about scaling larger than a single Postgres or MySQL instance can handle.

        • Tade0 an hour ago

          In the case of one project I've been in, the issue was the ORM creating queries, which Postgres deemed too large to do in-memory, so it fell back to performing them on-disk.

          Interestingly it didn't even use JOIN everywhere it could because, according to the documentation, not all databases had the necessary features.

          A hard lesson in the caveats of outsourcing work to ORMs.

          • richardlblair 42 minutes ago

            I've worked both with ORMs and without. As a general rule, if the ORM is telling you there is something wrong with your query / tables it is probably right.

            The only time I've seen this is my career was a project that was an absolute pile of waste. The "CTO" was self taught, all the tables were far too wide with a ton of null values. The company did very well financially, but the tech was so damn terrible. It was such a liability.

        • mdavid626 an hour ago

          Scalability is not the keyword here.

          The same principle applies to small applications too.

          If you apply it correctly, the application never going to be slow due to slow db queries and you won’t have to optimize complex queries at all.

          Plus if you want to split out part of an app to its own service, it’ll be easily possible.

          • nicoburns 22 minutes ago

            One of the last companies I worked at had very fast queries and response times doing all the joins in-memory in the database. And that was only on a database on a small machine with 8GB RAM. That leaves a vast amount of room for vertical scaling before we started hitting limits.

        • dondraper36 an hour ago

          Vertical scaling is criminally underrated, unfortunately. Maybe, it's because horizontal scaling looks so much better on Linkedin.

          • mdavid626 an hour ago

            Sooner or later even small apps reach hardware limits.

            My proposed design doesn’t bring many hard disadvantages.

            But it allows you to avoid vertical hardware scaling.

            Saves money and development time.

            • dondraper36 39 minutes ago

              Not really disagreeing with you here, but that "later" never comes for most companies.

      • AdrianB1 an hour ago

        My manufacturing data is hundreds of GB to a few TB in size per instance and I am talking about hot data, that is actively queried. It is not possible to restructure and it is a terrible idea to do joins in the front end. Not every app is tiny.

        • mdavid626 an hour ago

          In some cases, it’s true.

          But your thinking is rather limited. Even such data can be organized in a way, that joins are not necessarily in the db.

          This kind of design always “starts” on the frontend - by choosing how and what data will be visible eg. on a table view.

          Many people think, showing all data, all the time is the only way.

          • AdrianB1 an hour ago

            The SQL database has more than a dozen semi-independent applications that treat different aspects of the manufacturing process, for example from recipes and batches to maintenance, scrap management and raw material inventory. The data is interlocked, the apps are independent as different people in very different roles are using it. No, it never starts in the front end, it started as a system and evolved by adding more data and more apps. Think SAP as another such example.

            • mdavid626 an hour ago

              This is and “old-school” design. Nowadays I wouldn’t let apps meet in the database.

              Simple service oriented architecture is much preferred. Each app with its own data.

              Then such problems can be easily avoided.

              • dakiol 25 minutes ago

                It’s not old school, it’s actually solid design. I have worked too with people that think the frontend or even services should guide the design/architecture of the whole thing. Seems tempting and it has the initial impression that it works, but long terms it’s just bad design. Having Data structures (and mainly this means database structures) stable is key to long term maintenance.

        • mdavid626 an hour ago

          Good, simple solution could be data duplication, eg. store some props from the joined tables directly in the main table.

          I know, for many, this is one of the deadly sins, but I think it can work out very well.

    • torginus an hour ago

      Are you sure about this?

      Let's say you run a webshop and have two tables, one for orders with 5 fields, one for customers, with 20 fields.

      Let's say you have 10k customers, and 1m orders.

      A query performing a full join on this and getting all the data would result in 25 million fields transmitted, while 2 separate queries and a client side manual join would be just 5m for orders, and 200k for customers.

      • jameshart 32 minutes ago

        If you need all the orders and all the customers sure.

        But usually you need some of the orders and you need the customer info associated with them. Often the set of orders you’re interested in might even be filtered by attributes of the customers they belong to.

        The decision of whether to normalize our results of a database query into separate sets of orders and customers, or to return a single joined dataset of orders with customer data attached, is completely orthogonal to the decision of whether to join data in the database.

      • nicoburns 18 minutes ago

        These days you can use JSON aggregation in the database to avoid returning duplicate data in what would otherwise be large joins.

      • valiant55 18 minutes ago

        My rule of thumb is if it's a 1:1 relationship, use a join. If it's 1:M, separate queries.

      • dondraper36 an hour ago

        What I particularly like about the comments in this thread is how it proves that everything is a trade-off :)

      • digitalPhonix an hour ago

        What sort of application is regularly doing a query for “all data”?

    • quietbritishjim 3 hours ago

      I think it's ok to have this rule as a first approximation, but like all design rules you should understand it well enough to know when to break it.

      I worked on an application which joined across lots of tables, which made a few dozen records balloon to many thousands of result rows, with huge redundancy in the results. Think of something like a single conceptual result having details A, B, C from one table, X, Y from another table, and 1, 2, 3 from another table. Instead of having 8 result rows (or 9 if you include the top level one from the main table) you have 18 (AX1, AX2, AX3, AY1, ...). It gets exponentially worse with more tables.

      We moved to separate queries for the different tables. Importantly, we were able to filter them all on the same condition, so we were not making multiple queries to child tables when there were lots of top-level results.

      The result was much faster because the extra network overhead was overshadowed by the saving in query processing and quantity of data returned. And the application code was actually simpler, because it was a pain to pick out unique child results from the big JOIN. It was literally a win in every respect with no downsides.

      (Later, we just stuffed all the data into a single JSONB in a single table, which was even better. But even that is an example of breaking the old normalisation rule.)

      • nicoburns 16 minutes ago

        If you use CTEs and json_agg then you can combine your separate queries into one query without redundant data.

      • magicalhippo an hour ago

        I think it's more like avoid doing a "limiting" join in the application, ie where the join is used to limit the output to a subset or similar.

        As a somewhat contrived example since I just got out of bed, if your software has a function that needs all the invoice items from invoices from this year which invoice address country is a given value, use a join rather than loading all invoices, invoice addresses and invoice items and performing the filtering on the client side.

        Though as you point out, if you just need to load a given record along with details, prefer fetching detail rows independently instead of making a Cartesian behemoth.

      • wongarsu 2 hours ago

        That reminds me of many cases of adhering to database normalisation rules even in views and queries, even in a case where you should break it. Aggregation functions like postgres's array_agg and jsonb_agg are incredibly powerful at preventing the number of rows from ballooning in situations like those

      • 9rx 3 hours ago

        > which made a few dozen records balloon to many thousands of result rows

        That doesn't really sound like a place where data is actually conceptually joined. I expect, as it is something commonly attempted, that you were abusing joins to try and work around the n+1 problem. As a corollary to the above, you also shouldn't de-join in application code.

        • kccqzy 2 hours ago

          It's a join. A join without any ON or USING clause or any filtering is a Cartesian product which is what's happening here.

    • luckylion 13 minutes ago

      I feel like the biggest question to ask is: how expensive is it exactly, how often do you need to do it, and how important is the speed of it?

      If you have some complex queries on every page load with a huge number of users, put it in the DB as much as possible.

      If you need to iterate over a bunch of records and do something based on some combination of values, and it's for a weekly reporting thing, I'd much rather see 3 nested foreach loops with a lot of early exits to skip the things you don't care about than a multi-kb SQL-statement that took two days to develop and nobody every dares to touch again because it's hard to handle.

    • hk1337 2 hours ago

      You should be careful with how much you lean into “doing it in the database” as well with how you implement it. Lest, you get the situation where your application inserts as one value and it gets saved completely different.

      • mattmanser an hour ago

        I'm not sure if this is what you mean, but I think a big thing missing from the article is how you should isolate you business logic.

        A great software design will separate all business logic into its own layer. That might be a distinct project, module, or namespace, depending on what your language supports. Keep business logic out of SQL and out of web server code (controllers, web helpers, middleware, etc.).

        Then you're treating SQL as the data store it is designed to be. When you embed application logic in SQL, you're hiding core functionality in a place where most developers won't expect to find it. This approach also creates tight coupling between your application and your database provider, making it hard to switch as needs change/the application grows.

        • dondraper36 an hour ago

          That also depends on what you would consider "business logic in the database".

          What would you say about CHECK constraints, though? I don't think it's something few developers expect to see, and having these checks is very convenient.

          I know that there are even opponents of foreign keys (which makes sense sometimes), but in general, I don't understand why I would ever throw away the nice features of Postgres that can enforce correctness.

    • tossandthrow 3 hours ago

      Views make good sense when you can check them in - and DB migrations are a poor way of doing it due to their immutable nature.

      Depending on the ecosystem the code base adopts a good orm might be a better choice to do joins.

    • CafeRacer 3 hours ago

      I came here to say an exactly opposite things. There were a few instances where a relatively heavy join would not perform well, no matter what I tried. And it was faster to load/stitch data together with goroutines. So I just opted to doing it that way.

      Also SQL is easy, but figuring out what's up with indexes and planner is not.

    • nurettin 3 hours ago

      for me, it is orm -> schema bound views -> views -> table functions -> stored procedure as a last resort (hopefully it doesn't come to that)

  • nasretdinov 10 minutes ago

    I agree with most of the stuff written in the article (quite a rare thing I must admit :)). But one thing I'd say is a bit outdated: in general whether or not to read from replica is the same decision as whether or not to use caching: it's a (pretty significant) tradeoff. Previously you didn't have much of a choice due to hardware being quite limited. Now, however, you can have literally hundreds of CPU cores, so all those CPUs can very much be busy at work doing reads. Writes obviously do have an overhead, _but_ note that all writes are eventually serialised, _and_ replica needs to handle them as well anyway

  • nvarsj an hour ago

    > Paradoxically, good design is self-effacing: bad design is often more impressive than good.

    Rings very true. Engineers are rated based on the "complexity" of the work they do. This system seems to encourage over-engineered solutions to all problems.

    I don't think there is enough appreciation for KISS - which I first learned about as an undergrad 20 years ago.

  • KronisLV 3 hours ago

    > Schema design should be flexible, because once you have thousands or millions of records, it can be an enormous pain to change the schema. However, if you make it too flexible (e.g. by sticking everything in a “value” JSON column, or using “keys” and “values” tables to track arbitrary data) you load a ton of complexity into the application code (and likely buy some very awkward performance constraints). Drawing the line here is a judgment call and depends on specifics, but in general I aim to have my tables be human-readable: you should be able to go through the database schema and get a rough idea of what the application is storing and why.

    I’m surprised that the drawbacks of EAV or just using JSON in your relational database don’t get called out more.

    I’d very much rather have like 20 tables with clear purpose than seeing that colleagues have once more created a “classifier” mechanism and are using polymorphic links (without actual foreign keys, columns like “section” and “entity_id”) and are treating it as a grab bag of stuff. One that you also need to read the application code a bunch to even hope to understand.

    Whenever I see that, I want to change careers. I get that EAV has its use cases, but in most other cases fuck EAV.

    It’s right up there with N+1 issues, complex dynamically generated SQL when views would suffice and also storing audit data in the same DB and it inevitably having functionality written against it, your audit data becoming a part of the business logic. Oh and also shared database instances and not having the ability to easily bootstrap your own, oh and also working with Oracle in general. And also putting things that’d be better off in the app inside of the DB and vice versa.

    There are so many ways to decrease your quality of life when it comes to storing and accessing data.

    • dondraper36 3 hours ago

      There's a great book SQL Antipatterns, by Bill Karwin where this specific antipattern is discussed and criticized.

      That said, sometimes when I realize there's no way for me to come up even with a rough schema (say, some settings object that is returned to the frontend), I use JSONB columns in Postgres. As a rule of thumb, however, if something can be normalized, it should be, since, after all, that's still a relational database despite all the JSON(B) conveniences and optimizations in Postgres.

    • quibono an hour ago

      > storing audit data in the same DB and it inevitably having functionality written against it, your audit data becoming a part of the business logic

      What's the "proper" way to do this? Separate DB? Separate data store?

      • KronisLV an hour ago

        Typically you want your audit/log data to be immutable and kept in an append only data store.

        Whether that's a typical relational DB or something more specialized (like a log shipping solution) that's up to you, but usually it would be separate from the main DB.

        If you need some functionality that depends on events that have taken place, you probably want to store information about those events in the main data store (but only what's needed for that functionality, not a list of all mutations done to a table like audit data might include).

        In general, it's nice to have such a clear boundary of where the business domain ends and where the aux. stuff to help you keep it running goes - your logs and audit data, analytics and metrics, tracing spans and so on.

  • thisbeensaid an hour ago

    Since the author praises proper use of databases and talks about event bus, background jobs and caching, I highly recommend to check out https://dbos.dev if you have Python or TypeScript backends. DBOS nicely solves common challenges in simple and complex systems and can eliminate the need for running separate services such as Kafka, Redis or Celery. The best: DBOS can be used as a dependency and doesn't require deploying a separate service.

    Very recently discussed here a week ago: https://news.ycombinator.com/item?id=44840693

  • ZYbCRq22HbJ2y7 4 hours ago

    > You’re supposed to store timestamps instead, and treat the presence of a timestamp as true. I do this sometimes but not always - in my view there’s some value in keeping a database schema immediately-readable.

    Seems overly negative of broad advice on a good pattern?

        is_on => true
        on_at => 1023030
    
    Sure, that makes sense.

         is_a_bear => true
         a_bear_at => 12312231231
    
    Not so much, as most bears do not become bears at some point after not being a bear.
    • oftenwrong 31 minutes ago

      A boolean is smaller, which is a relevant consideration for some workloads. For example, you may be pre-aggregating a large amount of data to serve a set of analytical queries which do not care about the associated timestamp. The smaller data type is more efficient both in storage and in query execution.

      Additionally, there are situations where it is logical to store a boolean. For example, if the boolean denotes an outcome:

          process_executed_at timestamp not null
          process_succeeded boolean not null
    • grey-area 3 hours ago

      I’d see the booleans as a bad thing in almost all cases, instead of a boolean you can have a timestamp or an integer field (which can expand later).

      In the is_a case almost always a type or kind is better as you’ll rarely just have bears even if you only start with bears, just as you rarely have just two states for a status field (say on or off), often these expand in use to include things like suspended, deleted and asleep.

      So generally I’d avoid booleans as they tend to multiply and increase complexity partially when they cover mutually exclusive states like live, deleted and suspended. I have seen is_visible, is_deleted and is_suspended all on the same table (without a status) and the resulting code and queries are not pretty.

      I’d use an integer rather than a timestamp to replace them though.

      • ZYbCRq22HbJ2y7 3 hours ago

        Yeah, I mean, an integer can definitely hold more data than a boolean.

        If your data was simple enough, you could have an integer hold the entire meaning of a table's row, if every client understood how it was interpreted. You could do bitwise manipulations, encodings, and so on.

        Sometimes it is nice to understand what the data means in the schema alone. You can do that with enums, etc.

          ate_an_apple_in_may_2024
          saw_an_eclipse_before_30
          
        
        These are more of the sort of things I don't see needing enums, timestamps, integers...
    • setr 3 hours ago

      If you take the statement at face value — essentially storing booleans in the db ever is a bad smell - then he’s correct.

      Although I’m not even sure it’s broadly a good principle, even in the on_at case; if you actually care about this kind of thing, you should be storing it properly in some kind of audit table. Switching bool to timestamp is more of a weird lazy hack that probably won’t be all that useful in practice because only a random subset of data is being tracked like that (Boolean data type definitely isn’t the deciding factor on whether it’s important enough to track update time on).

      The main reason it’s even suggested is probably just that it’s “free” — you can smuggle the timestamp into your bool without an extra column — and it probably saved some effort accidentally; but not because it’s a broadly complete solution to the set of problems it tries to solve for

      I’ve got the same suspicion with soft-deletes — I’m fairly positive it’s useless in practice, and is just a mentally lazy solution to avoid proper auditing. Like you definitely can’t just undelete it, and it doesn’t solve for update history, so all you’re really protecting against is accidental bulk delete caught immediately? Which is half the point of your backup

      • maxbond 3 hours ago

        Audit tables are a big ask both in terms of programming effort to design and support them, and in terms of performance hit due to write amplification (all inserts and updates cause an additional write to an audit table). Whereas making a bool into a timestamp is free. Including timestamps on rows (including created_at and updated_at) are real bacon savers when you've deployed a bug and corrupted some rows and need to eg refund orders created in a certain window.

        • mrkeen an hour ago

          Audit tables are a dumb concept because they imply bolting on an actual source of truth in addition to the regular not so source of truth tables, and only if the programmer gets around to it (like documentation or logging or whatever else falls along the wayside).

        • valenterry 2 hours ago

          This. The mere fact that it's much easier to find deleted/impacted entities is worth it.

      • moebrowne 3 hours ago

        It's well documented that soft delete is more of a headache than it's worth

        https://brandur.org/soft-deletion

    • spiddy 3 hours ago

      though why treat booleans as special case and keep timestamps for them when you don’t for integers with this pattern:

      isDarkTheme: {timestamped} paginationItems: 50

      I can see when dark theme was activated but not when pagination was set to 50.

      also, i can’t see when dark theme is being deactivated either.

      seems like a poor-man changelog. there maybe use cases for it but i can’t think of anything tbh.

    • seafoamteal 3 hours ago

      I think in that situation, you could have an enum value that contains Bear and whatever other categories you are looking at.

      • ZYbCRq22HbJ2y7 3 hours ago

        Sure, but this was for demonstration purposes showing that some data has other meaning that doesn't have an instantiation state dependent on time.

    • Lionga 3 hours ago

      All this general advice is quite useless and needs millions of asterix.

      Good system design is designing a system that works best for the problem at hand.

      • oftenwrong 10 minutes ago

        [delayed]

      • FrankChalmers 2 hours ago

        That's even more general and requires another million asterisks.

      • urquhartfe 2 hours ago

        This is an utterly fatuous statement

  • mgaunard 11 minutes ago

    Seems biased towards websites, which are mostly easy CRUD.

  • bubblebeard 44 minutes ago

    Very good article, right on point!

    I do wonder about why the author left out testing, documentation and qa tool design though. To my mind, writing a proper phpcs or whatever to ensure everyone on the team writes code in a consistent way is crucial. Without documentation we end up forgetting why we did certain things. And without tests refactors are a nightmare.

    • dondraper36 40 minutes ago

      Especially given that generating documentation and tests (of course, with manual revision) is so much faster with, say, Claude Code.

  • tetha 3 hours ago

    The distinction of stateful and stateless is one of the main criteria how we're dividing responsibilities between platform-infra and development.

    I know it's a bit untrue, but you can't do that many things wrong with a stateless application running in a container. And often the answer is "kill it and deploy it again". As long as you don't shred your dataset with a bad migration or some bad database code, most bad things at this level can be fixed in a few minutes with a few redeployments.

    I'm fine having a larger amount of people with a varying degree of experience, time for this, care and diligence working here.

    With a persistence like a database or a file store, you need some degree of experience of what you have to do around the system so it doesn't become a business risk. Put plainly, a database could be a massive business risk even if it is working perfectly... because no one set backups up.

    That's why our storages are run by dedicated people who have been doing this for years and years. A bad database loss easily sinks ships.

    • mrkeen an hour ago

      > but you can't do that many things wrong with a stateless application running in a container

      > As long as you don't shred your dataset with a bad migration or some bad database code, most bad things at this level can be fixed in a few minutes with a few redeployments.

      At some point between these statements you switched from stateless to stateful and I can't follow the rest of the argument.

  • com 4 hours ago

    The advice about logging and metrics was good.

    I had been nodding away about state and push/pull, but this section grabbed my attention, since I’ve never seen it do clearly articulated before.

    • bravesoul2 4 hours ago

      Yes. Everyone should spent the small amount of time getting some logging/metrics going. It's like tests, getting from 0-1 test is psychologically hard in a org but 1-1000 then becomes "how did I live without this". Grafana has a decent free tier or you can self host.

    • dondraper36 4 hours ago

      The logging part is spot on. It has happened so many times when I thought, "Oh, I wish I had logged this.", and then you face an issue or even an incident and introduce these logs anyways.

      • bravesoul2 4 hours ago

        It is a balance. Too many logs cost money and slow down log searches both for the search and the human seeing 100 things on the same trace.

        • jillesvangurp 4 hours ago

          The trick here is to log aggressively and then filter aggressively. Logs only get costly if you keep them endlessly. Receiving them isn't that expensive. And keeping them for a short while won't break the bank either. But having logs pile up by the tens of GB every day gets costly pretty quickly. Having aggressive filtering means you don't have that problem. And when you need the logs, temporarily changing the filters is a lot easier than adding a lot of ad hoc logging back into the system and deploying that.

          Same with metrics. Mostly they don't matter. But when they do, it's nice if it's there.

          Basically, logging is the easy and cheap part of observability, it's the ability to filter and search that makes it useful. A lot of systems get that wrong.

          • bravesoul2 4 hours ago

            Nice. I'm going to read up more about filtering.

        • dondraper36 4 hours ago

          Yeah, absolutely. But the author's idea of logging all major business logic decisions (that users might question later) sounds reasonable.

          • bravesoul2 4 hours ago

            Yes. I like the idea of assertions too. Log when an assertion fails. Then get notified to investigate.

  • gethly an hour ago

    Actually event-sourcing solves most of the pains - events, schema, push/pull, caching, distribution... whatever. The downside is that it is definitely not suitable for small projects and the overhead is substantial(especially during the development stage when you want to ship the product as soon as possible). On the other hand, once you get it going, it's an unstoppable beast.

  • bravesoul2 4 hours ago

    He doesnt seem to mention Conway or team topology which is an important part of system design too.

    • dondraper36 4 hours ago

      Well, as sad as it is, such advice is often applicable to new projects when you still have runway for your own decisions.

      For mostly political reasons, if you are onboarded to a team with a billion microservices and a lot of fanciness, it's unlikely that you will ever get approval or time to introduce simplicity. Or maybe I just got corrupted myself by the reality where I have to work now.

      • bravesoul2 4 hours ago

        There is definitely a wood for the trees issue at bigger companies. I doubt there is an architect who understands the full system to see how to simplify it. Hard to even know what "simpler" looks like.

    • jillesvangurp 3 hours ago

      You should adapt your team to the architecture, not the other way around.

      My former Ph.D. supervisor who moonlights as a consultant on this topic uses a nice acronym to capture this: BAPO. Business, Architecture, Process, and Organization. The idea is to end up with optimal business, an optimal architecture & design for that business, the minimum of manual processes that are necessitated by that architecture, and an organization that is efficiently executing those processes. So, you should design and engineer in that order.

      Most companies do this in reverse and then end up limiting their business with an architecture that matches whatever processes that their org chart necessitated years ago in a way that doesn't makes any logical sense whatsoever except in the historical context of the org chart. If you come in as a consultant to fix such a situation, it helps understanding that whatever you are going to find is probably wrong because of this reason. I've been in the situation where I come in to fix a technical issue and immediately see that the only reason the problem exists is the org chart is bullshit. That can be a bit awkward but lucrative if you deal with it correctly. It helps asking the right questions before you get started.

      Turning that around means you start from the business end (where's the money coming from?, what value can we create?, etc.), finding a solution that delivers that and then figure out processes and organizational needs. Many companies start out fairly optimal and then stuff around them changes and they forget to adapt to that.

      Having micro services because you have a certain team structure is a classic mistake here. You just codified your organizational inefficiency. Before you even delivered any business value. And now your organizational latency has network latency to match that. Probably for no good reason other than that team A can't be trusted to work with team B. And even if it's optimal now, is it going to stay optimal?

      If you are going to break stuff into (micro) services, do so for valid business/technical reasons. E.g. processing close to your data is cheaper, caching for efficiency means stuff is faster and cheaper, physically locating chunks of your system close to the customer means less latency, etc. But introducing network latency just because team A can't work with team B, is fundamentally stupid. Why do you even have those teams? What are those people doing? Why?

      • ChrisMarshallNY 3 hours ago

        A lot of what I have done, is design subsystems —components, meant to be integrated into larger structures. I tend to take a modular approach (not microservices —modules).

        The acronym I use is “S.Q.U.I.D”[0] (Simplicity, Quality, Unambiguity, Integrity, Documentation).

        But most of the stuff I’ve done, is different from what he’s written about, so it’s probably not relevant, here.

        [0] https://littlegreenviper.com/itcb-04/#more-4074

  • magnio 4 hours ago

    I think it's a very good article. Even if you disagree with some of the individual points in it, the advice given are very concrete, pragmatic, and IMO tunable to the specifics of each project.

    On state, in my current project, it is not statefulness that causes trouble, but when you need to synchronize two stateful systems. Every time there's bidirectional information flow, it's gonna be a headache. The solution is of course to maintain a single source of truth, but with UI application this is sometimes quite tricky.

    • klabb3 an hour ago

      Yes it’s total madness to synchronize and replicate complex state that logically belongs together. This is why microservices are such hot garbage. Well, how people tend to use them anyway.

      Monotonic state is also better than mutable state. If you must distribute state, think ownership. Who owns it? Eg Theres nothing necessarily wrong with having state owned by eg a mobile client that can be adjusted by the user. Then you can sync it to the backend if you want, but they are only a reader/listener, and should never try to control it directly.

  • pelagicAustral 2 hours ago

    I can definitely feel the "underwhelming" factor. I've been working for +10 years on government software and I really know what an underwhelming codebase looks like, first off, it has my fucking name on it.

  • mattlondon an hour ago

    There was an article here recently about how to write good design docs: the TL;DR for that was basically your design doc should make your design seem obvious. I think that is the same conclusion here - good design is simple, straightforward design with no real surprises.

    Wholly agree.

  • StevenWaterman an hour ago

    What do you call system design, when it's referring to the design of systems in general, and not just computer services?

    As in:

    - writing a constitution

    - designing API for good DX

    - improving corporate culture

    I intuitively want to call all of those system design, because they're all systems in the literal sense. But it seems like everyone else uses "system design" to mean distributed computer service design.

    Any ideas what word or phrase I could use to mean "applying systems thinking to systems that include humans"

  • dennisy 3 hours ago

    This post has some good concepts, but I do not feel it helps you design good systems. It iterates options and primitives, but good design is when and how you apply them, which the post does not provide.

    • dondraper36 3 hours ago

      But isn't that type of advice the best we can have? Having read Designing Data Intensive Applications (DDIA) and some system design interview-focused books (like those from Alex Xu), I have noticed two types of resources:

      * Fundamental books/courses on distributed systems that will help you understand the internals of most distributed systems and algorithms (DDIA is here, even though it's not even the most theoretical treatment)

      * Hand-wavy cookbooks that tend to oversimplify things, and (I am intentionally exaggerating here) teach to reason like "I have assumed a billion users, let's use Cassandra"

      I liked the article for its focus on real systems and the sensible rules of thumb instead of another reformulation of the gossip protocol that very few engineers will ever need to apply in practice themselves.

  • usernamed7 3 hours ago

    One thing i would add, is that a well designed system is often one that is optimized for change. It is rare that a service remains static and unchanging; browsers and libraries are regularly updated, after all. Thus if/when a developer takes on a feature ticket to add or change XYZ, it should be easy to reason about and have predictable side-effects of how that change will impact the system, and ideally be easy to change as well.

    • bbkane 5 minutes ago

      "optimized for change" really only works well if you can predict the incoming changes.

      Common tools used for this "optimization" often raise the complexity and lower the performance of the system.

      For example, a db with a single table with just a key and a value is very flexible and "optimized for change" but it offers lower performance (in most cases) and is harder to reason about.

      I also frequently see people (me too) prematurely make abstractions (interfaces, extra tables, etc) because they're "optimizing for change". Then that part never changes OR it changes in a way that their abstraction doesn't abstract over OR they figure out a better abstraction later on when the app has matured a bit. Then that part of the code is at best wasted space (usually it needs to be rewritten yet no one gets time to do that).

      Of course, it's also foolish to say "never abstract". I almost always find it worth it to abstract over I/O, just so I can easily add logging, dual writes, or mock it. And when a change is obviously coming down the line it makes sense to plan for it.

      But usually I'm served best by trying to keep most of my computation pure functions (easy to test), doing as little as possible in the I/O path (it should just persist or print stuff so I can mock it) and otherwise write obvious "deletable" code that does one thing so I can debug it and, only if necessary, replace with a better abstraction if I need to.

    • mrkeen an hour ago

      And you get this part somewhat for free if you're actually testing as you're going.

      When my service wants to store and retrieve as part of its behaviour, of course I'm going to back it with a hashmap first.

      Once I know it fulfills its business logic I'll start fiddling with hard-to-change stuff like DB schemas and migrations.

      And having finished and tested the logic, I'll have a much better idea of the actual access patterns so I can design good tables & indexes.

    • setnone an hour ago

      Some systems are designed to last vs. designed to adapt

  • whodidntante 30 minutes ago

    Never write an article about good system design.

    In all seriousness, this is an extraordinary subtle and complex area, and there are few rules.

    For example, "if you need data from multiple tables, JOIN them instead of making separate queries and stitching them together in-memory" may be useful in certain circumstances. For highly scalable consumer systems, the rule of "avoid joins as much as possible" can work a lot better.

    There is also no mention of how important it is to understand the business - usage patterns, the customers, the data, the scale of data, the scale of usage, security, uptime and reliability requirements, reporting requirements, etc.