Eventually consistent plain text accounting

(tylercipriani.com)

71 points | by thcipriani 2 days ago ago

29 comments

  • teo_zero a day ago

    I've been doing manual accounting with spreadsheets for years, while looking for a smarter system. I wonder if your solution gives satisfactory answers to the issues I struggle with:

    1. Entries in one month's CSV file may be repeated in the previous or following month's CSV file, typically movements at the beginning or end of the month that the bank takes some time to record, or adjusts later. Is this familiar with you and how do you address it? Is your system robust against it?

    2. Credit card: the total amount for a month is one single entry in the bank's CSV file, while a separate CSV file contains all the details. Do you rely on accounts to handle this indirect flow, e.g. one transaction of 1000$ from checking to cc, based on the single entry of the bank's CSV, then several transactions from cc to the various expense categories, based on the details CSV, and checking that the cc account has a zero balance?

    3. Some utility companies bill every 2 or 3 months. This makes monthly stats meaningless (why is September so high compared to August? did I spent too much or is it the effect of the phone bill of the previous quarter?). Do you make any effort at trying to allocate such expenses to the month(s) when the cost originated, rather when it's billed?

    • simonmic 15 hours ago

      Another answer:

      > 1. Entries in one month's CSV file may be repeated in the previous or following ... Is your system robust against it?

      People use various strategies for this. You can treat the data as partitioned (eg in months), or as a continuous stream. In general you can't rely on unique ids, including ones generated from the data (because identical transaction records are common).

      hledger has a simple built in method, which works pretty well for most people: it remembers the latest transaction date processed (for each input source). https://hledger.org/hledger.html#date-skipping

      > 2. Credit card: the total amount for a month is one single entry in the bank's CSV file, while a separate CSV file contains all the details. Do you rely on accounts to handle this indirect flow, e.g. one transaction of 1000$ from checking to cc, based on the single entry of the bank's CSV, then several transactions from cc to the various expense categories, based on the details CSV, and checking that the cc account has a zero balance?

      Yes, purchases made with a credit card and paying off the credit card balance are all separate transactions, so the most standard way is record them as such - either manually, or by downloading/converting/importing the data from both accounts. And for extra assurance, assert that the credit card's balance is zero after payoff.

      When downloading from both accounts, you get the payoff transaction in both downloads; one of them can be ignored by rule or manually, or both can be recorded with a dummy transfer account keeping things balanced.

      If you don't care enough about the detailed credit card purchases, you'd just record the checking data.

      > 3. Some utility companies bill every 2 or 3 months. This makes monthly stats meaningless...

      One workaround is to split the expense posting into several expense postings, each with its own date (still keeping them within the one journal entry). A more strictly correct one is to add transactions and accounts like in evrimoztamur's answer. Most often I'll just use hledger's -MA flags (--monthly --average).

      • teo_zero 10 hours ago

        > One workaround is to split the expense posting into several expense postings, each with its own date.

        Can a rule expeess such calculated fields like "set virtual date at 3 month before transaction date"?

        > A more strictly correct one is to add transactions and accounts like in evrimoztamur's answer.

        That's good for forecasting, but I don't see how to write automatic rules that ensure the accrued amount = billed amount.

        • simonmic 10 hours ago

          > Can a rule expeess such calculated fields like "set virtual date at 3 month before transaction date"?

          hledger's CSV rules can't do this, no. You'd need to either do that manually, or write a custom conversion script (or a custom preprocessor that modifies the CSV before hledger sees it).

          > That's good for forecasting, but I don't see how to write automatic rules that ensure the accrued amount = billed amount.

          Similar answer, if you wanted to generate those verbose entries from a single CSV record, it would need some manual entry or a custom preprocessor script. hledger's CSV rules will only generate 0 or 1 transactions per CSV record.

          What about the --average option ?

    • thcipriani a day ago

      Great questions.

      1. This is a familiar problem and it is where I spend the majority of time. If a bank provides a unique transaction ID the it would be a simple matter of code; however, none of mine do. What I do when I spot problems is run `hledger areg <account> --begin=<last month> --end=<this month>` and compare with the CSV. Opening the output of that command + this months CSV in vim using vertical split (`vsp`) and scrollbind (`scrollbind`) so each line scrolls together works well and is pretty quick. It's still a problem and a pain.

      2. This is solved in my system in a robust way (I believe) with a "virtual account." I do most of my spending on my credit card. Then every month I pay off the card from my checking. To represent this I need to see:

      - liabilities:credit with any balance I carry

      - expenses should show a transfer from liabilities:credit to the correct expense category/account

      - I should avoid being "double charged"; i.e., when I pay off liabilities:credit it should NOT be an expense.

      I've set it up like this:

      - rules/debit/journal.rules: if (payment to credit) then account2 assets:transfer:credit

      - rules/credit/journal.rules: if (payment from debit) then account2 assets:transfer:credit

      This becomes a way to spot problems. If assets:transfer:credit ever has a balance then something may be awry and you should spend some time digging. I do the same thing for Amazon. Each Amazon order is for multiple expenses. I have assets:transfer:amazon. This is how I learned the the $0.29 Colorado retail delivery fee is not universally applied to orders from Amazon—that was a frustrating journey!

      3. I haven't worried about this. Looks like other good responses in this thread.

      • RaftPeople a day ago

        > 1. This is a familiar problem and it is where I spend the majority of time. If a bank provides a unique transaction ID the it would be a simple matter of code; however, none of mine do.

        This is an annoying problem I've had to deal with also (banks have some unique ID somewhere, why not provide it?).

        I create my own unique ID to the extent that it's possible. Some txns have the exact same info but are 2 sep txns, requires a bit of messy logic.

    • evrimoztamur a day ago

      With 3, this is called an accrued expense. You book the estimated expenses of that specific month not against your accounts payable (to later reconcile against the invoice, also in accounts receivable) but against an accrued expense account. Once you receive the invoice, you also book it against the accrued expense account to reconcile it there. This way, your bookings might look like:

      1 May 2025 Debit 100.00 Utilities Credit 100.00 Accrued (utility) expenses

      1 May 2025 Pre-booking May usage Debit 100.00 Utilities Credit 100.00 Accrued (utility) expenses

      1 Jun 2025 Pre-booking July usage Debit 100.00 Utilities Credit 100.00 Accrued (utility) expenses

      30 Jun 2025 Received the invoice for May/Jun 2025 Credit 200.00 Accounts payable Debit 200.00 Accrued (utility) expenses

      6 Jul 2026 Paying off the invoice by deducting cash from the bank account and removing the payable liability Credit 200.00 Cash Debit 200.00 Accounts payable

      More info at https://en.m.wikipedia.org/wiki/Accrual#Accruals_in_accounti...

      • simonmic 14 hours ago

        And here's that (approximately) in h/ledger-ese:

          2025-05-01 Pre-booking May usage
              Expenses:Utilities                           100.00
              Liabilities:Accrued (utility) expenses      -100.00
          
          2025-06-01 Pre-booking June usage
              Expenses:Utilities                           100.00
              Liabilities:Accrued (utility) expenses      -100.00
          
          2025-06-30 Received the invoice for May/Jun 2025
              Liabilities:Accrued (utility) expenses       200.00
              Liabilities:Accounts payable                -200.00
          
          2025-07-06 Paying off the invoice by deducting cash from the bank account and removing the payable liability
              Liabilities:Accounts payable                 200.00
              Assets:Cash                                 -200.00
    • eterps a day ago

      > Entries in one month's CSV file may be repeated in the previous or following month's CSV file

      I am aware of this problem. The approach in the blog post definitely helps with that because the data is imported fresh every time again. So you could export your bank data with overlapping dates and do something like this prior to importing:

          cat *.csv | sort | uniq >clean.csv
    • NoboruWataya a day ago

      > 2. Credit card: the total amount for a month is one single entry in the bank's CSV file, while a separate CSV file contains all the details. Do you rely on accounts to handle this indirect flow, e.g. one transaction of 1000$ from checking to cc, based on the single entry of the bank's CSV, then several transactions from cc to the various expense categories, based on the details CSV, and checking that the cc account has a zero balance?

      Not OP but this is certainly how I've always done it. A bank account and a credit card are very much two separate accounts, they may even be with different institutions and one represents an asset whereas the other generally represents a liability. So in an accounting context it makes sense to treat them separately.

      • piva00 a day ago

        Same approach for me and I think I learnt it from using YNAB, I struggled a bit with the mental model at first but after it clicked it made a lot more sense to treat a credit card as a separate account.

    • jbarberu a day ago

      My approach for number 3 is to look at previous months and escrow money for it every month, and then pay it off once the bill comes due. From the accounting side you only see the monthly cost of the bill, even though it looks big on the bank statement.

    • eterps a day ago

      > Some utility companies bill every 2 or 3 months. This makes monthly stats meaningless (why is September so high compared to August? did I spent too much or is it the effect of the phone bill of the previous quarter?).

      According to Claude 3.5, ledger/hledger handles that situation like this:

          ; Automatically allocate $60 quarterly phone bill across 3 months
          ~ monthly
            Expenses:Phone          $20
            Liabilities:Accrued    $-20
          
          ; When actual bill comes
          2023-09-15 Phone Company
            Liabilities:Accrued     $60
            Assets:Bank            $-60
      
      This approach:

      1. Accrues monthly portions ($20) to a liability account

      2. When the real bill arrives, clears the accrued amount

      • taviso a day ago

        I think that's a miss for Claude, this doesn't look right at all. The accrual account is an okay solution, but the syntax is wrong! That syntax is only used for budgeting and forecasting.

        I think the solution is effective dates, there is an example pretty close to this scenario in the manual:

        https://ledger-cli.org/doc/ledger3.html#Effective-Dates

        • simonmic 14 hours ago

          I don't think it's terribly wrong. It will generate the desired entries (internally) and proper reports, if you add the appropriate option (--forecast=... for hledger, something else for Ledger). Or you could use it once to generate permanent journal entries, in your main journal or a forecast journal:

            hledger print --forecast='2024-11-01..+3 months' >>$LEDGER_FILE
          
          Effective dates are another solution, but IMHO a misfeature, I mention this for newcomers: https://hledger.org/hledger.html#secondary-dates
        • eterps 20 hours ago

          Thank you for pointing this out, will check your link instead.

  • ashish01 a day ago

    https://github.com/egh/ledger-autosync is really useful for integrating banks OFX files into an existing ledger file. Over time, it learns common patterns, automatically assigning transactions to the correct accounts.

  • simonmic 16 hours ago

    G'day all.. checking in with my usual 2 (or 20..) cents. I'm loving this article, and discussion, I always learn lots from these.

    Eventual consistency is a nice phrase to highlight. I feel (https://fosstodon.org/@simonmic/113444957161144239) the same way. For me it's about not feeling you're on a time-sensitive treadmill, where if (when) you fall behind, you might risk losing data or never catching up. (Though book-keeping regularly is still advantageous because it's less work.) I believe the "pure function" style of many PTA tools, where all inputs are known and the output is deterministic, combined with version control, are a big part of this.

    For newcomers, I like to make clear there's multiple styles of doing plain text accounting, including:

    1. Entering all your own data by hand (optionally assisted by data entry tools).

    2. Downloading data periodically from banks (as CSV, OFX, PDF, or whatever you can get) and importing it once to your journal file.

    3. Downloading data periodically from banks, keeping that as your master data, and frequently regenerating journal files from it.

    Each of these has its pros and cons, and there are people successfully using each of them. hledger's own docs focus on methods 1 and 2. Several third-party workflows using method 3 have been built on top of hledger; Full-Fledged Hledger that OP is using is the best known.

    1 requires nothing but hledger.

    2 adds: either manual bank downloading, or setting up one or more bank download tools/APIs; plus maintenance of conversion rules/scripts.

    The workflows implementing 3 add: a recommended file layout, a recommended process, and tools/scripts which depend on these. They bring some benefits (a well documented featureful workflow) and some costs (added complexity). This isn't justified for all users, so don't think you're required to use these workflows.

    More details: https://hledger.org/workflows.html

    • simonmic 13 hours ago

      4. Downloading CSV (TSV, SSV, *SV) data periodically from banks, and using it directly as hledger's data files, without converting to journal format at all. (Optionally adding a journal file just for configuration directives.)

      5. Generating one of hledger's input formats from somewhere and piping it into hledger's stdin, running it without saving any data files.

      • thcipriani 2 hours ago

        Thank you for laying these out. And thank you for all the work you've done to make my tiny niche workflow possible.

  • s0fasurfa a day ago

    My major gripe with hledger and other plaintext accounting systems: setting up the correct rules (regex for your expenses) takes too much time as it involves:

    a) defining the regexes/categories) for dozens to hundreds of expense descriptions (e.g. walmart, gas station xyz)

    b) "recompiling" your hledger journals after every rule change

    c) checking for negative and positive errors (expense description was not matched OR too many different expenses were matched)

    This process is much faster in a spreadsheet application. IMHO the journal format of hledger and other plaintext accounts apps is too verbose.

    it would be a great relief if someone started a Github repository with expense texts used when you e.g. pay at Carrefour in Italy or Walmart in the USA with your debit card. People could submit those descriptions from the CSV exports of their bank accounts.

    Another annoyance in Europe is that there is no API connection for open source apps to bank account statements, you always rely on manual CSV exports.

    • simonmic 14 hours ago

      > My major gripe with hledger and other plaintext accounting systems: setting up the correct rules (regex for your expenses) takes too much time as it involves: > a) defining the regexes/categories) for dozens to hundreds of expense descriptions (e.g. walmart, gas station xyz)

      We don't usually do it all up front; it's easier to build up your rules over time, improving them a little each time you download.

      > b) "recompiling" your hledger journals after every rule change

      It's not required (see my other comment).

      > c) checking for negative and positive errors (expense description was not matched OR too many different expenses were matched)

      For the not matched case: hledger's fallback category is "expenses:unknown" (and inflows are "income:unknown"). I usually watch for those, with something like

        watchexec -- 'hledger import *.csv --dry-run | tail +2 | hledger -f- -I print unknown'
      
      and tweak rules until there are no unknowns, before importing.

      For the other - I guess I have a similar answer - if you're not certain of your rules, preview the conversion and tweak them, before finalising it.

      > This process is much faster in a spreadsheet application. IMHO the journal format of hledger and other plaintext accounts apps is too verbose.

      You can keep your data in CSV/TSV instead of journal. This could be better for spreadsheet interop.

      I think a more compact, but still readable journal format would be interesting (something between CSV and current journal format).

      I agree some things are much easier in a spreadsheet (and also, vice versa).

      > it would be a great relief if someone started a Github repository with expense texts used when you e.g. pay at Carrefour in Italy or Walmart in the USA with your debit card. People could submit those descriptions from the CSV exports of their bank accounts.

      Please do, I would love to see it! However, wouldn't it be a really huge list of text patterns/regexps, with 0.1% of them useful to any one person ? Perhaps region-specific collections could work.

      https://github.com/simonmichael/hledger/tree/master/examples... is hledger's collection of CSV rules; these cover the basics of CSV conversion, but not the expense categorising.

      > Another annoyance in Europe is that there is no API connection for open source apps to bank account statements, you always rely on manual CSV exports.

      Ha! It's not better in the US at least (except, by giving up your bank credentials to third parties). We are usually envying your APIs over there.

  • davidhunter a day ago

    Has anyone else gone on the following journey:

    1. Use excel

    2. See ledger/hledger. Think this must be 'the way'. Go all in.

    3. Constantly wrestle with ledger/hledger because you only do your accounting once per month/quarter which is not enough frequency to really grok it.

    4. Use excel with a new sense of calm that you're not missing out on something better

    • jsdwarf 20 hours ago

      Been there. hedger/ledger look tempting, but if you follow the approach of the OP, you need to build your own toolchain around it so that it recompiles your journals on each rule change.

      I've found it easier to use a spreadsheet. Sheet 1 contains the csv export to which i constantly append, sheet 2 the hierarchical acoount structure: an account name like "expenses: groceries:walmart" is followed by a regex that matches the expense description on sheet 1 for that account (walmart). On sheet 1 I have an "detected Account" column, into which a formula outputs the detected account based in the regexes.

      Sumif formulas sum up the totals per account. Since no compilation step is necessary, rule changes are picked up much faster

    • nulbyte a day ago

      Not Excel, but LibreOffice Calc, here. It's just a biweekly budget, helps with making sure Ive acheduled my bills for the month and tracks savings.

      Toward the end of every year, I go through the ritual of preparing to move back to ledger and getting files ready. And shortly after the start of the year, I abandon it. Maybe one year I will stick with it. Maybe.

  • kreyenborgi a day ago

    There was some interesting discussion on https://www.reddit.com/r/plaintextaccounting/comments/16xhjm... about GUI's for plain text accounting; anyone tried Paisa? Looks nice for those who don't want to learn gnuplot

    • simonmic 14 hours ago

      It's great, and uniquely works with Ledger, hledger or beancount. But probably best if you're starting from scratch; making it work with existing data is harder.

      Fava is also very good (for Beancount; hledger can export to it).

      hledger-web is much simpler than these and can be good for less technical users.

      More UIs: starting at https://plaintextaccounting.org/#ui-console

  • a day ago
    [deleted]
  • daviddavid11 a day ago

    [dead]