This kind of thing beats me. Why should a "Large Language Model" be expected to act as a calculator. Clue one is on the name, clue two might be an understanding that it is based on statistics, it is not the deterministic tool you need.
Did exactly that for the actual filing — Python, mentioned in the post. The 23 numbers were a probe, not the goal: I wanted to understand how it works.
This kind of thing beats me. Why should a "Large Language Model" be expected to act as a calculator. Clue one is on the name, clue two might be an understanding that it is based on statistics, it is not the deterministic tool you need.
Though it’s obvious, nice write up. This is the kinda rabbit hole I enjoy going through and reading.
Have you thought of using a calculator for this task?
Did exactly that for the actual filing — Python, mentioned in the post. The 23 numbers were a probe, not the goal: I wanted to understand how it works.