FORTH? Really!?

(rescrv.net)

22 points | by rescrv 11 hours ago ago

8 comments

The observation that concatenative programming languages have nearly ideal properties for efficient universal learning on silicon is very old. You can show that the resource footprint required for these algorithms to effectively learn a programming language is much lower than other common types of programming models. There is a natural mechanical sympathy with the theory around universal learning. It was my main motivation to learn concatenative languages in the 1990s.

This doesn't mean you should write AI in these languages, just that it is unusually cheap and easy for AI to reason about code written in these languages on silicon.

d3nit 18 minutes ago

From the title alone I tought it will be another FORTH interpreter implementation article, but I was happy to see someone actually using it for anything besides proving their interpreter with a Fibonacci calculation.

[-]

macintux 2 minutes ago

[delayed]

rescrv 11 hours ago

Looking to discuss with people about whether LLMs would do better if the language had properties similar to postfix-notation.

[-]

cameldrv 5 minutes ago

Even though I really like postfix from an elegance standpoint, and I use an RPN calculator, IMO it's harder to reason about subexpressions with postfix. Being able to decompose an expression into independent parts is what allows us to understand it. If you just randomly scan a complex expression in infix, if you see parenthesis or a +, you know that what's outside of the parenthesis or on the other side of a + can't affect the part you're looking at.

If you're executing the operations interactively, you're seeing what's happening on the stack, and so it's easy to keep track of where you are, but if you're reading postfix expressions, it's significantly harder.

crq-yml 37 minutes ago

I have just spent a month writing about 2000 lines of Forth. My answer is no, at least w/r to generating something that looks like the by-hand code I wrote. LLMs coast by on being able to reproduce idiomatic syntax and having other forms of tooling(type checkers, linters, unit tests, etc.) back them up.

But Forth taken holistically is a do-anything-anytime imperative language, not just "concatenative" or "postfix". It has a stack but the stack is an implementation detail, not a robust abstraction. If you want to do larger scale things you don't pile more things on the stack, you start doing load and store and random access, inventing the idioms as you go along to load more and store more. This breaks all kinds of tooling models that rely on robust abstractions with compiler-enforced boundaries. I briefly tested to see what LLMs would do with it and gave up quickly because it was a complete rewrite every single time.

Now, if we were talking about a simplistic stack machine it might be more relevant, but that wouldn't be the same model of computation.

shakna 2 hours ago

Most models are multi-paradigm, and so they get... Fixated on procedural language design. Concepts like the stack, backtracking, etc. violate the logic they've absorbed, leading to... Burning tokens whilst it corrects itself.

This won't show up in a smaller benchmark, because the clutching at straws tends to happen nearer to the edge of the window. The place where you can get it to give up obvious things that don't work, and actually try the problem space you've given.

[-]

rescrv 39 minutes ago

I haven’t tried the extremes. Context rot says it’ll likely degrade there anyway.

What I’m investigating is if more compact languages work for querying data.

What makes you think it’s going to clutch at straws more? What makes you think it won’t do better with a more compact, localized representation?