Rust for tokenising and parsing

(xnacly.me)

285 points | by thunderbong 8 months ago ago

299 comments

About 2 months ago I would have said the same as the author, but I kept running against the hard edges of Rust: the borrow checker. I realised that while I really liked using algebraic data types (e.g. Enums) and pattern matching, the borrow checker and the low level memory concerns meant I spent a lot of time fighting the borrow checker instead of fighting the PL issues at the heart of my project. So while tokenising/parsing was nice, interpreting and typechecking became the bane of my existence

With that realisation I started looking for another more suitable language - I knew the FP aspects of Rust are what I was looking for so at first I considered something like F# but I didn't like that it's tied to microsoft/.NET. Looking a bit further I could have gone with something like Zig/C but then I lose the FP niceness I'm looking for. I also spent a fair amount of time looking at Go, but eventually decided that 1. I wanted a fair amount of syntax sugar, and 2. golang is a server side language, a lot of its features and library are geared towards this use case.

Finally I found OCaml, what really convinced me was seeing the syntax was like a friendly version of Haskell, or like Rust without lifetimes. In fact the first Rust compiler was written in OCaml, and OCaml is well known in the programming language space. I'm still learning OCaml so I'm not sure I can give a fair review yet, but so far it's exactly what I was looking for.

[-]

krick 8 months ago

Bringing up goland always annoys me for some reason. Like, it's really practical, it is fast, but not actually low-level, it compiles fast, and most importantly it is very popular and has all the libraries. It seems like I should use it. But I just almost irrationally hate the language itself. Everything about it is just so ugly. It's a language invented in 2009 by some C-people who are apparently oblivious to everything that excited PL design folks for the last 20 years (as of 2009). PHP in 2009 was already a more modern and better designed language than goddamn golang. And golang didn't really improve since. I just cannot let it go somehow.

[-]

pjmlp 8 months ago

It is worse than that, as Go initially lacked generics (introduced by CLU and ML in 1976), still doesn't do even basic Pascal enumerations (1970) rather the iota/const dance, let alone the 1990's programming language design surface.

I only advocate for it on the scenarios where a garbage collected C is more than enough, regardless of the anti-GC naysayers, e.g. see TamaGo Unikernel.

[-]

mattpallissard 8 months ago

> Go initially lacked generics

And even though it has them now, the generics syntax is pretty clunky IMO.

randomdata 8 months ago

> still doesn't do even basic Pascal enumerations

The term you are looking for is sum types (albeit in a gimped form in the case of Pascal). Enumerations refer to the value applied to the type, quite literally, and is identical in Pascal as every other language with enumerations, including Go. There is only so much you can do with what is little more than a counter.

[-]

samatman 8 months ago

I'm fairly sure he's referring to enumerations actually.

Pascal doesn't require case matching of enumerations to be exhaustive, but this can be turned on as a compiler warning in modern Pascal environments, FreePascal / Lazarus and such.

Go only has enums by convention, hence the "iota dance" referred to. I've argued before that this does qualify as "having enums" but just barely.

It wouldn't have been difficult to do a much better job of it, is the thing.

[-]

randomdata 8 months ago

> Pascal doesn't require case matching of enumerations to be exhaustive

Normally in Pascal you would not match on the enumeration at all, but rather on the sum types.

    type
       Foo = (Bar, Baz)

    case foo:
       Bar: ...  // Matches type Bar
       Baz: ...  // Matches type Baz

The only reason for enumerations in Pascal (and other languages with similar constructs) is because under the hood the computer needs a binary representation to identify the type, and an incrementing number (an enum) is a convenient source for an identifier. In a theoretical world where the machine is magic you could have the sum types without enums, but in this reality...

Thus, yes, in practice it is possible to go around the type system and get the enumerated value out with Ord(foo), but at that point its just an integer and your chance at exhaustive matching is out the window. It is the type system that allows more flexibility in what the compiler can tell you, not the values generated by the enumeration.

> Go only has enums by convention

"Enums by convention" would be manually typing 1, 2, 3, 4, etc. into the code. Indeed, that too is an enumeration, but not as provided by the language. Go actually has enums as a first-class feature of the language[1]. You even say so yourself later on, so this statement is rather curious. I expect you are confusing enums with sum types again.

[1] Arguably Pascal doesn't even have that, only using enums as an implementation detail to support its sum types. Granted, the difference is inconsequential in practice.

[-]

samatman 8 months ago

Pascal enums are not sum types, because they are not the sum of multiple types. They are an enumeration of discrete values, which is why they're called enums.

Sum types in Pascal are called variant records:

  type
    FooKind = (Foo, Bar, Baz); (* An enum *)

    FooOrBaz = record (* This is the sum type *)
      case foo: FooKind of
        Foo: (quux: Double);
        Bar: (zot, zap: Double);
        Baz: (xyzzy: String);
      end

Rust conflates the 'enum' keyword with sum types. Pascal does not do this. One of us is confused about what a sum type is. It isn't me.

As for Go, my full opinion on that subject may be found here. If you're... curious. Let's say.

https://news.ycombinator.com/item?id=40224485

[-]

randomdata 8 months ago

> This is the sum type

This is the exact same thing, except in addition to the tag there is also a data component. Yes, this is the more traditional representation of sum types, but having an "undefined" data component is still identifiable as a sum type. It is the tag that makes the union a sum type.

In Typescript terms, which I think illustrates this well, it is conceptually the difference between:

   { kind: 0 } | { kind: 1 }

and

   { kind: 0; data: T } | { kind: 1; data: U }

Which is to say that there is no difference with respect to the discussion here.

> They are an enumeration of discrete values

Yes, the "tag" is populated with an enumerator. There is an enumerator involved, that it is certain, but it is outside of the type system as the user sees it. It's just an integer generated to serve as an identifier – an identifier like seen in the above examples – but provided automatically. The additional information you can gain from it, like exhaustive matching, comes at the type level, not the number itself.

> Rust conflates the 'enum' keyword with sum types.

Right, because it too uses an enumerator to generate the tag value. Like Ord(foo) before, you can access the enumerated in Rust with something like

    mem::discriminant(&foo)

The spoken usage of 'enum' in Rust is ultimately misplaced, I agree. An enumerator is not a type! But it is not wrong in identifying that an enumerator is involved. It conflates 'enum' only in the very same way you have here.

[-]

samatman 8 months ago

There's absolutely a difference between an enumeration and a sum type.

[-]

randomdata 8 months ago

Exactly. One describes a type, the other a numberer. Very different concepts, even if the latter is often used in the implementation of the former (e.g. in Pascal and Rust). Glad to see you are past your earlier confusion.

[-]

samatman 8 months ago

You do not understand what a sum type is. At all.

There are innumerable ways you can rectify this lack of understanding if you want to.

[-]

randomdata 8 months ago

There may be innumerable (ironic) ways for someone else to explain what it means, but we are here to understand what you mean. The only logical way to achieve that is for you to tell us.

1. If you have a pet sum type definition, let it be known. But traditionally, a sum type is better known as a discriminated or tagged union. So far, this is what we understand Pascal offers: A union type that discriminates its subtypes by an enumerated value.

2. The tag that discriminates the type within the union (or whatever your explanation above ends ups calling it) is, in implementation, generated by a process that assigns a number to each entity; something also true of Rust. This is undeniable, as proven by the use of Ord and mem::discriminant. If you have another name for that numberer, if not an enumerator, let it be known.

Thaxll 8 months ago

So every language have to implement every features released in the last 50 years?

[-]

pjmlp 8 months ago

Not necessarly, but designing for stone age computers isn't ideal either, even C, Fortran and COBOL have progressed during those 50 years.

treyd 8 months ago

I expect it to implement features that have become ubiquitous in every other mainstream language from the last 30 years, yes.

[-]

randomdata 8 months ago

But then what purpose would it serve? The last 30 years has brought no lack of new languages, not to mention evolution of older languages. Just use one of them.

The purpose of creating yet another language with Go was to break from what everyone else was doing, to see if a "simple" language would stop developers from playing with fun language toys all day to instead focus on actual engineering.

Arguably it was successful in that.

[-]

pjmlp 8 months ago

The purpose was to get rid of C++ compile times and replace C++ with Go, failing that, they decided to pivot the story.

https://commandcenter.blogspot.com/2012/06/less-is-exponenti...

They also got lucky with Docker and Kubernetes, pivoting from Python and Java respectively into Go.

[-]

randomdata 8 months ago

Numerous languages already got rid of C++ compile times. That didn't necessitate another one. Certainly Go wouldn't want C++ compile times any more than any other language, but as far as justifying its creation that would not be sufficient justification alone.

Legend has it that Go was conceived while waiting for a C++ program to compile, if that's what you are thinking of?

Thaxll 8 months ago

That's how you end up with C++ and soon C#.

[-]

biorach 8 months ago

No, the person you were replying to was advocating for the intersection of ubiquitous features. C++ seems to be aiming for the union.

pjmlp 8 months ago

We also end up with C23, Fortran 2023, COBOL 2023, Scheme R7RS,... even those oldies embrace modern ideas.

umanwizard 8 months ago

It’s really not. The strawman that if we add a few features that have stood the test of time in every other language we’ll end up with C++ is just not true. Nobody is proposing adding SFINAE-based conditional compilation, rvalue references, multiple inheritance, or any of the million other Byzantine features that make C++ virtually impossible to use correctly. Adding sum types and a match statement does not necessarily start you down that path.

miohtama 8 months ago

When we can call it Go++

pimeys 8 months ago

I know, I'm on the same boat. What I realized is I just need to avoid the companies using Go and I don't really need to be vocal about my dislike. It's not my loss if others find the language useful, but for me it either solves problems I'm not interested in solving, or the language and tooling just does not make it for me.

But, I can always just write Rust and be happy where I am. Or, to be honest, would not be very unhappy with F#, Haskell or Ocaml either.

[-]

mistrial9 8 months ago

> I just need to avoid the companies using Go

and they also will avoid you! A monthly go-lang meetup in San Francisco impressed me as the only meetup I have ever been to where no one (in a crowded venue) seemed to want to talk to anyone outside their clique

[-]

guappa 8 months ago

They don't like hearing that the new exciting feature that the new go has was already in every other programming language 10 years ago.

meowface 8 months ago

>What I realized is I just need to avoid the companies using Go

What do you mean, exactly?

[-]

leftyspook 8 months ago

I'd imagine not seeking employment at Google is a big part.

[-]

tempest_ 8 months ago

Go is definitely used a lot more outside of google as of late.

Anecdotally I would say where a lot of companies would have used Java in the past they are now turning to go for their server-side/backend service implementations.

[-]

danudey 8 months ago

Also it feels like pretty much everything in the k8s/container/etc. space is go-related, which kind of makes sense.

[-]

guappa 8 months ago

Why does it make sense? You can implement a container in any language that can make system calls.

guappa 8 months ago

I've had to use go occasionally and it feels like the language is designed to stop me from achieving my goals.

The standard library is unimpressive (to be generous), it has plenty of footguns like C but none of its flexibility.

Also for some reason parenthesis AND \n are required. So you get the worse of C and python there.

[-]

danudey 8 months ago

> The standard library is unimpressive (to be generous)

Coming from Python, this is one of the major things that I just can't get past with golang (despite having to use it for work). The standard library has a lot of really interesting/impressive/useful things to cover niche cases, but is missing a lot of what I would consider basic functionality that I keep running into requiring me to go get an external module to solve the problem.

Then, on top of that, the documentation for external modules is extremely terrible. In many cases the best you can get is API documentation in the form of "these are the functions, this is what they take and return" with no explanation of what those values need to be, what the function does with them, and so on; a simple list of functions. In others, there is that plus example code which doesn't work because it hasn't been updated since the last time backwards-incompatible changes were made so you end up down a rabbit hole of trying to debug someone else's wrong code.

The only thing letting me write effective golang at this point is that VSCode can autocomplete a lot of method calls, API calls, and so on, and then tell me what parameters they need, but even then I'm just guessing about what function might exist and what it might be called.

The language itself is okay and the more I use it the more I understand why they implemented all the stuff I hate (like a lack of proper error handling leading to half of my lines of code being boilerplate `if err != nil` blocks), but if the tooling around it wasn't so good no one would take it remotely seriously.

aftbit 8 months ago

You're intended to run gofmt on every save. golang is designed to be a sort of straight-jacket that forces everyone to write code in the same way (style etc) so that the junior devs can understand it clearly.

[-]

grey-area 8 months ago

And so that people (of any level) don't bikeshed over silly things like tabs and spaces or { and newlines.

I really like this about go - that it formats code for you, and miss it in other languages where we have linters but not formatters, which is a terrible idea IMO.

[-]

umanwizard 8 months ago

What mainstream languages don’t have formatters nowadays? Rust has rustfmt, C and related languages have clang-tidy, python has Black…

[-]

grey-area 8 months ago

I mean built in that everyone uses. There are of course third party formatters.

[-]

umanwizard 8 months ago

You can force everyone on a project to use a formatter just as easily in any of those languages as you can in Go, with a few lines in your CI job definitions. Whether they're third-party or not is irrelevant.

[-]

Mawr 8 months ago

It's not irrelevant, it's the entire point. The whole Go ecosystem uses a single formatter. There's no need to force anyone, everyone just uses gofmt. You will not be able to replicate that level of ubiquity easily at all.

[-]

guappa 8 months ago

Not true. Because after running go fmt and pushing our CI fails because of other stylistic concerns that are checked by another tool.

umanwizard 8 months ago

Not really true because you have both gofmt and gofumpt, and gofmt can be run with -s or not, and also lots of projects are using separate linters to enforce maximum line length.

[-]

mikeschinkel 8 months ago

But true enough.

mikeschinkel 8 months ago

You can only force everyone on a project to use a formatter if you are the one who has the authority to decree what will be done on the team. OTOH, if you are but a team member and you join a team where the manager does not think common formatting is important, then you get f*ck all.

So being built-in, idiomatic and expected cannot be over-appreciated.

mistrial9 8 months ago

that's your preference but not universal

[-]

grey-area 8 months ago

My only preference is not to have to think about whitespace etc. I guess others who feel differently certainly wouldn't like Go, and that's ok!

guappa 8 months ago

Why are you talking about go fmt? I'm talking about parser of the go language

If I write

    if x
    {
    bla
    }

it will not compile, because the { needs to be on the same line as the X (for no reason whatsoever).

mattgreenrocks 8 months ago

I’m convinced there’s a contingent of devs who don’t like/grok abstraction. And it overlaps partially with stated goals of an easy language to onboard inexperienced devs with.

Nothing wrong with that, but it will probably never work for me. Newer versions of Java are much more enjoyable to work with versus Go.

[-]

keldaris 8 months ago

> I’m convinced there’s a contingent of devs who don’t like/grok abstraction.

I am one of those. I grok abstractions just fine (have commercially written idiomatically obtuse Scala and C#, some Haskell for fun, etc.), but I don't enjoy them.

I use them, of course (writing everything in raw asm is unproductive for most tasks), but rather than getting that warm fuzzy feeling most programmers seem to get when they finish writing a fancy clever abstraction and it works on the first try, I get it when I look at a piece of code I've written and realize there is nothing extraneous to take away, that it is efficient and readable in the sense of being explicit and clear, rather than hiding all the complexity away in order to look pretty or maximize more abstract concerns (reusability, DRY, etc.).

This mindset is a very good fit for writing compute-heavy numerical code, GPU stuff and lots of systems level code, not so much for being a cog in a large team on enterprise web backends, so I mostly write numerical code for physics simulations. You can write many other things this way and get very fast and bloatfree websites or anything else, but it doesn't work well in large teams or people using "industry best practices". It also makes me prefer C to Rust.

[-]

fuzztester 8 months ago

>I get it when I look at a piece of code I've written and realize there is nothing extraneous to take away,

"Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away."

- Antoine de Saint-Exupery

https://www.brainyquote.com/quotes/antoine_de_saintexupery_1...

[-]

guappa 8 months ago

Given how go's binaries are 500x bigger than other binaries, I'd say it has still something to take away :p

alpaca128 8 months ago

> I’m convinced there’s a contingent of devs who don’t like/grok abstraction

I am one of them. I don't like Go, though. Enums and tagged unions aren't abstractions but fundamental features in my book. It's pretty transparent how they look in memory and there's nothing hidden about them.

What does confuse me are things like macros or annotations that magically insert something and make the code incomprehensible. I'm sure it's convenient to use, but it makes my brain try to manually translate it to simple instructions like a foreign language.

In my free time I like using Rust without custom traits (except a few iterators), that's close to the sweet spot for me.

ralegh 8 months ago

I guess I’m in that camp. I can come up with a good abstraction after working on a problem for a while and refactor it into my code. Or I can come up with a really simple abstraction (eg a Go interface with 2-3 methods), and that usually works well. But I try to avoid starting a project by defining a bunch of abstractions, since I just end up writing loads of boiler plate. Yes, I’m probably doing some things wrong.

[-]

mattgreenrocks 8 months ago

Sounds about right. Proper abstractions are difficult to get right up front, might as well pull them out only when they're obvious and profitable.

grey-area 8 months ago

Strange that you bracket don't like/don't understand together like that.

The vast majority understand abstractions just fine, though each takes time to understand. However most people like their own abstractions best, and those of other people less. To me hell is living in a world of bad abstractions created by someone else.

Every abstraction created adds to cognitive load when reading the code and to the maintenance burden of that code. So you have an abstraction budget, which is usually in overspent IME and needs to be carefully controlled. Most of the most horrible codebases are horrible because they have too many of the wrong sort of abstraction.

[-]

mattgreenrocks 8 months ago

Everyone lands at a different spot.

Personally, I don't want to write any new code in something that doesn't have ADTs, or the moral equivalent (Java's sealed classes). I've already written a lifetime of code without them, so I suppose part of that is not wanting to write another 20 years of the same code. :)

[-]

grey-area 8 months ago

If you don’t like subclasses changing code, isn’t inheritance the real problem?

seanw444 8 months ago

Man, Go gets a lot of hate on here. It's certainly not the most flexible language. If I want flexibility + speed, I tend to choose Nim for my projects. But for practical projects that I want other people to be able to pick up quickly, I usually opt for Go. I'm building a whole product manufacturing rendering system for my company, and the first-class parallelism and concurrency made it super pleasant.

I will say that the error propagation is a pain a lot of the time, but I can appreciate being forced to handle errors everywhere they pop up, explicitly.

[-]

f154hfds 8 months ago

So much of language opinion is based on people's progression of languages. My progression (of serious professional usage) looked like this:

Java -> Python -> C++ -> Rust -> Go

I have to say, given this progression going to Rust from C++ was wonderful, and going to Go from Rust was disappointing. I run into serious language issues almost daily. The one I ran into yesterday was that defer's function arguments are evaluated immediately (even if the underlying type is a reference!).

https://go.dev/play/p/zEQ77TIP8Iy

Perhaps with a progression Java -> Go -> Rust moving to rust could feel slow and painful.

[-]

62951413 8 months ago

I'm curious how one ends up with such ahistorical sequence. I'd expect it to be more aligned with the actual PL history. Mainstream PLs have had a fairly logical progression with each generation solving well understood problems from the previous one. And building on top of the previous generation's abstractions.

Turbo Pascal for education, C as professional lingua franca in mid-90s (manual memory management). C++ was all the rage in late 90s (OOP,STL) . Java got hot around 2003 (GC, canonical concurrency library and memory model). Scala grew in popularity around 2010-2012 (FP for the masses, much less verbosity, mainstream ADTs and pattern matching). Kotlin was cobbled together to have the Scala syntactic sugar without the Haskell-on-the-JVM complexity later.

And then they came up with golang which completely broke with any intellectual tradition and went back to before the Java heyday.

Rust feels like a Scala with pointers so the "C++ => Rust" transition looks analogous to the "Java => Scala" one.

[-]

aaronblohowiak 8 months ago

>I'm curious how one ends up with such ahistorical sequence.

they are all actively in-use.. if gp is earlier in their career, it could all be in last 10 years.

guappa 8 months ago

I learnt QuickBasic, VisualBasic, Java, C, Python, Go, C++ in that order.

I'd never do a project in go.

Go suitable for networking? Really? With no packed structs and no way to set the endianness?

materielle 8 months ago

Go is definitely of the “worse is better” philosophy. You can basically predict what someone will think of Go if you know how they feel about that design philosophy.

I remember that famous rant about how Go’s stdlib file api assumes Unix, and doesn’t handle Windows very well.

If you are against “worse is better” like the author, that’s a show stopping design flaw.

If you are for it, you would slap a windows if statement and add a unit test when your product crosses that bridge.

[-]

guappa 8 months ago

> Go’s stdlib file api assumes Unix

Until you want to do unix things like mmap() and madvise() of course. In which case it assumes an OS without those really basic features.

speed_spread 8 months ago

The problem is that most of the time, errors are not to be handled but only bubbled up. I've also seen it in Java with checked exceptions: the more explicit error handling is, the more developers feel they should somehow try to do _something_ with the error when the correct thing to do would actually be to fail in the most straightforward manner. The resulting code is often much heavier than necessary because of this and the stacktraces also get polluted by overwrapping.

[-]

Mawr 8 months ago

The problem with the opposite is, since everything gets invisibly bubbled up to the top, you are not able to tell what errors do need to be handled. You only find those out from runtime failures and that's no good if you care about reliability.

[-]

speed_spread 8 months ago

You are right, I wouldn't want totally invisible bubbling either. I like Rust's ? notation although it's not perfect and I'm sure another language could do better maybe with more structured error classes.

mikeschinkel 8 months ago

Sounds like a lack of good code review.

[-]

speed_spread 8 months ago

That's just another way of dismissing the problem as a "skill issue" and is not helpful. While the problem can be prevented by having strong coding standards, many teams do not have the luxury of wisdom and thus recreate this exception tar pit again and again.

dinosaurdynasty 8 months ago

I use golang for work and have done a fair amount of Rust programming. Rust feels like the higher level language. This really shouldn't be the case.

madeofpalk 8 months ago

go’s error handling patterns, while lacking every established feature that makes it ergonomic, is baffling.

Embarrassing that developers are still forgetting nil pointer checks in 2024.

[-]

ziml77 8 months ago

The error handling is one of the worst parts of Go for me. Every call that can fail ends up being followed by 3 lines of error handling even if it's just propagating the error up. The actual logic get drowned out.

[-]

danudey 8 months ago

I would kill for some kind of `err_yield(err)` construct that handles propagating the error if it's the caller's problem to deal with.

That said, I discovered that Go has the ability to basically encapsulate one error inside of another with a message; for example, if you get an err because your HTTP call returned a 404, you can pass that up and say "Unable to contact login server: <404 error here>". But then the caller takes that error and says "Could not authenticate user: <login error here>", and _their_ caller returns "Could not complete upload: <authentication error here>" and you end up with a four-line string of breadcrumbs that is ostensibly useful but not very readable.

Python's `raise from` produces a much more readable output since it amounts to much more than just a bunch of strings that force you to follow the stack yourself to figure out where the error was.

[-]

instig007 8 months ago

> I would kill for some kind of `err_yield(err)` construct that handles propagating the error if it's the caller's problem to deal with.

This is called (>>=)[0], but most of the industry is ignorant enough to call it impractical and non-pragmatic (as opposed to their pragmatic `if err`)

[0] https://hackage.haskell.org/package/base-4.20.0.1/docs/Prelu...

tuveson 8 months ago

Fast to compile, fast to run, simple cross-compilation, a big standard library, good tooling…

As ugly and ad-hoc as the language feels, it’s hard to deny that what a lot of people want is just good built-in tooling.

I was going to say that maybe the initial lack of generics helped keep compile times low for go, but OCaml manages to have good compile times and generics, so maybe that depends on the implementation of generics (would love to hear from someone with a better understanding of this).

[-]

klodolph 8 months ago

There are a million little decisions that affect compile time. A big factor here is inlining. When you inline functions, you may improve the generated code or you may make it worse. It’s hard to predict the result because the improvements may come about because of various other code transformation passes which you perform after inlining. After inlining, the compiler detects that certain code paths are impossible, certain calls can be devirtualized, etc., and this can enable more inlining.

Rust is designed with the philosophy of zero-cost abstractions. (I don’t like the name, because the cost is never zero, but it is what it is.) The abstractions usually involve a lot of function calls and you need a compiler with aggressive inlining in order to get reasonable performance out of Rust. Usage of generics still results in the same non-virtual calls which can be inlined. But the compiler then has to do a lot of work to evaluate inlining for every instantiation of every generic.

Go is designed with the philosophy of simple abstractions, which may come with a cost. Generics are implemented in a way that means you are still doing a lot of dynamic dispatch. If you need speed in Go, you should be writing the monomorphic code yourself. Generics don’t get instantiated for every single type you use them with. They only get instantiated for every “shape” of type.

[-]

sophacles 8 months ago

> Rust is designed with the philosophy of zero-cost abstractions. (I don’t like the name, because the cost is never zero, but it is what it is.)

So when the generated asm is the same between the abstraction and the non-abstraction version, wheres the cost?

[-]

steveklabnik 8 months ago

The point is that different people have different understandings of "cost." You're correct that that kind of cost is what "zero cost abstractions" means, but there are other costs, like "how long did it take to write the code," that people think about.

samatman 8 months ago

Cognitive cost is the most important cost to minimize.

A Rust project's cognitive cost budget comes out of what's left over after the language is done spending. This is true of any language, but many language designers do not discount cognitive costs to zero, which, with the "zero cost abstraction" slogan, Rust explicitly does.

klodolph 8 months ago

> So when the generated asm is the same between the abstraction and the non-abstraction version, wheres the cost?

The generated asm isn’t the same.

There’s also a presupposition here that you know what the non-abstracted version would look like. If you don’t know what the non-abstracted version looks like, you can’t do a comparison.

tjalfi 8 months ago

> I was going to say that maybe the initial lack of generics helped keep compile times low for go, but OCaml manages to have good compile times and generics, so maybe that depends on the implementation of generics (would love to hear from someone with a better understanding of this).

OCaml types are complex enough that monomorphization like Rust or C++ is impossible, so everything is boxed.

guappa 8 months ago

> a big standard library

Not really, no. It doesn't even contain wrappers for the most common system calls.

devmor 8 months ago

golang feels like someone wanted to write a "web focused" version of C, but decided to ignore every issue and complaint about C raised in the past 25 years

It's a very simple and straightforward language, which I think is why people like it, but it's just a pain to use. It feels like it fights any attempt at using it to do things optimally or quickly.

[-]

randomdata 8 months ago

> which I think is why people like it

Do people actually care that much about languages? I mean, we're here writing English, which is a complete dumpster fire. Go is undeniable perfection compared to the horror that is English. Clearly you and I don't care that much about languages.

I expect people like Go because of its tooling (what also saves English), which was a million miles ahead of the pack when it first came out. Granted, everyone else took notice, so the gap has started to narrow.

> It feels like it fights any attempt at using it to do things optimally or quickly.

Serious question: Is that because you are trying to write code in another language with Go syntax? Go unquestionably requires a unique mental model that doesn't transfer from other languages; even those that appear similar on the surface. Because of that, I posit that it is a really hard language to learn. It is easy to get something working, but I mean truly learn it.

While every programming language requires its own mental model, Go seems to take it to another level (before reaching a completely different paradigm). I expect that is because its lack of features prevents you from papering over "misuse" like is possible in other, more featureful languages, so you feel it right away instead of gradually being able build the right mental model.

[-]

devmor 8 months ago

The purpose of human language is communication of concepts between two individual thinking people. The purpose of a programming language is literal instruction for a machine.

These are not directly comparable just because we use the same term to describe them.

To your second point, I wholeheartedly disagree. Go is not a difficult language to learn, nor is it particularly unique compared to the type of language it attempts to emulate. In fact I think it’s one of the easiest languages to learn if you already have experience in C-likes because of how obvious it is what it’s trying to do.

I think it is just a bad language. It’s simple to figure out how you need to use it, but it is obnoxious and tedious to do it in that way.

[-]

randomdata 8 months ago

> The purpose of a programming language is literal instruction for a machine.

Toggle switches are for giving instructions to a machine. Programming languages are a higher level abstraction over the toggle switches so that the intent of the toggling can be communicated with other people. You don't just write code, run it through the machine, and then throw it away. Other people, and probably even yourself, will read what was written again and again and again. The language is very much for people first and foremost, with the side effect of also being understandable by machine.

> I think it is just a bad language.

It is – nobody is suggesting otherwise – but you didn't answer the question. Are you writing Go with Go syntax, or another language with Go syntax? Perhaps the best way to answer, if it is that you just didn't know how, is to post some sample code that you find to be obnoxious and tedious and we can see if it is that way because of Go, or if it is because you are trying to use patterns from other languages that don't fit the language.

[-]

mikeschinkel 8 months ago

In all your comments you seem to be hitting the nail on the head.

Someone can say the German language is a bad language. But it is not the language that is bad, it is the person's perception.

When they try to evaluate German while thinking in English it is no surprise they consider it sub-standard. Germans, OTOH, are much better equipped to evaluate the German language than those who only know how to think in English.

(Full disclosure; my grandfather was German but I only know how to think in English.)

devmor 8 months ago

Writing questions and suggesting someone else’s answer for them from an imagined situation is not useful or pleasant discussion. Please refrain from doing that.

[-]

randomdata 8 months ago

You are welcome to answer it any way you see fit, but relaying a real world experience in code where you found those aforementioned attributes to be pressing would be a concise way to answer it. I suggested that to help, as you were clearly struggling to broach the subject previously. Certainly don't hold back if you can do better.

There would be no logical reason to make up some imagined situation. I am not understanding your reason for mentioning it. What is the thought process behind your thinking there to help me better understand your intent?

Or, if this is just your subtle way of saying that you've never actually written any Go code in your life and are just making this up because you read someone like it elsewhere, then fine. So be it. But that would well and truly not be useful. Given that you are seemingly here in good faith we can be sure that is not what is going on here and look forward to your reasoned response.

[-]

devmor 8 months ago

I asked you not to write questions with suggested answers based on your imagination and your response was to just do that again.

I don’t know what your goal is here, but it’s clearly not anything that involves a conversation.

LinXitoW 8 months ago

Thank god I'm not the only one. I can still remember when the Go zealots were everywhere (it's cooled down now). Every feature Go didn't have was "too complicated and useless", while the few features it did have were "essential and perfect".

I've really tried giving Go a go, but it's truly the only language I find physically revolting. For a language that's supposed to be easy to learn, it made sooooo many weird decisions, seemingly just to be quirky. Every single other C-ish language declares types either as "String thing" or "thing: String". For no sane reason at all, Go went with "thing String". etc. etc.

I GENUINELY believe that 80% of Gos success has nothing to do with the language itself, and everything to do with it being bankrolled and used by a huge company like Google.

[-]

skocznymroczny 8 months ago

I believe most successes of languages happen because of corporate backing and tooling/library ecosystem rather than language. It's not like most people are using Java because they are in love with the language features.

Personally I also think that if you removed memory safety overnight from Rust, people will still use it. Rust is appealing not because it's memory safe. For some uses it is, but most people flock to Rust because it offers an alternative to C++ without fifty years of accumulated cruft. Rust is a modern language, with a well working package manager/build tool and a wide ecosystem of libraries for every usecase. Memory safety and other features are just a cherry on the top. If Rust used garbage collection I am sure it would also be very popular just because of those other things.

Other languages like D or Nim tried to fit into that space also, but they don't have the budget to really make it. Most of work on those languages is done by unpaid volunteers, so there's little direction and there's a lot of one man projects.

atombender 8 months ago

I recommend giving it a second chance. You will at least realize that the "thing string" problem isn't a problem, it's just something you find aesthetically displeasing.

One thing I've learned over the years is that if you go with the grain — not against it — of a language (or any system, really), the design tends to become apparent quicker. "When in Rome," and so forth. Cultural displeasure tends to disappear if you give the native way an earnest chance rather than resisting it. For example, in the beginning, marking identifiers as public by giving them a capital letter struck me as the ugliest thing ever. I don't mind it now. It's never going to be something I love looking at, but it does have the benefit of making declarations' visibility extremely obvious.

I don't think Go's popularity is due to Google at all. Google the company has never really promoted Go (unlike Microsoft with C# and Sun with Java, for example). Go is still treated as a bastard stepchild in many Google projects such as Protobuf/gRPC, Beam, and Google Cloud. The Go team has never seemed very enthusiastic about PR, either. There was that one big redesign of the Go site, but relatively little after that.

I think Go grew by word of mouth more than anything. Projects like Kubernetes, Prometheus, Traefik, etc. helped a lot. Don't forget that it took years for Go to become popular. It wasn't taken very seriously by many in the beginning. Go was not popular within Google until relatively recently. For many years the only serious thing written in Go internally at Google, as I understand it, was the dl.google.com backend.

[-]

guappa 8 months ago

You're the zealot he was complaining about.

Mawr 8 months ago

It's funny the single example you provide to back up your point is plainly wrong. The syntax Go went with is obviously better and every modern language uses it too, Rust, Zig, you name it.

[-]

umanwizard 8 months ago

Rust at least doesn’t.

  let x: usize = 42;

[-]

Mawr 8 months ago

I now realize I had misread and the parent poster did explicitly say that "thing: String" is fine but "thing String" is not. An odd hill to die on.

mikeschinkel 8 months ago

As a counterpoint, Go is currently the only language I do not find revolting.

And complaining about "thing string" vs. "string thing" seems high on pedantry.

Yes, there are aspects of Go I really dislike, but I find fewer things to dislike in Go compared to things I dislike in all the other languages I have programmed in.

marcosdumay 8 months ago

Whatever reasons are there for using or not the language, tokenising and parsing are absolutely not a problem you want to solve with it.

dlisboa 8 months ago

> It's a language invented in 2009 by some C-people who are apparently oblivious to everything that excited PL design folks for the last 20 years (as of 2009).

Is there a term equivalent to "armchair quarterback" in programming? Most programmers are already in armchairs.

It's the equivalent of yelling at the TV that the ultra-successful mega-athlete sucks. I can't imagine the thought process that goes into thinking Ken Thompson, Rob Pike and Robert Griesemers are complete idiots that have no clue of what they were doing.

[-]

biorach 8 months ago

No one said they are complete idiots.

They made a deliberate decision to design a language that did not take many developments in PL design since the 70's into account.

They had their reasons, which make sense in the context of their employer and their backgrounds.

Many people, myself included, prefer to program with languages that do not focus so much on simplicity

[-]

billmcneale 8 months ago

It was not deliberate, it was ignorance. Time and again, the Go team made comments in various forums for years showing they really knew nothing about programming language development past 2000.

All they knew was C and that they wanted to create a language that compiles faster than C++. That's all.

[-]

biorach 8 months ago

You're talking about Ken Thompson, Rob Pike and Robert Griesemers, among others.

You're not doing yourself any favors.

[-]

billmcneale 8 months ago

They're great examples that just because you're good in one field doesn't mean you're good in other fields. Or even open minded.

8 months ago

[deleted]

randomdata 8 months ago

> But I just almost irrationally hate the language itself.

That's the point. It's a rejection of the keyboard jockeys who become more concerned with the code itself than the problem being solved.

[-]

wyager 8 months ago

Golang was created specifically so that Google could mitigate the downsides from lower their hiring standards. It doesn't have any higher design aspirations.

"The key point here is our programmers are Googlers, they’re not researchers. They’re typically, fairly young, fresh out of school, probably learned Java, maybe learned C or C++, probably learned Python. They’re not capable of understanding a brilliant language but we want to use them to build good software. So, the language that we give them has to be easy for them to understand and easy to adopt." - Rob Pike

I suppose in a sense this is rejecting the "keyboard jockeys", but probably not in the way you mean.

You cannot separate the tool used to solve a problem from the problem itself. The choice of tool is a critical and central consideration.

[-]

grey-area 8 months ago

I think you're giving far too much weight to that off the cuff quote from one of the creators of Go.

Really I think it's more useful to view it as a better C in the less is more tradition, compared to say C++ and Java, which at the time were pretty horrible. That's my understanding of its origin. It makes sense in that context; it doesn't have pretensions to be a super advanced state of the art language, but more of a workhorse, which as Pike pointed out here could be useful for onboarding junior programmers.

Certain things about it I think have proven really quite useful and I wish other languages would adopt them:

* It's easy to read precisely because the base language is so boring * Programs almost never break on upgrade - this is wonderful * Fewer dependencies, not more * Formatters for code

Lots of little things (struct tags for example) I'm not so keen on but I think it's pretty successful in meeting its rather modest goals.

[-]

wyager 8 months ago

> Really I think it's more useful to view it as a better C

But Go is nothing at all like C, and it's completely unsuitable for most of the situations where C is used. I'm having trouble even imagining what you're getting at with this comparison. The largest areas of overlap I can think of are "vaguely similar syntax style" and "equally bad and outdated type system". Pretty much everything else of substance is different. Go is GC'd, Go has a runtime, etc.

[-]

grey-area 8 months ago

Clearly you see it differently from people who use go, have a nice day.

[-]

umanwizard 8 months ago

No, the vast majority of people who use go are using it in situations where Java or Python would have been appropriate, not C.

[-]

grey-area 8 months ago

To say a better C means it is clearly influenced by C, not that it was intended to replace C in all contexts, but that it's a compiled language in the C tradition which extends C idioms/syntax, adds a good standard library, better strings etc without losing the simplicity which made C attractive.

Just as C++ intended to improve C but went in a very different direction and added a lot more. Crucially, it doesn't add inheritance as C++/Java did, which I think is an interesting choice which I find quite pleasing and avoids a lot of horrible architectural decisions and vast inheritance trees.

There are certainly lots of bits I would change having used it a while, but I find it quite useful to work in and far more like C than say Java, Ruby or Python which it has supplanted more than usage of actual C. Not sure one of the goals was to supplant C usage and that was not at all what I meant to imply.

[-]

randomdata 8 months ago

> To say a better C means it is clearly influenced by C

Influenced by the same person who influenced C, at least, but it is nearly a straight up clone of Newsqueak, with a dash of Oberon, combined with a will to be more Zen than Python.

If you follow the bouncing ball of influence through Newsqueak, Limbo, et al. I'm sure you make it to C. But then why stop there? What about B, BCPL, etc.?

randomdata 8 months ago

> people who use go are using it in situations where [...] Python would have been appropriate

I'm not so sure about that. Python is a DSL for connecting C functions together. Whereas the biggest criticism of Go (gc, at least, but fair to say it has become synonymous with Go) is around its poor C-interop.

In fact, because of that limitation, Go has developed a bit of a "rewrite those C functions in Go" attitude. While that doesn't indicate that Go is a better C (that's subjective anyway), it does indicate that they are found on the same playground. You're probably not going to rewrite Linux in Go, which might be what you are grasping at, but that's not where C ends. Not even close.

You may have a point about Java. Griesemer was a founding member, after all.

wyager 8 months ago

I mean, if you have a coherent explanation for why you made that comparison, it should be pretty easy to communicate...

Your response as it stands is not doing much to fight the "Go enthusiasts are Blub Paradox victims" perception

randomdata 8 months ago

That's saying the same thing. If you give someone the ability to understand a brilliant language, they will turn their attention to the language and away from the problem. That's just human nature. Shiny attracts us. Let's be honest, we all have way more fun diving deep into crafting the perfect type in Haskell. But Pike indicates that if you constrain developers to bounds that they already know, where they can't turn their attention to learning about what is brilliant, then they won't be compelled away from what actually matters – the engineering.

[-]

instig007 8 months ago

> then they won't be compelled away from what actually matters – the engineering.

Maybe it is, to the extend that playing with lego blocks authorised by the lego corporation is considered engineering. That, of course, is a silly line to draw for defining the discipline.

> we all have way more fun diving deep into crafting the perfect type in Haskell

What Haskell allows you to do in practice is defining required and missing control flows (for solving your problems) as types[0], and constraining those types with rules that define your business domains the control flows must operate in. That is the actual engineering, as opposed to joining plastic bricks.

[0] https://hackage.haskell.org/package/foldl-1.4.17/docs/Contro...

[-]

randomdata 8 months ago

> What Haskell allows you to do in practice is defining required and missing control flows

It allows that in theory, at least, certainly. That's the whole reason for its existence. But in practice, developers become enamoured with the language and start to forget about the problem. If that weren't a problem we'd all be using Haskell, but back in the real world.

[-]

umanwizard 8 months ago

The reason we’re not all using Haskell is because like any language it has pros and cons and isn’t the best choice for every niche. Not because “developers become enamored with the language and start to forget about the problem”.

Anyway, why do Go fans always reach for the Haskell strawman in discussions like this? Most mainstream languages are not nearly as exotic as Haskell, while also not being intentionally crippled like Go. But for some reason Go fans always want to compare it to Haskell.

Even JavaScript, Python and Java are not allergic to adding modern features like iterator map/filter/etc., do you think those are esoteric ivory tower languages too?

[-]

randomdata 8 months ago

> The reason we’re not all using Haskell is because like any language it has pros and cons and isn’t the best choice for every niche.

Exactly. Thanks for reiterating.

> Anyway, why do Go fans always reach for the Haskell strawman in discussions like this?

What's a Go fan? Someone who thinks that Go blows? That is as bizarre as becoming enamoured by a language. What leads one to have feelings about a language anyway? It is an impossible to understand concept for me.

> Even JavaScript, Python and Java are not allergic to adding modern features like iterator map/filter/etc.

In what world are patterns from the 1960s "modern"? Do you consider selt belts in cars to also be a modern feature?

[-]

wyager 8 months ago

> What leads one to have feelings about a language anyway?

Not a single person reading this thread believes that you are a coolly detached rational observer

Responding to an argument by not only pretending not to have an opinion, but pretending not to understand the concept of having an opinion, is a very interesting rhetorical strategy. Not sure it works

[-]

randomdata 8 months ago

> Not a single person reading this thread believes that you are a coolly detached rational observer

All one of them?

> Responding to an argument by not only pretending not to have an opinion

The "argument" has only ever been about sharing of that which we understand as fact. One can present a case for why an apparent fact is not factual – mistakes and misinformation can, indeed, slip in – but what is fact would not rest on one's feelings towards it. The story of Go is the same whether you love it, hate it, ascribe no emotion towards it, or even if you have never heard of it before.

> but pretending not to understand the concept of having an opinion

Feeling and opinion traditionally do not imply the exact same thing – they are different words for a reason – so it is not clear if you misspoke, don't recognize a difference, or if you are trying to change the subject, but I will, for the sake of quality, assume the former. While I can understand feelings in some contexts, like feelings towards people, I have no idea why an inanimate programming language would conjure feelings? That is like developing feelings towards a grain of sand you found on the beach, which I don't understand either.

I am not about to claim nobody develops feelings for inanimate programming languages. Humans are varieitied and can do all sorts of weird and wonderful things, but that does not imply I understand it. Feel free to explain it, though. That's the beauty of not understanding something: You get to learn!

I mean, that's why I also asked last time. Surely you're not one of those anti-education types?

> Not sure it works

Okay, cool. Is this some kind of problem, or why are you mentioning it?

alpaca128 8 months ago

Is there any evidence that the Go style of constraints increases productivity or code quality or other metrics compared to "shiny" languages? I've heard that point repeated many times, but people have done a decent amount of engineering in many other languages too, without the need to be limited like that.

[-]

randomdata 8 months ago

I expect nobody outside of Google has ever truly taken the time to study it. When was the last time you saw a programmer actually research the effectiveness of their tools and not just land on "I like this. It feels right." I never have!

But I'm not sure it matters. Go was created to test the theory, not because a theory was proven. It didn't have to be successful. It may be that the studies didn't happen even within Google, although that is our greatest chance. We do know Google actually cares about data, unlike programmers.

That said, since Go was released it seems every other language has tried to copy it with their own twist, so while that may not come from a place of evidence, it would appear that the feeling of increased productivity[1] was felt.

[1] Or something adjacent. Focusing on engineering isn't necessarily about productivity. You can't discount productivity, but it is not the top engineering concern, especially in a place like Google.

[-]

wyager 8 months ago

> since Go was released it seems every other language has tried to copy it

What are you referring to here? I would consider myself quite well-apprised of recent developments in PL theory (and practice) and I am struggling to come up with examples matching this description.

Go's main selling point, at least as of 10 years ago, was its green threading system, and even at the time it was substantially inferior to the green threading systems available in BEAM (Erlang, Elixir) or GHC Haskell

[-]

Mawr 8 months ago

- Java's latest addition of green threads is 100% lifted from Go.

- Rust and Zig's built-in language tooling, including dependency management and an opinionated autoformatter.

- Java's ZGC was an obvious response to Go's GC. There's also the realization that the fewer GC knobs there are to tune the better.

- Possibly Java's quest to add value types as well, not 100% sure on the timeline.

- Broad industry trend towards providing easy cross compilation into static binaries.

[-]

wyager 8 months ago

> Java's latest addition of green threads is 100% lifted from Go.

Why do you say it was lifted from Go and not from one of the places Go lifted it from, like the two I mentioned?

> Rust and Zig's built-in language tooling

Maybe my memory is failing here, but I don't recall rust ecosystem tooling ever lagging behind Go's. The opposite, if anything.

> Java's ZGC/value types

Can't comment on this one, could be right, although are any of these things clearly sourced from Go? Not disagreeing, I just don't know the history here.

I'm willing to believe that Java is copying from Go, because it's one of the most lumbering languages in popular use today.

> Broad industry trend towards providing easy cross compilation into static binaries.

I'm pretty sure Go does't get credit for this one either. This has been happening independently across many ecosystems as a natural response to increasing containerization and falling storage costs.

instig007 8 months ago

> Java's latest addition of green threads is 100% lifted from Go.

or gevent, or many other greenlet implementations across languages, 100%

mikeschinkel 8 months ago

Deno 2.0 explicitly copied Go in many features, especially with dependencies being referenced via URL, and many of the CLI's commands are inspired by Go such as install, fmt, and test.

umanwizard 8 months ago

> If you give someone the ability to understand a brilliant language, they will turn their attention to the language and away from the problem. That's just human nature.

Not sure which languages you define as brilliant but I’ve never seen this pattern anywhere in my career, in any language. Including Rust, which is often offered by Go fans as an example of an esoteric exotic language.

In everywhere I’ve seen, people are focused on solving some engineering problem, not on showing off their brilliance by solving language puzzles.

This defense of Go is REALLY common but as far as I can tell is purely based on myth.

[-]

randomdata 8 months ago

> I’ve never seen this pattern anywhere in my career

Me neither, but I've never actually tired to look. In honesty, have you? It is not exactly to the faint of heart to study. I suspect only Google is willing to go to the necessary depts. But, if you know of something otherwise, I'm sure we're all interested in the details.

> This defense of Go is REALLY common but as far as I can tell is purely based on myth.

1. Defence? Are you in some kind of fight...? That doesn't make any sense.

2. Myth? Are you confusing the hypothesis on which Go was built with results of experimentation? It very well may be that the hypothesis didn't stand up to experimentation, but what does that have to with our discussion? Furthermore, if you really want to change the subject to talk about that, why have you failed to tell us anything about the details of experimentation?

mikeschinkel 8 months ago

I recognized exactly that way back in 1995. I used to teach Clipper in the late 80s and early 90s and you could do so much with it. It's competitor was FoxPro, which was far less capable.

However, in hindsight after I quit using it I realized that Clipper developers often focused on perfecting code — myself included — whereas FoxPro developers just got shit done for the client/end-user. #fwiw

Mawr 8 months ago

Things are what they are, not what someone (yes, even their creator) says they are.

Unless you take a good look at the language itself, you will remain ignorant.

rtpg 8 months ago

I think one core thing that you have to do with ASTs in Rust is to absolutely not store strings of the like in your AST. Instead you want to use something like static string libraries (so you get cheap clones and interning) and for things like positions in text you want to use just indices. Absolutely avoid storing references if you can avoid it!

The more your stuff is held in things that are cheap/free to clone, the less you have to fight the borrow checker… since you can get away with clones!

And for actual interpretation there’s these libraries that can help a lot for memory management with arenas and the like. It’s super specialized stuff but helps to give you perf + usability. Projects like Ruffle use this heavily and it’s nice when you figure out the patterns

Having said that OCaml and Haskell are both languages that will do all of this “for free” with their built in reference counting and GC… I just like the idea of going very fast in Rust

radicalbyte 8 months ago

I've been writing a lot of Golang in the last year and I wouldn't use it for writing a parser. It's just a modernised C, the model it provides is very simple (coming from C# the simplicity actually made it harder to learn!) and is very well suited to small, focused applications where a low conceptual load are beneficial and the trade off of verbosity are acceptable.

F# or even the latest version of C# are what I would recommend. Yes Microsoft are involved but if you're going to live in a world where you won't touch anything created by evil corporations then you're going to have a hard time. Java, Golang, Python, TypeScript/Javascript and Swift all suffer from this. That leaves you with very little choice.

I'd be interested in hearing your thoughts over OCaml after a year or so of using it. The Haskell-likes are very interesting but Haskell itself has a poor learning curve / benefit ratio for me (Rust is similar there actually; I mastered C# and made heavy use of the type system but that involved going very very deep into some holes and I don't have the time to do that with Rust).

[-]

galangalalgol 8 months ago

F# and Ocaml are still functionally identical to the point many programs would compile in either right? F# Ocaml and Rust seem a lot more similar to me than any of them are to haskell, or go for that matter. I like Haskell, but my brain hasn't made thinking that way native yet.

MrMcCall 8 months ago

Python suffers by having been created by an evil corporation?

Have I missed something GvR or his team did?

guappa 8 months ago

I wouldn't use it to write an hello world :D

ThePhysicist 8 months ago

Lots of folks use Golang on the client side, even on mobile (for which Go has really great support with go-mobile). Of course it adds around 10-20 MB to your binary and memory footprint but in todays world that's almost nothing. I think Tailscale e.g. uses Golang as a cross-platform Wireguard layer in their mobile and desktop apps, seems to work really well. You wouldn't build a native UI using Golang of course but for low-level stuff it's fantastic. Tinygo even allows you to write Golang for microcontrollers or the web via Webassembly, lots of things aren't supported there but a large part of the standard library is.

[-]

ducktapeofficer 8 months ago

Saying a the language adds 10-20 mb and go on to say it's almost nothing is avoiding the issue raised. The footprint always matters and we should use the right tool for the right job.

[-]

packetlost 8 months ago

It's not ignoring it, it's saying that 20Mb of data isn't really a lot these days which is objectively true for most contexts.

[-]

guappa 8 months ago

It's not just data. It's data that needs to be memory mapped and faulted into RAM.

Go programs do have kinda slower startup times compared to regular dynamically linked C/C++ programs.

8 months ago

[deleted]

ralegh 8 months ago

I wouldn't call Go a 'server side' language. The Go compiler is written in Go, for example! Cross compilation and (relatively) small binaries make it super easy for distribution. Syntax sugar is a fair point though, it doesn't lend itself to functional-y pattern matching.

[-]

eminent101 8 months ago

> The Go compiler is written in Go, for example!

Do you know how they avoid the GC in the Go implementation of the Go compiler? If I understand correctly they need to implement the Go garbage collector in their Go implementation of the Go compiler. But Go already has a garbage collector. So how do they avoid invoking Go's garbage collector so that they can implement the garbage collector of the Go language they are implementing?

Not sure if I'm making sense but I'd like to know more about this from those who understand this more than I do.

[-]

enugu 8 months ago

We can think of the Compiler as a function from a string to a string - high level (HLC), to low level code(LLC). LLC can include the garbage collection code(if it is run as a standalone executable instead of garbage collection being done by a separated runtime).

The compiler executable itself is running in a compilation process P which uses memory and has its own garbage collection. (The compiler executable was itself generated by a compilation, using a compiler written in Go itself(self-hosting) or initially, in another language).

But the compilation process P is unrelated to the process Q in which the generated code, LLC, will run when first executed. The OS which runs LLC doesn't even know about the compiler - LLC is just another binary file. The garbage collection in P doesn't affect garbage collection in Q.

Indeed, it should be easy for the compiler to generate an assembly program which constantly keeps allocating more memory until the system runs out, while compiling say a loop which allocates a struct within a loop running a billion times. Unless, of course, you explicitly also generate a garbage collector as part of the low level code.

Your question does become very interesting in the realm of security, there is a famous paper called "Trusting Trust" where a compiled compiler can still have backdoors even if the compiled code is trustworthy and the compiler code is trustworthy but the code which compiled the compiler had backdoors.

miningape 8 months ago

Remember that a compiler generates an executable file (can almost be thought of as an ASM transpiler), this file must contain everything the language needs to operate (oversimplification) so that includes the runtime as well as the compiled instructions from the user's code. This is compared to an interpreter which doesn't require you to pack all the implementation details into a binary, so instead you can use the host language's runtime.

All this to say: the output of a compiler is by necessity not tied to the language the compiler is written in, instead it is tied to the machine the executable should run on. A compiler "merely" translates instructions from a high level language to a machine executable one. So stuff like a GC must be coded, compiled and then "injected" into the binary so the user's code can interact with it. In an interpreted language this isn't necessary, since the host language is already running and contains these tools which would otherwise have to be injected into the binary.

James_K 8 months ago

They just use the implementation from the last version of the compiler, which you can follow back in a long chain to the first implementation. As for the implementation of the garbage collector, it probably just doesn't allocate anything. The basics of a garbage collector are a function "alloc" and another one "collect". The function to allocate memory usually looks something like this:

  char heap[100000000];
  int heap_end;
  void *alloc(int n_bytes) {
    void *out = &heap[heap_end]
    heap_end += n_bytes;
    return out;
  }

As you can see, it doesn't need to allocate any memory to do this.

[-]

umanwizard 8 months ago

> They just use the implementation from the last version of the compiler

The garbage collector isn’t part of the compiler, it’s part of the runtime. It’s worth being clear about this distinction because I think it’s the root of the OP’s confusion.

[-]

wrs 8 months ago

Sure, but what’ll really bake OP’s noodle is that Go’s GC is written in Go too. :) [0]

[0] https://go.dev/src/runtime/mgc.go

umanwizard 8 months ago

How does clang, a C++ compiler that is itself written in C++, use <feature from C++> that it is itself implementing?

Why wouldn’t it be able to?

I don’t understand how your question specifically relates to garbage collection, or why the compiler would need to avoid it. The Go compiler is a normal Go program and garbage collection works in it the same way it does in any other Go program.

P-Nuts 8 months ago

I’ve never used Go myself, but according to this https://go.dev/doc/install/source you need a Go compiler to compile Go. However, for the early versions, you needed a C compiler to compile Go.

So at some point, someone wrote enough of a Go GC in C to support enough of Go to compile itself.

afandian 8 months ago

I don't understand the question as it's written.

But the shape of the question feels like you're asking about whether an interpreter (which the compiler is not) uses the GC of the host language?

[-]

ben0x539 8 months ago

I think they're asking how the code in the Go runtime (not the compiler, that being an interesting but also maybe non-obvious distinction!) that implements the garbage collector, a core feature of the language, avoids needing the garbage collector to already exist to be able to run, being written in the language that it's a core feature of. I suspect the answer is just something like "by very carefully not using language features that might tempt the compiler to emit something that requires an allocation". I think it's a fair question as it's not really obvious that that's possible--do you just avoid calling make() and new() and forming pointers to local variables that might escape? Do you need to run on a magical goroutine that won't try to grow its stack with gc-allocated segments? Can you still use slices (probably yes, just not append() or the literal syntax), closures (probably only trivial ones without local captures?), maps (probably no)...?

I think the relevant code is https://github.com/golang/go/blob/master/src/runtime/mgc.go and adjacent files. I see some annotations like //go:systemstack, //go:nosplit, //go:nowritebarrier that are probably relevant but I wouldn't know if there's any other specific requirements for that code.

[-]

dboreham 8 months ago

Why would the runtime not be allowed to use GC? I can understand the question being: how do you implement the GC code without using GC?

[-]

ben0x539 8 months ago

Yeah that's the code I mean

the_gipsy 8 months ago

On a high level the question is "how do you bootstrap X, if you need X to bootstrap X?".

[-]

pmontra 8 months ago

This is correct but it's never as hard as it seems.

First, that is a problem only for the very first version of X. Then you use X for version X+1.

Second, building from source usually doesn't mean having to build every single dependency. Some .so or .dll are already in the system. Only when one has to build everything from scratch the first step would have to solve the original X from X problem but I think that even a Gentoo full system build doesn't start with a user setting in bytes in RAM with switches (?), setting the program counter of the CPU and its registers to eventually start the bootstrap process.

dboreham 8 months ago

Definitely not making sense. Other answers appear to assume you don't know what a compiler is, but I'm not so sure. Re-state the question perhaps?

miningape 8 months ago

Well now I've got to go check out the go compiler! That sounds really interesting. I was mainly referring to go having a lot more developed concurrency features, which while they're great I didn't really want to use them for my toy language, it seemed like I was throwing away a lot of what makes golang great just because of the nature of my project.

The rest of the golang ecosystem I found really nice actually, and imo it had a really great set of tools for reading/writing to files - and also I like that everything is apart of the go binary, it certainly is easier than juggling between opam and dune (used for OCaml for example).

[-]

ralegh 8 months ago

That's fair, the concurrency features are very handy though optional of course.

The ecosystem and tooling are great, probably the best I've worked with. But the main reason I reach for Go is that it's got tiny mental overhead. There's a handful of language features so it becomes obvious what to use, so you can focus on the actual goal of the project.

There are some warts of course. Heavy IO code can be riddled with err checks (actually, why I find it a bit awkward for servers). Similarly the stdlib is quite verbose when doing file system manipulation, I may try https://github.com/chigopher/pathlib because Python's pathlib is by far my favourite interface.

materielle 8 months ago

I love Go for writing servers. And in fact, I do it professionally. But I totally agree that for parsers, it’s not the right tool for the job.

First off, the only way to express union types is with runtime reflection. You might as well be coding in Python (but without the convenient syntax sugar).

Second off, “if err != nil” is really terrible in parsers. I’m actually somewhat of a defender of Go’s error handling approach in servers. Sure, it could have used a more convenient syntax. But in servers, I almost never return an error without handling it or adding additional context. The same isn’t true in parser’s though. Almost half of my parser code was error checks that simply wouldn’t exist in other languages.

For Rust, I think the value proposition is if you are also writing a virtual machine or an interpreter, your compiler front end can be written in the same language as your backend. Your other alternatives are C and C++, but then you don’t have sum types. You could write the front end in Ocaml, but then you would have to write the backend and runtime in some other language anyways.

FrustratedMonky 8 months ago

FSharp is OCaml to great extent. So if you don't have the need to stay away from MS/.NET, it is more 'open source' than the rest of MS products. MS did release Fsharp with an Open Source License.

But, it does still run on .NET.

At this point, isn't every major language controlled by one main corporate entity?

Except Python? But Python doesn't have algebraic types, or very complete pattern matching.

[-]

adsharma 8 months ago

I still believe that a variant of python that has algebraic types and pattern matching beats Rust for writing parsers quickly.

My effort has been in adding these features to a front end language that transpiles to an underlying FP language, including but not limited to Rust.

mikeschinkel 8 months ago

PHP is not controlled by one main corporate entity.

{Runs, ducks, hides. :-D}

pimeys 8 months ago

OCaml has been pretty common tool to write parsers for many years. Not a bad choice.

I've written parsers professionally with Rust for two companies now. I have to say the issues you had with the borrow checker are just in the beginning. After working with Rust a bit you realize it works miracles for parsers. Especially if you need to do runtime parsing in a network service serving large traffic. There are some good strategies we've found out to keep the borrow checker happy and at the same time writing the fastest possible code to serve our customers.

I highly recommend taking a look how flat vectors for the AST and using typed vector indices work. E.g. you have vector for types as `Vec<Type>` and fields in types as `Vec<(TypeId, Field)>`. Keep these sorted, so you can implement lookups with a binary search, which works quite well with CPU caches and is definitely faster than a hashmap lookup.

The other cool thing with writing parsers with Rust is how there are great high level libraries for things like lexing:

https://crates.io/crates/logos

The cool thing with Logos is it keeps the source data as a string under the surface, and just refers to a specific locations in it. Now use these tokens as a basis for your AST tree, which is all flat data structures and IDs. Simplify the usage with a type:

    #[Clone, Copy]
    struct Walker<'a, Id> {
        pub id: Id,
        pub ast: &'a Ast,
    }

    impl<'a, Id> Walker<'a, Id> {
        pub fn walk<T>(self, other_id: T) -> Walker<'a, T> {
            Walker { id: other_id, ast: self.ast }
        }
    }

Now you can specialize these with type aliases:

    type TypeWalker<'a> = Walker<'a, TypeId>;

And implement methods:

    impl<'a> TypeWalker<'a> {
        fn as_ref(&self) -> &'a Type {
            &self.ast[self.id]
        }
        
        fn name(&self) -> &'a str {
            &self.as_ref().name
        }
    }

From here you can introduce string interning if needed, it's easy to extend. What I like about this design is how all the IDs and Walkers are Copy, so you can pass them around as you like. There's also no reference counting needed anywhere, so you don't need to play the dance with Arc/Weak.

I understand Rust feels hard especially in the beginning. You need to program more like you write C++, but with Rust you are enforced to play safe. I would say an amazing strategy is to first write a prototype with Ocaml, it's really good for that. Then, if you need to be faster, do a rewrite in Rust.

[-]

miningape 8 months ago

Thanks for your comment, you've given me a lot to chew on and I think I'll need to bookmark this page.

> I've written parsers professionally with Rust for two companies now

If you don't mind me asking, which companies? Or how do you get into this industry within an industry? I'd really love to work on some programming language implementations professionally (although maybe that's just because I've built them non-professionally until now),

> Especially if you need to do runtime parsing in a network service serving large traffic.

I almost expected something like this, it just makes sense with how the language is positioned. I'm not sure if you've been following cloudflare's pingora blogs but I've found them very interesting because of how they are able to really optimise parts of their networking without looking like a fast-inverse-sqrt.

> There's also no reference counting needed anywhere, so you don't need to play the dance with Arc/Weak.

I really like the sound of this, it wasn't necessarily confusing to work with Rc and Weak but more I had to put in a lot of extra thought up front (which is also valuable don't get me wrong).

> I would say an amazing strategy is to first write a prototype with Ocaml, it's really good for that.

Thanks! Maybe then the Rust code I have so far won't be thrown in the bin just yet.

[-]

pimeys 8 months ago

> If you don't mind me asking, which companies? Or how do you get into this industry within an industry? I'd really love to work on some programming language implementations professionally (although maybe that's just because I've built them non-professionally until now),

You do not need to write programming languages to need parsers and lexers. My last company was Prisma (https://prisma.io) where we had our own schema definition language, which needed a parser. The first implementation was nested structures and reference counting, which was very buggy and hard to fix. We rewrote it with the index/walker strategy described in my previous comment and got a significant speed boost and the whole codebase became much more stable.

The company I'm working for now is called Grafbase (https://grafbase.com). We aim to be the fastest GraphQL federation platform, which we are in many cases already due to the same design principles. We need to be able to parse GraphQL schemas, and one of our devs wrote a pretty fast library for that (also uses Logos):

https://crates.io/crates/cynic-parser

And we also need to parse and plan the operation for every request. Here, again, the ID-based model works miracles. It's fast and easy to work with.

> I really like the sound of this, it wasn't necessarily confusing to work with Rc and Weak but more I had to put in a lot of extra thought up front (which is also valuable don't get me wrong).

These are suddenly _very annoying_ to work with. If you come from the `Weak` side to a model, you need to upgrade it first (and unwrap), which makes passing references either hard or impossible depending on what you want to do. It's also not great for CPU caches if your data is too nested. Keep everything flat and sorted. In the beginning it's a bit more work and thinking, but it scales much better when your project grows.

> Thanks! Maybe then the Rust code I have so far won't be thrown in the bin just yet.

You're already on the right path if you're interested in Ocaml. Keep going.

[-]

miningape 8 months ago

I should've expected prisma! It's actually my main "orm" for my TS web projects, so thanks for that! Also grafbase seems interesting, I've had my fair share of issues with federated apollo servers so it'd be interesting to check out.

> If you come from the `Weak` side to a model, you need to upgrade it first (and unwrap), which makes passing references either hard or impossible depending on what you want to do.

You're literally describing my variable environment, eventually I just said fuggit and added a bunch of unsafe code to the core of it just to move past these issues.

packetlost 8 months ago

There's also the phenomenal pest library. It probably wouldn't be as fast, but I've found that usually parsing a performance critical part of a system. If it is, a manually writing the parser is definitely the way to go.

marcosdumay 8 months ago

> Especially if you need to do runtime parsing in a network service serving large traffic

Yeah, that's the focus of it, and the thing you can use Rust well.

All the popular Rust parsing libraries aren't even focused on the use that most people use "parser" to name. They can't support language-parsing at all, but you only discover that after you spent weeks fighting with the type-system to get to the real PL problems.

Rust itself is parsed by a set of specialized libraries that won't generalize to other languages. Everything else is aimed at parsing data structures.

[-]

pimeys 8 months ago

There is also the rust-analyzer which is a separate binary. Should compile with a stable rust compiler. I remember reading it's source together with the zig compiler. Both are quite impressive codebases.

kemaru 8 months ago

You wouldn't be losing FP niceness with Zig, and the pattern matching and enum situation is also similar to Rust. Even better, in a few areas, for example arbitrary-width integers and enum tagging in unions/structs. Writing parsers and low level device drivers is actually quite comfortable in Zig.

almostdeadguy 8 months ago

With some patience and practice, I think reasoning about borrows becomes second nature. And what it buys you with lexing/parsing is the ability to do zero-copy parsing.

balencpp 8 months ago

Did you discover Scala 3 and give it a thought? I think of it as Rust with an _overall_ stronger type-system, but where you don't have to worry about memory management. It has an amazing standard library, particularly around collections. You get access to the amazing JVM ecosystem. And more. Martin Odersky in fact sees Scala's future lying in being a simpler Rust.

Also, regarding F#. It runs on .NET, and indeed, since the ecosystem and community are very small, you need to rely on .NET (basically C#) libraries. But it's really not "tied" to Microsoft and is open source.

packetlost 8 months ago

Go is not a great language to write parsers in IMO, it just doesn't have anything that makes such a task nice. That being said, people seem to really dislike Go here, which is fine, but somewhat surprising. Go is extremely simple. If you take a look at it's creators pedigree, that should make a ton of sense: they want you to make small, focused utilities and systems that use message passing (typically via a network) as the primary means of scaling complexity. What I find odd about this is that it was originally intended as a C++ replacement.

[-]

dinosaurdynasty 8 months ago

Go is simple at the cost of increasing the complexity of stuff written in it.

marcosdumay 8 months ago

Message passing is an horrible means of scaling complexity, unless you are so big and have so many developers working with you that you can't use anything else.

PittleyDunkin 8 months ago

The borrow checker is definitely a pain, but it stops being such a pain once you design your types around ownership and passing around non-owned pointers or references or indexes.

[-]

lsllc 8 months ago

This. I've found the same, being effective in Rust really requires that you change your way of thinking about your data structures (and code layout). Once I realized that, I was no longer fighting the borrow checker and I've been about to build complex code that more or less worked immediately. As I look back on it, I think what a pain it would have been to write and debug in C, although doing it in C would appear to be "easier".

neonsunset 8 months ago

> I considered something like F# but I didn't like that it's tied to microsoft/.NET.

Could you explain your thought process when deciding to not use F# because it runs on top of .NET? (both of which are open-source, and .NET is what makes F# fast and usable in almost every domain)

[-]

balencpp 8 months ago

I am genuinely curious too. .NET is a very mature, very performant runtime, and I think of F#, a beautiful, productive language, running on it a big pro. Perhaps things used to be different about/regarding Microsoft?

[-]

neonsunset 8 months ago

Yeah. I'm having so much fun with F# that I absolutely did not anticipate. Sure, it's something everyone using .NET knows about but I genuinely underestimated it and wish more people gave it a try. Such a good language.

As for the hate - my pet theory is that developers need something like a sacrificial lamb to blame their misfortunes on, and a banner to rally under which often happens to be "against that other group" or "against that competing language", and because .NET is platform that happens to be made by microsoft and is a host for two very powerful multi-paradigm languages causes it to be a point of contention for many. From what I've seen, other languages do not receive so much undeserved hate and here on HN some like Go, Ruby or BEAM family receive copious amount of unjustified praise not rooted in technical merits.

MrMcCall 8 months ago

(OCaml Question Ahead)

I agree on F#. It changed my C && OO perspective in fantastic ways, but I too can't support anything Microsoft anymore.

But, seeing as OCaml was the basis for F#, I have a question, though:

Does OCaml allow the use of specifically sized integer types?

I seem to remember in my various explorations that OCaml just has a kind of "number" type. If I want a floating point variable, I want a specific 32- or 64- or 128-bit version; same with my ints. I did very much like F# having the ability to specify the size and signedness of my int vars.

Thanks in advance, OCaml folks.

[-]

neonsunset 8 months ago

F# is a far better option from a practical standpoint when compared to alternatives. By simple virtue of using .NET and having access to very wide selection of libraries that make it a non-issue when deciding to solve a particular business case. It also has an alternate compiler Fable which can target JS allowing the use of F# in front-end.

Other options have worse support and weaker tooling, and often not even more open development process (e.g. you can see and contribute to ongoing F# work on Github).

This tired opinion ".net bad because microsoft bad" has zero practical relevance to actually using C# itself and even more so F# and it honestly needs to die out because it borders on mental illness. You can hate microsoft products, I do so too, and still judge a particular piece of techology and the people that work on it on their merits.

[-]

MrMcCall 8 months ago

I was asking about whether or not OCaml has the ability to target specific integer size and signedness, as F# does. I would like to construct precise software that targets the specific kinds of ints that compilers have historically facilitated microprocessors to process, but using a functional language instead of C or C++.

The F# folks (including Don Syme) did a fantastic job on the early versions of the language (I used it up to ver 2 or early 3), but I am tired of the corporate engine that funds that ecosystem. I now construct my software within an operating system of a different pedigree. Such considerations are important to me, but thanks for sharing your preference. As for me, I hate nothing or no one, but I am as picky as a poor man can be about whom I choose to rely on for my tools.

As for your opinion on the borderlands of mental illness, I'll contact you at your outlook email address should I seek your opinion about such differently-technical topics. But I was only asking if the OCaml compiler can target specific varieties of ints, as the .NET compiler does.

throw16180339 8 months ago

I'm a novice OCaml user, but AFAIK here's how it works.

int is pointer-sized minus one bit, e.g. 31 on 32-bit, 63 on 64-bit

nativeint is pointer-sized, but boxed.

float is 64-bit and boxed.

There's limited support for different number sizes or specifying signedness.

[-]

MrMcCall 8 months ago

Thanks for your kind reply. In my occasional explorations over the years, that, too, is basically the gist of what I've gathered.

F#, being based upon the .NET compiler and runtime, allows specific targeting of both int size and signedness. I'm surprised that OCaml doesn't also allow that, seeing as its compiler has a long pedigree with respect to both programming contests and real-world hardcore software systems (such as trading desk software). I remember first learning of it around two decades ago as a team used it to win the ICFP contest. I guess that such specificity is just outside of the concerns of the OCaml folks.

wyager 8 months ago

Having written probably several hundred kloc of both Haskell and OCaml, I strongly prefer Haskell. A very simple core concept wrapped in an extremely powerful shell. Haskell is a lot better for parsing tasks because (among other considerations) its more powerful type system can better express constraints on grammars.

omginternets 8 months ago

I had a similar journey of enlightenment that likewise led me to OCaml. Unless you're doing low-level systems programming, OCaml will give you the "if it compiles, it's probably right" vibe with much less awkward stuff to type.

smolder 8 months ago

I think you missed something if you felt the borrow checker made things too hard. You can just copy and move on. Most languages do less efficient things anyway.

[-]

miningape 8 months ago

Oh no, you're right - especially looking at my last few commits this is very much what some parts of the project became. And when I was looking at it I felt like I was throwing away so much of the goodness Rust provides and it really irritated me.

Looking at Primeys' comment he actually gave some really interesting suggestions on how to manage this without needing Rc / weak pointers or copying loads of dynamic memory all over the place. Instead you have a flat structure of copy-able elements, giving you better cache locality and a really easy way to work with them.

noelwelsh 8 months ago

This is, to me, an odd way to approach parsing. I get the impression the author is relatively inexperienced with Rust and the PL ideas it builds on.

A few notes:

* The AST would, I believe, be much simpler defined as an algebraic data types. It's not like the sqlite grammar is going to randomly grow new nodes that requires the extensibility their convoluted encoding requires. The encoding they uses looks like what someone familiar with OO, but not algebraic data types, would come up with.

* "Macros work different in most languages. However they are used for mostly the same reasons: code deduplication and less repetition." That could be said for any abstraction mechanism. E.g. functions. The defining features of macros is they run at compile-time.

* The work on parser combinators would be a good place to start to see how to structure parsing in a clean way.

[-]

huijzer 8 months ago

> I get the impression the author is relatively inexperienced

The author never claimed to be an experienced programmer. The title of the blog is "Why I love ...". Your notes look fair to me, but calling out inexperience is unnecessary IMO. I love it if someone loves programming. I think that's great. Experience will come.

[-]

guappa 8 months ago

If someone didn't study the state of the art of tokenising and parsing and still wants to write about it, it's absolutely ok to call it out as being written by someone who has only a vague idea of what they're talking about.

benji-york 8 months ago

>> I get the impression the author is relatively inexperienced

> calling out inexperience is unnecessary IMO. I love it if someone loves programming. I think that's great.

I'll observe that the commenter did not make the value judgement about inexperience that you appear to think they did.

palata 8 months ago

> calling out inexperience is unnecessary IMO

I don't know the author, so it's useful for me to see in the comments that some people think they are not so experienced.

Doesn't mean I won't respect the author at all, it's great that they write about what they do!

hiddencost 8 months ago

Posts on here sometimes come from the world expert, and sometimes from enthusiastic amateurs.

I wrote a compiler in school many years ago, but besides thinking "this project is only one a world class expert or an enthusiastic amateur would attempt", I wasn't immediately sure which I was dealing with.

robertlagrant 8 months ago

"calling out" is too fuzzy a term to be useful. It covers "mentioning" and "accusing". I wouldn't use it, for that reason.

"unnecessary" is the same. Who defines what's necessary? Is Hacker News necessary?

[-]

kreetx 8 months ago

It's definitely necessary: it provides an answer for those who do have knowledge about parsing, read this and wonder why didn't the author do this other often used practice instead.

kiayokomo 8 months ago

Strongly disagree. There should be a higher standard of articles. This amateur "look what I can do" is just noise. Here's an idea, don't tell the world about what you've done unless it is something new. We don't care and it wastes our time and fills the internet with shit. Not everyone deserves a medal for pooping.

noelwelsh 8 months ago

I can see how my comment could be read in this way. It wasn't my intention; I was just a quick note I dashed off before starting work. Apologies to anyone who thought I was making a value judgement on experience here. I think everyone should be encouraged to share their experiences.

soegaard 8 months ago

> * "Macros work different in most languages. However they are used for mostly the same reasons: code deduplication and less repetition." That could be said for any abstraction mechanism. E.g. functions. The defining features of macros is they run at compile-time.

In the context of the blog post, he wants to generate structure definitions. This is not possible with functions.

ryandv 8 months ago

I don't know. Having written a small parser [0] for Forsyth-Edwards chess notation [1] Haskell takes the cake here in terms of simplicity and legibility; it reads almost as clearly as BNF, and there is very little technical ceremony involved, letting you focus on the actual grammar of whatever it is you are trying to parse.

[0] https://github.com/ryandv/chesskell/blob/master/src/Chess/Fa...

[1] https://en.wikipedia.org/wiki/Forsyth%E2%80%93Edwards_Notati...

[-]

PittleyDunkin 8 months ago

Haskell definitely takes the cake in terms of leveraging parser combinators, but you’re still stuck with Haskell to deal with the result.

[-]

wesselbindt 8 months ago

That's what they call a "win-win".

pjmlp 8 months ago

For some of us, "being stuck with Haskell" isn't a problem.

[-]

anilakar 8 months ago

For the rest, being stuck with real-world problems instead of self-inflicted ones is preferable :-)

[-]

pjmlp 8 months ago

Doesn't seems so, otherwise the computing world wouldn't be so full of NIH syndrome. :)

instig007 8 months ago

re-inventing known language features in inferior languages isn't more real-world, it's a self-inflicting kool-aid thirst ^_^

[-]

HKH2 8 months ago

Has Cabal been fixed yet?

[-]

kreetx 8 months ago

Yes, a long time ago [0]. Depending on your needs, stack might still have advantages as the direct tool used by the developer (as it uses cabal underneath anyway).

[0] https://stackoverflow.com/a/51016806/4126514

instig007 8 months ago

At least since August 2017: https://downloads.haskell.org/~cabal/Cabal-3.0.0.0/doc/users...

You don't need switching to Stack (as other commenters suggest) to have isolated builds and project sandboxes etc. If you want to bootstrap a specific compiler version, a-la nvm/pyenv/opam, use GHCup with Cabal project setup: https://www.haskell.org/ghcup/

ryandv 8 months ago

Yes. Use stack [0].

[0] https://docs.haskellstack.org/en/stable/

PittleyDunkin 8 months ago

I like haskell a lot, but it's not like there's any shortage of reasons why people don't use it. Replicating parser-combinators in other languages is a huge win.

zxexz 8 months ago

As someone who really enjoys Haskell, I used to think like that. But I realized for problems like parsing, it really is just excellent.

instig007 8 months ago

don't make it sound as if it's bad, it's actually superb on all these levels: the typelevel, the SMP runtime, and throughput.

fuzztester 8 months ago

$ echo "Haskell" | sed 's/ke/-ki'

Has-kill

[-]

fuzztester 8 months ago

| sed '/k/sk'

Has-skill

[-]

orf 8 months ago

Write the full transform in Haskell?

[-]

itishappy 8 months ago

    {-# LANGUAGE OverloadedStrings #-}

    import Prelude hiding (putStrLn)
    import Data.Text (Text, replace)
    import Data.Text.IO (putStrLn)

    transform :: Text -> Text
    transform = replace "k" "sk" . replace "ke" "-ki" 

    main :: IO ()
    main = putStrLn $ transform "Haskell"

[-]

sshine 8 months ago

Since the original sed command took "Haskell" as standard input, why not:

  {-# LANGUAGE OverloadedStrings #-}

  import qualified Data.Text as T
  import qualified Data.Text.IO as T

  main :: IO ()
  main = T.interact (T.replace "k" "sk" . T.replace "ke" "-ki")

[-]

itishappy 8 months ago

Didn't think to, but that looks very slick!

fuzztester 8 months ago

Looks good to me, even as a Haskell ignoramus. I had tried it long back, but found it tough at the time.

What is the reason for hiding the putStrLn of Prelude and importing that of Data.Text.IO?

[-]

itishappy 8 months ago

Prelude takes a `String`, Data.Text.IO works with `Text`. Strings are linked lists of of chars, Text is a more traditional data structure. I tend to write small scripts and use Strings more, but Text is much more efficient for big blocks of text (unsurprisingly). The main reason I used it here was because I couldn't find a `replace` function for `String`, lol.

[-]

fuzztester 8 months ago

Thanks.

mkl 8 months ago

  sed: -e expression #1, char 5: unterminated `s' command

It's like this:

  sed 's/find/replace/'

[-]

fuzztester 8 months ago

Whoops, it was a typo. I do know how to use the sed command, at least the basics; see my previous use of it ( https://news.ycombinator.com/item?id=42084984 ). But thanks, good catch.

fuzztester 8 months ago

HN "bros" (in the ugliest sense of the word "bro") showing their sick nature by viciously downvoting a perfectly innocuous comment.

Seems like many of them have nothing better to do, probably because of layoffs and statistics and linear algebra and 'predicting the next token' (hee hee) in the input, based on gigantic corpuses of data, masquerading as "AI", and many of those bros were/are worthless anyway.

nine_k 8 months ago

But this is not unaided Haskell, it's a parser combinator library, isn't it?

Do you see an obvious reason why a similar approach won't work in Rust? E.g. winnow [1] seems to offer declarative enough style, and there are several more parser combinator libraries in Rust.

[1]: https://docs.rs/winnow/latest/winnow/

[-]

codebje 8 months ago

    data Color = Color
        { r :: Word8
        , b :: Word8
        , c :: Word8
        } deriving Show

    hex_primary :: Parser Word8
    hex_primary = toWord8 <$> sat isHexDigit <*> sat isHexDigit
        where toWord8 a b = read ['0', 'x', a, b]

    hex_color :: Parser Color
    hex_color = do
        _ <- char '#'
        Color <$> hex_primary <*> hex_primary <*> hex_primary

Sure, it works in Rust, but it's a pretty far cry from being as simple or legible - there's a lot of extra boilerplate in the Rust.

[-]

jeroenhd 8 months ago

I think it's a stretch to call parser combinator code in Haskell simple or legible. Most Haskell code is simple and legible if you know enough Haskell to read it, but Haskell isn't exactly a simple or legible language.

Haskell demonstrates the use of parser combinators very well, but I'd still use parser combinators in another language. Parser combinators are implemented in plenty of languages, including Rust, and actually doing anything with the parsed output becomes a lot easier once you leave the Haskell domain.

[-]

sshine 8 months ago

> it's a stretch to call parser combinator code in Haskell simple or legible

Parser combinators in Haskell are exactly simple and legible.

When you think of them as an embedded DSL, Haskell is just a well-suited medium because it allows for very clean syntax: Function application is whitespace, and currying and partial application allows for a high degree of composability simply with parentheses and value bindings.

They're simple and legible, if you just want to know what a combinator does, and you don't need to understand how they work underneath. And you could argue that you don't need to to make practical use of them, just like you don't need to understand how LINQ works in C# to use and value them.

The catch comes with the incomprehensible error messages that are a consequence of the DSL being embedded: Once you forget a partial application, or you put a parenthesis the wrong way, you'll not understand why or where it goes wrong. So no, they're not as easy to work with as they are simple and legible.

kreetx 8 months ago

I'd say Haskell is even simpler than Rust: the syntactic sugar of monads/do-notation makes writing parsers easy. The same sugar transfers to most other problem domains.

orf 8 months ago

The nom crate has an RGB parser example: https://docs.rs/nom/latest/nom/#example

It’s slightly longer, but more legible.

[-]

codebje 8 months ago

Yeah, that one appeals to me more than the morrow example. The difference between that and the Haskell, other than any particular choices I made about use of operators instead of more explicit bindings, is really just syntactic decoration around functions and such.

mrkeen 8 months ago

But it doesn't take much to go from 0 to a parser combinator library. I roll my own each year for advent of code. It starts at like 100 lines of code (which practically writes itself - very hard to stray outside of what the types enforce) and I grow it a bit over the month when I find missing niceties.

lynx23 8 months ago

I wouldn't consider FEN a great parsing example, simply because it can be implement in a simple function with a single loop.

Just a few days ago, I wrote a FEN "parser" for an experimental quad-bitboard impelementation. It almost wrote itself.

P.S.: I am the author of chessIO on Hackage

tptacek 8 months ago

So, just to kick this off: I wrote an eBPF disassembler and (half-hearted) emulator in Rust and I also found it a pleasant language to do parsing-type stuff in. But: I think the author cuts against their argument when they manage to necessitate a macro less than 1/6th of the way into their case study. A macro isn't quite code-gen, but it also doesn't quite feel like working idiomatically within the language, either.

Again: not throwing shade. I think this is a place where Rust is genuinely quite strong.

[-]

thesz 8 months ago

How can one define an infinite grammar in Rust?

E.g., a context-free rule S ::= abc|aabbcc|aaabbbccc|... can effectively parse a^Nb^Nc^N which is an example of context-sensitive grammar.

This is a simple example, but something like that can be seen in practice. One example is when language allows definition of operators.

So, how does Rust handle that?

[-]

ryandv 8 months ago

In Haskell I think it's something like:

    {-# LANGUAGE OverloadedStrings #-}
    import Data.Attoparsec.Text
    import qualified Data.Text as T

    type ParseError = String

    csgParse :: T.Text -> Either ParseError Int
    csgParse = eitherResult . parse parser where
    parser = do
        as <- many' $ char 'a'
        let n = length as
        count n $ char 'b'
        count n $ char 'c'
        char '\n'
        return n

    ghci> csgParse "aaabbbccc\n"
    Right 3

[-]

thesz 8 months ago

The question comes from Haskell, yes: https://byorgey.wordpress.com/2012/01/05/parsing-context-sen...

You used monadic parser, monadic parsers are known to be able to parse context-sensitive grammars. But, they hide the fact that they are combiinators, implemented with closures beneath them. For example, that "count n $ char 'b'" can be as complex as parsing a set of statements containing expressions with an operator specified (symbol, fixity, precedence) earlier in code.

In Haskell, it is easy - parameterize your expression grammar with operators, apply them, parse text. This will work even with Applicative parsers, even unextended.

But in Rust? I haven't seen how it can be done.

[-]

ryandv 8 months ago

I tried winnow as suggested elsewhere in this thread, this is what it may look like:

    use winnow::combinator::{ repeat };
    use winnow::token::take_while;
    use winnow::prelude::*;

    pub fn csg_parse(input: &mut &str) -> PResult<usize> {
        let a: &str = take_while(0.., 'a').parse_next(input)?;
        let n: usize = a.len();
        repeat(n, "b").parse_next(input)?;
        repeat(n, "c").parse_next(input)?;
        '\n'.parse_next(input)?;
        return Ok(n);
    }

    #[cfg(test)]
    mod test {
        use super::*;

        #[test]
        fn parses_examples() {
            assert_eq!(Ok(0), csg_parse(&mut "\n"));
            assert_eq!(Ok(3), csg_parse(&mut "aaabbbccc\n"));

            assert!(csg_parse(&mut "abcc\n").is_err());
            assert!(csg_parse(&mut "abbcc\n").is_err());
            assert!(csg_parse(&mut "aabc\n").is_err());
            assert!(csg_parse(&mut "aabbc\n").is_err());
            assert!(csg_parse(&mut "def\n").is_err());
        }
    }

[-]

thesz 8 months ago

So you are directly express context-sensitivity instead of approximating it with infinite context-free grammar.

Okay, good enough.

jeroenhd 8 months ago

Using parser combinator library "nom", this should probably do what you'd want:

    fn parse_abc(input: &str, n: usize) -> IResult<&str, (Vec<char>, Vec<char>, Vec<char>)> {
      let (input, result) = tuple(( many_m_n(n, n, char('a')),
                                  many_m_n(n, n, char('b')),
                                  many_m_n(n, n, char('c'))
                            ))(input)?;
      Ok((input, result)) 
    }

It parses (the beginning of) the input, ensuring `n` repetitions of 'a', 'b', and 'c'. Parse errors are reported through the return type, and the remaining characters are returned for the application to deal with as it sees fit.

https://play.rust-lang.org/?version=stable&mode=debug&editio...

[-]

oguz-ismail 8 months ago

> this should probably do what you'd want

If you have to specify N, no, it doesn't

8 months ago

[deleted]

jamra 8 months ago

Link us your eBPF disassembler if you can. Sounds cool.

[-]

tptacek 8 months ago

It's not. If you wrote one, it'd be more interesting than mine.

gritzko 8 months ago

I have the experience of writing parsers (lexers) in Ragel, using Go, Java C++, and C. I must say, once you have some boilerplate generator in place, raw C is as good as the Rust code the author describes. Maybe even better because simplicity. For example, this is the most of code necessary to have a JSON parser: https://github.com/gritzko/librdx/blob/master/JSON.lex

In fact, that eBNF only produces the lexer. The parser part is not that impressive either, 120 LoC and quite repetitive https://github.com/gritzko/librdx/blob/master/JSON.c

So, I believe, a parser infrastructure evolves till it only needs eBNF to make a parser. That is the saturation point.

[-]

dvdkon 8 months ago

That repetitivness can be seen as a downside, not a virtue. And I feel that Rust's ADTs make working with the resulting syntax tree much easier.

Though I agree that a little code generation and/or macro magic can make C significantly more workable.

djoldman 8 months ago

I love love love ragel.

Won't the code here:

https://github.com/gritzko/librdx/blob/master/JSON.lex

accept "[" as valid json?

   delimiter = OpenObject | CloseObject | OpenArray | CloseArray | Comma | Colon;
   primitive = Number | String | Literal;
   JSON = ws* ( primitive? ( ws* delimiter ws* primitive? )* ) ws*;
   Root = JSON;

(pick zero of everything in JSON except one delimiter...)

I usually begin with the RFCs:

https://datatracker.ietf.org/doc/html/rfc4627#autoid-3

I'm not sure one can implement JSON with ragel... I believe ragel can only handle regular languages and JSON is context free.

[-]

gritzko 8 months ago

That is a lexer, so yes, it accepts almost any sequence of valid tokens. Pure Ragel only parses regular languages, but there are ways.

pornel 8 months ago

C/Ragel bug was the cause of Cloudbleed, and the reason why Cloudflare switched to Rust.

https://en.wikipedia.org/wiki/Cloudbleed

[-]

gritzko 8 months ago

I think I know the author of the bug :) But I use a C dialect that has bounds-checking. Abstractions leak anyway (see Meltdown/Spectre). https://meltdownattack.com/

hu3 8 months ago

Related, I love Rob Pike's talk about lexical Scanning in Go (2011).

Educational and elegant approach.

https://www.youtube.com/watch?v=HxaD_trXwRE

[-]

emmanueloga_ 8 months ago

That talk is great, but I remember some discussion later about Go actually NOT using this technique because of goroutine scheduling overhead and/or inefficient memory allocation patterns? The best discussion I could find is [1].

Another great talk about making efficient lexers and parsers is Andrew Kelley's "Practical Data Oriented Design" [2]. Summary: "it explains various strategies one can use to reduce memory footprint of programs while also making the program cache friendly which increase throughput".

1: https://news.ycombinator.com/item?id=31649617

2: https://www.youtube.com/watch?v=IroPQ150F6c

[-]

chubot 8 months ago

Yeah I actually remember that too, this article mentions it:

Coroutines for Go - https://research.swtch.com/coro

The parallelism provided by the goroutines caused races and eventually led to abandoning the design in favor of the lexer storing state in an object, which was a more faithful simulation of a coroutine. Proper coroutines would have avoided the races and been more efficient than goroutines.

tptacek 8 months ago

I feel like that talk has more to do with expressing concurrency, in problems where concurrency is a natural thing to think about, than it does with lexing.

sshine 8 months ago

One mind-blowing experience for me:

I can take my parser combinator library that I use for high-level compiler parsers, and use that same library in a no-std setting and compile it to a micro-controller, and deploy that as a high-performance protocol parser in an embedded environment. Exact same library! Just with fewer String and more &'static str.

So toying around with compilers translates my skill-set rather well into doing embedded protocol parsers.

brundolf 8 months ago

Something that was hard when I wrote a full AST parser in Rust was representing a hierarchy of concrete AST types, with upcasting and downcasting. I was able to figure out a way, but it required some really weird type shenanigans (eg PhantomData) and some macros. Looks like they had to do crazy macros here too

Curious what the rest of the prior art looks like

[-]

elcritch 8 months ago

Hmmm, yeah Rust’s ADTs and matching syntax would be great. Until you got to the up/down casting. I’m inexperienced enough in Rust to know if there’s good ways to handle it. Dynamic traits maybe?

ainiriand 8 months ago

Sorry to bother you, but would that be open-source by any chance? Is there any public repo available? Thank you.

[-]

norman784 8 months ago

I wrote my fairly share of parsers the last year, and the one I liked a lot is from Salsa examples, you can find it here[0].

[0] https://github.com/salsa-rs/salsa/blob/e4d36daf2dc4a09600975...

brundolf 8 months ago

Yup! You can find it here: https://github.com/brundonsmith/bagel-rs/blob/master/src/mod...

[trying to remind myself how this works because it's been a while]

So it's got macros for defining "union types", which combine a bunch of individual structs into an enum with same-name variants, and implement From and TryFrom to box/unbox the structs in their group's enum

ASTInner is a struct that holds the Any (all possible AST nodes) enum in its `details` field, alongside some other info we want all AST nodes to have

And then AST<TKind> is a struct that holds (1) an RC<ASTInner>, and (2) a PhantomData<TKind>, where TKind is the (hierarchical) type of AST struct that it's known to contain

AST<TKind> can then be:

1. Downcast to a TKind (basically just unboxing it)

2. Upcast to an AST<Any>

3. Recast to a different AST<TKind> (changing the box's PhantomData type but not actually transforming the value). This uses trait implementations (implemented by the macros) to automatically know which parent types it can be "upwardly casted to", and which more-specific types it can try and be casted to

The above three methods also have try_ versions

What this means then is you can write functions against, eg, AST<Expression>. You will have to pass an AST<Expression>, but eg. an AST<BooleanLiteral> can be infallibly recast to an AST<Expression>, but an AST<Any> can only try_recast to AST<Expression> (returning an Option<AST<Expression>>)

Another cool property of this is that there are no dynamic traits, and the only heap pointers are the Rc's between AST nodes (and at the root node). Everything else is enums and concrete structs; the re-casting happens solely with that PhantomType, at the type level, without actually changing any data or even cloning the Rc unless you unbox the details (in downcast())

I worked in this codebase for a while and the dev experience was actually quite nice once I got all this set up. But figuring it out in the first place was a nightmare

I'm wondering now if it would be possible/worthwhile to extract it into a crate

[-]

ainiriand 8 months ago

Thanks for the detailed rundown.

yu3zhou4 8 months ago

Maybe it can work as a quick glimpse into how parser and lexer can work in Rust https://github.com/jmaczan/0x6b73746b

I wrote it long time ago and it’s not fully implemented tho

ketzo 8 months ago

So how do you debug code written with macros like this, or come into it as a new user of the codebase?

I’m imagining seeing the node! macro used, and seeing the macro definition, but still having a tough time knowing exactly what code is produced.

Do I just use the Example and see what type hints I get from it? Can I hover over it in my IDE and see an expanded version? Do I need to reference the compiled code to be sure?

(I do all my work in JS/TS so I don’t touch any macros; just curious about the workflow here!)

[-]

schneems 8 months ago

Run:

    $ cargo expand

And you’ll see the resulting code.

Rust is really several languages, ”vanilla” rust, declarative macros and proc macros. Each have a slightly different capability set and different dialect. You get used to working with each in turn over time.

Also unit tests is generally a good playground area to understand the impacts of modifying a macro.

guitarbill 8 months ago

rust-analyzer, the Rust LSP used in e.g. VSCode, can expand declarative and proc macros recursively.

it isn't too bad, although the fewer proc macros in a code base, the better. declarative macros are slightly easier to grok, but much easier to maintain and test. (i feel the same way about opaque codegen in other languages.)

nhatcher 8 months ago

Well good luck parsing sqlite syntax! I had to write a (fairly small) subset sqlite parser for work a couple of years ago. I really like sqlite, it's always a source of inspiration.

The railroad diagrams are tremendously useful:

https://www.sqlite.org/syntaxdiagrams.html

I don't think the lemon parser generator gets enough credit:

https://sqlite.org/src/doc/trunk/doc/lemon.html

With respect of the choice of the language, any language with Algebraic Data Types would work great. Even Typescript would be great for this.

FWIW I wrote a small introduction to writing parsers by hand in Rust a while ago:

https://www.nhatcher.com/post/a-rustic-invitation-to-parsing...

ForHackernews 8 months ago

I'll throw in a plug for https://pest.rs/ a PEG-based parser-generator library in Rust. Delightful to work with and removes so much of the boilerplate involved in a parser.

[-]

tda 8 months ago

I have been using this tool. The best feature imho is that you can quickly iterate on the grammar in the browser using the online editor in the homepage.

I was struggling though with the lack of strong typing in the returned parse tree, though I think some improvements have beenade there which I did not have a chance to look into yet

[-]

ForHackernews 8 months ago

That feature is on the roadmap for Pest 3: https://github.com/pest-parser/pest/issues/882

djfobbz 8 months ago

Sorry my OCD is kicking in but "Asterisk" is spelled wrong as "Asteriks" in your entire sample code.

WiSaGaN 8 months ago

I think except macros, most of these features are ML family language features as well. Rust stands out because it can implement this in an efficient, zero overhead abstraction way.

kldx 8 months ago

I like MegaParsec in haskell quite expressive, based on my limited experience using nom in Rust

samsartor 8 months ago

Imperative rust is really good for parsing, but you can also get a long way with regexes. Especially if you are just prototyping or doing Advent of Code.

I do still like declarative parsing over imperative, so I wrote https://docs.rs/inpt on top of the regex crate. But Andrew Gallant gets all the credit, the regex crate is overpowered.

sksxihve 8 months ago

I've found that the logos crate is really nice for writing lexers in rust

https://docs.rs/logos/0.14.2/logos/

omani 8 months ago

this is the third day in a row this article is being posted here.

this time it got traction. funny how HN works.

https://news.ycombinator.com/item?id=42055954

https://news.ycombinator.com/item?id=42058920

James_K 8 months ago

Mentioning macros as a reason to love Rust goes against my experience with them.

[-]

tonyhart7 8 months ago

I love using macros, writing it however

jamra 8 months ago

Does anyone have a good EBNF notation for Sqlite? I tried to make a tree-sitter grammar, which produces C code and great Rust bindings for it. But they use some lemon parser. Not sure how to read the grammar from that.

[-]

mingodad 8 months ago

The lemon tool that is used by SQLite can output the grammar as SQL database that you can manipulate. There is https://github.com/ricomariani/CG-SQL-author that goes way beyond and you'll need to create the Rust generation, you can play with it here with a Lua backend https://mingodad.github.io/CG-SQL-Lua-playground/ .

Also I'm collecting several LALR(1) grammars here https://mingodad.github.io/parsertl-playground/playground/ that is an Yacc/Lex compatible online editor/interpreter that can generate EBNF for railroad diagram, SQL, C++ from the grammars, select "SQLite3 parser (partially working)" from "Examples" then click "Parse" to see the parse tree for the content in "Input source".

I also created https://mingodad.github.io/plgh/json2ebnf.html to have a unified view of tree-sitter grammars and https://mingodad.github.io/lua-wasm-playground/ where there is an Lua script to generate an alternative EBNF to write tree-sitter grammars that can later be converted to the standard "grammar.js".

rstuart4133 8 months ago

Not EBNF or anything standard, but possibly readable enough. It is an LR(1) grammar that has tested on all the test cases in Sqlite's test suite at the time:

https://lrparsing.sourceforge.net/doc/examples/lrparsing-sql...

The grammer contains things you won't have seen before, like Prio(). Think of them as macros. It all gets translated to LR(1) productions which you can ask it to print out. LR(1) productions are simpler than EBNF. They look like:

   symbol1 := symbol2 symbol3
   symbol1 := symbol4 symbol3
   symbol3 := token1 symbol2 token2
   ...

Documentation on what the macros do, and how to get it to spit out the LR1(1) productions is here:

https://lrparsing.sourceforge.net/doc/html/

It was used to do a similar task the OP is attempting.

[-]

jamra 8 months ago

This is great. Do you have any pointers to where those tests are? It’s hard to test the grammar without those.

Edit: Never mind. I see it right there under the parser. Thanks!

andrewflnr 8 months ago

It looks pretty much like BNF. Not too far off, anyway. https://sqlite.org/src/doc/trunk/doc/lemon.html#syntax

emmanueloga_ 8 months ago

Perhaps this ANTLR v4 sqlite grammar? [1]

1: https://github.com/antlr/grammars-v4/tree/master/sql/sqlite

[-]

jamra 8 months ago

I actually have some experience porting these antlrs over to tree-sitter. I'll give it a shot.

8 months ago

[deleted]

nicoco 8 months ago

Every rust article: "Look how great this rust feature is and how clean and concise the resulting code is!"

Me: "How can a programming language be so damn complex? Am I just dumb?"

[-]

biorach 8 months ago

There's plenty of complex programming languages out there. Some are worth putting the time into. If you can program well in some other language you can get your head around Rust - give it some time - it's worth it.

pornel 8 months ago

If you don't know the language, and it's different from what you know, it may look incomprehensible to you whether it's complex or not.

For example, the Korean alphabet is pretty simple, and the spelling is simpler than English, but to English speakers it can look intimidating, like hundreds of blocks of unrecognizable squiggles.

The article (over)uses macros, which make the code look more convoluted than it really is. The macro-by-example syntax is conceptually simple, but it has a punctuation-heavy syntax that may look alien if you don't know what you're looking at. BTW, this macro system has been originally designed for JavaScript, which is why even in Rust it looks odd by Rust's standards.

jurschreuder 8 months ago

I cannot agree less, C++ is the best and always will be. You youngsters made up this new dialect that can also compile with the C++ compiler. This is like people putting VS Code in dark mode thinking they're now also working in the Terminal like the Gods of Binary.

[-]

arlort 8 months ago

Rust being a dialect of c++ is certainly a novel take

[-]

tialaramex 8 months ago

I expect they are thinking of the "Safe C++" proposal P3390. This proposes to provide the syntax and other features needed to grant (a subset of the future) C++ the same safety properties as safe Rust via an equivalent mechanism (a borrow checker for C++ and the lifetime annotations to drive it, the destructive move, the nominative typing and so on).

Much as you might anticipate (although perhaps its designer Sean Baxter did not) this was not kindly looked upon by many C++ programmers and members of WG21 (the C++ committee)

The larger thing that "Safe C++" and the reaction to it misses is that Rust's boon is its Culture. The "Safe C++" proposal gives C++ a potential safety technology but does not and cannot gift it the accompanying Safety Culture. Government programmes to demand safety will be most effective - just as with other types of safety - if they deliver an improved culture not just technological change.

[-]

whytevuhuni 8 months ago

That sounds significantly more like C++ trying to be a dialect of Rust, rather than the other way around. I don't think that was the GGP's main gripe.

But more importantly, Safe C++ is just not a thing yet. People seem to discount the herculean effort that was required to properly implement the borrow checker, the thousands of little problems that needed to be solved for it to be sound, not to mention a few really, really hard problems, like variance, lifetimes in higher-kinded trait bounds, generic associated types, and how lifetimes interact with a Hindley-Milner type system in general.

Not trying to discount Safe C++'s efforts of course. I really hope they, too, succeed. I also hope they manage to find a syntax that's less... what it is now.

[-]

tialaramex 8 months ago

I don't think Safe C++ has a Hindley-Milner type system? I think it's just the "Just the machine integers wearing funny hats†" types from C which were passed on to C++

In K&R C this very spartan type system makes some sense, there's no resources, you're on a tiny Unix machine, you'd otherwise be grateful for an assembler. In C++ it does look kinda silly, like an SUV with a lawnmower engine. Or one of those very complicated looking board games which turns out to just be Snakes and Ladders with more steps.

But I don't think Safe C++ fixes that anyhow.

† Technically maybe the C pointer types are not just the integers wearing a funny hat. That's one of many unresolved soundness bugs in the language, hence ISO/IEC DTS 6010 (which will some day become a TR)

[-]

whytevuhuni 8 months ago

No, Safe C++ does not have that type system. I was just trying to emphasize the amount of, let's be honest, downright genius that had to go into that lifetime specification and borrow checker implementation.

For C++, it'll be about cramming lifetimes into diamond-inheritance OOP, which... feels even harder.

Safe C sounds like a much, much more believable project, if such a proposal were to exist.