Adding row polymorphism to Damas-Hindley-Milner

(bernsteinbear.com)

211 points | by surprisetalk 9 months ago ago

58 comments

rtpg 9 months ago

Row polymorphism is excellent. I was intro'd to it in Purescript, but I would like to say that Typescript gives you some things that rhyme with it through its combinations of records at the type level.

Highly recommend people mess around with Purescript, you can feel how much pressure is relieved thanks to the row polymorphism tooling almost instantly. Truly, all we wanted as an industry is an easy way to bundle together various tags into our types, and row polymorphism gets you there.

I think row polymorphism is a fairly straightforward thing compared to dependent types in general, but can let you crush a whole class of errors and pay almost nothing in terms of expression costs.

[-]

enugu 9 months ago

Somtime ago, there was a debate on the ability of a static type system to model an 'open-world situation' where fields are added to records as model changes. (based on a post[1] which responded to a Rich Hickey talk).

The crucial point was that structural typing on which row-polymorphism is based can model such open-world situations.

Also, having such a system can free you from having overly nested types.

It would be great if Purescript or row-polymorphism became more popular.

[1] https://news.ycombinator.com/item?id=22090700

[-]

arianvanp 9 months ago

This is also why I like interfaces in Golang (as a Haskell developer). You simply define a slice of the world you want to see and anything that fits the box can be passed.

It's just unfortunate golang interfaces dont support fields. Only methods. Typescript fares better with its interface type

[-]

diggan 9 months ago

> It's just unfortunate golang interfaces dont support fields. Only methods.

Why is that unfortunate? Usually when defining interfaces you care about the API surface without wanting to care about the internals of what will eventually implement that API. If you suddenly also spec fields with the interface, wouldn't be too easy to couple the internals to the API?

I can't say I've programmed in Go too much, so maybe I'm missing something very obvious.

[-]

n0w 9 months ago

The API surface of some struct is determined by visibility, not by whether a member of the struct is a method or a field.

I can't remember the specifics for why fields cannot be used within a Go interface but I do remember missing it a few times while writing Go code.

[-]

fwip 9 months ago

I think the reasoning is that interfaces are implemented by a dynamic lookup. Part of Go's philosophy is that things that could be expensive (function calls) should be visually distinct from cheap things.

Struct field access is cheap, hopping through a dynamic dispatch table is less cheap.

[-]

rtpg 9 months ago

Took me a second to grok why the field access would require dynamic dispatch and it's because you have to deal with differing layout between structs.

runeks 9 months ago

> Highly recommend people mess around with Purescript, you can feel how much pressure is relieved thanks to the row polymorphism tooling almost instantly.

> I think row polymorphism is a fairly straightforward thing compared to dependent types in general, but can let you crush a whole class of errors [...]

Would you care to provide a few examples? I don't have experience with row polymorphism so I'm genuinely curious.

[-]

tmountain 9 months ago

  greet :: forall r. { name :: String | r } -> String
  greet person = "Hello, " <> person.name <> "!"

  greetWithAge :: forall r. { name :: String, age :: Int | r } -> String
  greetWithAge person = 
    "Hello, " <> person.name <> "! You are " <> show person.age <> " years old."

  main :: Effect Unit
  main = do
    let person = { name: "Alice", age: 30, occupation: "Engineer" }

    -- greet can accept the person record even though it has more fields
    log (greet person)           -- Output: "Hello, Alice!"
  
    -- greetWithAge can also accept the person record
    log (greetWithAge person)

[-]

blue_pants 9 months ago

How does it differ from structural typing in TypeScript though?

[-]

tmountain 9 months ago

Structural typing relies on interface compatibility. Row polymorphism is a type-level feature in PureScript where record types are constructed with an explicit "row" of fields.

In Practice, row polymorphism is more granular, allowing you to explicitly allow certain fields while tracking all other fields via a ("rest") type variable.

Example: PureScript allows you to remove specific fields from a record type. This feature, is called record subtraction, and it allows more flexibility when transforming or narrowing down records.

You can also apply exact field constraints; meaning, you can constrain records to have exactly the fields you specify.

Lastly, PureScript allows you to abstract over rows using higher-kinded types. You can create polymorphic functions that accept any record with a flexible set of fields and can transform or manipulate those fields in various ways. This level of abstraction is not possible in TypeScript.

These are just a few examples. In the most general sense, you can think of row polymorphism as a really robust tool that gives you a ton of flexibility regarding strictness and validation.

[-]

blue_pants 9 months ago

> PureScript allows you to remove specific fields from a record type. This feature, is called record subtraction, and it allows more flexibility when transforming or narrowing down records.

TypeScript does allow you to remove specific fields, if I understand you right [0]:

    function removeField<T, K extends keyof T>(obj: T, field: K): Omit<T, K> {
        const { [field]: _, ...rest } = obj;
        return rest;
    }

    type Person = { name: string; age: number };
    declare const p: Person;
    const result = removeField(p, 'age'); // result is of type: Omit<Person, "age">

> PureScript allows you to abstract over rows using higher-kinded types. You can create polymorphic functions that accept any record with a flexible set of fields and can transform or manipulate those fields in various ways. This level of abstraction is not possible in TypeScript.

Again, if I understand you correctly, then TypeScript is able to do fancy manipulations of arbitrary records [1]:

    type StringToNumber<T> = {
        [K in keyof T]: T[K] extends string ? number : T[K]
    }

    function stringToLength<T extends Record<string, unknown>>(obj: T): StringToNumber<T> {
        const result: Record<string, unknown> = {};
        for (const key in obj) {
            result[key] = typeof obj[key] === 'string' ? obj[key].length : obj[key];
        }
        return result as StringToNumber<T>;
    }

    const data = {
        name: "Alice",
        age: 30,
        city: "New York"
    };

    const lengths = stringToLength(data);

    lengths.name // number
    lengths.age // number
    lengths.city // number

[0] https://www.typescriptlang.org/play/?#code/GYVwdgxgLglg9mABA...

[1] https://www.typescriptlang.org/play/?#code/C4TwDgpgBAysBOBLA...

edit: provided links to TS playground

[-]

jy14898 9 months ago

The tools typescript provides are a little pointless if it allows you to do stuff like this (imo):

const r1: { a: number; b: number } = { a: 10, b: 20 };

const r2: { a: number } = r1;

const r3: { a: number; b: string } = { b: "hello", ...r2 };

console.log(r3.b) // typescript thinks it's a string, but actually it's a number

[-]

blue_pants 9 months ago

Yeah, it's definitely not ideal, but even with its many flaws I prefer TS over plain JS.

The problem in question can be "fixed" like this

    const r1: { a: number; b: number } = { a: 10, b: 20 };

    const r2 = r1 satisfies { a: number };

    const r3: { a: number; b: string } = { b: "hello", ...r2 };

Now, TS would warn us that "'b' is specified more than once, so this usage will be overwritten". And if we remove b property -- "Type 'number' is not assignable to type 'string'"

Another "fix" would be to avoid using spread operator and specify every property manually .

Both of these solutions are far from ideal, I agree.

---

I don't advocate TS in this thread though; I genuinely want to understand what makes row polymorphism different, and after reading several articles and harassing Claude Sonnet about it, I still didn't grasp what row polymorphism allows over what TS has.

[-]

wk_end 9 months ago

As far as I understand it, row polymorphism wouldn’t allow the given example. Or to put it another way, the spread operator is impossible to type soundly in the presence of structural subtyping because the type system doesn’t capture the “openness” of the record’s type, the potential presence of additional fields. Whereas with row polymorphism, to some degree or another, you can.

runeks 9 months ago

Thank you, but I was looking for real world examples solved by row types as OP implied there were plenty of.

tmoertel 9 months ago

Can `greetWithAge` be implemented using `greet`?

rtpg 9 months ago

https://rtpg.co/2016/07/20/supercharged-types.html I wrote this example a long time ago, I think nowadays I could come up with some more interesting examples though.

As to the differences with TS... I think they're playing in similar spaces but the monadic do syntax with Purescript lets you use row polymorphism for effect tracking without having to play weird API tricks. In TS that's going to be more difficult.

(short version: in purescript I could write an API client that tracks its state in the types so that you can make sure you authorize before calling some mechanism. In TS you would need to design that API around that concept and do things like client = client.authorize(). Purescript you could just do "authorize" in a monadic context and have the "context" update accordingly)

tmountain 9 months ago

PureScript is such a cool language. I wish it would get more traction, but it feels like it's competing in the same space as TypeScript, and TypeScript seems to solve the category of problems it addresses well enough for most people.

kccqzy 9 months ago

It's easy enough to add row polymorphism if there is no subtyping involved. It is an entirely different matter otherwise. For example if a function needs to take a record containing x as a field with type int, it should also accept a record with y as a field that it does not use. This is just like in traditional OOP if a function takes a base class pointer, you can pass a subclass class pointer instead, just that these subtyping relationships are not explicitly defined by users but inferred by the record contents. There is width subtyping and there is also depth subtyping. And we also need awareness of contravariance and covariance.

I implemented the above as a toy type checker. I found the above combination of features too complicated and they end up being unintuitive for the user: the type errors are difficult to comprehend when type errors are found. My implementation is here: https://gist.github.com/kccqzy/d761b8adc840333af0303e1b822d7... and I mostly followed the paper but I cannot guarantee there aren't bugs.

[-]

paldepind2 9 months ago

> For example if a function needs to take a record containing x as a field with type int, it should also accept a record with y as a field that it does not use.

Not sure if I misunderstand what you mean, but this does not require subtyping. One of the key distinguishing features of row polymorphism is that exactly this can be achieved without subtyping. The extra unused fields (`y` in your example) are represented as a polymorphic type variable _instead_ of using subtyping. See for instance page 7 in these slides: https://www.cs.cmu.edu/~aldrich/courses/819/slides/rows.pdf

[-]

9 months ago

[deleted]

calebh 9 months ago

Functions that use a field called x but do not use a field called y can use the type {x=int, ... 'a}, right?

The main difficulty I see with row polymorphism is with field shadowing. For example if you have a record with type {a=bool, x=int, c=unit}, then set the x field with type string instead, the new type should be {a=bool, x=string, c=unit}.

I suppose if you only have syntax for creating a record with a literal, but do not have syntax for updating an existing record this is not a problem.

[-]

kccqzy 9 months ago

I don't exactly understand your concern, but yes the type {x=int, ... 'a} is valid in a language with row polymorphism but without subtyping. If you do have subtyping, dealing with rest (or spread) is unnecessary. But if you remove subtyping, the unification algorithm isn't powerful enough on its own for many intuitive use cases. The easiest example is if a function takes a list of records all of which need an x field of type int, then you cannot pass it a list of records where all contain the x field of int but some also contain an irrelevant y field and others contain an irrelevant z field.

[-]

a1369209993 9 months ago

> you cannot pass it a list of records where all contain the x field of int but some also contain an irrelevant y field and others contain an irrelevant z field.

Yes you can - that's just a existential type. I'm not sure what the syntax would be, but it could be somthing like:

  List (exists a : {x=int, ...'a})

(In practice (ie if your language doesn't support existential types) you might need to jump through hoops like:

  List ((forall a : {x=int, ...'a} -> b) -> b)

or whatever the language-appropriate equivalent is, but in that case your list will have been created with the same hoops, so it's a minor annoyance rather than a serious problem.)

hencq 9 months ago

> if a function takes a list of records all of which need an x field of type int, then you cannot pass it a list of records where all contain the x field of int but some also contain an irrelevant y field and others contain an irrelevant z field.

Can you explain that a little more? Intuitively I would imagine that those y and z fields would 'disappear' into the rest part.

[-]

kccqzy 9 months ago

With subtyping, the type checker would understand that the list type is covariant and would accept a record with more irrelevant fields, because that's a valid subtype.

Without subtyping, the rest part needs to be identical for each element of the list. In fact you cannot even express the concept of a list with different rest parts. The key thing to understand is that the rest part never really disappears. The type checker always deduces what the rest part should be in every case. In languages like Haskell you can work around this by using existential quantification but that's a whole different extension to the type system, and one that's certainly not as flexible as full subtyping.

ogogmad 9 months ago

Why have subtyping when you have row polymorphism? It seems that there's enough overlap that you should pick one or the other.

[-]

kccqzy 9 months ago

My whole point is that subtyping or row polymorphism alone is not enough to assign types to many valid programs that people intuitively write. My example about list of records shows you what happens when you only have row polymorphism without subtyping. See my other comment. (Nothing prevents you from designing such a type system that's slightly inconvenient though: it's really about whether users would want this type system. Haskell and OCaml both have enjoyed user acceptance without subtyping.)

If you were to go the other direction and choose only subtyping but not row polymorphism to implement records, then you end up co-opting things like intersection types inappropriately leading to unsoundness.

[-]

naasking 9 months ago

Your list of records doesn't require subtyping, it requires information hiding, eg. existential types. Subtyping is too powerful.

weinzierl 9 months ago

"A record is an unordered collection of name to value mappings"

If there is one thing that is for sure, it is that we have too many names for "collection of name to value mappings".

In my book the term record is not what first comes to mind when thinking of unordered mappings. All of the usages of the word I can think of imply the possibility of access by name while retaining order. Sometimes this allows easy indexed access (database rows used to be called records) sometimes it doesn't (C structs which also used to be called record types).

[-]

gpderetta 9 months ago

C structs are record types, but, while the standard does guarantees an in-memory ordering, in the general case you can't really iterate through members and always have to access by tag, so it might as well be unordered.

Also in C++:

   struct A { int x; int y; } a;
   struct B { int y; int x; } b;
   template<class C> concept has_x_y = requires(C c) {
     { c.x } -> std::convertible_to<int>; 
     { c.y } -> std::convertible_to<int>; 
    };

   int sum(has_x_y auto z) { return z.x + z.y; }
   ...
   sum(a);
   sum(b);

arianvanp 9 months ago

But for an open world model we need the unordered definition.

As two structs that have same names at different places in the struct still need to conform to the generic record type

RandomThoughts3 9 months ago

Yes, it is a very unusual definition of a record. As you rightfully noted, typically, a record is defined as a composite data structure mixing data types (indeed something more or less equivalent to a C struct basically).

Access by name is not even truly necessary and the difference between tuples and records is minimal (you could build something looking exactly like field access on top of tuples with functions and would get as a result something indistinguishable from an actual record).

The existence or not of an order is then totally accessory and it's generally straightforward to build an order on both provided the data types they contain is orderable by ordering the field and then using a lexicographic order.

[-]

fire_lake 9 months ago

Yes in Haskell the names are just sugar for index numbers in a tuple (effectively)

yazaddaruvala 9 months ago

I tend to agree with you. “Unordered” is very strange when clearly it is consistently “ordered” by some function(name, insertion_id) to facilitate low latency lookup.

The iteration order seems arbitrary to a human, but that is exclusively because function(name, insertion_id) is not optimized for a human. It seems strange to call the collection “unordered” because of how it appears to a human.

[-]

codeflo 9 months ago

Unordered simply means the order isn’t exposed, not that the bits have somehow literally fallen out of the computer and are scattered on the floor.

[-]

taeric 9 months ago

I remember trying to explain to someone that Java's List type was ordered, where they thought ordered meant sorted. Turns out, language is tough.

[-]

tialaramex 9 months ago

This often comes up for map types. C++ std::map and Python OrderedDict are both "ordered" but in very different senses of that word. People who wanted one but got the other will most likely be unhappy. Both do always work for "golden test" inputs, which will satisfy the sort of person who writes unit tests after finishing the software by picking inputs and then mutating the checked outputs until the test passes...

choeger 9 months ago

Well written!

I wonder if there's a way to efficiently implement it without resorting to monomorphization?

A function that's polymorphic can be transformed into a more primitive (say C or assembly) function that gets extra arguments that carry the "shape" of the type variable (think of sizes, pointers vs. values, etc.). Is there a similar strategy for these polymorphic records?

I see two issues:

1. The offset of any particular field in the record is unknown at compile time

2. The size of the record itself is unknown at compile time (but this should be trivial as an extra argument.)

[-]

pistoleer 9 months ago

Swift solves this by using witness tables. Witness table is a skeleton of the desired record shape. It records offsets of desired fields as found in the actual supplied record. Each function call the actual record is not passed, but its corresponding witness table instead.

For instance, if the "prototype" of the argument is {int foo, float bar}, and I supply {int foo, int baz, float bar}, the table will be {foo: base+0 bytes, bar: base+8 bytes}.

[-]

whatshisface 9 months ago

Why doesn't it limit what it pushes on the stack to (foo,bar) instead of pushing (foo,baz,bar) along with an offset table? If the offset table is possible to construct at the callsite, compiling it into the call is also possible.

[-]

Joker_vD 9 months ago

Because the polymorphic function called may also need to pass the whole record it was given to some other polymorphic function a la "void storeInGlobalStorage(key: string, value: T)".

marcosdumay 9 months ago

> I wonder if there's a way to efficiently implement it without resorting to monomorphization?

Depends on what you consider "efficient".

Monomorphization is necessary for the most efficient code. But you can have a vtable or a restricted type lookup.

[-]

Jaxan 9 months ago

> Monomorphization is necessary for the most efficient code.

That is not always true. Monomorphisation also leads to code size increase, because the function is compiled for each type. This may decrease cache efficiency.

[-]

marcosdumay 9 months ago

Right. Monomorphization is very often a requirement for getting the most performance.

But not always, and won't always increase your performance either.

Joker_vD 9 months ago

For another approach without monomorphization see also my comment at [0] although it's only really meant for the immutable records.

[0] https://osa1.net/posts/2023-01-23-fast-polymorphic-record-ac...

sirwhinesalot 9 months ago

What pistoleer said basically. Lets say a function expects a 2D point and you want to pass in a rectangle struct (which has x and y plus more info). You'd do something like:

void f({float x, float y} p);

Becomes

void f(void* p, size_t offsets[2]);

wbl 9 months ago

Careful: the function definition doesn't remove the records but the call site does. You could transform into a type erased "access at each offset dynamically" form.

abstractcontrol 9 months ago

I've thought about adding record row polymorphism to Spiral, but I am not familiar with it and couldn't figure out how to make it work well in the presence of generics.

[-]

mrkeen 9 months ago

Why is generics the tricky bit? Isn't that the bread-and-butter of this type system? You should just be able to substitute the term 'type variable' in the article for 'generics'.

news_to_me 9 months ago

I'm loving this series, it's one of the best I've found for explaining HM typing in an approachable way. Looking forward to more!

jdmoreira 9 months ago

Totally unrelated to the topic but for my fellow Portuguese geeks, `Damas` is the same Damas that wrote the famous (or infamous) Portuguese C book. Blew my mind when I found out.

https://www.wook.pt/livro/linguagem-c-luis-damas/99231

[-]

jjice 9 months ago

I'm not familiar with this book. Anything particular about it that would cause infamy?

[-]

jdmoreira 9 months ago

at least in my circles it was considered way inferior to K&R. I read both and enjoyed both for totally different reasons.

But some Portuguese might remember "Samad" a mock person used in some of the examples. Which of course is "Damas" spelled backwards.

Relevant reddit comment: https://www.reddit.com/r/devpt/comments/qujip3/comment/hlmb9...

ggleason 9 months ago

It would be extremely handy to have a document store / graph database which had row polymorphism. I started writing a prototype, but unfortunately never got there. I wonder if any such things already exist?

marksabanal1 9 months ago

[dead]

revskill 9 months ago

I do not think using theoretical concept would make the convoept easier to understand.