Smallest transformer that can add two 10-digit numbers

(github.com)

57 points | by ks2048 a day ago ago

10 comments

Not sure how much this fits into the rules but I saw on twitter someone claimed 28 params : https://gist.github.com/SeuperHakkerJa/da3050739bea97aabd86e...

amelius 2 hours ago

> In short: if you can swap in a different set of weights and use the exact same inference code for a different task, your setup is legitimate. If the inference code is inseparable from the algorithm, it's not.

I wonder why they don't just write the code themselves, so by design the focus can be on the model.

ks2048 38 minutes ago

So, hand-coded weights can do it with 36 params and 311 for trained weights - did anyone try the former architecture, but starting with random weights and learning?

medi8r an hour ago

You can do that in a single matmul of course.

[-]

hyperhello an hour ago

So can you take an arbitrary transformer and somehow turn it into a compact set of low-power fast gates by some algorithm?

[-]

measurablefunc an hour ago

I think you're misunderstanding the joke.

[-]

medi8r 21 minutes ago

Yes joke is:

    [A B]

times

    [1]
    [1]

    [A+B]

[-]

hyperhello 11 minutes ago

From context then, I infer that a transformer is not comprised of matrix multiplications, because it would simply be one that adds two 10-digit numbers.

[-]

medi8r 6 minutes ago

A transformer tokenizes input, does a bunch of matmul and relu set up in a certain way. It doesn't get to see the raw number (just like you don't when you look at 1+1 you need visual cortex etc. first.)

1over137 4 minutes ago

Now wrap it all in an Electron app!