Stupid question - how is it even possible given that you lose information on each layer? And how do one implement a non-linear activation function without an amplifier of a sort?
This is a neat idea, but it's extremely light (no pun intended) on real details. Translating a simulation into real hardware that can do real computation in a reliable manner is properly hard. As much as I'd love to be an optimist about this project, I have to say I'll believe it when I see it actually running on a workbench.
If it does work, I think one of the biggest challenges will be adding enough complexity to it for it to do real, useful computation. Running the equivalent of GPT-2 is a cool tech demo, but if there's not an obvious path to scaling it up, it's a bit of a dead end.
Stupid question - how is it even possible given that you lose information on each layer? And how do one implement a non-linear activation function without an amplifier of a sort?
This is a neat idea, but it's extremely light (no pun intended) on real details. Translating a simulation into real hardware that can do real computation in a reliable manner is properly hard. As much as I'd love to be an optimist about this project, I have to say I'll believe it when I see it actually running on a workbench.
If it does work, I think one of the biggest challenges will be adding enough complexity to it for it to do real, useful computation. Running the equivalent of GPT-2 is a cool tech demo, but if there's not an obvious path to scaling it up, it's a bit of a dead end.
meds