I'm really glad that these HNet-inspired approaches are getting traction, I'm a big fan of that paper.
Though I wonder how much of the gains in this case are actually due to 75% extra parameters compared to the baseline, even if the inference FLOPs are matched.
Can't help but see this as a just different twist on parameter use sparsity idea leveraged by MoE models, as those also gain in performance at constant forward pass FLOPs because of extra parameters.
Would this enable a model to learn concepts in one language and generate answers about it in another, as long as it learns general translations between them?
I don’t think for this approach it sounds like, this is related to the large concept model: https://arxiv.org/abs/2412.08821, where the latent space is SONAR, which is very much designed for this purpose. You learn SONAR embeddings so that every sentence with the same semantic meaning gets mapped to the same latent representation. So you can have e.g. a French SONAR encoder and a Finnish SONAR encoder, trained separately with large scale corpi of paired sentences with the same meaning (basically the same thing you would use for learning translation models directly, but for SONAR you don’t need to train a single model per pair of languages). The LCM then works in this language-agnostic SONAR space which means it does (in principle) learn concepts from texts or speech in all supported languages
My educated guess:
Not more than any other LLM.
The text-latent encoder and latent-text decoder just find am more efficient representation of the tokens, but it's more of a compression instead of turning words/sentences into abstract concepts.
There will be residuals of the input language be in there.
I'm really glad that these HNet-inspired approaches are getting traction, I'm a big fan of that paper.
Though I wonder how much of the gains in this case are actually due to 75% extra parameters compared to the baseline, even if the inference FLOPs are matched.
Can't help but see this as a just different twist on parameter use sparsity idea leveraged by MoE models, as those also gain in performance at constant forward pass FLOPs because of extra parameters.
Would this enable a model to learn concepts in one language and generate answers about it in another, as long as it learns general translations between them?
I don’t think for this approach it sounds like, this is related to the large concept model: https://arxiv.org/abs/2412.08821, where the latent space is SONAR, which is very much designed for this purpose. You learn SONAR embeddings so that every sentence with the same semantic meaning gets mapped to the same latent representation. So you can have e.g. a French SONAR encoder and a Finnish SONAR encoder, trained separately with large scale corpi of paired sentences with the same meaning (basically the same thing you would use for learning translation models directly, but for SONAR you don’t need to train a single model per pair of languages). The LCM then works in this language-agnostic SONAR space which means it does (in principle) learn concepts from texts or speech in all supported languages
My educated guess: Not more than any other LLM. The text-latent encoder and latent-text decoder just find am more efficient representation of the tokens, but it's more of a compression instead of turning words/sentences into abstract concepts. There will be residuals of the input language be in there.
Broken citations. My inner reviewer gets sad. :(