26 points | by PaulHoule 3 hours ago ago
2 comments
I wish the authors calculated a plot of model size (number of params) vs number of triples it can hold before the memory collapse happens.
It's hard to map the frequency of knowledge injection to a real world understanding of "how much knowledge" can a 4B param model hold?
I wonder if this depends on what is inside the domain specific data.
I’m happy to see ML papers on hacker news.
I wish the authors calculated a plot of model size (number of params) vs number of triples it can hold before the memory collapse happens.
It's hard to map the frequency of knowledge injection to a real world understanding of "how much knowledge" can a 4B param model hold?
I wonder if this depends on what is inside the domain specific data.
I’m happy to see ML papers on hacker news.