Knowledge Infusion Scaling Law for Pre-Training Large Language Models

(arxiv.org)

26 points | by PaulHoule 3 hours ago ago

2 comments

adsharma an hour ago

I wish the authors calculated a plot of model size (number of params) vs number of triples it can hold before the memory collapse happens.

It's hard to map the frequency of knowledge injection to a real world understanding of "how much knowledge" can a 4B param model hold?

gdiamos 2 hours ago

I wonder if this depends on what is inside the domain specific data.

I’m happy to see ML papers on hacker news.