Show HN: Visualizing 8k+ LLM papers with t-SNE (awesome-LLM-papers.github.io)

(awesome-llm-papers.github.io)

2 points | by sjm213 7 hours ago ago

1 comments

sjm213 7 hours ago

I’ve been working on a small project to map the evolution of large language model research.

I collected around 8,000 papers, embedded their abstracts, and plotted them using t-SNE to visualize clusters such as instruction-tuning, RAG, agents, and evaluation.

One interesting detail — the earliest “proto-LLM” paper that shows up is “Natural Language Processing (almost) From Scratch” (2011), which already had hints of joint representations and multitask learning.

Interactive version here: https://awesome-llm-papers.github.io/tsne.html

Would love feedback — especially on what other dimensions or embeddings might be interesting to explore (e.g., by year, model type, or dataset).