Show HN: Visualizing 8k+ LLM papers with t-SNE (awesome-LLM-papers.github.io)

(awesome-llm-papers.github.io)

2 points | by sjm213 7 hours ago ago

1 comments

  • sjm213 7 hours ago

    I’ve been working on a small project to map the evolution of large language model research.

    I collected around 8,000 papers, embedded their abstracts, and plotted them using t-SNE to visualize clusters such as instruction-tuning, RAG, agents, and evaluation.

    One interesting detail — the earliest “proto-LLM” paper that shows up is “Natural Language Processing (almost) From Scratch” (2011), which already had hints of joint representations and multitask learning.

    Interactive version here: https://awesome-llm-papers.github.io/tsne.html

    Would love feedback — especially on what other dimensions or embeddings might be interesting to explore (e.g., by year, model type, or dataset).