This is my favorite book on statistics. Full stop. The author Andrew Gelman created a whole new branch of Bayesian statistics with both his theoretical work on hierarchical modeling while also publishing Stan to enable practical applications of hierarchical models.
It took me about a year to work through this book on the side (including the exercises) and it provided the foundation for years of fruitful research into hierarchical Bayesian models. It’s a definitely not an introductory read, but for any looking to advance their statistical toolkit, I cannot recommend this book highly enough.
As a starting point, I’d strongly suggest the first 5 chapters for an excellent introduction to Gelman’s modeling philosophy, and then jumping around the table of contents to any topics that look interesting.
I don’t mean for the bar to sound too high. I think working through khan academy’s full probability, calculus and linear algebra courses would give you a strong foundation. I worked through this book having just completed the equivalent courses in college.
It’s just a relatively dense book. There’s some other really good suggestions in this thread, most of which I’ve heard good things about. If you have a background in programming, I’d suggest Bayesian Methods for Hackers as a really good starting point. But you can also definitely tackle this book head on, and it will be very rewarding.
Bayesian Statistics the Fun Way is probably the best place to start if you're coming at this from 0. It covers the basics of most of the foundational math you'll need along the way and assumes basically no prerequisites.
After than Statistical Rethinking will take you much deeper into more complex experiment design using linear models and beyond as well as deepening your understanding of other areas of math required.
Regression and Other Stories. It’s also co-authored by Gelman and it reads like an updated version of his previous book Data Analysis Using Hierarchical/Multilevel Models.
This book is very relevant to those fields. There is a common choice in statistics to either stratify or aggregate your dataset.
There is an example in his book discussing efficacy trials across seven hospitals. If you stratify the data, you lose a lot of confidence, if you aggregate the data, you end up just modeling the difference between hospitals.
Hierarchical modeling allows you to split your dataset under a single unified model. This is really powerful for extracting signal for noise because you can split your dataset according to potential confounding variables eg the hospital from which the data was collected.
I am writing this on my phone so apologies for the lack of links, but in short the approach in this book is extremely relevant of medical testing.
The key insight to recognize is that within the Bayesian framework hypothesis testing is parameter estimation. Your certainty in the outcome of the test is your posterior probability over the test-relevant parameters.
Once you realize this you can easily develop very sophisticated testing models (if necessary) that are also easy to understand and reason about. This dramatically simplifies.
If you're looking for a specific book recommendation Statistical Rethinking does a good job covering this at length and Bayesian Statistics the Fun Way is a more beginner friendly book that covers the basics of Bayesian hypothesis testing.
I might checkout Statistical Rethinking given how frequently it is being recommended!
Edit: Haha I just found the textbook and I’m remembering now that I actually worked through sections of it back when I was working through BDA several years back.
I am interested in this topic, but this textbook is too daunting for me. What I'd love is a crash course on Bayesian methods for the working systems performance engineer. If you, dear reader, happen to be familiar with both domains: what would you include in such a course, and can you recommend any existing resources for self-study?
My go to for teaching statistics is Statistical Rethinking. It’s basically a course in how to actually thing about modeling: what you’re really looking for is analyzing a hypothesis, and a model may be consistent with a number of hypotheses, figuring out what hypotheses any given model implies is the hard/fun part, and this book teaches you that. The only drawback is that it’s not free. (Although there are excellent lectures by the author available for free on YouTube. These are worth watching even if you don’t get the book.)
I also recommend Gelman’s (one of the authors of the linked book) Regression and Other Stories as a more approachable text for this content.
Think Bayes and Bayesian Methods for Hackers are introductory books from a beginner coming from a programming background.
If you want something more from the ML world that heavily emphasizes the benefits of probabilistic (Bayesian) methods, I highly recommend Kevin Murphy’s Probabilistic Machine Learning. I have only read the first edition before he split it into two volumes and expanded it, but I’ve only heard good things about the new volumes too.
Start with statistics by David Freedman. It is very approachable as an introduction, not too theory heavy, can get a handle on all of the "main" issues. Afterwards, you have 2 options:
1) Do you want "theoretical" knowledge(math background required)? If so, then you need to get a decent mathematical statistics book like Casella-Berger. I think a good US CS degree grad could handle it, but you might need to go a bit slow and google around/ maybe fill in some gaps in probability/calculus.
2)Introduction to Statistical Learning is unironically a great intro to "applied" stats. You have most of the "vanilla" models/algorithms, theoretical background behind each but not too much, you can follow along with the R version and see how stuff actually works and exercises that vary in difficulty.
With regards to Gelman and Bayesian data analysis, I should note that in my experience the Bayesian approach is 1st year MS /4th year of a Bachelors in the US. It's very useful to know and have in your toolbox but IMO it should be left aside until you are confident in the "frequentist" basics.
I think Statistical Rethinking [0] is a far more approachable first entry. The author posts his video lectures on Youtube which are excellent and should be watched with the book. The book gets way less into the mathematical weeds than other texts, so a working statistician would require something deeper.
This is my favorite book on statistics. Full stop. The author Andrew Gelman created a whole new branch of Bayesian statistics with both his theoretical work on hierarchical modeling while also publishing Stan to enable practical applications of hierarchical models.
It took me about a year to work through this book on the side (including the exercises) and it provided the foundation for years of fruitful research into hierarchical Bayesian models. It’s a definitely not an introductory read, but for any looking to advance their statistical toolkit, I cannot recommend this book highly enough.
As a starting point, I’d strongly suggest the first 5 chapters for an excellent introduction to Gelman’s modeling philosophy, and then jumping around the table of contents to any topics that look interesting.
What is a book / course on statistics that I can go through before this so that I can understand this?
I don’t mean for the bar to sound too high. I think working through khan academy’s full probability, calculus and linear algebra courses would give you a strong foundation. I worked through this book having just completed the equivalent courses in college.
It’s just a relatively dense book. There’s some other really good suggestions in this thread, most of which I’ve heard good things about. If you have a background in programming, I’d suggest Bayesian Methods for Hackers as a really good starting point. But you can also definitely tackle this book head on, and it will be very rewarding.
Bayesian Statistics the Fun Way is probably the best place to start if you're coming at this from 0. It covers the basics of most of the foundational math you'll need along the way and assumes basically no prerequisites.
After than Statistical Rethinking will take you much deeper into more complex experiment design using linear models and beyond as well as deepening your understanding of other areas of math required.
Regression and Other Stories. It’s also co-authored by Gelman and it reads like an updated version of his previous book Data Analysis Using Hierarchical/Multilevel Models.
Statistical Rethinking is a good option too.
Can second Regression and Other Stories, it's freely available here: https://users.aalto.fi/~ave/ROS.pdf, and you can access additional information such as data and code (including Python and Julia ports) here: https://avehtari.github.io/ROS-Examples/index.html
Is there a good book that covers statistics as it is applied to testing - like for medical research or as optimization or manufacturing or whatever?
This book is very relevant to those fields. There is a common choice in statistics to either stratify or aggregate your dataset.
There is an example in his book discussing efficacy trials across seven hospitals. If you stratify the data, you lose a lot of confidence, if you aggregate the data, you end up just modeling the difference between hospitals.
Hierarchical modeling allows you to split your dataset under a single unified model. This is really powerful for extracting signal for noise because you can split your dataset according to potential confounding variables eg the hospital from which the data was collected.
I am writing this on my phone so apologies for the lack of links, but in short the approach in this book is extremely relevant of medical testing.
The key insight to recognize is that within the Bayesian framework hypothesis testing is parameter estimation. Your certainty in the outcome of the test is your posterior probability over the test-relevant parameters.
Once you realize this you can easily develop very sophisticated testing models (if necessary) that are also easy to understand and reason about. This dramatically simplifies.
If you're looking for a specific book recommendation Statistical Rethinking does a good job covering this at length and Bayesian Statistics the Fun Way is a more beginner friendly book that covers the basics of Bayesian hypothesis testing.
I might checkout Statistical Rethinking given how frequently it is being recommended!
Edit: Haha I just found the textbook and I’m remembering now that I actually worked through sections of it back when I was working through BDA several years back.
I'm a fan of the stats blog hosted by Columbia that Gelman is the main contributor to: https://statmodeling.stat.columbia.edu
Thanks for sharing, any particular articles that had last impact on you?
idk about impact, but here are a couple I liked:
- https://statmodeling.stat.columbia.edu/2025/08/25/what-writi...
- https://statmodeling.stat.columbia.edu/2025/09/04/assembling...
Here are the articles that were popular on HN over the years:
https://hn.algolia.com/?q=statmodeling.stat.columbia.edu
I am interested in this topic, but this textbook is too daunting for me. What I'd love is a crash course on Bayesian methods for the working systems performance engineer. If you, dear reader, happen to be familiar with both domains: what would you include in such a course, and can you recommend any existing resources for self-study?
My go to for teaching statistics is Statistical Rethinking. It’s basically a course in how to actually thing about modeling: what you’re really looking for is analyzing a hypothesis, and a model may be consistent with a number of hypotheses, figuring out what hypotheses any given model implies is the hard/fun part, and this book teaches you that. The only drawback is that it’s not free. (Although there are excellent lectures by the author available for free on YouTube. These are worth watching even if you don’t get the book.)
I also recommend Gelman’s (one of the authors of the linked book) Regression and Other Stories as a more approachable text for this content.
Think Bayes and Bayesian Methods for Hackers are introductory books from a beginner coming from a programming background.
If you want something more from the ML world that heavily emphasizes the benefits of probabilistic (Bayesian) methods, I highly recommend Kevin Murphy’s Probabilistic Machine Learning. I have only read the first edition before he split it into two volumes and expanded it, but I’ve only heard good things about the new volumes too.
Yep 100% came here to say the same. Helped me a lot during the PhD to get a better understanding of statistics.
https://github.com/CamDavidsonPilon/Probabilistic-Programmin...
https://www.oreilly.com/library/view/bayesian-methods-for/97...
Related course materials here: https://sites.stat.columbia.edu/gelman/book/
Looking for more self study statistics resources for someone with a CS degree, any other recs?
Start with statistics by David Freedman. It is very approachable as an introduction, not too theory heavy, can get a handle on all of the "main" issues. Afterwards, you have 2 options:
1) Do you want "theoretical" knowledge(math background required)? If so, then you need to get a decent mathematical statistics book like Casella-Berger. I think a good US CS degree grad could handle it, but you might need to go a bit slow and google around/ maybe fill in some gaps in probability/calculus.
2)Introduction to Statistical Learning is unironically a great intro to "applied" stats. You have most of the "vanilla" models/algorithms, theoretical background behind each but not too much, you can follow along with the R version and see how stuff actually works and exercises that vary in difficulty.
With regards to Gelman and Bayesian data analysis, I should note that in my experience the Bayesian approach is 1st year MS /4th year of a Bachelors in the US. It's very useful to know and have in your toolbox but IMO it should be left aside until you are confident in the "frequentist" basics.
I think Statistical Rethinking [0] is a far more approachable first entry. The author posts his video lectures on Youtube which are excellent and should be watched with the book. The book gets way less into the mathematical weeds than other texts, so a working statistician would require something deeper.
[0] https://en.wikipedia.org/wiki/Statistical_Rethinking
2024 videos / lectures on github here -> https://github.com/rmcelreath/stat_rethinking_2024
Start here:
https://www.inference.org.uk/itprnn/book.pdf
It's a little dated now but it connects Bayesian statistics with neural nets and information theory in an elegant way.
Probability Theory by Jaynes if you'd like more bayes
Previously:
Bayesian Data Analysis, Third Edition [pdf] - https://news.ycombinator.com/item?id=23091359 - May 2020 (48 comments)