November 29

Free Must-Read Statistics Books for Aspiring Data Scientists

Blog, Ebooks, Resources


If you're going to be a data scientist, you're going to need to be really good at statistics. And we know a lot of our readers love getting FREE Statistics books to get started or brush up on their stats knowledge. 

Well, we've got you covered. We've been putting together a list of the best FREE Statistics books for Data Scientist in this post.


In this post we bring you all the FREE Statistics books written for Data Science that we've found, categorised by sub-topic so you can find what you're looking for easily.

We've recently updated this blog post and we'll be adding more FREE books on Statistics for anybody wanting to improve their statistical and data analysis skills and learn new concepts.

Bookmark this page, enjoy and don't forget to share!

Here are some of the best FREE ebooks on Statistics any Data Scientist needs to read.

To get the book you're interested in, click on the images of the books and you'll be taken to a page where you can read or download a copy of the book.

Since we know some of these books are must-haves and some of our Data Ninjas love having a paper copy of the books for their library, we've included links for those of you interested in having a hard copy. 

Disclosure: The FREE ebooks were free to download at the time of posting but other links in this post may contain affiliate links. As Amazon Associates we may earn from qualifying purchases.

You can find further details in our TCs

FREE General Statistics Books

Probabilistic Programming and Bayesian Methods for Hackers

Cam Davidson-Pillon

Probabilistic Programming and Bayesian Methods for Hackers

The Bayesian method is the natural approach to inference, yet it is hidden from readers behind chapters of slow, mathematical analysis. The typical text on Bayesian inference involves two to three chapters on probability theory, then enters what Bayesian inference is. Unfortunately, due to mathematical intractability of most Bayesian models, the reader is only shown simple, artificial examples.

This can leave the user with a ‘so-what’ feeling about Bayesian inference. In fact, this was the author’s own prior opinion.

"After some recent success of Bayesian methods in machine-learning competitions, I decided to investigate the subject again. Even with my mathematical background, it took me three straight days of reading examples and trying to put the pieces together to understand the methods. There was simply not enough literature bridging theory to practice. The problem with my misunderstanding was the disconnect between Bayesian mathematics and probabilistic programming."

"That being said, I suffered then so the reader would not have to now. This book attempts to bridge the gap."

Computational And Inferential Thinking

Ani Adhikari and John DeNero

Computational and Inferential Thinking

Computational and Inferential Thinking is an introductory text for data science that explores foundational concepts in data processing and statistics using modern programming tools. Ideas are illustrated by real-world data sets and examples.

While rigorous in presentation, this text does not expect prior experience in computing, calculus, or linear algebra.

Introduction to Probability

Joseph K. Blitzstein and Jessica Hwang

Introduction to Probability

This book will give you a great introduction to probability and a strong foundation for understanding statistics, randomness and uncertainty. It does so by offering many intuitive explanations, diagrams and practice problems. 

At the end of each section the authors explain how to explore the ideas in the chapter using R.

Computer Age and Statistical Inference

Bradley Efron and Trevor Hastie

Computer Age Statistical Inference

This book takes us on a journey through the revolution in data analysis following the introduction of electronic computation in the 1950s. Beginning with classical inferential theories – Bayesian, frequentist, Fisherian – individual chapters take up a series of influential topics: survival analysis, logistic regression, empirical Bayes, the jackknife and bootstrap, random forests, neural networks, Markov chain Monte Carlo, inference after model selection, and dozens more. The book integrates methodology and algorithms with statistical inference, and ends with speculation on the future direction of statistics and data science.

Introduction to Probability

Charles M. Grinstead and J. Laurie Snell

Introduction to Probability

This text is designed for an introductory probability course at the university level for undergraduates in mathematics, the physical and social sciences, engineering, and computer science.

It presents a thorough treatment of probability ideas and techniques necessary for a firm understanding of the subject. The text is also recommended for use in discrete probability courses. The material is organized so that the discrete and continuous probability discussions are presented in a separate, but parallel, manner.

This organization does not over emphasize an overly rigorous or formal view of probability and therefore offers some strong pedagogical value. Hence, the discrete discussions can sometimes serve to motivate the more abstract continuous probability discussions.

A First Course On Design And Analysis Of Experiments

Gary W. Ohelert

Design and analysis of experiments

This text, for students needing to prepare and analyze experimental data, gives a balanced presentation of the design and analysis of experiments, teaching students when to use various designs, how to analyze the results, and how to recognize design options. The book is also fully oriented towards the use of statistical software in analyzing experiments, and the companion web site offers data sets for most of the exercises in the text.

Putting It All Together - Essays On Data Analysis

Roger D. Peng

Putting it all together - Essays in Data Analysis

What is a data analysis? What makes for a successful data analysis? These are difficult questions that even long-time practitioners have difficulty answering. The way that we have thought about data analysis to date has been focused on the data and the statistical tools that we employ to produce results. But data analysis is about more than those things, and developing an understanding of the things "outside" the data is critical to characterizing the actual process of data analysis, the process that data analysts go through every day.

This book attempts to draw a more complete picture of the data analysis process and presents a new view about what makes for a successful data analysis. It is presented in a completely non-technical and highly readable style that should be of interest to practitioners and managers in data analysis.

Collaborative Statistics

Barbara Illowsky and Susan Dean

Collaborative Statistics

This book is intended for introductory statistics courses being taken by students at two– and four–year colleges who are majoring in fields other than math or engineering.

Intermediate algebra is the only prerequisite. The book focuses on applications of statistical knowledge rather than the theory behind it. The text is named Collaborative Statistics because students learn best by doing. In fact, they learn best by working in small groups. The old saying “two heads are better than one” truly applies here.

FREE Statistics Books for Data Science

Theory and Applications For Advanced Text Mining

Edited by Shigeaki Sakurai

Theory and applications for advanced text mining

Due to the growth of computer technologies and web technologies, we can easily collect and store large amounts of text data in the belief that these data contain useful knowledge.

Text mining techniques have been studied aggressively in order to extract the knowledge from the data since the late 1990s. Even if many important techniques have been developed, the text mining research field continues to expand for the needs arising from various application fields. This book is composed of 9 chapters introducing advanced text mining techniques. There are various techniques from relation extraction to under or less resourced language.

This book will give new knowledge in the text mining field and help many readers open new research fields.

Advanced Statistical Computing

Roger D. Peng

Advanced Statistical Computing

The journey from statistical model to useful output has many steps, most of which are taught in other books and courses.

The purpose of this book is to focus on one particular aspect of this journey: the development and implementation of statistical algorithms. It's often nice to think about statistical models and various inferential philosophies and techniques, but when the rubber meets the road, we need an algorithm and a computer program implementation to get the results we need from a combination of our data and our models. This book is about how we fit models to data and the algorithms that we use to do so. Examples are given using the R programming language.

Advanced Linear Models For Data Science

Brian Caffo

Advanced Linear Models for Data Science

Linear models are the cornerstone of statistical methodology. Perhaps more than any other tool, advanced students of statistics, biostatistics, machine learning, data science, econometrics, etcetera should spend time learning the finer grain details of this subject.

In this book, we give a brief, but rigorous treatment of advanced linear models. It is advanced in the sense that it is of level that an introductory PhD student in statistics or biostatistics would see. The material in this book is standard knowledge for any PhD in statistics or biostatistics.

Students will need a fair amount of mathematical prerequisites before trying to undertake this class. First, is multivariate calculus and linear algebra. Especially linear algebra, since much of the early parts of linear models are direct applications of linear algebra results applied in a statistical context. In addition, some basic proof based mathematics is necessary to follow the proofs. In addition, some regression models and mathematical statistics are needed.

Modeling With Data

Ben Klemens
Modeling with data

Modeling with Data fully explains how to execute computationally intensive analyses on very large data sets, showing readers how to determine the best methods for solving a variety of different problems, how to create and debug statistical models, and how to run an analysis and evaluate the results.

Ben Klemens introduces a set of open and unlimited tools, and uses them to demonstrate data management, analysis, and simulation techniques essential for dealing with large data sets and computationally intensive procedures.

He then demonstrates how to easily apply these tools to the many threads of statistical technique, including classical, Bayesian, maximum likelihood, and Monte Carlo methods.

Klemens's accessible survey describes these models in a unified and non-traditional manner, providing alternative ways of looking at statistical concepts that often befuddle students. The book includes nearly one hundred sample programs of all kinds.

FREE Books for Programming Statistics in R

A Little Book Of R For Time Series

Avril Coghlan

A Little Book In R For Time Series

This is a simple introduction to time series analysis using the R statistics software (have you spotted the pattern yet?). It includes instruction on how to read and plot time seriestime series decomposition, forecasting, and ARIMA models

A Little Book Of R For Multivariate Analysis

Avril Coghlan

Free data science books

A Little Book of R for Multivariate Analysis is a simple introduction to multivariate analysis using the R statistics software.

It covers topics such as reading and plotting multivariate data, principal components analysis, and linear discriminant analysis.

It's only 49 pages long and you can read it online or download it as a pdf.

Practical Regression And Anova Using R

Julian J. Faraway

Practical Regression and Anova Using R

This book is not for beginners.  It presumes some knowledge of basic statistical theory and practice, such as statistical inference like estimation, hypothesis testing and confidence intervals. A basic knowledge of data analysis is presumed. Some linear algebra and calculus is also required..

Introduction to Statistical Thought

Michael Lavine

Introduction to Statistical Throught

The book is intended as an upper level undergraduate or introductory graduate textbook in statistical thinking with a likelihood emphasis for students with a good knowledge of calculus and the ability to think abstractly. "Statistical thinking" means a focus on ideas that statisticians care about as opposed to technical details of how to put those ideas into practice. The book does contain technical details, but they are not the focus. "Likelihood emphasis" means that the likelihood function and likelihood principle are unifying ideas throughout the text.

Another unusual aspect is the use of statistical software as a pedagogical tool. That is, instead of viewing the computer merely as a convenient and accurate calculating device, the book uses computer calculation and simulation as another way of explaining and helping readers understand the underlying concepts. The book is written with the statistical language R embedded throughout.

Forecasting Principles And Practice

Rob J. Hyndman and George Athanasopoulos

Forecasting Principles and Practice

Forecasting is required in many situations. Deciding whether to build another power generation plant in the next five years requires forecasts of future demand. Scheduling staff in a call centre next week requires forecasts of call volumes. Stocking an inventory requires forecasts of stock requirements. Telecommunication routing requires traffic forecasts a few minutes ahead.

Whatever the circumstances or time horizons involved, forecasting is an important aid in effective and efficient planning. This textbook provides a comprehensive introduction to forecasting methods and presents enough information about each method for readers to use them sensibly. Examples use R with many data sets taken from the authors' own consulting experience.

From Algorithms to Z-Scores: Probabilistic And Statistical Modeling In Computer Science

Norm Matloff

From Algorithms to Z-Scores

The materials here form a textbook for a course in mathematical probability and statistics for computer science students.

Computer science examples are used throughout, in areas such as: computer networks; data and text mining; computer security; remote sensing; computer performance evaluation; software engineering; data management; etc.

The R statistical/data manipulation language is used throughout. Since this is a computer science audience, a greater sophistication in programming can be assumed. It is recommended that the R tutorial, R for Programmers, be used as a supplement.

Throughout the units, mathematical theory and applications are interwoven, with a strong emphasis on modelling.

An Introduction To Statistical Learning With Applications In R

Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani

An Introduction to Statistical Learning with Applications in R

This book provides an introduction to statistical learning methods.

It is aimed for upper level undergraduate students, masters students and Ph.D. students in the non-mathematical sciences.

The book also contains a number of R labs with detailed explanations on how to implement the various methods in real life settings, and should be a valuable resource for a practicing data scientist.

R Programming For Data Science

Roger D. Peng

R Programming for Data Science

This book brings the fundamentals of R programming to you, using the same material developed as part of the industry-leading Johns Hopkins Data Science Specialization. The skills taught in this book will lay the foundation for you to begin your journey learning data science.

Data Analysis And Graphics Using R

John Maindonald and W. John Braun

Data Analysis and Graphics Using R

Introducing the R system, covering standard regression methods, then tackling more advanced topics, this book guides users through the practical, powerful tools that the R system provides. The emphasis is on hands-on analysis, graphical display, and interpretation of data.

The many worked examples, from real-world research, are accompanied by commentary on what is done and why.  

Assuming basic statistical knowledge and some experience with data analysis (but not R), the book is ideal for research scientists, final-year undergraduate or graduate-level students of applied statistics, and practicing statisticians. It is both for learning and for reference. This third edition expands upon topics such as Bayesian inference for regression, errors in variables, generalized linear mixed models, and random forests.

FREE Books for Statistics and Machine Learning

Statisticial Learning With Sparsity

Trevor Hastie, Robert Tibshirani, Martin Wainwright

Statistical Learning with Sparsity

During the past decade there has been an explosion in computation and information technology. With it has come vast amounts of data in a variety of fields such as medicine, biology, finance, and marketing. This book describes the important ideas in these areas in a common conceptual framework.

Gaussian Processes For Machine Learning

C.E Rasmussen and C.K.I Williams

Gaussian Processes For Machine Learning

Gaussian processes (GPs) provide a principled, practical, probabilistic approach to learning in kernel machines through a systematic and unified treatment of theoretical and practical aspects of GPs in machine learning. 


free ebooks, statistics

You may also like

45+ Awesome Gifts for Data Scientists, Statisticians and Other Geeks

45+ Awesome Gifts for Data Scientists, Statisticians and Other Geeks

Computational Statistics is the New Holy Grail – Experts

Computational Statistics is the New Holy Grail – Experts

3 Crucial Tips for Data Processing and Analysis

3 Crucial Tips for Data Processing and Analysis

Correlation Is Not Causation – Pirates Prove It!

Correlation Is Not Causation – Pirates Prove It!

Cracking Chi-Square Tests: Step-by-Step

Cracking Chi-Square Tests: Step-by-Step

Chi-Square Test: The Key to Categorical Analysis

Chi-Square Test: The Key to Categorical Analysis
{"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}

Machine Learning Models:

The Big Picture

FREE Ultra HD pdf

Download your FREE mind map to learn about the different types of ML models in Machine Learning.

Remember Me
Success message!
Warning message!
Error message!