As a programmer you're spoilt for choice when it comes to programming languages. You can choose from any of a dozen really powerful languages to get the job done.
When it comes to Data Science programming though, there are really only 2 choices - Python and R.
Sure, there are lots of other languages that are useful and can be used to supplement your choice, but primarily you're going to be programming in Python or R.
I'm not going to go too much into the pros and cons of each in this post. Rather, if you've made your choice and decided you're going to go with the snake, then you can enjoy a number of benefits:
- Loads of Third-Party modules
- Extensive support libraries
- Open Source and community development
- Ease of learning and available support
- User-friendly data structures
- Productivity and speed
If you're not sure how to get started with Python, the 3 books in this blog post will help you make your first steps.
Disclosure: the three books in this post link you to the listed book at your local Amazon store. We may earn an affiliate commission for purchases you make when using the links to books on this page.
In this post - the 4th in a series of 8 in which we bring you 21 Inspirational Books for All Aspiring Data Scientists, we highlight 3 books to introduce you to the Python programming language and how it is being used in Data Science:
- Data Science from Scratch: First Principles with Python
- Python for Data Science For Dummies
- Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython 2nd Edition
They are all highly recommended reading and will get your Python skills from zero to hero in no time...
Data Science from Scratch: First Principles with Python
by Joel Grus
Data science libraries, frameworks, modules, and toolkits are great for doing data science, but they’re also a good way to dive into the discipline without actually understanding data science. In this book, you’ll learn how many of the most fundamental data science tools and algorithms work by implementing them from scratch.
If you have an aptitude for mathematics and some programming skills, author Joel Grus will help you get comfortable with the math and statistics at the core of data science, and with hacking skills you need to get started as a data scientist. Today’s messy glut of data holds answers to questions no one’s even thought to ask. This book provides you with the know-how to dig those answers out.
- Get a crash course in Python
- Learn the basics of linear algebra, statistics, and probability – and understand how and when they’re used in data science
- Collect, explore, clean, munge, and manipulate data
- Dive into the fundamentals of machine learning
- Implement models such as k-nearest Neighbors, Naive Bayes, linear and logistic regression, decision trees, neural networks, and clustering
- Explore recommender systems, natural language processing, network analysis, MapReduce, and databases
by John Paul Mueller and Luca Massaron
Python is the preferred programming language for data scientists and combines the best features of Matlab, Mathematica, and R into libraries specific to data analysis and visualization. Python for Data Science For Dummies shows you how to take advantage of Python programming to acquire, organize, process, and analyze large amounts of information and use basic statistics concepts to identify trends and patterns. You’ll get familiar with the Python development environment, manipulate data, design compelling visualizations, and solve scientific computing challenges as you work your way through this user-friendly guide.
- Covers the fundamentals of Python data analysis programming and statistics to help you build a solid foundation in data science concepts like probability, random distributions, hypothesis testing, and regression models
- Explains objects, functions, modules, and libraries and their role in data analysis
- Walks you through some of the most widely-used libraries, including NumPy, SciPy, BeautifulSoup, Pandas, and MatPlobLib
Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython 2nd Edition
by Wes McKinney
Get complete instructions for manipulating, processing, cleaning, and crunching datasets in Python. Updated for Python 3.6, the second edition of this hands-on guide is packed with practical case studies that show you how to solve a broad set of data analysis problems effectively. You’ll learn the latest versions of pandas, NumPy, IPython, and Jupyter in the process.
Written by Wes McKinney, the creator of the Python pandas project, this book is a practical, modern introduction to data science tools in Python. It’s ideal for analysts new to Python and for Python programmers new to data science and scientific computing. Data files and related material are available on GitHub.
- Use the IPython shell and Jupyter notebook for exploratory computing
- Learn basic and advanced features in NumPy (Numerical Python)
- Get started with data analysis tools in the pandas library
- Use flexible tools to load, clean, transform, merge, and reshape data
- Create informative visualizations with matplotlib
- Apply the pandas groupby facility to slice, dice, and summarize datasets
- Analyze and manipulate regular and irregular time series data
- Learn how to solve real-world data analysis problems with thorough, detailed examples
All 8 posts in the series:
- 3 Great Data Science Books for Aspiring Data Scientists
- 3 Must-Read Statistics Books for Aspiring Data Scientists
- 3 Essential Python Books for Aspiring Data Scientists
- 3 Books on R That all Aspiring Data Scientists Should Read
- 3 Inspirational Machine Learning Books for Aspiring Data Scientists
- 3 Essential Visualisation Books for Aspiring Data Scientists
- 3 Must-Read Books on Data Ethics for Aspiring Data Scientists