Every month we scour the internet seeking out free eBooks to help you on your educational journey, and this month has been no different.
I hope these books prove to be a valuable resource to you and that you will visit regularly (and invite your friends too).
If you haven't subscribed to our newsletter yet, why not subscribe using the form on the right - you'll be the very first to know when new resources are published.
This month, we have Information Theory, Inference and Learning algorithms, Data Science in the Cloud with Microsoft Azure Machine Learning and Python and Data-Intensive Text Processing with MapReduce. They're all FREE, so help yourselves.
By the way, the first one is written by David MacKay. You might not have heard of him, but when I was doing my PhD, his 1992 PhD thesis Bayesian Methods for Adaptive Models was my Bible. He doesn't know it, but David is my God! We are not worthy...
by David MacKay
Information theory and inference, often taught separately, are here united in one entertaining textbook.
These topics lie at the heart of many exciting areas of contemporary science and engineering - communication, signal processing, data mining, machine learning, pattern recognition, computational neuroscience, bioinformatics, and cryptography.
This textbook introduces theory in tandem with applications. Information theory is taught alongside practical communication systems, such as arithmetic coding for data compression and sparse-graph codes for error-correction.
A toolbox of inference techniques, including message-passing algorithms, Monte Carlo methods, and variational approximations, are developed alongside applications of these tools to clustering, convolutional codes, independent component analysis, and neural networks.
Take time to explore Microsoft’s Azure machine learning platform, Azure ML - a production environment that simplifies the development and deployment of machine learning models.
In this O’Reilly report, Stephen Elston from Quantia Analytics uses a complete data science example (forecasting hourly demand for a bicycle rental system) to show you how to manipulate data, construct models, and evaluate models with Azure ML.
The report walks you through key steps in the data science process from problem definition, data understanding, and feature engineering, through construction of a regression model and presentation of results. You’ll also learn how to extend Azure ML with Python.
Elston uses downloadable Python code and data to demonstrate how to perform data munging, data visualization, and in-depth evaluation of model performance. At the end, you’ll learn how to publish your trained models as web services in the Azure cloud.
by Jimmy Lin and Chris Dyer
Our world is being revolutionized by data-driven methods: access to large amounts of data has generated new insights and opened exciting new opportunities in commerce, science, and computing applications.
Processing the enormous quantities of data necessary for these advances requires large clusters, making distributed computing paradigms more crucial than ever.
MapReduce is a programming model for expressing distributed computations on massive datasets and an execution framework for large-scale data processing on clusters of commodity servers. The programming model provides an easy-to-understand abstraction for designing scalable algorithms, while the execution framework transparently handles many system-level details, ranging from scheduling to synchronization to fault tolerance.
This book focuses on MapReduce algorithm design, with an emphasis on text processing algorithms common in natural language processing, information retrieval, and machine learning. We introduce the notion of MapReduce design patterns, which represent general reusable solutions to commonly occurring problems across a variety of problem domains.
This book not only intends to help the reader "think in MapReduce", but also discusses limitations of the programming model as well.