Now that Christmas and the New Year are behind us the nights are becoming a little longer with each passing day. Nevertheless, there's still loads of cold winter nights left to endure (unless you're in the Southern Hemisphere, in which case - throw me a shrimp on the barbie!).
It's time to dust off your New Year resolutions from last year (remember those?) and get ready to learn some new data skills.
Here are three free eBooks to help you on that journey and make those long nights just that bit shorter.
I hope these books prove to be a valuable resource to you and that you will visit regularly (and share with your friends in social media too).
If you haven't subscribed to our newsletter yet, why not subscribe using the form on the right - you'll be the very first to know when new resources are published.
This month we highlight 3 books:
- Data-Intensive Text Processing with MapReduce
- Programming Pig
- Test-Driven Development With Python
They're all FREE, so help yourselves...
by Jimmy Lin and Chris Dyer
Our world is being revolutionized by data-driven methods: access to large amounts of data has generated new insights and opened exciting new opportunities in commerce, science, and computing applications. Processing the enormous quantities of data necessary for these advances requires large clusters, making distributed computing paradigms more crucial than ever.
MapReduce is a programming model for expressing distributed computations on massive datasets and an execution framework for large-scale data processing on clusters of commodity servers. The programming model provides an easy-to-understand abstraction for designing scalable algorithms, while the execution framework transparently handles many system-level details, ranging from scheduling to synchronization to fault tolerance.
This book focuses on MapReduce algorithm design, with an emphasis on text processing algorithms common in natural language processing, information retrieval, and machine learning. We introduce the notion of MapReduce design patterns, which represent general reusable solutions to commonly occurring problems across a variety of problem domains.
This book not only intends to help the reader "think in MapReduce", but also discusses limitations of the programming model as well.
by Alan Gates
This guide is an ideal learning tool and reference for Apache Pig, the open source engine for executing parallel data flows on Hadoop. With Pig, you can batch-process data without having to create a full-fledged application—making it easy for you to experiment with new datasets.
Programming Pig introduces new users to Pig, and provides experienced users with comprehensive coverage on key features such as the Pig Latin scripting language, the Grunt shell, and User Defined Functions (UDFs) for extending Pig. If you need to analyze terabytes of data, this book shows you how to do it efficiently with Pig.
by Harry Percival
By taking you through the development of a real web application from beginning to end, the second edition of this hands-on guide demonstrates the practical advantages of test-driven development (TDD) with Python.
You'll learn how to write and run tests before building each part of your app, and then develop the minimum amount of code required to pass those tests. The result? Clean code that works. In the process, you'll learn the basics of Django, Selenium, Git, jQuery, and Mock, along with current web development techniques.