Welcome to Hadoop 2 Quick-Start Guide Resource Page, where you can ask questions about the book and examples. By visiting our website you agree that we are using cookies to ensure you to get the best experience.

Hadoop® 2 Quick-Start Guide

Learn the Essentials of Big Data Computing in the Apache Hadoop® 2 Ecosystem

With Hadoop 2+ and YARN, Hadoop moves beyond MapReduce to become practical for virtually any type of data processing. Hadoop 2+ and the Data Lake concept represent a radical shift away from conventional approaches to data usage and storage. Hadoop 2+installations offer unmatched scalability and breakthrough extensibility that supports new and existing Big Data analytics processing methods and models.

Hadoop® 2 Quick-Start Guide is the first easy, accessible guide to Apache Hadoop 2+ (including Hadoop Version 3), YARN, and the modern Hadoop ecosystem. Building on his unsurpassed experience teaching Hadoop and Big Data, author Douglas Eadline covers all the basics you need to know to install and use Hadoop 2 on personal computers or servers, and to navigate the powerful technologies that complement it.

Eadline concisely introduces and explains every key Hadoop 2+ concept, tool, and service, illustrating each with a simple “beginning-to-end” example and identifying trustworthy, up-to-date resources for learning more. This guide is ideal if you want to learn about Hadoop 2+ without getting mired in technical details. Douglas Eadline will bring you up to speed quickly, whether you’re a user, admin, devops specialist, programmer, architect, analyst, or data scientist.

Additional Learning Check out these on-line training classes (get up to speed quickly):

  1. Begining Linux Command Line for Data Engineers and Analysts - Quickly learn the basics of using the Linux command line on for Hadoop/Spark clusters. command basics, run applications, edit files and navigate the Linux command line interface used on almost all modern analytics clusters. (3 hours - 1 Day)
  2. Intermediate Linux Command Line for Data Engineers and Analysts -Continue to learn the basics of using the Linux including how to move data to/from Linux and to the Hadoop Distributed File System (HDFS), use basic Linux "analytics tools" like grep, sed, gawk, and understand how to run Hadoop and Spark applications from the command line. (3 hours - 1 Day)
  3. Apache Hadoop, Spark and Kafka Foundations - A great introduction to the Hadoop Big Data Ecosystem. A non-programming introduction to Hadoop, Spark, HDFS, MapReduce, and the Kafka message broker. (3 hours- 1 day)
  4. Hands-on Introduction to Apache Hadoop and Spark Programming - A hands-on introduction to using Hadoop, Pig, Hive, Sqoop, Flume, Spark and Zeppelin notebooks. All examples provided in course notes. Students can download and run examples on a "Hadoop Minimal" virtual machine. The virtual machine is designed to be used on a desktop or laptop (6 hours - 2 days).
  5. Scalable Data Science with Hadoop and Spark - Learn How to Apply Hadoop and Spark tools from previous classes to Predict Airline Delays. All programming will be done using Hadoop and Spark with the Zeppelin web notebook on a four node cluster. The notebook will be made available for download so student can reproduce the examples. (3 hours-1 day)

All classes provided ample time for interactive questions. The course can be streamed at any time.

To learn about Data Analytics with Hadoop check out the latest book on Practical Data Science with Hadoop and Spark: Designing and Building Effective Analytics at Scale.

ClusterMonkey Home