User Tools

Site Tools


This is an old revision of the document!

Welcome to the Scalable Analytics with Apache Hadoop and Spark

(The four essential courses on the path to scalable data science nirvana–or at least a good start)

Click on the course name for availability and further information. For best results, courses should be taken in the recommended order (shown below). Courses 1 and 2 can be taken out of order. Course 3 builds on course 1 and 2. Course 4 builds on course 3, 2, and 1.

1 Apache Hadoop, Spark and Big Data Foundations - A great introduction to the Hadoop Big Data Ecosystem. A non-programming introduction to Hadoop, Spark, HDFS, and MapReduce. (3 hours- 1 day)

2 Practical Linux Command Line for Data Engineers and Analysts - Quickly learn the essentials of using the Linux command line on Hadoop/Spark clusters. Move files, run applications, write scripts and navigate the Linux command line interface used on almost all modern analytics clusters. (3 hours - 1 Day)

3 Hands-on Introduction to Apache Hadoop and Spark Programming - A hands-on introduction to using Hadoop, Pig, Hive, Sqoop, Spark and Zeppelin notebooks. Students can download and run examples on a “Hadoop Minimal” virtual machine. (6 hours - 2 days).

4 Scalable Data Science with Hadoop and Spark - Learn How to Apply Hadoop and Spark tools to Predict Airline Delays. All programming will be done using Hadoop and Spark with the Zeppelin web notebook on a four node cluster. The notebook will be made available for download so student can reproduce the examples. (3 hours- 1 day)

Class Notes for Hands-on Introduction to Apache Hadoop and Spark Programming

(Updated 03-June-2019)

Class Notes for Practical Linux Command Line for Data Engineers and Analysts

(Updated 19-Mar-2019)

DOS to Linux and Hadoop HDFS Help:

Linux Hadoop Minimal Virtual Machine

(Current Version 0.42, 03-June-2019) Note: This VM can also be used for the Hadoop and Spark Fundamentals: LiveLessons video mentioned below.

Other Resources for all Classes


For further questions or help with the Linux Hadoop Minimal Virtual Machine please email

Unless otherwise noted, all course content, notes, and examples © Copyright Basement Supercomputing 2019, All rights reserved.

start.1560261481.txt.gz · Last modified: 2019/06/11 13:58 by deadline