This is an old revision of the document!
(The four essential courses on the path to scalable data science nirvana–or at least a good start)
Click on the course name for availability and further information. For best results, courses should be taken in the recommended order (shown below). Courses 1 and 2 can be taken out of order. Course 3 builds on course 1 and 2. Course 4 builds on course 3, 2, and 1.
1 | Apache Hadoop, Spark and Big Data Foundations - A great introduction to the Hadoop Big Data Ecosystem. A non-programming introduction to Hadoop, Spark, HDFS, and MapReduce. (3 hours-1 day) | |
2 | Practical Linux Command Line for Data Engineers and Analysts - Quickly learn the essentials of using the Linux command line on Hadoop/Spark clusters. Move files, run applications, write scripts and navigate the Linux command line interface used on almost all modern analytics clusters. (3 hours-1 day) | |
3 | Hands-on Introduction to Apache Hadoop and Spark Programming - A hands-on introduction to using Hadoop, Pig, Hive, Sqoop, Spark and Zeppelin notebooks. Students can download and run examples on a “Hadoop Minimal” virtual machine. (6 hours-2 days) | |
4 | Scalable Data Science with Hadoop and Spark - Learn How to Apply Hadoop and Spark tools to Predict Airline Delays. All programming will be done using Hadoop and Spark with the Zeppelin web notebook on a four node cluster. The notebook will be made available for download so student can reproduce the examples. (3 hours-1 day) |
(Updated 03-June-2019)
(Current Version 0.42, 03-June-2019) Note: This VM can also be used for the Hadoop and Spark Fundamentals: LiveLessons video mentioned below.
For further questions or help with the Linux Hadoop Minimal Virtual Machine please email d...@b...g.com
Unless otherwise noted, all course content, notes, and examples © Copyright Basement Supercomputing 2019, All rights reserved.