Differences

This shows you the differences between two versions of the page.

--- start [2019/06/12 17:03]
deadline added mirror link
+++ start [2019/12/04 14:10]
deadline added Python Zeppelin notebook
@@ Line 6: / Line 6: @@
 ====Course Descriptions and Links====
-Click on the course name for availability and further information. For best results, courses should be taken in the recommended order (shown below).  Courses 1 and 2 can be taken out of order. Course 3 builds on course 1 and 2. Course 4 builds on course 3, 2, and 1.
+Click on the course name for availability and further information. For best results, courses should be taken in the recommended order (shown below).  Courses 1 and 2 can be taken out of order. Course 3 builds on courses 1 and 2. Course 4 builds-on and assumes competence with topics in courses 3, 2, and 1.
+**NOTE:** If the link does not lead you to the class, it has not yet been scheduled. Check back at a future date. Also two new courses in the series are coming in the new year (including Kafka coverage
+and Data Engineering).
 | 1 | [[https://www.safaribooksonline.com/search/?query=Apache%20Hadoop%2C%20Spark%20and%20Big%20Data%20Foundations&field=title|Apache Hadoop, Spark and Big Data Foundations]] - A great introduction to the Hadoop Big Data Ecosystem. A non-programming introduction to Hadoop, Spark, HDFS, and MapReduce. (3 hours-1 day)|{{wiki:foundations-course.png}}|
@@ Line 19: / Line 22: @@
   * [[https://www.clustermonkey.net/download/Hands-on_Hadoop_Spark/Hands_On_Hadoop_Spark-V1.5.tgz|Class Notes]] (tgz format)
   * [[https://www.clustermonkey.net/download/Hands-on_Hadoop_Spark/Hands_On_Hadoop_Spark-V1.5.zip|Class Notes]] (zip format)
+  * [[https://www.clustermonkey.net/download/Hands-on_Hadoop_Spark/Using-Python-Zeppelin.json| Example Zeppelin notebook]]
 ===Class Notes for Practical Linux Command Line for Data Engineers and Analysts===
@@ Line 24: / Line 28: @@
   * [[https://www.clustermonkey.net/download/Hands-on_Hadoop_Spark/Linux-Command-Line-V1.0.tgz|Class Notes]] (tgz format)
   * [[https://www.clustermonkey.net/download/Hands-on_Hadoop_Spark/Linux-Command-Line-V1.0.zip|Class Notes]] (zip format)
+====Zeppelin Notebook for Scalable Data Science with Hadoop and Spark===
+(Updated 20-Aug-2019)
+  * [[https://www.clustermonkey.net/download/Hands-on_Hadoop_Spark/Scalable-Analytics.json|Scalable-Analytics.json]]
 ----
@@ Line 30: / Line 39: @@
   * [[https://www.clustermonkey.net/download/Hands-on_Hadoop_Spark/DOS-Linux-HDFS-cheatsheet.pdf|DOS to Linux/HDFS Cheat-sheet]]
   * [[https://www.clustermonkey.net/download/Hands-on_Hadoop_Spark/ericg_vi-editor.bw.pdf|vi (visual editor) Cheat-sheet]]
+  * [[https://www.cs.colostate.edu/helpdocs/vi.html|Additional help with vi]]
 ----
-====Linux Hadoop Minimal Virtual Machine====
-(Current Version 0.42, 03-June-2019) Used for "Hands-on" and "Command line" courses above. Note: This VM can also be used for the //Hadoop and Spark Fundamentals: LiveLessons// video mentioned below.
+====Linux Hadoop Minimal (LHM) Virtual Machine Sandbox====
+(Current Version 0.42, 03-June-2019) **Not ready for Scalable Data Science with Hadoop and Spark (soon)**
+Used for //Hands-on//, //Command Line//, and //Scalable Data Science// courses above. Note: This VM can also be used for the //Hadoop and Spark Fundamentals: LiveLessons// video mentioned below.
   * [[Linux Hadoop Minimal Installation Instructions]] (Read First)
   * [[https://www.clustermonkey.net/download/Hands-on_Hadoop_Spark/Linux-Hadoop-Minimal-0.42.MD5.txt|Linux Hadoop Minimal MD5]]
   * Linux Hadoop Minimal Virtual Machine OVA file [[https://www.clustermonkey.net/download/Hands-on_Hadoop_Spark/Linux-Hadoop-Minimal-0.42.ova|US]] [[http://134.209.239.225/download/Linux-Hadoop-Minimal-0.42.ova|Europe]] (3.3G)
-  * [[https://www.clustermonkey.net/download/Hands-on_Hadoop_Spark/old|Old Versions]
+  * [[https://www.clustermonkey.net/download/Hands-on_Hadoop_Spark/old|Old Versions]]
 ----
+====Cloudera-Hortonworks HDP Sandbox====
+The Cloudera-Hortonworks HDP Sandbox, a full featured Hadoop/Spark virtual machine that runs under Docker, VirtualBox, or VMWare. Please see [[https://www.cloudera.com/downloads/hortonworks-sandbox.html|Cloudera/Hortonworks HDP Sandbox]] for more information. Due to the number of applications the HDP Sandbox can require substantial resources to run.
+----
+====Zeppelin Web Notebook====
+For those taking the //Scalable Data Science// course a 30-day web-based Zeppelin Notebook is available from [[https://www.basement-supercomputing.com|Basement Supercomputing]]. Please use the [[Sign Up Form]] to get access to the notebook.
+----
 ====Other Resources for all Classes====
   * Book: [[https://www.clustermonkey.net/Hadoop2-Quick-Start-Guide/| Hadoop® 2 Quick-Start Guide: Learn the Essentials of Big Data Computing in the Apache Hadoop® 2 Ecosystem]]
   * Video Tutorial: [[https://www.safaribooksonline.com/library/view/hadoop-and-spark/9780134770871|Hadoop® and Spark Fundamentals: LiveLessons]]
   * Book: [[https://www.clustermonkey.net/Practical-Data-Science-with-Hadoop-and-Spark|Practical Data Science with Hadoop® and Spark: Designing and Building Effective Analytics at Scale]]
 ----

Live On-Line Training: Scalable Data Pipelines with Hadoop, Spark, and Kafka

User Tools

Site Tools

Differences

Page Tools

Live On-Line Training:
Scalable Data Pipelines with Hadoop, Spark, and Kafka