Both sides previous revision
Previous revision
Next revision
|
Previous revision
Next revision
Both sides next revision
|
start [2019/06/12 17:03] deadline added mirror link |
start [2019/12/04 14:10] deadline added Python Zeppelin notebook |
====Course Descriptions and Links==== | ====Course Descriptions and Links==== |
| |
Click on the course name for availability and further information. For best results, courses should be taken in the recommended order (shown below). Courses 1 and 2 can be taken out of order. Course 3 builds on course 1 and 2. Course 4 builds on course 3, 2, and 1. | Click on the course name for availability and further information. For best results, courses should be taken in the recommended order (shown below). Courses 1 and 2 can be taken out of order. Course 3 builds on courses 1 and 2. Course 4 builds-on and assumes competence with topics in courses 3, 2, and 1. |
| |
| **NOTE:** If the link does not lead you to the class, it has not yet been scheduled. Check back at a future date. Also two new courses in the series are coming in the new year (including Kafka coverage |
| and Data Engineering). |
| |
| 1 | [[https://www.safaribooksonline.com/search/?query=Apache%20Hadoop%2C%20Spark%20and%20Big%20Data%20Foundations&field=title|Apache Hadoop, Spark and Big Data Foundations]] - A great introduction to the Hadoop Big Data Ecosystem. A non-programming introduction to Hadoop, Spark, HDFS, and MapReduce. (3 hours-1 day)|{{wiki:foundations-course.png}}| | | 1 | [[https://www.safaribooksonline.com/search/?query=Apache%20Hadoop%2C%20Spark%20and%20Big%20Data%20Foundations&field=title|Apache Hadoop, Spark and Big Data Foundations]] - A great introduction to the Hadoop Big Data Ecosystem. A non-programming introduction to Hadoop, Spark, HDFS, and MapReduce. (3 hours-1 day)|{{wiki:foundations-course.png}}| |
* [[https://www.clustermonkey.net/download/Hands-on_Hadoop_Spark/Hands_On_Hadoop_Spark-V1.5.tgz|Class Notes]] (tgz format) | * [[https://www.clustermonkey.net/download/Hands-on_Hadoop_Spark/Hands_On_Hadoop_Spark-V1.5.tgz|Class Notes]] (tgz format) |
* [[https://www.clustermonkey.net/download/Hands-on_Hadoop_Spark/Hands_On_Hadoop_Spark-V1.5.zip|Class Notes]] (zip format) | * [[https://www.clustermonkey.net/download/Hands-on_Hadoop_Spark/Hands_On_Hadoop_Spark-V1.5.zip|Class Notes]] (zip format) |
| * [[https://www.clustermonkey.net/download/Hands-on_Hadoop_Spark/Using-Python-Zeppelin.json| Example Zeppelin notebook]] |
| |
===Class Notes for Practical Linux Command Line for Data Engineers and Analysts=== | ===Class Notes for Practical Linux Command Line for Data Engineers and Analysts=== |
* [[https://www.clustermonkey.net/download/Hands-on_Hadoop_Spark/Linux-Command-Line-V1.0.tgz|Class Notes]] (tgz format) | * [[https://www.clustermonkey.net/download/Hands-on_Hadoop_Spark/Linux-Command-Line-V1.0.tgz|Class Notes]] (tgz format) |
* [[https://www.clustermonkey.net/download/Hands-on_Hadoop_Spark/Linux-Command-Line-V1.0.zip|Class Notes]] (zip format) | * [[https://www.clustermonkey.net/download/Hands-on_Hadoop_Spark/Linux-Command-Line-V1.0.zip|Class Notes]] (zip format) |
| |
| ====Zeppelin Notebook for Scalable Data Science with Hadoop and Spark=== |
| (Updated 20-Aug-2019) |
| |
| * [[https://www.clustermonkey.net/download/Hands-on_Hadoop_Spark/Scalable-Analytics.json|Scalable-Analytics.json]] |
| |
---- | ---- |
* [[https://www.clustermonkey.net/download/Hands-on_Hadoop_Spark/DOS-Linux-HDFS-cheatsheet.pdf|DOS to Linux/HDFS Cheat-sheet]] | * [[https://www.clustermonkey.net/download/Hands-on_Hadoop_Spark/DOS-Linux-HDFS-cheatsheet.pdf|DOS to Linux/HDFS Cheat-sheet]] |
* [[https://www.clustermonkey.net/download/Hands-on_Hadoop_Spark/ericg_vi-editor.bw.pdf|vi (visual editor) Cheat-sheet]] | * [[https://www.clustermonkey.net/download/Hands-on_Hadoop_Spark/ericg_vi-editor.bw.pdf|vi (visual editor) Cheat-sheet]] |
| * [[https://www.cs.colostate.edu/helpdocs/vi.html|Additional help with vi]] |
| |
---- | ---- |
====Linux Hadoop Minimal Virtual Machine==== | |
| |
(Current Version 0.42, 03-June-2019) Used for "Hands-on" and "Command line" courses above. Note: This VM can also be used for the //Hadoop and Spark Fundamentals: LiveLessons// video mentioned below. | ====Linux Hadoop Minimal (LHM) Virtual Machine Sandbox==== |
| |
| (Current Version 0.42, 03-June-2019) **Not ready for Scalable Data Science with Hadoop and Spark (soon)** |
| |
| Used for //Hands-on//, //Command Line//, and //Scalable Data Science// courses above. Note: This VM can also be used for the //Hadoop and Spark Fundamentals: LiveLessons// video mentioned below. |
* [[Linux Hadoop Minimal Installation Instructions]] (Read First) | * [[Linux Hadoop Minimal Installation Instructions]] (Read First) |
* [[https://www.clustermonkey.net/download/Hands-on_Hadoop_Spark/Linux-Hadoop-Minimal-0.42.MD5.txt|Linux Hadoop Minimal MD5]] | * [[https://www.clustermonkey.net/download/Hands-on_Hadoop_Spark/Linux-Hadoop-Minimal-0.42.MD5.txt|Linux Hadoop Minimal MD5]] |
* Linux Hadoop Minimal Virtual Machine OVA file [[https://www.clustermonkey.net/download/Hands-on_Hadoop_Spark/Linux-Hadoop-Minimal-0.42.ova|US]] [[http://134.209.239.225/download/Linux-Hadoop-Minimal-0.42.ova|Europe]] (3.3G) | * Linux Hadoop Minimal Virtual Machine OVA file [[https://www.clustermonkey.net/download/Hands-on_Hadoop_Spark/Linux-Hadoop-Minimal-0.42.ova|US]] [[http://134.209.239.225/download/Linux-Hadoop-Minimal-0.42.ova|Europe]] (3.3G) |
* [[https://www.clustermonkey.net/download/Hands-on_Hadoop_Spark/old|Old Versions] | * [[https://www.clustermonkey.net/download/Hands-on_Hadoop_Spark/old|Old Versions]] |
---- | ---- |
| |
| ====Cloudera-Hortonworks HDP Sandbox==== |
| |
| The Cloudera-Hortonworks HDP Sandbox, a full featured Hadoop/Spark virtual machine that runs under Docker, VirtualBox, or VMWare. Please see [[https://www.cloudera.com/downloads/hortonworks-sandbox.html|Cloudera/Hortonworks HDP Sandbox]] for more information. Due to the number of applications the HDP Sandbox can require substantial resources to run. |
| |
| ---- |
| |
| ====Zeppelin Web Notebook==== |
| For those taking the //Scalable Data Science// course a 30-day web-based Zeppelin Notebook is available from [[https://www.basement-supercomputing.com|Basement Supercomputing]]. Please use the [[Sign Up Form]] to get access to the notebook. |
| |
| ---- |
| |
====Other Resources for all Classes==== | ====Other Resources for all Classes==== |
* Book: [[https://www.clustermonkey.net/Hadoop2-Quick-Start-Guide/| Hadoop® 2 Quick-Start Guide: Learn the Essentials of Big Data Computing in the Apache Hadoop® 2 Ecosystem]] | * Book: [[https://www.clustermonkey.net/Hadoop2-Quick-Start-Guide/| Hadoop® 2 Quick-Start Guide: Learn the Essentials of Big Data Computing in the Apache Hadoop® 2 Ecosystem]] |
* Video Tutorial: [[https://www.safaribooksonline.com/library/view/hadoop-and-spark/9780134770871|Hadoop® and Spark Fundamentals: LiveLessons]] | * Video Tutorial: [[https://www.safaribooksonline.com/library/view/hadoop-and-spark/9780134770871|Hadoop® and Spark Fundamentals: LiveLessons]] |
* Book: [[https://www.clustermonkey.net/Practical-Data-Science-with-Hadoop-and-Spark|Practical Data Science with Hadoop® and Spark: Designing and Building Effective Analytics at Scale]] | * Book: [[https://www.clustermonkey.net/Practical-Data-Science-with-Hadoop-and-Spark|Practical Data Science with Hadoop® and Spark: Designing and Building Effective Analytics at Scale]] |
| |
| |
---- | ---- |