User Tools

Site Tools


start

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
start [2019/06/11 20:23]
deadline tweaked content
start [2019/12/04 14:15] (current)
deadline removed notebook
Line 1: Line 1:
-=====Welcome to the Scalable Analytics with Apache Hadoop and Spark=====+=====Welcome to Scalable Analytics with Apache Hadoop and Spark=====
  
 **(The four essential courses on the path to **(The four essential courses on the path to
Line 6: Line 6:
 ====Course Descriptions and Links==== ====Course Descriptions and Links====
  
-Click on the course name for availability and further information. For best results, courses should be taken in the recommended order (shown below). ​ Courses 1 and 2 can be taken out of order. Course 3 builds on course ​1 and 2. Course 4 builds on course ​3, 2, and 1.  +Click on the course name for availability and further information. For best results, courses should be taken in the recommended order (shown below). ​ Courses 1 and 2 can be taken out of order. Course 3 builds on courses ​1 and 2. Course 4 builds-on and assumes competence with topics in courses ​3, 2, and 1.  
 + 
 +**NOTE:** If the link does not lead you to the class, it has not yet been scheduled. Check back at a future date. Also two new courses in the series are coming in the new year (including Kafka coverage 
 +and Data Engineering).
  
 | 1 | [[https://​www.safaribooksonline.com/​search/?​query=Apache%20Hadoop%2C%20Spark%20and%20Big%20Data%20Foundations&​field=title|Apache Hadoop, Spark and Big Data Foundations]] - A great introduction to the Hadoop Big Data Ecosystem. A non-programming introduction to Hadoop, Spark, HDFS, and MapReduce. (3 hours-1 day)|{{wiki:​foundations-course.png}}| | 1 | [[https://​www.safaribooksonline.com/​search/?​query=Apache%20Hadoop%2C%20Spark%20and%20Big%20Data%20Foundations&​field=title|Apache Hadoop, Spark and Big Data Foundations]] - A great introduction to the Hadoop Big Data Ecosystem. A non-programming introduction to Hadoop, Spark, HDFS, and MapReduce. (3 hours-1 day)|{{wiki:​foundations-course.png}}|
Line 24: Line 27:
   * [[https://​www.clustermonkey.net/​download/​Hands-on_Hadoop_Spark/​Linux-Command-Line-V1.0.tgz|Class Notes]] (tgz format)   * [[https://​www.clustermonkey.net/​download/​Hands-on_Hadoop_Spark/​Linux-Command-Line-V1.0.tgz|Class Notes]] (tgz format)
   * [[https://​www.clustermonkey.net/​download/​Hands-on_Hadoop_Spark/​Linux-Command-Line-V1.0.zip|Class Notes]] (zip format)   * [[https://​www.clustermonkey.net/​download/​Hands-on_Hadoop_Spark/​Linux-Command-Line-V1.0.zip|Class Notes]] (zip format)
 +
 +====Zeppelin Notebook for Scalable Data Science with Hadoop and Spark===
 +(Updated 20-Aug-2019)
 +
 +  * [[https://​www.clustermonkey.net/​download/​Hands-on_Hadoop_Spark/​Scalable-Analytics.json|Scalable-Analytics.json]]
  
 ---- ----
Line 30: Line 38:
   * [[https://​www.clustermonkey.net/​download/​Hands-on_Hadoop_Spark/​DOS-Linux-HDFS-cheatsheet.pdf|DOS to Linux/HDFS Cheat-sheet]]   * [[https://​www.clustermonkey.net/​download/​Hands-on_Hadoop_Spark/​DOS-Linux-HDFS-cheatsheet.pdf|DOS to Linux/HDFS Cheat-sheet]]
   * [[https://​www.clustermonkey.net/​download/​Hands-on_Hadoop_Spark/​ericg_vi-editor.bw.pdf|vi (visual editor) Cheat-sheet]]   * [[https://​www.clustermonkey.net/​download/​Hands-on_Hadoop_Spark/​ericg_vi-editor.bw.pdf|vi (visual editor) Cheat-sheet]]
 +  * [[https://​www.cs.colostate.edu/​helpdocs/​vi.html|Additional help with vi]]
  
 ---- ----
-====Linux Hadoop Minimal Virtual Machine==== 
  
-(Current Version 0.42, 03-June-2019) Used for "Hands-on" and "Command ​line" ​courses above. Note: This VM can also be used for the //Hadoop and Spark Fundamentals:​ LiveLessons//​ video mentioned below.+====Linux Hadoop Minimal (LHM) Virtual Machine Sandbox==== 
 + 
 +(Current Version 0.42, 03-June-2019) ​**Not ready for Scalable Data Science with Hadoop and Spark (soon)** 
 + 
 +Used for //Hands-on//, //Command ​Line//, and //Scalable Data Science// ​courses above. Note: This VM can also be used for the //Hadoop and Spark Fundamentals:​ LiveLessons//​ video mentioned below.
   * [[Linux Hadoop Minimal Installation Instructions]] (Read First) ​   * [[Linux Hadoop Minimal Installation Instructions]] (Read First) ​
   * [[https://​www.clustermonkey.net/​download/​Hands-on_Hadoop_Spark/​Linux-Hadoop-Minimal-0.42.MD5.txt|Linux Hadoop Minimal MD5]]   * [[https://​www.clustermonkey.net/​download/​Hands-on_Hadoop_Spark/​Linux-Hadoop-Minimal-0.42.MD5.txt|Linux Hadoop Minimal MD5]]
-  * [[https://​www.clustermonkey.net/​download/​Hands-on_Hadoop_Spark/​Linux-Hadoop-Minimal-0.42.ova|Linux Hadoop Minimal ​Virtual Machine OVA file]] (3.3G in size)+  * Linux Hadoop Minimal Virtual Machine OVA file [[https://​www.clustermonkey.net/​download/​Hands-on_Hadoop_Spark/​Linux-Hadoop-Minimal-0.42.ova|US]] [[http://​134.209.239.225/​download/​Linux-Hadoop-Minimal-0.42.ova|Europe]] (3.3G)
   * [[https://​www.clustermonkey.net/​download/​Hands-on_Hadoop_Spark/​old|Old Versions]]   * [[https://​www.clustermonkey.net/​download/​Hands-on_Hadoop_Spark/​old|Old Versions]]
  
 ---- ----
 +
 +====Cloudera-Hortonworks HDP Sandbox====
 +
 +The Cloudera-Hortonworks HDP Sandbox, a full featured Hadoop/​Spark virtual machine that runs under Docker, VirtualBox, or VMWare. Please see [[https://​www.cloudera.com/​downloads/​hortonworks-sandbox.html|Cloudera/​Hortonworks HDP Sandbox]] for more information. Due to the number of applications the HDP Sandbox can require substantial resources to run. 
 +
 +----
 +
 +====Zeppelin Web Notebook====
 +For those taking the //Scalable Data Science// course a 30-day web-based Zeppelin Notebook is available from [[https://​www.basement-supercomputing.com|Basement Supercomputing]]. Please use the [[Sign Up Form]] to get access to the notebook. ​
 +
 +----
 +
 ====Other Resources for all Classes==== ====Other Resources for all Classes====
   * Book: [[https://​www.clustermonkey.net/​Hadoop2-Quick-Start-Guide/​| Hadoop® 2 Quick-Start Guide: Learn the Essentials of Big Data Computing in the Apache Hadoop® 2 Ecosystem]]   * Book: [[https://​www.clustermonkey.net/​Hadoop2-Quick-Start-Guide/​| Hadoop® 2 Quick-Start Guide: Learn the Essentials of Big Data Computing in the Apache Hadoop® 2 Ecosystem]]
   * Video Tutorial: [[https://​www.safaribooksonline.com/​library/​view/​hadoop-and-spark/​9780134770871|Hadoop® and Spark Fundamentals:​ LiveLessons]]   * Video Tutorial: [[https://​www.safaribooksonline.com/​library/​view/​hadoop-and-spark/​9780134770871|Hadoop® and Spark Fundamentals:​ LiveLessons]]
   * Book: [[https://​www.clustermonkey.net/​Practical-Data-Science-with-Hadoop-and-Spark|Practical Data Science with Hadoop® and Spark: Designing and Building Effective Analytics at Scale]]   * Book: [[https://​www.clustermonkey.net/​Practical-Data-Science-with-Hadoop-and-Spark|Practical Data Science with Hadoop® and Spark: Designing and Building Effective Analytics at Scale]]
- 
  
 ---- ----
start.1560284606.txt.gz · Last modified: 2019/06/11 20:23 by deadline