User Tools

Site Tools


start

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Next revision Both sides next revision
start [2019/06/11 19:20]
deadline
start [2019/12/04 14:10]
deadline added Python Zeppelin notebook
Line 1: Line 1:
-=====Welcome to the Scalable Analytics with Apache Hadoop and Spark=====+=====Welcome to Scalable Analytics with Apache Hadoop and Spark=====
  
 **(The four essential courses on the path to **(The four essential courses on the path to
Line 6: Line 6:
 ====Course Descriptions and Links==== ====Course Descriptions and Links====
  
-Click on the course name for availability and further information. For best results, courses should be taken in the recommended order (shown below).  Courses 1 and 2 can be taken out of order. Course 3 builds on course 1 and 2. Course 4 builds on course 3, 2, and 1.  +Click on the course name for availability and further information. For best results, courses should be taken in the recommended order (shown below).  Courses 1 and 2 can be taken out of order. Course 3 builds on courses 1 and 2. Course 4 builds-on and assumes competence with topics in courses 3, 2, and 1. 
  
-|1| [[https://www.safaribooksonline.com/search/?query=Apache%20Hadoop%2C%20Spark%20and%20Big%20Data%20Foundations&field=title|Apache Hadoop, Spark and Big Data Foundations]] - A great introduction to the Hadoop Big Data Ecosystem. A non-programming introduction to HadoopSpark, HDFS, and MapReduce. (3 hours- 1 day)|{{wiki: foundations-course-275.png}}|+**NOTE:** If the link does not lead you to the classit has not yet been scheduledCheck back at a future date. Also two new courses in the series are coming in the new year (including Kafka coverage 
 +and Data Engineering).
  
-| 2 |[[https://www.oreilly.com/search/?query=Practical%20Linux%20Command%20Line%20for%20Data%20Engineers%20and%20Analysts%20Eadline| Practical Linux Command Line for Data Engineers and Analysts]] - Quickly learn the essentials of using the Linux command line on Hadoop/Spark clusters. Move files, run applications, write scripts and navigate the Linux command line  interface used on almost all modern analytics clusters. (3 hours - 1 Day)|{{wiki: foundations-course-275.png}} 3 [[https://www.safaribooksonline.com/search/?query=Hands-on%20Introduction%20to%20Apache%20Hadoop%20and%20Spark%20Programming&field=title|Hands-on Introduction to Apache Hadoop and Spark Programming]] - A hands-on introduction to using Hadoop, Pig, Hive, Sqoop, +| 1 | [[https://www.safaribooksonline.com/search/?query=Apache%20Hadoop%2C%20Spark%20and%20Big%20Data%20Foundations&field=title|Apache Hadoop, Spark and Big Data Foundations]] - A great introduction to the Hadoop Big Data Ecosystem. A non-programming introduction to Hadoop, Spark, HDFS, and MapReduce. (3 hours-1 day)|{{wiki:foundations-course.png}}| 
-Spark and Zeppelin notebooks. Students can download and run examples on "Hadoop Minimal" virtual machine. (6 hours - 2 days). +| 2 |[[https://www.oreilly.com/search/?query=Practical%20Linux%20Command%20Line%20for%20Data%20Engineers%20and%20Analysts%20Eadline| Practical Linux Command Line for Data Engineers and Analysts]] - Quickly learn the essentials of using the Linux command line on Hadoop/Spark clusters. Move files, run applications, write scripts and navigate the Linux command line interface used on almost all modern analytics clusters. Students can download and run examples on the "Linux Hadoop Minimal" virtual machine, see below. (3 hours-1 day)|{{wiki: command-line-course.png}}
- +|[[https://www.safaribooksonline.com/search/?query=Hands-on%20Introduction%20to%20Apache%20Hadoop%20and%20Spark%20Programming&field=title|Hands-on Introduction to Apache Hadoop and Spark Programming]] - A hands-on introduction to using Hadoop, Pig, Hive, Sqoop, Spark and Zeppelin notebooks. Students can download and run examples on the "Linux Hadoop Minimal" virtual machine, see below. (6 hours-2 days)|{{wiki: hands-on-course.png}}| 
- 4 [[https://www.oreilly.com/search/?query=Scalable%20Data%20Science%20with%20Hadoop%20and%20Spark%20Eadline|Scalable Data Science with Hadoop and Spark]] - Learn How to Apply Hadoop and Spark tools to Predict Airline Delays. +4[[https://www.oreilly.com/search/?query=Scalable%20Data%20Science%20with%20Hadoop%20and%20Spark%20Eadline|Scalable Data Science with Hadoop and Spark]] - Learn How to Apply Hadoop and Spark tools to Predict Airline Delays. All programming will be done using Hadoop and Spark with the Zeppelin web notebook on a four node cluster. The notebook will be made available for download so student can reproduce the examples. (3 hours-1 day)|{{wiki: scalable-DS-course.png}}|
-All programming will be done using Hadoop and Spark with the Zeppelin web notebook on a four node cluster. The notebook will be made available for download so student can reproduce the examples. (3 hours- 1 day)+
  
 ---- ----
Line 22: Line 22:
   * [[https://www.clustermonkey.net/download/Hands-on_Hadoop_Spark/Hands_On_Hadoop_Spark-V1.5.tgz|Class Notes]] (tgz format)   * [[https://www.clustermonkey.net/download/Hands-on_Hadoop_Spark/Hands_On_Hadoop_Spark-V1.5.tgz|Class Notes]] (tgz format)
   * [[https://www.clustermonkey.net/download/Hands-on_Hadoop_Spark/Hands_On_Hadoop_Spark-V1.5.zip|Class Notes]] (zip format)   * [[https://www.clustermonkey.net/download/Hands-on_Hadoop_Spark/Hands_On_Hadoop_Spark-V1.5.zip|Class Notes]] (zip format)
 +  * [[https://www.clustermonkey.net/download/Hands-on_Hadoop_Spark/Using-Python-Zeppelin.json| Example Zeppelin notebook]]
  
 ===Class Notes for Practical Linux Command Line for Data Engineers and Analysts=== ===Class Notes for Practical Linux Command Line for Data Engineers and Analysts===
Line 27: Line 28:
   * [[https://www.clustermonkey.net/download/Hands-on_Hadoop_Spark/Linux-Command-Line-V1.0.tgz|Class Notes]] (tgz format)   * [[https://www.clustermonkey.net/download/Hands-on_Hadoop_Spark/Linux-Command-Line-V1.0.tgz|Class Notes]] (tgz format)
   * [[https://www.clustermonkey.net/download/Hands-on_Hadoop_Spark/Linux-Command-Line-V1.0.zip|Class Notes]] (zip format)   * [[https://www.clustermonkey.net/download/Hands-on_Hadoop_Spark/Linux-Command-Line-V1.0.zip|Class Notes]] (zip format)
 +
 +====Zeppelin Notebook for Scalable Data Science with Hadoop and Spark===
 +(Updated 20-Aug-2019)
 +
 +  * [[https://www.clustermonkey.net/download/Hands-on_Hadoop_Spark/Scalable-Analytics.json|Scalable-Analytics.json]]
  
 ---- ----
Line 33: Line 39:
   * [[https://www.clustermonkey.net/download/Hands-on_Hadoop_Spark/DOS-Linux-HDFS-cheatsheet.pdf|DOS to Linux/HDFS Cheat-sheet]]   * [[https://www.clustermonkey.net/download/Hands-on_Hadoop_Spark/DOS-Linux-HDFS-cheatsheet.pdf|DOS to Linux/HDFS Cheat-sheet]]
   * [[https://www.clustermonkey.net/download/Hands-on_Hadoop_Spark/ericg_vi-editor.bw.pdf|vi (visual editor) Cheat-sheet]]   * [[https://www.clustermonkey.net/download/Hands-on_Hadoop_Spark/ericg_vi-editor.bw.pdf|vi (visual editor) Cheat-sheet]]
 +  * [[https://www.cs.colostate.edu/helpdocs/vi.html|Additional help with vi]]
  
 ---- ----
-====Linux Hadoop Minimal Virtual Machine==== 
  
-(Current Version 0.42, 03-June-2019) Note: This VM can also be used for the //Hadoop and Spark Fundamentals: LiveLessons// video mentioned below.+====Linux Hadoop Minimal (LHM) Virtual Machine Sandbox==== 
 + 
 +(Current Version 0.42, 03-June-2019) **Not ready for Scalable Data Science with Hadoop and Spark (soon)** 
 + 
 +Used for //Hands-on//, //Command Line//, and //Scalable Data Science// courses above. Note: This VM can also be used for the //Hadoop and Spark Fundamentals: LiveLessons// video mentioned below.
   * [[Linux Hadoop Minimal Installation Instructions]] (Read First)    * [[Linux Hadoop Minimal Installation Instructions]] (Read First) 
   * [[https://www.clustermonkey.net/download/Hands-on_Hadoop_Spark/Linux-Hadoop-Minimal-0.42.MD5.txt|Linux Hadoop Minimal MD5]]   * [[https://www.clustermonkey.net/download/Hands-on_Hadoop_Spark/Linux-Hadoop-Minimal-0.42.MD5.txt|Linux Hadoop Minimal MD5]]
-  * [[https://www.clustermonkey.net/download/Hands-on_Hadoop_Spark/Linux-Hadoop-Minimal-0.42.ova|Linux Hadoop Minimal Virtual Machine OVA file]] (3.3G in size)+  * Linux Hadoop Minimal Virtual Machine OVA file [[https://www.clustermonkey.net/download/Hands-on_Hadoop_Spark/Linux-Hadoop-Minimal-0.42.ova|US]] [[http://134.209.239.225/download/Linux-Hadoop-Minimal-0.42.ova|Europe]] (3.3G)
   * [[https://www.clustermonkey.net/download/Hands-on_Hadoop_Spark/old|Old Versions]]   * [[https://www.clustermonkey.net/download/Hands-on_Hadoop_Spark/old|Old Versions]]
  
 ---- ----
 +
 +====Cloudera-Hortonworks HDP Sandbox====
 +
 +The Cloudera-Hortonworks HDP Sandbox, a full featured Hadoop/Spark virtual machine that runs under Docker, VirtualBox, or VMWare. Please see [[https://www.cloudera.com/downloads/hortonworks-sandbox.html|Cloudera/Hortonworks HDP Sandbox]] for more information. Due to the number of applications the HDP Sandbox can require substantial resources to run. 
 +
 +----
 +
 +====Zeppelin Web Notebook====
 +For those taking the //Scalable Data Science// course a 30-day web-based Zeppelin Notebook is available from [[https://www.basement-supercomputing.com|Basement Supercomputing]]. Please use the [[Sign Up Form]] to get access to the notebook. 
 +
 +----
 +
 ====Other Resources for all Classes==== ====Other Resources for all Classes====
   * Book: [[https://www.clustermonkey.net/Hadoop2-Quick-Start-Guide/| Hadoop® 2 Quick-Start Guide: Learn the Essentials of Big Data Computing in the Apache Hadoop® 2 Ecosystem]]   * Book: [[https://www.clustermonkey.net/Hadoop2-Quick-Start-Guide/| Hadoop® 2 Quick-Start Guide: Learn the Essentials of Big Data Computing in the Apache Hadoop® 2 Ecosystem]]
   * Video Tutorial: [[https://www.safaribooksonline.com/library/view/hadoop-and-spark/9780134770871|Hadoop® and Spark Fundamentals: LiveLessons]]   * Video Tutorial: [[https://www.safaribooksonline.com/library/view/hadoop-and-spark/9780134770871|Hadoop® and Spark Fundamentals: LiveLessons]]
   * Book: [[https://www.clustermonkey.net/Practical-Data-Science-with-Hadoop-and-Spark|Practical Data Science with Hadoop® and Spark: Designing and Building Effective Analytics at Scale]]   * Book: [[https://www.clustermonkey.net/Practical-Data-Science-with-Hadoop-and-Spark|Practical Data Science with Hadoop® and Spark: Designing and Building Effective Analytics at Scale]]
- 
  
 ---- ----
start.txt · Last modified: 2024/01/29 21:19 by deadline