Differences

This shows you the differences between two versions of the page.

--- start [2024/01/07 23:17]
deadline [Class Notes for Scalable PySpark for Data Science] added notebook
+++ start [2026/02/09 21:32] (current)
deadline [About the Presenter]
@@ Line 29: / Line 29: @@
 Contact: ''deadline''(you know what goes here)''eadline''(and here)''org''\\
-Mast: @thedeadline@mast.hpc.social \\
-Twitter: @thedeadline
+  * Mast: @thedeadline@mast.hpc.social \\
+  * Twitter: @thedeadline
+  * BlueSky:@thedeadline.bsky.social
 ----
@@ Line 73: / Line 75: @@
   * [[https://www.clustermonkey.net/download/Hands-on_Hadoop_Spark/Getting-Started-Kafka-V2.1.tgz|Class Notes]] (tgz format)
   * [[https://www.clustermonkey.net/download/Hands-on_Hadoop_Spark/Getting-Started-Kafka-V2.1.zip|Class Notes]] (zip format)
+  * Additional [[https://www.clustermonkey.net/download/Eadline/Lehigh/Week-01/Install-KafkaEsque-Local-Mac-M.pdf|note]] for running Kafkaesque on Apple M based systems (Linux Virtual Machines running on UTM)
 ====Class Notes for Kafka Methods and Administration ====
@@ Line 85: / Line 88: @@
   * [[https://www.clustermonkey.net/download/Hands-on_Hadoop_Spark/Scalable-PySpark-v1.tgz|Class Notes]] (tgz format)
   * [[https://www.clustermonkey.net/download/Hands-on_Hadoop_Spark/Scalable-PySpark-v1.zip|Class Notes]] (zip format)
-  * [[https://www.clustermonkey.net/download/Hands-on_Hadoop_Spark/Scalable_PySpark_with_CSV_Files_and_Hive_Tables.json| PySpark for Data Science Zeppelin Notebook]] (Right Click, Save Link As ...)
+  * [[https://www.clustermonkey.net/download/Hands-on_Hadoop_Spark/Zeppelin-Notebooks/Scalable_PySpark_with_CSV_Files_and_Hive_Tables.json| PySpark for Data Science Zeppelin Notebook]] (Right Click, Save Link As ...)
 === Old Notes ====
@@ Line 113: / Line 116: @@
 Used for //Hands-on//, //Command Line//, and //Scalable Data Science// trainings above. Note: This VM can also be used for the //Hadoop and Spark Fundamentals: LiveLessons// video mentioned below.
-===VERSION 2-beta8: (Current)===
+===VERSION 3.0-beta-2: (Current)===
-=== IMPORTANT: VirtualBox will not work on the new Apple M1 based systems ====
+(Updated Jan-25-2024)
-(Updated Aug-08-2022)
+[[Linux Hadoop Minimal Installation Instructions VERSION 3]]
-CentOS Linux 7.6, Anaconda 3:Python 3.7.4, R 3.6.0, Hadoop 3.3.0, Hive 3.1.2, Apache Spark 2.4.5, Derby 10.14.2.0, Zeppelin 0.8.2, Sqoop 1.4.7, Kafka 2.5.0, HBase 2.4.10, NiFi 1.17.0, KafkaEsque. **Used in all current trainings.**
+Contents: Rocky Linux 9.7: Python 3.9.25, R 4.5.2, Hadoop 3.3.6, Hive 4.0.1, Apache Spark 3.5.6, Derby 10.14.2.0, Zeppelin 0.11.2, Sqoop 1.4.7, Kafka 3..4.1, HBase 2.6.2, NiFi 1.17.0, KafkaEsque. **Used in all classes, trainings, and workshops after January 1, 2026.**
+===VERSION 2-8.1: (Previous, no longer supported)===
+(Updated Jan-25-2024)
+[[Linux Hadoop Minimal Installation Instructions VERSION 2]]
+Contents: CentOS Linux 7.6, Anaconda 3:Python 3.7.4, R 3.6.0, Hadoop 3.3.0, Hive 3.1.2, Apache Spark 2.4.5, Derby 10.14.2.0, Zeppelin 0.8.2, Sqoop 1.4.7, Kafka 2.5.0, HBase 2.4.10, NiFi 1.17.0, KafkaEsque. **Used in all classes, trainings, and workshops prior to January 1, 2026).**
-  * [[Linux Hadoop Minimal Installation Instructions VERSION 2]] (Read First)
-  * [[https://www.clustermonkey.net/download/Hands-on_Hadoop_Spark/Linux-Hadoop-Minimal-V2.0-beta8.MD5.txt|Linux Hadoop Minimal V2.0-beta8 MD5]]
-  * Linux Hadoop Minimal Virtual Machine V2.0-beta8 OVA file [[http://161.35.229.207/download/Linux-Hadoop-Minimal-V2.0-beta8.ova|US]] [[http://134.209.239.225/download/Linux-Hadoop-Minimal-V2.0-beta8.ova|Europe]] (11.0G) **NOTE:** Chrome may prevent //http// downloads, right click the link, choose "Save Link As" then click "Keep" next to the blue discard box at the bottom of the browser.
-  * [[https://www.clustermonkey.net/download/Hands-on_Hadoop_Spark/Hadoop-Minimal-Install-Notes-V2-beta8.tgz|Hadoop Minimal Build Notes (tgz format)]]
-----
-====Cloudera-Hortonworks HDP Sandbox====
-The Cloudera-Hortonworks HDP Sandbox, a full featured Hadoop/Spark virtual machine that runs under Docker, VirtualBox, or VMWare. Please see [[https://www.cloudera.com/downloads/hortonworks-sandbox.html|Cloudera/Hortonworks HDP Sandbox]] for more information. Due to the number of applications the HDP Sandbox can require substantial resources to run.
-----
-/*
-====Zeppelin Web Notebook====
-For those taking the //Scalable Data Science// training a 30-day web-based Zeppelin Notebook is available from [[https://www.basement-supercomputing.com|Basement Supercomputing]]. Please use the [[Sign Up Form]] to get access to the notebook.
 ----
-*/
 ====Other Resources for all Classes====
   * Book: [[https://www.clustermonkey.net/Hadoop2-Quick-Start-Guide/| Hadoop® 2 Quick-Start Guide: Learn the Essentials of Big Data Computing in the Apache Hadoop® 2 Ecosystem]]
   * Book: [[https://www.clustermonkey.net/Practical-Data-Science-with-Hadoop-and-Spark|Practical Data Science with Hadoop® and Spark: Designing and Building Effective Analytics at Scale]]
   *  Video Tutorial: [[https://www.oreilly.com/videos/data-engineering-foundations/9780137440580|Data Engineering Foundations Part 1: LiveLessons: Using Spark, Hive, and Hadoop® Tools]]
-  *  Video Tutorial (**NEW**): [[https://www.informit.com/store/data-engineering-foundations-part-2-building-data-pipelines-9780138086992|Data Engineering Foundations Part 2: Building Data Pipelines with Kafka and Nifi ]]
+  *  Video Tutorial: [[https://www.informit.com/store/data-engineering-foundations-part-2-building-data-pipelines-9780138086992|Data Engineering Foundations Part 2: Building Data Pipelines with Kafka and Nifi ]]
+  * Video Tutorial (**NEW**): [[https://www.oreilly.com/library/view/kafka-essentials-livelessons/9780138176761/|Kafka Essentials LiveLessons: A Quick-Start for Building Effective Data Pipelines ]]
 ----
@@ Line 151: / Line 154: @@
 ----
-**Unless otherwise noted, all training content, notes, and examples (c) Douglas Eadline 2019-2023 All rights reserved.**
+**Unless otherwise noted, all training content, notes, and examples (c) Douglas Eadline 2019-2024 All rights reserved.**

Live On-Line Training: Scalable Data Pipelines with Hadoop, Spark, and Kafka

User Tools

Site Tools

Differences

Page Tools

Live On-Line Training:
Scalable Data Pipelines with Hadoop, Spark, and Kafka