| Both sides previous revision
Previous revision
Next revision
 | 
                    Previous revision
 | 
                
                        
                
                    start [2023/04/19 12:21] deadline [Class Notes for Linux Command Line Quick Start] removed                 | 
                
                    start [2025/07/11 13:34] (current) deadline [Other Resources for all Classes]  added Kafka                 | 
            
        
|   * [[https://www.clustermonkey.net/download/Hands-on_Hadoop_Spark/Getting-Started-Kafka-V2.1.tgz|Class Notes]] (tgz format) |   * [[https://www.clustermonkey.net/download/Hands-on_Hadoop_Spark/Getting-Started-Kafka-V2.1.tgz|Class Notes]] (tgz format) | 
|   * [[https://www.clustermonkey.net/download/Hands-on_Hadoop_Spark/Getting-Started-Kafka-V2.1.zip|Class Notes]] (zip format) |   * [[https://www.clustermonkey.net/download/Hands-on_Hadoop_Spark/Getting-Started-Kafka-V2.1.zip|Class Notes]] (zip format) | 
|   |   * Additional [[https://www.clustermonkey.net/download/Eadline/Lehigh/Week-01/Install-KafkaEsque-Local-Mac-M.pdf|note]] for running Kafkaesque on Apple M based systems (Linux Virtual Machines running on UTM) | 
 |  | 
| ====Class Notes for Kafka Methods and Administration ==== | ====Class Notes for Kafka Methods and Administration ==== | 
|   * [[https://www.clustermonkey.net/download/Hands-on_Hadoop_Spark/Kafka-Methods-and-Administration-V1.2.tgz|Class Notes]] (tgz format) |   * [[https://www.clustermonkey.net/download/Hands-on_Hadoop_Spark/Kafka-Methods-and-Administration-V1.2.tgz|Class Notes]] (tgz format) | 
|   * [[https://www.clustermonkey.net/download/Hands-on_Hadoop_Spark/Kafka-Methods-and-Administration-V1.2.zip|Class Notes]] (zip format) |   * [[https://www.clustermonkey.net/download/Hands-on_Hadoop_Spark/Kafka-Methods-and-Administration-V1.2.zip|Class Notes]] (zip format) | 
|   |  | 
|   | ====Class Notes for Scalable PySpark for Data Science ==== | 
|   | (Update **07-Jan-2024**)  | 
|   |   * [[First Steps for Scalable PySpark for Data Science]] | 
|   |   * [[https://www.clustermonkey.net/download/Hands-on_Hadoop_Spark/Scalable-PySpark-v1.tgz|Class Notes]] (tgz format) | 
|   |   * [[https://www.clustermonkey.net/download/Hands-on_Hadoop_Spark/Scalable-PySpark-v1.zip|Class Notes]] (zip format) | 
|   |   * [[https://www.clustermonkey.net/download/Hands-on_Hadoop_Spark/Zeppelin-Notebooks/Scalable_PySpark_with_CSV_Files_and_Hive_Tables.json| PySpark for Data Science Zeppelin Notebook]] (Right Click, Save Link As ...) | 
 |  | 
| === Old Notes ==== | === Old Notes ==== | 
| Used for //Hands-on//, //Command Line//, and //Scalable Data Science// trainings above. Note: This VM can also be used for the //Hadoop and Spark Fundamentals: LiveLessons// video mentioned below. | Used for //Hands-on//, //Command Line//, and //Scalable Data Science// trainings above. Note: This VM can also be used for the //Hadoop and Spark Fundamentals: LiveLessons// video mentioned below. | 
 |  | 
| ===VERSION 2-beta8: (Current)===   | ===VERSION 2-8.1: (Current)===  | 
| === IMPORTANT: VirtualBox will not work on the new Apple M1 based systems ==== |  | 
 |  | 
| (Updated Aug-08-2022) |   | 
|   | (Updated Jan-25-2024) | 
| CentOS Linux 7.6, Anaconda 3:Python 3.7.4, R 3.6.0, Hadoop 3.3.0, Hive 3.1.2, Apache Spark 2.4.5, Derby 10.14.2.0, Zeppelin 0.8.2, Sqoop 1.4.7, Kafka 2.5.0, HBase 2.4.10, NiFi 1.17.0, KafkaEsque. **Used in all current trainings.** | CentOS Linux 7.6, Anaconda 3:Python 3.7.4, R 3.6.0, Hadoop 3.3.0, Hive 3.1.2, Apache Spark 2.4.5, Derby 10.14.2.0, Zeppelin 0.8.2, Sqoop 1.4.7, Kafka 2.5.0, HBase 2.4.10, NiFi 1.17.0, KafkaEsque. **Used in all current trainings.** | 
 |  | 
|   * [[Linux Hadoop Minimal Installation Instructions VERSION 2]] (Read First)   | [[Linux Hadoop Minimal Installation Instructions VERSION 2]] (Read First)   | 
|   * [[https://www.clustermonkey.net/download/Hands-on_Hadoop_Spark/Linux-Hadoop-Minimal-V2.0-beta8.MD5.txt|Linux Hadoop Minimal V2.0-beta8 MD5]]  |   | 
|   * Linux Hadoop Minimal Virtual Machine V2.0-beta8 OVA file [[https://www.clustermonkey.net/download/Hands-on_Hadoop_Spark/Linux-Hadoop-Minimal-V2.0-beta8.ova|US]] [[http://134.209.239.225/download/Linux-Hadoop-Minimal-V2.0-beta8.ova|Europe]] (11.0G) **NOTE:** Chrome may prevent //http// downloads, right click the link, choose "Save Link As" then click "Keep" next to the blue discard box at the bottom of the browser.  | ==For VirtualBox X86 PC, Mac, Linux Machines==  | 
|   * [[https://www.clustermonkey.net/download/Hands-on_Hadoop_Spark/Hadoop-Minimal-Install-Notes-V2-beta8.tgz|Hadoop Minimal Build Notes (tgz format)]] |   * [[https://www.clustermonkey.net/download/Hands-on_Hadoop_Spark/Linux-Hadoop-Minimal-V2.0-8.1.ova.MD5.txt|Linux Hadoop Minimal V2.0-8.1MD5]]  | 
|   |   * Linux Hadoop Minimal Virtual Machine V2.0-8.1 OVA file [[http://161.35.229.207/download/Linux-Hadoop-Minimal-V2.0-8.1.ova|US]] [[http://134.209.239.225/download/Linux-Hadoop-Minimal-V2.0-8.1.ova|Europe]] (13.0G) **NOTE:** Chrome may prevent //http// downloads, right click the link, choose "Save Link As" then click "Keep" next to the blue discard box at the bottom of the browser.  | 
|   |   * [[https://www.clustermonkey.net/download/Hands-on_Hadoop_Spark/Hadoop-Minimal-Install-Notes-V2.0-8.1.tgz|Hadoop Minimal Build Notes x86 Virtual Box]] (tgz format)  | 
|   |   | 
|   | ==For UTM Apple Mac M Machines==  | 
|   |   * [[https://www.clustermonkey.net/download/Hands-on_Hadoop_Spark/Linux-Hadoop-Minimal-V2.0-M8.2.utm.zip.MD5.txt|Linux Hadoop Minimal V2.0-M8.2.zip MD5]]  | 
|   |   * Linux Hadoop Minimal Virtual Machine V2.0-8.2 UTM file [[http://161.35.229.207/download/Linux-Hadoop-Minimal-V2.0-M8.2.utm.zip|US]] [[http://134.209.239.225/download/Linux-Hadoop-Minimal-V2.0-M8.2.utm.zip|Europe]] (8.0G) **NOTE:** Chrome may prevent //http// downloads, right click the link, choose "Save Link As" then click "Keep" next to the blue discard box at the bottom of the browser.  | 
|   |   * [[https://www.clustermonkey.net/download/Hands-on_Hadoop_Spark/Hadoop-Minimal-Install-Notes-V2.0-M8.2.tgz|Hadoop Minimal Build Notes Mac UTM]] (tgz format)  | 
 |  | 
| ---- | ---- | 
|   * Book: [[https://www.clustermonkey.net/Practical-Data-Science-with-Hadoop-and-Spark|Practical Data Science with Hadoop® and Spark: Designing and Building Effective Analytics at Scale]]   |   * Book: [[https://www.clustermonkey.net/Practical-Data-Science-with-Hadoop-and-Spark|Practical Data Science with Hadoop® and Spark: Designing and Building Effective Analytics at Scale]]   | 
|   *  Video Tutorial: [[https://www.oreilly.com/videos/data-engineering-foundations/9780137440580|Data Engineering Foundations Part 1: LiveLessons: Using Spark, Hive, and Hadoop® Tools]]   |   *  Video Tutorial: [[https://www.oreilly.com/videos/data-engineering-foundations/9780137440580|Data Engineering Foundations Part 1: LiveLessons: Using Spark, Hive, and Hadoop® Tools]]   | 
|   *  Video Tutorial (**NEW**): [[https://www.informit.com/store/data-engineering-foundations-part-2-building-data-pipelines-9780138086992|Data Engineering Foundations Part 2: Building Data Pipelines with Kafka and Nifi ]] |   *  Video Tutorial: [[https://www.informit.com/store/data-engineering-foundations-part-2-building-data-pipelines-9780138086992|Data Engineering Foundations Part 2: Building Data Pipelines with Kafka and Nifi ]]  | 
|   |   * Video Tutorial (**NEW**): [[https://www.oreilly.com/library/view/kafka-essentials-livelessons/9780138176761/|Kafka Essentials LiveLessons: A Quick-Start for Building Effective Data Pipelines ]]  | 
 |  | 
| ---- | ---- | 
| ---- | ---- | 
 |  | 
| **Unless otherwise noted, all training content, notes, and examples (c) Douglas Eadline 2019, 2020, 2022 All rights reserved.** | **Unless otherwise noted, all training content, notes, and examples (c) Douglas Eadline 2019-2024 All rights reserved.** | 
 |  | 
 |  |