User Tools

Site Tools


first_steps_for_scalable_pyspark_for_data_science

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

first_steps_for_scalable_pyspark_for_data_science [2024/01/07 23:08] (current)
deadline created
Line 1: Line 1:
 +====== Scalable PySpark for Data Science ======
  
 +The following steps explain how load and start the Linux Hadoop Minimal Virtual Machine (LHM-VM) and download the course notes files. A full and expanded explanation is provided as part of the class. The following steps are a "quick start." 
 +
 +If you are using Linux or Mac, a terminal application is available that includes an "ssh client."
 +
 +If you are using Windows, you will need an "ssh client." Either of these listed
 +below will work. They are both freely available at no cost. (MobaXterm is recommended)
 +
 +  - [[http://www.putty.org| Putty]] - provides terminal for ssh session.
 +  - [[http://mobaxterm.mobatek.net|MobaXterm]] - provides terminal for ssh sessions and allows remote X Windows session.
 +
 +See [[Linux Hadoop Minimal Installation Instructions]] for instructions on how to start the Linux Hadoop Minimal Virtual Machine (LHM-VM)
 +
 +==== When the VM is Started ====
 +
 +Open a terminal (using ''Putty'' or ''MobaXterm'' on Windows) and enter the following to log in to the LHM-VM as user "hands-on"  (password="minimal"). Note: use MobaXterm if you want to use the Kafkaesque
 +graphical tool.
 +<code>
 +  ssh hands-on@127.0.0.1 -p 2222
 +</code>
 +
 +Once you are logged in to the LHM-VM, you should see the following prompt string:
 +<code>
 +  [hands-on@localhost ~]$
 +</code>
 +
 +The ''[hands-on@localhost ~]'' will not be shown in the rest of the class documentation. A ''$'' will indicate the prompt string for input.
 +
 +To download the **Kafka Methods and Administration** class notes into the LHM-VM, pull down and extract the course files (from inside the LHM-VM) as shown below:
 +<code>
 +  $ wget --no-check-certificate https://www.clustermonkey.net/download/Hands-on_Hadoop_Spark/Scalable-PySpark-v1.tgz
 +  $ tar xvzf Scalable-PySpark-v1.tgz </code>
 +
 +These steps will be performed as part of the class.
first_steps_for_scalable_pyspark_for_data_science.txt ยท Last modified: 2024/01/07 23:08 by deadline