User Tools

Site Tools


first_steps_for_data_engineering_class

Data Engineering Quick Start

The following steps explain how load and start the Linux Hadoop Minimal Virtual Machine (LHM-VM) and download the course notes files. A full and expanded explanation is provided as part of the class. The following steps are a “quick start.”

If you are using Linux or Mac, a terminal application is available that includes and “ssh client.”

If you are using Windows, you will need an “ssh client.” Either of these listed below will work. They are both freely available at no cost. (MobaXterm is recommended)

  1. Putty - provides terminal for ssh session.
  2. MobaXterm - provides terminal for ssh sessions and allows remote X Windows session.

See Linux Hadoop Minimal Installation Instructions for instructions on how to start the Linux Hadoop Minimal Virtual Machine (LHM-VM)

When the VM is Started:

Open a terminal (using Putty or MobaXterm on Windows) and enter the following to log in to the LHM-VM as user “hands-on” (password=“minimal”)

  ssh hands-on@127.0.0.1 -p 2222

Once you are logged in to the LHM-VM, you should see the following prompt string:

  [hands-on@localhost ~]$

The [hands-on@localhost ~] will not be shown in the rest of the class documentation. A $ will indicate the prompt string for input.

To download the Data Engineering at Scale class notes into the LHM-VM, pull down and extract the course files (from inside the LHM-VM) as shown below:

  $ wget --no-check-certificate https://www.clustermonkey.net/download/Hands-on_Hadoop_Spark/Data-Engineering-at-Scale-V1.0.tgz
  $ tar xvzf Data-Engineering-at-Scale-V1.0.tgz

If the file extracted correctly you should see:

  $ ls
  Data-Engineering-at-Scale-V1.0  Data-Engineering-at-Scale-V1.0.tgz

These steps will be performed as part of the class.

first_steps_for_data_engineering_class.txt · Last modified: 2020/06/23 20:46 by deadline