User Tools

Site Tools


linux_hadoop_minimal_installation_instructions

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
linux_hadoop_minimal_installation_instructions [2019/06/11 16:10]
deadline more tweaks
linux_hadoop_minimal_installation_instructions [2020/05/21 18:46] (current)
deadline
Line 1: Line 1:
-=====Linux Hadoop Minimal Notes===== +=====Linux Hadoop Minimal VM Notes===== 
-**Version .42**\\ +**Version:** .42\\ 
-**Date: June 3, 2019**\\ +**Date:** June 3, 2019\\ 
-**Author: Douglas Eadline** \\ +**Author:** Douglas Eadline\\ 
-(Email: deadline(you know what goes here)basement-supercomputing.com)+**Email:** deadline(you know what goes here)basement-supercomputing.com
  
-Unless otherwise noted, all course content, notes, and examples are +**Unless otherwise noted, all course content, notes, and examples are 
-(c) Copyright Basement Supercomputing 2019, All rights reserved.+(c) Copyright Basement Supercomputing 2019, All rights reserved.**
  
 ====What Is This?==== ====What Is This?====
  
 The Linux Hadoop Minimal is a virtual machine (VM) that can be used to The Linux Hadoop Minimal is a virtual machine (VM) that can be used to
-try the examples presented in the two on-line course entitled:+try the examples presented in the following on-line courses entitled:
  
   * [[https://www.oreilly.com/search/?query=Practical%20Linux%20Command%20Line%20for%20Data%20Engineers%20and%20Analysts%20Eadline| Practical Linux Command Line for Data Engineers and Analysts]]    * [[https://www.oreilly.com/search/?query=Practical%20Linux%20Command%20Line%20for%20Data%20Engineers%20and%20Analysts%20Eadline| Practical Linux Command Line for Data Engineers and Analysts]] 
   * [[https://www.safaribooksonline.com/search/?query=Hands-on%20Introduction%20to%20Apache%20Hadoop%20and%20Spark%20Programming&field=title|Hands-on Introduction to Apache Hadoop and Spark Programming]]   * [[https://www.safaribooksonline.com/search/?query=Hands-on%20Introduction%20to%20Apache%20Hadoop%20and%20Spark%20Programming&field=title|Hands-on Introduction to Apache Hadoop and Spark Programming]]
  
-It can also be used for the examples provided in the companion on-line +It can also be used for the [[https://www.clustermonkey.net/download/LiveLessons/Hadoop_Fundamentals/|examples]] provided in the companion on-line 
-video tutorial (14+ hours)+video tutorial (14+ hours):
  
   * [[https://www.safaribooksonline.com/library/view/hadoop-and-spark/9780134770871|Hadoop® and Spark Fundamentals: LiveLessons]]   * [[https://www.safaribooksonline.com/library/view/hadoop-and-spark/9780134770871|Hadoop® and Spark Fundamentals: LiveLessons]]
  
-The machine has many important Hadoop and Spark packages installed and +The machine has many important Hadoop and Spark packages installed and at the same time tries to keep the resource usage as low as possible so the VM can used on most laptops. (See below for resource recommendations)
-at the same time tries to keep the resource usage as low as possible +
-so the VM can used on most laptops. (See below for resource recommendations)+
  
 To learn more about the course and my other analytics books and videos, go to: To learn more about the course and my other analytics books and videos, go to:
Line 34: Line 32:
  
 ====Student Usage==== ====Student Usage====
-If you have taken the "Hands-on" course mentioned above, you can download +If you have taken the "Hands-on" course mentioned above, you can download the ''NOTES.txt'' files, examples, and data archive directly to the VM using ''wget'' The archive is in both compressed tar (tgz) and Zip (zip) format. It is recommended that you either make a new user account or use the "hands-on" account for the archive (and run most of the examples from this account).
-the ''NOTES.txt'' files, examples, and data archive directly to the VM +
-using ''wget'' The archive is in both compressed tar (tgz) and +
-Zip (zip) format. It is recommended that you either make a new user account +
-or use the "hands-on" account for the archive (and run most of the examples from +
-this account).+
  
-For instance, to download and extract the archive for the "Hands-on" course from within the VM:+For instance, to download and extract the archive for the "Hands-on" course from //within the VM//:
  
   wget https://www.clustermonkey.net/download/Hands-on_Hadoop_Spark/Hands_On_Hadoop_Spark-V1.5.tgz   wget https://www.clustermonkey.net/download/Hands-on_Hadoop_Spark/Hands_On_Hadoop_Spark-V1.5.tgz
-  tar xvzf Hands_On_Hadoop_Spark-V1.5.tgz+  tar xvzf Hands_On_Hadoop_Spark-V1.5.1.tgz
  
 Similarly, for the "Linux Command Line" course (do this within the VM) Similarly, for the "Linux Command Line" course (do this within the VM)
Line 57: Line 50:
   scp -P2222  SOURCE-FILE USERNAME@127.0.0.1:PATH   scp -P2222  SOURCE-FILE USERNAME@127.0.0.1:PATH
  
-''USERNAME'' is a valid account on the VM. There is a user account called "hands-onthat can +''USERNAME'' is a valid account on the VM. There is a user account called ''hands-on'' that can 
-be used for most of the examples. Therefore the command to copy file (''SOURCE-FILE'') from your +be used for most of the examples. Thereforethe command to copy file (''SOURCE-FILE'') from your 
-host system to the VM is:+host system to the VM is (it places the file in ''/home/hands-on'' in the VM):
  
    scp -P2222  SOURCE-FILE hands-on@127.0.0.1:/home/hands-on    scp -P2222  SOURCE-FILE hands-on@127.0.0.1:/home/hands-on
  
-See the "Connect From Your Local Machine to the LHM Sandboxbelow for more information +See the [[#Connect From Your Local Machine to the LHM Sandbox|Connect From Your Local Machine to the LHM Sandbox]] below for more information 
-on using ssh and scp.+on using ''ssh'' and ''scp''.
  
 ====General Usage Notes==== ====General Usage Notes====
Line 99: Line 92:
 **Step 3:** Make sure hardware virtualization is enabled in your BIOS. **Step 3:** Make sure hardware virtualization is enabled in your BIOS.
  
-**Step 4:** Download  the https://www.clustermonkey.net/download/Hands-on_Hadoop_Spark/Linux-Hadoop-Minimal-0.42.ova image and load into VirtualBox. (NOTE newer version may be available.)+**Step 4:** Download the https://www.clustermonkey.net/download/Hands-on_Hadoop_Spark/Linux-Hadoop-Minimal-0.42.ova image and load it into VirtualBox. (NOTE newer version may be available.) 
  
 **Step 5:** Start the VM. All the essential Hadoop service should be started automatically. **Step 5:** Start the VM. All the essential Hadoop service should be started automatically.
Line 120: Line 114:
 You are should now be in the ''/root'' directory You are should now be in the ''/root'' directory
  
-To confirm all the Hadoop daemons have started enter ''jps'' as root +To confirm all the Hadoop daemons have started enter ''jps'' as root The results should list the 10 daemons as shown below. (process numbers will be different) 
-The results should list the 10 daemons as shown below. (process numbers +
-will be different)+
  
 <code> <code>
-jps+jps
 1938 NetworkServerControl 1938 NetworkServerControl
 2036 ZeppelinServer 2036 ZeppelinServer
Line 141: Line 133:
 ====Copying Files In and Out of the Virtual Machine==== ====Copying Files In and Out of the Virtual Machine====
  
-To copy a file from your LOCAL MACHINE into the VM, use the ''scp'' command. +To copy a file from your LOCAL MACHINE into the VM, use the ''scp'' command. For instance, to copy the file ''SOURCE-FILE'' from your local directory on your ''LOCAL MACHINE'' to the "**hands-on**" account. The password is "**minimal**" and the command places file in ''/home/hands-on'' directory in the VM.
-For instance, to copy the file ''SOURCE-FILE'' from your local directory on your +
-''LOCAL MACHINE'' to the "**hands-on**" account. The password is "**minimal**" and +
-the command places file in ''/home/hands-on'' directory in the VM.+
  
   scp -P2222  SOURCE-FILE  hands-on@127.0.0.1:/home/hands-on   scp -P2222  SOURCE-FILE  hands-on@127.0.0.1:/home/hands-on
  
-To be clear, the above command is run on your ''LOCAL MACHINE''. +To be clear, the above command is run on your ''LOCAL MACHINE''. On Macintosh and Linux systems run this from a terminal. On Windows run it from MobaXterm.
-On Macintosh and Linux systems run this from a terminal. On Windows +
-run it from MobaXterm.+
  
-To copy a file from the VM to your ''LOCAL MACHINE'' and place it +To copy a file from the VM to your ''LOCAL MACHINE'' and place it in your current directory use the following. (don't forget the ''.''):
-in your current directory use the following. (don't forget the ''.''):+
  
   scp -P2222 hands-on@127.0.0.1:/home/hands-on/SOURCE-FILE .   scp -P2222 hands-on@127.0.0.1:/home/hands-on/SOURCE-FILE .
Line 159: Line 145:
 To be clear, the above command is run on your ''LOCAL MACHINE''. To be clear, the above command is run on your ''LOCAL MACHINE''.
  
-On Windows, the data will be placed in the MobaXterm "Persistent +On Windows, the data will be placed in the MobaXterm "Persistent Home Directory." In the case of Windows 10 with user "Doug" this would be the following:
-Home Directory." In the case of Windows 10 with user "Doug" +
-this would be the following:+
  
   C:\Users\Doug\Documents\MobaXterm\home   C:\Users\Doug\Documents\MobaXterm\home
Line 167: Line 151:
 ====Adding Users==== ====Adding Users====
  
-As configured, the LHM comes with one general user account. The account is called "hands-onand the password is "**minimal**" You can run everything under this account (but remember you need to be user "hdfsto do any administrative work in HDFS. The hdfs account has no password. To become the hdfs user, log in as root and issue a ''su - hdfs'' command.+As configured, the LHM comes with one general user account. The account is called **hands-on** and the password is **minimal**. **It is highly recommended that this account be used for the class examples.** Remember you need to be user ''hdfs'' to do any administrative work in HDFS and running as user ''hdfs'' gives you full ''root'' control of the HDFS file system. The ''hdfs'' account has no active password. To become the ''hdfs'' user, log in as root and issue a ''su - hdfs'' command.
  
-Warning: Running as user ''hdfs'' gives you full ''root'' control of the HDFS file system. +To add yourself as a user with a different user name follow the following steps.
- +
-To add yourself as a user.+
  
 **Step 1.** As root do the following to create a user and add a password: **Step 1.** As root do the following to create a user and add a password:
Line 305: Line 287:
 and describe the situation. and describe the situation.
  
-When the VM is stopped (see belwo) with ''poweroff'' or restarted with ''reboot'' commands, a script called ''/sbin/halt.local'' shuts down all the daemons. +When the VM is stopped (see below) with ''poweroff'' or restarted with ''reboot'' commands, a script called ''/sbin/halt.local'' shuts down all the daemons. 
  
 ====Stopping the VM==== ====Stopping the VM====
Line 327: Line 309:
 These issues have been addressed in the current version of the VM. Please use the lasted VM and you can avoid these issues. These issues have been addressed in the current version of the VM. Please use the lasted VM and you can avoid these issues.
  
-1. Either create your own user account as described above or use the existing "hands-on" user account. The examples will not work if run as the root account.+1. If you have problems loading the OVA image into VirtualBox, check the MD5 signature of the OVA file. The MD5 signature returned by running the program below should match the signature provided [[https://www.clustermonkey.net/download/Hands-on_Hadoop_Spark/Linux-Hadoop-Minimal-0.42.MD5.txt|here]]. For each OS, use the following commands (note the name of the OVA file may be different): 
 + 
 +For **Linux** use "md5sum" 
 + 
 +  $ md5sum Linux-Hadoop-Minimal-0.42.ova 
 + 
 +For **Macintosh** use "md5" 
 + 
 +  $ md5 Linux-Hadoop-Minimal-0.42.ova 
 + 
 +For **Windows 10** (in PowerShell) use "Get-FileHash" (Also, note the use of uppercase) 
 + 
 +  C:\Users\Doug> Get-FileHash .\Linux-Hadoop-Minimal-0.42.ova -Algorithm MD5 
 + 
 +2. Either create your own user account as described above or use the existing "hands-on" user account. The examples will not work if run as the root account.
  
-2. If zip is not installed on your version of the VM, you can install it by entering the following, as root, and a "y" when asked. Zip will now be installed and available for use.+3. If zip is not installed on your version of the VM, you can install it by entering the following, as root, and a "y" when asked. Zip will now be installed and available for use.
  
    # yum install zip    # yum install zip
Line 342: Line 338:
      zip.x86_64 0:3.0-1.el6_7.1      zip.x86_64 0:3.0-1.el6_7.1
            
-3. In previous versions there is a permission issue in HDFS that prevents Hive jobs from working. To fix it, perform the following steps:+4. In previous versions there is a permission issue in HDFS that prevents Hive jobs from working. To fix it, perform the following steps:
  
-  - login to the VM as root (pw="hadoop")+a) login to the VM as root (pw="hadoop")
  
      ssh root@127.0.0.1 -p 2222      ssh root@127.0.0.1 -p 2222
  
-  - then change to hdfs user+b) then change to hdfs user
  
      su - hdfs      su - hdfs
  
-  - fix the permission error:+c) fix the permission error:
  
      hdfs dfs -chmod o+w /user/hive/warehouse      hdfs dfs -chmod o+w /user/hive/warehouse
  
-  - Check the result+d) Check the result
  
      hdfs dfs -ls /user/hive      hdfs dfs -ls /user/hive
  
-  - The output of the previous command should look like:+e) The output of the previous command should look like:
  
      Found 1 items      Found 1 items
      drwxrwxrwx   - hive hadoop          0 2019-01-24 20:43 /user/hive/warehouse      drwxrwxrwx   - hive hadoop          0 2019-01-24 20:43 /user/hive/warehouse
  
-  - Exit out of the hdfs account+f) Exit out of the hdfs account
  
      exit      exit
  
-  - exit out the root account+g) exit out the root account
  
      exit      exit
linux_hadoop_minimal_installation_instructions.1560269403.txt.gz · Last modified: 2019/06/11 16:10 by deadline