User Tools

Site Tools


linux_hadoop_minimal_installation_instructions

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
linux_hadoop_minimal_installation_instructions [2019/06/11 14:51]
deadline added remainder of content
linux_hadoop_minimal_installation_instructions [2020/05/21 18:46] (current)
deadline
Line 1: Line 1:
-=====Linux Hadoop Minimal Notes===== +=====Linux Hadoop Minimal VM Notes===== 
-**Version .42** +**Version:** .42\\ 
-**Date: June 3, 2019** +**Date:** June 3, 2019\\ 
-Author: Douglas Eadline  +**Author:** Douglas Eadline\\ 
-(Email: deadline(you know what goes here)basement-supercomputing.com)+**Email:** deadline(you know what goes here)basement-supercomputing.com
  
-Unless otherwise noted, all course content, notes, and examples are +**Unless otherwise noted, all course content, notes, and examples are 
-(c) Copyright Basement Supercomputing 2019, All rights reserved.+(c) Copyright Basement Supercomputing 2019, All rights reserved.**
  
 ====What Is This?==== ====What Is This?====
  
 The Linux Hadoop Minimal is a virtual machine (VM) that can be used to The Linux Hadoop Minimal is a virtual machine (VM) that can be used to
-try the examples presented in the two on-line course entitled:+try the examples presented in the following on-line courses entitled:
  
-  "Hands-on Introduction to Apache Hadoop and Spark Programming"+  * [[https://www.oreilly.com/search/?query=Practical%20Linux%20Command%20Line%20for%20Data%20Engineers%20and%20Analysts%20Eadline| Practical Linux Command Line for Data Engineers and Analysts]]  
 +  * [[https://www.safaribooksonline.com/search/?query=Hands-on%20Introduction%20to%20Apache%20Hadoop%20and%20Spark%20Programming&field=title|Hands-on Introduction to Apache Hadoop and Spark Programming]]
  
-  "Practical Linux Command Line for Data Engineers and Analysts"+It can also be used for the [[https://www.clustermonkey.net/download/LiveLessons/Hadoop_Fundamentals/|examples]] provided in the companion on-line 
 +video tutorial (14+ hours):
  
-It can also be used for the examples provided in the companion on-line +  * [[https://www.safaribooksonline.com/library/view/hadoop-and-spark/9780134770871|Hadoop® and Spark Fundamentals: LiveLessons]]
-video tutorial (14+ hours)+
  
-   "Hadoop and Spark Fundamentals: LiveLessons" +The machine has many important Hadoop and Spark packages installed and at the same time tries to keep the resource usage as low as possible so the VM can used on most laptops. (See below for resource recommendations)
- +
-The machine has many important Hadoop and Spark packages installed and +
-at the same time tries to keep the resource usage as low as possible +
-so the VM can used on most laptops. (See below for resource recommendations)+
  
 To learn more about the course and my other analytics books and videos, go to: To learn more about the course and my other analytics books and videos, go to:
  
-  https://www.safaribooksonline.com/search/?query=eadline+  * [[https://www.safaribooksonline.com/search/?query=eadline|Safari Books Online]]
  
 PLEASE NOTE: This version of Linux Hadoop Minimal (LHM) is still considered PLEASE NOTE: This version of Linux Hadoop Minimal (LHM) is still considered
 "beta."  If you use it and find problems, please send any issues to "beta."  If you use it and find problems, please send any issues to
-deadline@eadline.org with "LHM" in the subject line.+deadline(you know what goes here)basement-supercomputing.com with "LHM" in the subject line.
  
-====Student Usage:==== +====Student Usage==== 
-If you have taken the "Hands-on" course mentioned above, you can download +If you have taken the "Hands-on" course mentioned above, you can download the ''NOTES.txt'' files, examples, and data archive directly to the VM using ''wget'' The archive is in both compressed tar (tgz) and Zip (zip) format. It is recommended that you either make a new user account or use the "hands-on" account for the archive (and run most of the examples from this account).
-the NOTES.txt files, examples, and data archive directly to the VM +
-using "wgetThe archive is in both compressed tar (tgz) and +
-Zip (zip) format. It is recommended that you either make a new user account +
-or use the "hands-on" account for the archive (and run most of the examples from +
-this account).+
  
-For instance, to download and extract the archive for the "Hands-on" course from within the VM:+For instance, to download and extract the archive for the "Hands-on" course from //within the VM//:
  
   wget https://www.clustermonkey.net/download/Hands-on_Hadoop_Spark/Hands_On_Hadoop_Spark-V1.5.tgz   wget https://www.clustermonkey.net/download/Hands-on_Hadoop_Spark/Hands_On_Hadoop_Spark-V1.5.tgz
-  tar xvzf Hands_On_Hadoop_Spark-V1.5.tgz+  tar xvzf Hands_On_Hadoop_Spark-V1.5.1.tgz
  
 Similarly, for the "Linux Command Line" course (do this within the VM) Similarly, for the "Linux Command Line" course (do this within the VM)
Line 52: Line 44:
   tar xvzf Linux-Command-Line-V1.0.tgz   tar xvzf Linux-Command-Line-V1.0.tgz
  
-If you want to move files from your local machine to the VM, then you can use "scp" +If you want to move files from your local machine to the VM, then you can use ''scp'' 
-on your host. (scp natively available on Linux and Macintosh systems, it is part of the+on your host. (''scp'' natively available on Linux and Macintosh systems, it is part of the
 MobaXterm package on Windows) MobaXterm package on Windows)
  
-   scp -P2222  SOURCE-FILE USERNAME@127.0.0.1:PATH+  scp -P2222  SOURCE-FILE USERNAME@127.0.0.1:PATH
  
-USERNAME is a valid account on the VM. There is a user account called "hands-onthat can +''USERNAME'' is a valid account on the VM. There is a user account called ''hands-on'' that can 
-be used for most of the examples. Therefore the command to copy file (SOURCE-FILE) from your +be used for most of the examples. Thereforethe command to copy file (''SOURCE-FILE'') from your 
-host system to the VM is:+host system to the VM is (it places the file in ''/home/hands-on'' in the VM):
  
    scp -P2222  SOURCE-FILE hands-on@127.0.0.1:/home/hands-on    scp -P2222  SOURCE-FILE hands-on@127.0.0.1:/home/hands-on
  
-See the "Connect From Your Local Machine to the LHM Sandboxbelow for more information +See the [[#Connect From Your Local Machine to the LHM Sandbox|Connect From Your Local Machine to the LHM Sandbox]] below for more information 
-on using ssh and scp.+on using ''ssh'' and ''scp''.
  
-====USAGE NOTES:==== +====General Usage Notes====
-1. The Linux Hadoop Minimal includes the following Apache software+
  
-   CentOS Linux 6.9 minimal +1. The Linux Hadoop Minimal includes the following Apache software. Note: Spark 1.6.3 is installed because later versions need Python 2.7+ (not available in CentOS)\\ 
-   Apache Hadoop 2.8.1 +<code> 
-   Apache Pig 0.17.0 +CentOS Linux 6.9 minimal 
-   Apache Hive 2.3.2 +Apache Hadoop 2.8.1 
-   Apache Spark 1.6.3 +Apache Pig 0.17.0 
-   Apache Derby 10.13.1.1 +Apache Hive 2.3.2 
-   Apache Zeppelin 0.7.3 +Apache Spark 1.6.3 
-   Apache Sqoop-1.4.7 +Apache Derby 10.13.1.1 
-   Apache Flume-1.8.0+Apache Zeppelin 0.7.3 
 +Apache Sqoop-1.4.7 
 +Apache Flume-1.8.0 
 +</code>
  
-   Spark 1.6.3 is installed because later versions need Python 2.7+ (not available in CentOS)+2The Linux Hadoop Minimal has been tested with VirtualBox on Linux, MacOS 10.12, and Windows 10 Home additionIt has not been tested with VMware.
  
-2. The Linux Hadoop Minimal has been tested with VirtualBox on Linux, MacOS 10.12, and Windows 10 +3. The Linux Hadoop Minimal Virtual Machine is designed to work on minimal hardwareIt is recommended at a MINIMUM your system have 2 cores, 4 GB memory, and 70G of disk spaceThe VM is set to use 2.5G of memory. This will cause some applications to swap to disk, but it should allow the virtual machine to run on a 4GB laptop/desktop. (If you are thinking of using the Hortonworks sandbox then 4+ cores and 16+ GB of memory is recommended)
-   Home additionIt has not been tested with VMware.+
  
-3. The Linux Hadoop Minimal Virtual Machine is designed to work on minimal hardware. +4. The above packages have not been fully tested although all of the examples from the course should work.
-   It is recommended at a MINIMUM your system have 2 cores, 4 GB memory, and 70G of disk space. +
-   The VM is set to use 2.5G of memory. This will cause some applications to swap to disk, +
-   but it should allow the virtual machine to run on a 4GB laptop/desktop.+
  
-   (If you are thinking of using the Hortonworks sandbox then 4+ cores and 16+ GB of memory is +====Installation Steps====
-   recommended)+
  
-4The above packages have not been fully tested although all of the examples from the course work.+**Step 1:** Download and install VirtualBox for your environmentVirtualBox is freely available. Note: Some windows environments may need the Extension Pack. See the [[https://www.virtualbox.org|Virtual Box Web Page]].
  
-===Installation Steps:===+**Step 2:** Follow the installation instructions for your Operating System environment. For Red Hat based systems this page, https://tecadmin.net/install-oracle-virtualbox-on-centos-redhat-and-fedora, is helpful. With Linux there is some dependencies on kernel versions and modules that need to be addressed.\\ 
 +If you are using Windows, you will need an "ssh client." Either of these will work. They are both freely available at no cost. (MobaXterm is recommended)\\
  
 +    * [[http://www.putty.org|Putty]] (provides terminal for ssh session)\\
 +    * [[http://mobaxterm.mobatek.net|MobaXterm]] (provides terminal for ssh sessions and allows remote X Windows session)
  
-1. Download and install VirtualBox for your environment. VirtualBox is freely available. +**Step 3:** Make sure hardware virtualization is enabled in your BIOS.
-   NoteSome windows environments may need the Extension Pack.+
  
-    https://www.virtualbox.org +**Step 4:** Download the https://www.clustermonkey.net/download/Hands-on_Hadoop_Spark/Linux-Hadoop-Minimal-0.42.ova image and load it into VirtualBox(NOTE newer version may be available.)
-2. Follow the installation instructions for your Operating System environment. For Red Hat based systems this +
-   page, https://tecadmin.net/install-oracle-virtualbox-on-centos-redhat-and-fedora, is helpfulWith Linux +
-   there is some dependencies on kernel versions and modules that need to be addressed.+
  
-   If you are using Windows, you will need an "ssh client." Either of these will work. 
-   They are both freely available at no cost. (MobaXterm is recommended) 
  
-     1. Putty: http://www.putty.org (provides terminal for ssh session) +**Step 5:** Start the VMAll the essential Hadoop service should be started automatically.
-     2. MobaXterm: http://mobaxterm.mobatek.net (provides terminal for ssh sessions and allows remote X Windows session)+
  
-3. Make sure hardware virtualization is enabled in your BIOS. 
  
-4. Download  the https://www.clustermonkey.net/download/Hands-on_Hadoop_Spark/Linux-Hadoop-Minimal-0.41.ova +====Connect From Your Local Machine to the LHM Sandbox====
-   image and load into VirtualBox. (NOTE newer version may be available.) +
- +
-5. Start the VM. All the essential Hadoop service should be started automatically. +
- +
- +
-----Connect From Your Local Machine to the LHM Sandbox:----+
  
 It is possible to login and use the sandbox from the VirtualBox terminal, however, you will have much It is possible to login and use the sandbox from the VirtualBox terminal, however, you will have much
 more flexibility with local terminals. Follow the instructions below for local terminal access. more flexibility with local terminals. Follow the instructions below for local terminal access.
  
-As a test, open a text terminal and connect to the sandbox as the root user with ssh. Macintosh and +As a test, open a text terminal and connect to the sandbox as the root user with ''ssh''. Macintosh and 
-Linux machines have ssh and terminal installed, for windows see above (Putty or MobaXterm) or this document:+Linux machines have ''ssh'' and terminal installed, for windows see above (Putty or MobaXterm) or this document:
  
-  https://www.clustermonkey.net/download/Hands-on_Hadoop_Spark/DOS-Linux-HDFS-cheatsheet.pdf+  * [[https://www.clustermonkey.net/download/Hands-on_Hadoop_Spark/DOS-Linux-HDFS-cheatsheet.pdf|DOS Linux HDFS Cheatsheet]] 
 +   
 +The root password is: **hadoop**
  
-The root password is: hadoop+  ssh root@127.0.0.1 -p 2222
  
- ssh root@127.0.0.1 -p 2222+You are should now be in the ''/root'' directory
  
-You are should now be in the /root directory+To confirm all the Hadoop daemons have started enter ''jps'' as root The results should list the 10 daemons as shown below. (process numbers will be different) 
  
-To confirm all the Hadoop daemons have started enter "jps" as root +<code> 
-The results should list the 10 daemons as shown below. (process numbers +# jps
-will be different)# jps+
 1938 NetworkServerControl 1938 NetworkServerControl
 2036 ZeppelinServer 2036 ZeppelinServer
Line 150: Line 129:
 1841 NodeManager 1841 NodeManager
 2445 Jps 2445 Jps
 +</code>
  
-----Copying Files Into and Out of the Virtual Machine (VM)----+====Copying Files In and Out of the Virtual Machine====
  
-To copy a file from your LOCAL MACHINE into the VM, use the "scpcommand. +To copy a file from your LOCAL MACHINE into the VM, use the ''scp'' command. For instance, to copy the file ''SOURCE-FILE'' from your local directory on your ''LOCAL MACHINE'' to the "**hands-on**" account. The password is "**minimal**" and the command places file in ''/home/hands-on'' directory in the VM.
-For instance, to copy the file "SOURCE-FILEfrom your local directory on your +
-LOCAL MACHINE to the "hands-on" account. The password is "minimal" and +
-the command places file in /home/hands-on directory in the VM.+
  
   scp -P2222  SOURCE-FILE  hands-on@127.0.0.1:/home/hands-on   scp -P2222  SOURCE-FILE  hands-on@127.0.0.1:/home/hands-on
  
-To be clear, the above command is run on your LOCAL MACHINE. +To be clear, the above command is run on your ''LOCAL MACHINE''. On Macintosh and Linux systems run this from a terminal. On Windows run it from MobaXterm.
-On Macintosh and Linux systems run this from a terminal. On Windows +
-run it from MobaXterm.+
  
-To copy a file from the VM to your LOCAL MACHINE and place it +To copy a file from the VM to your ''LOCAL MACHINE'' and place it in your current directory use the following. (don't forget the ''.''):
-in your current directory use the following. (don't forget the "."):+
  
   scp -P2222 hands-on@127.0.0.1:/home/hands-on/SOURCE-FILE .   scp -P2222 hands-on@127.0.0.1:/home/hands-on/SOURCE-FILE .
  
-To be clear, the above command is run on your LOCAL MACHINE.+To be clear, the above command is run on your ''LOCAL MACHINE''. 
 + 
 +On Windows, the data will be placed in the MobaXterm "Persistent Home Directory." In the case of Windows 10 with user "Doug" this would be the following:
  
-On Windows, the data will be placed in the MobaXterm "Persistent 
-Home Directory." In the case of Windows 10 with user "Doug" 
-this would be the following: 
   C:\Users\Doug\Documents\MobaXterm\home   C:\Users\Doug\Documents\MobaXterm\home
  
-====Adding Users:==== +====Adding Users====
-As configured, the LHM comes with one general user account. The account is called "hands-on" and the password +
-is "minimal" You can run everything under this account (but remember you need to be user "hdfs" to +
-do any administrative work in HDFS. The hdfs account has no password. To become the hdfs user, +
-log in as root and issue a "su - hdfs" command.+
  
-Warning: Running as user "hdfsgives you full "rootcontrol of the HDFS file system.+As configured, the LHM comes with one general user account. The account is called **hands-on** and the password is **minimal**. **It is highly recommended that this account be used for the class examples.** Remember you need to be user ''hdfs'' to do any administrative work in HDFS and running as user ''hdfs'' gives you full ''root'' control of the HDFS file system. The ''hdfs'' account has no active password. To become the ''hdfs'' user, log in as root and issue a ''su - hdfs'' command.
  
-To add yourself as a user.+To add yourself as a user with a different user name follow the following steps.
  
-   a) As root do the following to create a user and add a password:+**Step 1.** As root do the following to create a user and add a password:
  
-      useradd -G hadoop USERNAME +<code> 
-      passwd USERNAME+useradd -G hadoop USERNAME 
 +passwd USERNAME 
 +</code>
  
-   b) These steps change to user hdfs and create the user directory in HDFS (as root)+**Step 2.** These steps change to user hdfs and create the user directory in HDFS (as root)
  
-      su - hdfs +<code> 
-      hdfs dfs -mkdir /user/USERNAME +su - hdfs 
-      hdfs dfs -chown USERNAME:hadoop /user/USERNAME +hdfs dfs -mkdir /user/USERNAME 
-      exit+hdfs dfs -chown USERNAME:hadoop /user/USERNAME 
 +exit 
 +</code>
  
-    c) Logout and login to the new account+**Step 3.** Logout and login to the new account
  
-====Web Access:====+====Web Access====
  
 The various web interfaces shown in class are available using the following URLs. Enter the desired The various web interfaces shown in class are available using the following URLs. Enter the desired
 URL in you local browser and the VM should respond. URL in you local browser and the VM should respond.
- +<code> 
-  HDFS web interface:       http://127.0.0.1:50070 +HDFS web interface:       http://127.0.0.1:50070 
-  YARN Jobs web Interface:  http://127.0.0.1:8088 +YARN Jobs web Interface:  http://127.0.0.1:8088 
-  Zeppelin Web Notebook:    http://127.0.0.1:9995+Zeppelin Web Notebook:    http://127.0.0.1:9995 
 +</code>
  
 The Zeppelin interface is not configured (i.e. it is run in anonymous mode without the need to log-in). The Zeppelin interface is not configured (i.e. it is run in anonymous mode without the need to log-in).
-The "Zeppelin Tutorial/Basic Features" notebook used in class works as does some of the SparkR notebooks+The "Zeppelin Tutorial/Basic Features" notebook used in class works as does some of the ''SparkR''notebooks.
-The "PySpark Example" that was demonstrated in class also works. Also, the "md" and "sh" interpreters have been tested +
-and work.+
  
-==== Data into Zeppelin:====+The ''PySpark Example'' that was demonstrated in class also works. Also, the ''md'' and ''sh'' interpreters have been tested and work.
  
-If you want to load you own data into a Zeppelin notebook, place the data in the zeppelin account under /home/zeppelin. +==== Getting Data into Zeppelin==== 
-Login as root to place data in this account then change the ownership to zeppelin user for example:+ 
 +If you want to load you own data into a Zeppelin notebook, place the data in the zeppelin account under ''/home/zeppelin''. Login as root to place data in this account then change the ownership to zeppelin user for example:
  
   # cp DATA /home/zeppelin   # cp DATA /home/zeppelin
   # chown zeppelin:hadoop /home/zeppelin/DATA   # chown zeppelin:hadoop /home/zeppelin/DATA
  
-This location is the default path for the Zeppelin interpreter (run "pwdin the %sh interpreter).+This location is the default path for the Zeppelin interpreter (run ''pwd'' in the ''%sh'' interpreter).
  
-==== Database for Sqoop Example:====+==== Database for Sqoop Example====
  
 MySQL has been installed in the VM. The World database used in the Sqoop example from the class MySQL has been installed in the VM. The World database used in the Sqoop example from the class
-has been preloaded into MySQL.+has been preloaded into MySQL. SQL login and password for the Sqoop database is **sqoop** and **sqoop**
  
-====Log Files:====+====Log Files====
  
 There is currently no logfile management and log directly may fill up and use the sandbox storage. There is currently no logfile management and log directly may fill up and use the sandbox storage.
-There is a clean-logs.sh script in /root/Hadoop-Minimal-Install-Notes/Hadoop-Pig-Hive/scripts/+There is a ''clean-logs.sh'' script in ''/root/Hadoop-Minimal-Install-Notes/Hadoop-Pig-Hive/scripts''
 This script will remove most of the Hadoop/Spark and system logs (somewhat aggressive) This script will remove most of the Hadoop/Spark and system logs (somewhat aggressive)
  
-=====Stopping and Starting the Hadoop Daemons:=====+=====Stopping and Starting the Hadoop Daemons===== 
 + 
 +The Hadoop Daemons are started in the ''/etc/rc.local'' file (the last script file that 
 +is run when the system boots) The actual scripts are in ''/usr/sbin'' and are very 
 +simple with no checking. If you are knowledgeable, you can check ''/var/log/boot.log'' 
 +for errors and issues. The scripts are run in the following order:
  
-The Hadoop Daemons are started in the /etc/rc.local file (the last script file that 
-is run when the system boots) The actual scripts are in /usr/sbin and are very 
-simple with no checking. If you are knowledgeable, you can check /var/log/boot.log 
-for errors and issues. The scripts are run in the following order 
   /usr/sbin/start-hdfs.sh   /usr/sbin/start-hdfs.sh
   /usr/sbin/start-yarn.sh   /usr/sbin/start-yarn.sh
Line 250: Line 224:
 A corresponding "stop script" is run when the system is shutdown or rebooted. A corresponding "stop script" is run when the system is shutdown or rebooted.
  
-As mentioned, if all the the scripts are running, the "jpscommand +As mentioned, if all the the scripts are running, the ''jps'' command 
-(run as root) should show the following (process numbers will be different) +(run as root) should show the following (process numbers will be different). 
-The RunJar entrees are for the hiveserver2 and hive-metastore processes+The RunJar entrees are for the ''hiveserver2'' and ''hive-metastore'' processes.
  
   # jps   # jps
Line 279: Line 253:
  
 For YARN to be running correctly the following daemons need to be running: For YARN to be running correctly the following daemons need to be running:
 +
   ResourceManager   ResourceManager
   JobHistoryServer   JobHistoryServer
Line 289: Line 264:
  
 A local metadata database (called Derby) is needed for Hive, if A local metadata database (called Derby) is needed for Hive, if
-the "NetworkServerControldaemon is not running, then stop and restart+the ''NetworkServerControl'' daemon is not running, then stop and restart
 the derby daemon: the derby daemon:
  
Line 295: Line 270:
   /usr/sbin/start-derby.sh   /usr/sbin/start-derby.sh
  
-So that Spark can use Hive tables there is hive-metastore and hiveserver2 service needed. +Spark can use Hive tables through a hive-metastore and hiveserver2 service. To stop and restart the services (in the following order)
-To stop and restart the services (in the following order)+
  
   /usr/sbin/stop-hiveserver2.sh   /usr/sbin/stop-hiveserver2.sh
Line 302: Line 276:
   /usr/sbin/start-hive-metastore.sh   /usr/sbin/start-hive-metastore.sh
   /usr/sbin/start-hiveserver2.sh   /usr/sbin/start-hiveserver2.sh
- 
  
 Finally, if the Zeppelin web page cannot be reached, the Zeppelin daemon Finally, if the Zeppelin web page cannot be reached, the Zeppelin daemon
Line 314: Line 287:
 and describe the situation. and describe the situation.
  
 +When the VM is stopped (see below) with ''poweroff'' or restarted with ''reboot'' commands, a script called ''/sbin/halt.local'' shuts down all the daemons. 
  
-====Stopping the VM====To stop the VM, click on "machine" in the VirtualBox menu bar. Select "Close" and then+====Stopping the VM==== 
 +To stop the VM, click on "machine" in the VirtualBox menu bar. Select "Close" and then
 select the "Save State" option. The next time the machine starts it will have all the select the "Save State" option. The next time the machine starts it will have all the
 changes you made. changes you made.
Line 326: Line 301:
  
  
-VM Installation Documentation +====VM Installation Documentation==== 
------------------------------+ 
 +Please see ''/root/Hadoop-Minimal-Install-Notes'' directory in the VM for how the packages were installed. 
 + 
 +====Issues/Bugs==== 
 + 
 +These issues have been addressed in the current version of the VM. Please use the lasted VM and you can avoid these issues. 
 + 
 +1. If you have problems loading the OVA image into VirtualBox, check the MD5 signature of the OVA file. The MD5 signature returned by running the program below should match the signature provided [[https://www.clustermonkey.net/download/Hands-on_Hadoop_Spark/Linux-Hadoop-Minimal-0.42.MD5.txt|here]]. For each OS, use the following commands (note the name of the OVA file may be different): 
 + 
 +For **Linux** use "md5sum" 
 + 
 +  $ md5sum Linux-Hadoop-Minimal-0.42.ova 
 + 
 +For **Macintosh** use "md5"
  
-Please see /root/Hadoop-Minimal-Install-Notes directory for how the packages were installed.+  $ md5 Linux-Hadoop-Minimal-0.42.ova
  
 +For **Windows 10** (in PowerShell) use "Get-FileHash" (Also, note the use of uppercase)
  
-Issues/Bugs +  C:\Users\Doug> Get-FileHash .\Linux-Hadoop-Minimal-0.42.ova -Algorithm MD5
------------+
  
-1. Either create your own user account as described above or use the existing "hands-on" user account. +2. Either create your own user account as described above or use the existing "hands-on" user account. The examples will not work if run as the root account.
-   The examples will not work if run as the root account.+
  
-2. If zip is not installed on your version of the VM, you can install it by entering +3. If zip is not installed on your version of the VM, you can install it by entering the following, as root, and a "y" when asked. Zip will now be installed and available for use.
-   the following, as root, and a "y" when asked. Zip will now be installed and available for use.+
  
    # yum install zip    # yum install zip
Line 351: Line 337:
    Installed:    Installed:
      zip.x86_64 0:3.0-1.el6_7.1      zip.x86_64 0:3.0-1.el6_7.1
-3. In previous versions there is a permission issue in HDFS that prevents Hive jobs +      
-   from working. To fix it, perform the following steps:+4. In previous versions there is a permission issue in HDFS that prevents Hive jobs from working. To fix it, perform the following steps:
  
-  a) login to the VM as root (pw="hadoop")+a) login to the VM as root (pw="hadoop")
  
      ssh root@127.0.0.1 -p 2222      ssh root@127.0.0.1 -p 2222
  
-  b) then change to hdfs user+b) then change to hdfs user
  
      su - hdfs      su - hdfs
  
-  c) fix the permission error:+c) fix the permission error:
  
      hdfs dfs -chmod o+w /user/hive/warehouse      hdfs dfs -chmod o+w /user/hive/warehouse
  
-  d) Check the result+d) Check the result
  
      hdfs dfs -ls /user/hive      hdfs dfs -ls /user/hive
  
-  e) The output of the previous command should look like:+e) The output of the previous command should look like:
  
      Found 1 items      Found 1 items
      drwxrwxrwx   - hive hadoop          0 2019-01-24 20:43 /user/hive/warehouse      drwxrwxrwx   - hive hadoop          0 2019-01-24 20:43 /user/hive/warehouse
  
-  f) Exit out of the hdfs account+f) Exit out of the hdfs account
  
      exit      exit
  
-  g) exit out the root account+g) exit out the root account
  
      exit      exit
linux_hadoop_minimal_installation_instructions.1560264684.txt.gz · Last modified: 2019/06/11 14:51 by deadline