Last Updated: February 25, 2016
·
1.743K
· vidyasagar

Hadoop CDH 4.4.0 Installation

  1. Crete a Hadoop directory. Download all your components under this directory.

    sudo mkdir /usr/local/hadoop
  2. Change to all installations directory

    cd /usr/local/hadoop
  3. Download Hadoop tarball file

    wget http://archive.cloudera.com/cdh4/cdh/4/hadoop-2.0.0-cdh4.4.0.tar.gz
  4. Unpack the tarball file

    sudo tar –zxvf hadoop-2.0.0-cdh4.4.0.tar.gz
  5. Create Hadoop datastore directory

    sudo mkdir hadoop-datastore
    sudo mkdir hadoop-datastore/hadoop-hadoop  <hadoop-username>
  6. Changing permissions to current user for all the folders

    sudo chown –R hadoop.root *
    sudo chown –R hadoop.root .
    sudo chmod 755 *
    sudo chmod 755 .
  7. Adding hadoop binaries to /etc/environment

Current path: hadoop@localhost:/usr/local/hadoop/hadoop-2.0.0-cdh4.4.0$

sudo nano /etc/environment

Do changes in this file as shown below:

        PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/lib/jvm/java-6-openjdk-amd64/bin:/usr/local/hadoop/hadoop-2.0.0-cdh4.4.0/bin:/usr/local/hadoop/hadoop-2.0.0-cdh4.4.0/sbin"
JAVA_HOME="/usr/lib/jvm/java-6-openjdk-amd64"
HADOOP_HOME="/usr/local/hadoop/hadoop-2.0.0-cdh4.4.0"
HADOOP_CONF_DIR="/usr/local/hadoop/hadoop-2.0.0-cdh4.4.0/etc/hadoop"

 source /etc/environment
 echo $HADOOP_HOME

make sure this command is showing the below path

/usr/local/hadoop/hadoop-2.0.0-cdh4.4.0

Type hado and hit tab two files at prompt, hadoop keyword should be autofilled. (This ensures successful installation of hadoop)

  1. Make sure hadoop installation directory has current user permissions to read and write

    hadoop@localhost:/usr/local/hadoop/hadoop-2.0.0-cdh4.4.0$sudo chown –R hadoop.root *

    hadoop@localhost:/usr/local/hadoop/hadoop-2.0.0-cdh4.4.0$sudo chown –R hadoop.root .

    hadoop@localhost:/usr/local/hadoop/hadoop-2.0.0-cdh4.4.0$sudo chmod 755 .

    hadoop@localhost:/usr/local/hadoop/hadoop-2.0.0-cdh4.4.0$sudo chmod 755 *

  2. Configuring Hadoop

Current path: hadoop@localhost:/usr/local/hadoop/hadoop-2.0.0-cdh4.4.0/etc/hadoop$

sudo nano core-site.xml

<configuration>
<property>
    <name>fs.default.name</name>
    <value>hdfs://localhost:8020</value>
  </property>
  <property>
     <name>hadoop.tmp.dir</name>
     <value>/usr/local/hadoop/hadoop-datastore/hadoop-${user.name}</value>
  </property>

  <!-- OOZIE proxy user setting -->
  <property>
    <name>hadoop.proxyuser.hadoop.hosts</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.hadoop.groups</name>
    <value>*</value>
  </property>
</configuration>

sudo nano hadoop-env.sh

Add these two lines at the end of file

export HADOOP_OPTS=-Djava.net.preferIPv4Stack=true
export JAVA_HOME="/usr/lib/jvm/java-6-openjdk-amd64"

sudo nano hdfs-site.xml

Make sure you have the following contents in the file

<configuration>
<property>
    <name>dfs.replication</name>
    <value>1</value>
  </property>
  <property>
     <name>dfs.permissions</name>
     <value>false</value>
  </property>
  <!-- Immediately exit safemode as soon as one DataNode checks in.
       On a multi-node cluster, these configurations must be removed.  -->
  <property>
    <name>dfs.safemode.extension</name>
    <value>0</value>
  </property>
  <property>
     <name>dfs.safemode.min.datanodes</name>
     <value>1</value>
  </property>
  <property>
     <!-- specify this so that running 'hadoop namenode -format' formats the right dir -->
     <name>dfs.name.dir</name>
     <value>/usr/local/hadoop/hadoop-datastore/hadoop/dfs/name</value>
  </property>
</configuration>

Make sure in above value, which is underlined that has to be your username value.

    sudo nano mapred-site.xml

Make sure you have the similar contents in this file

    <configuration>
    <property>
          <name>mapreduce.framework.name</name>
          <value>yarn</value>
       </property>
    </configuration>

sudo nano yarn-site.xml

Make sure you have similar contents in this file

<configuration>
<property>
      <name>yarn.nodemanager.aux-services</name>
      <value>mapreduce.shuffle</value>
   </property>
   <property>
      <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
      <value>org.apache.hadoop.mapred.ShuffleHandler</value>
   </property>
</configuration>

You’re done with Hadoop installation.
These are commands to start and stop Hadoop:

start-all.sh

This has to give you all 5 deamons (i.e.., NameNode, Secondary NameNode, DataNode, ResourceManager and NodeManager ) running

stop-all.sh

This command allows you to stop all 5 deamons that are running in your cluster

You can start Job History Server Using command

mr-jobhistory-daemon.sh start historyserver --config $HADOOP_CONF_DIR

You can stop this historyserver using below command

mr-jobhistory-daemon.sh stop historyserver --config $HADOOP_CONF_DIR