Hadoop CDH 4.4.0 Installation
-
Crete a Hadoop directory. Download all your components under this directory.
sudo mkdir /usr/local/hadoop
-
Change to all installations directory
cd /usr/local/hadoop
-
Download Hadoop tarball file
wget http://archive.cloudera.com/cdh4/cdh/4/hadoop-2.0.0-cdh4.4.0.tar.gz
-
Unpack the tarball file
sudo tar –zxvf hadoop-2.0.0-cdh4.4.0.tar.gz
-
Create Hadoop datastore directory
sudo mkdir hadoop-datastore sudo mkdir hadoop-datastore/hadoop-hadoop <hadoop-username>
-
Changing permissions to current user for all the folders
sudo chown –R hadoop.root * sudo chown –R hadoop.root . sudo chmod 755 * sudo chmod 755 .
Adding hadoop binaries to /etc/environment
Current path: hadoop@localhost:/usr/local/hadoop/hadoop-2.0.0-cdh4.4.0$
sudo nano /etc/environment
Do changes in this file as shown below:
PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/lib/jvm/java-6-openjdk-amd64/bin:/usr/local/hadoop/hadoop-2.0.0-cdh4.4.0/bin:/usr/local/hadoop/hadoop-2.0.0-cdh4.4.0/sbin"
JAVA_HOME="/usr/lib/jvm/java-6-openjdk-amd64"
HADOOP_HOME="/usr/local/hadoop/hadoop-2.0.0-cdh4.4.0"
HADOOP_CONF_DIR="/usr/local/hadoop/hadoop-2.0.0-cdh4.4.0/etc/hadoop"
source /etc/environment
echo $HADOOP_HOME
make sure this command is showing the below path
/usr/local/hadoop/hadoop-2.0.0-cdh4.4.0
Type hado and hit tab two files at prompt, hadoop keyword should be autofilled. (This ensures successful installation of hadoop)
-
Make sure hadoop installation directory has current user permissions to read and write
hadoop@localhost:/usr/local/hadoop/hadoop-2.0.0-cdh4.4.0$sudo chown –R hadoop.root *
hadoop@localhost:/usr/local/hadoop/hadoop-2.0.0-cdh4.4.0$sudo chown –R hadoop.root .
hadoop@localhost:/usr/local/hadoop/hadoop-2.0.0-cdh4.4.0$sudo chmod 755 .
hadoop@localhost:/usr/local/hadoop/hadoop-2.0.0-cdh4.4.0$sudo chmod 755 *
Configuring Hadoop
Current path: hadoop@localhost:/usr/local/hadoop/hadoop-2.0.0-cdh4.4.0/etc/hadoop$
sudo nano core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:8020</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop/hadoop-datastore/hadoop-${user.name}</value>
</property>
<!-- OOZIE proxy user setting -->
<property>
<name>hadoop.proxyuser.hadoop.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hadoop.groups</name>
<value>*</value>
</property>
</configuration>
sudo nano hadoop-env.sh
Add these two lines at the end of file
export HADOOP_OPTS=-Djava.net.preferIPv4Stack=true
export JAVA_HOME="/usr/lib/jvm/java-6-openjdk-amd64"
sudo nano hdfs-site.xml
Make sure you have the following contents in the file
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
<!-- Immediately exit safemode as soon as one DataNode checks in.
On a multi-node cluster, these configurations must be removed. -->
<property>
<name>dfs.safemode.extension</name>
<value>0</value>
</property>
<property>
<name>dfs.safemode.min.datanodes</name>
<value>1</value>
</property>
<property>
<!-- specify this so that running 'hadoop namenode -format' formats the right dir -->
<name>dfs.name.dir</name>
<value>/usr/local/hadoop/hadoop-datastore/hadoop/dfs/name</value>
</property>
</configuration>
Make sure in above value, which is underlined that has to be your username value.
sudo nano mapred-site.xml
Make sure you have the similar contents in this file
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
sudo nano yarn-site.xml
Make sure you have similar contents in this file
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce.shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>
You’re done with Hadoop installation.
These are commands to start and stop Hadoop:
start-all.sh
This has to give you all 5 deamons (i.e.., NameNode, Secondary NameNode, DataNode, ResourceManager and NodeManager ) running
stop-all.sh
This command allows you to stop all 5 deamons that are running in your cluster
You can start Job History Server Using command
mr-jobhistory-daemon.sh start historyserver --config $HADOOP_CONF_DIR
You can stop this historyserver using below command
mr-jobhistory-daemon.sh stop historyserver --config $HADOOP_CONF_DIR