Installation and Configuration
Java Installation
Java JDK 8
Download Java
Extract Java
tar
jdk-8u181-linux-x64.tar.gz
Move to
mv jdk-8u181-linux-x64 /usr/local/
Set Path(~/.bashrc)
export JAVA_HOME=/usr/local/jdk1.7.0_71
export PATH= $PATH:$JAVA_HOME/bin
Apply changes
source ~/.bashrc
Java Version
java -version
Adding a dedicated Hadoop system user
We will use a dedicated Hadoop user account for running Hadoop.
sudo addgroup hadoop_group
sudo adduser --ingroup hadoop_group hduser
sudo adduser hduser
Configuring SSH
The hadoop
control scripts rely on SSH to peform cluster-wide operations. For example,
there is a script for stopping and starting all the daemons in the clusters. To
work seamlessly, SSH needs to be setup to allow password-less login
for the
hadoop user from machines in the cluster. The simplest way to achive this is to
generate a public/private key pair, and it will be shared across the cluster.
Hadoop requires SSH access
to manage its nodes, i.e. remote machines plus your local machine. For our
single-node setup of Hadoop, we therefore need to configure SSH access to
localhost for the hduser user we created in the earlier.
We have to generate an SSH
key for the hduser user.
sudo – hduser
ssh-keygen -t rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
The final step is to test the SSH setup by connecting to the local
machine with the hduser1 user. The step is also needed to save your local
machine’s host key fingerprint to the hduser user’s known hosts file.
ssh localhost
SINGLE-NODE
HADOOP INSTALLATION
Now,
download and extract Hadoop 3.0.1
Download Hadoop
Extract
Hadoop
tar
hadoop-3.1.1.tar.gz
Move to this
folder
mv hadoop-3.1.1 /usr/local/
Setting up Environment
Variable Hadoop
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export HADOOP_INSTALL=$HADOOP_HOME
Configuration
hadoop-env.sh
Change the file:
conf/hadoop-env.sh
export JAVA_HOME=/usr/local/jdk1.8.0_171
conf/*-site.xml
Paste the following between
<configuration>
• In file
conf/core-site.xml
<property>
<name>hadoop.tmp.dir</name>
<value>/hadoop/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:54310</value>
<description>The name of the default file system </description>
</property>
·
In file conf/mapred-site.xml
<property>
<name>mapred.job.tracker</name>
<value>localhost:54311</value>
<description>The host and port that the MapReduce job tracker runs at. </description>
</property>
• In file
conf/yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
• In file
conf/hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name >
<value>1</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>file:///home/hadoop/hadoopdatastore/hdfs/namenode</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>file:///home/hadoop/hadoopdatastore/hdfs/datanode</value>
</property>
</configuration>
*NOTE need to
create folder structure where we can store hdfs file:
Formatting the HDFS filesystem via the NameNode
To
format the filesystem (which simply initializes the directory specified by the
dfs.name.dir variable). Run the command
/usr/local/hadoop/bin/hadoop namenode -format
Run the command
start-all.sh
This will startup a
Namenode, Datanode, Jobtracker and a Tasktracker on the machine.
jps
Errors:
1. If by
chance your datanode is not starting, then you have to erase the contents of
the folder /app/hadoop/tmp The command that can be used
sudo rm -Rf /hadoop/tmp/*
2. You can
also check with netstat if Hadoop is listening on the configured ports. The
command that can be used
sudo netstat -plten | grep java
3.
Errors if any, examine the log files in the /logs/ directory.


Comments
Post a Comment