Installation and Configuration


Java Installation

Java  JDK 8

Download Java


Extract Java

tar jdk-8u181-linux-x64.tar.gz

Move to

mv jdk-8u181-linux-x64 /usr/local/

Set Path(~/.bashrc)

export JAVA_HOME=/usr/local/jdk1.7.0_71
export PATH= $PATH:$JAVA_HOME/bin

Apply changes

source ~/.bashrc

Java Version

java -version






Adding a dedicated Hadoop system user

We will use a dedicated Hadoop user account  for  running Hadoop.



sudo addgroup hadoop_group
sudo adduser --ingroup hadoop_group hduser
sudo adduser hduser






Configuring SSH

The hadoop control scripts rely on SSH to peform cluster-wide operations. For example, there is a script for stopping and starting all the daemons in the clusters. To work seamlessly, SSH needs to be setup to allow password-less login


for the hadoop user from machines in the cluster. The simplest way to achive this is to generate a public/private key pair, and it will be shared across the cluster.

Hadoop requires SSH access to manage its nodes, i.e. remote machines plus your local machine. For our single-node setup of Hadoop, we therefore need to configure SSH access to localhost for the hduser user we created in the earlier.

We have to generate an SSH key for the hduser user.


sudo – hduser
ssh-keygen -t rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys






The final step is to test the SSH setup by connecting to the local machine with the hduser1 user. The step is also needed to save your local machine’s host key fingerprint to the hduser user’s known hosts file.

ssh localhost



SINGLE-NODE HADOOP INSTALLATION

Now, download and extract Hadoop  3.0.1

Download Hadoop



Extract Hadoop

tar hadoop-3.1.1.tar.gz

Move to this folder

mv hadoop-3.1.1 /usr/local/


Setting up Environment Variable Hadoop

export HADOOP_HOME=/usr/local/hadoop
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export HADOOP_INSTALL=$HADOOP_HOME

Configuration

hadoop-env.sh

Change the file: conf/hadoop-env.sh

export JAVA_HOME=/usr/local/jdk1.8.0_171

conf/*-site.xml

Paste the following between <configuration>

  In file conf/core-site.xml

<property>
          <name>hadoop.tmp.dir</name>
          <value>/hadoop/tmp</value>
          <description>A base for other temporary directories.</description> 
</property>
<property>
          <name>fs.default.name</name>
          <value>hdfs://localhost:54310</value>
          <description>The name of the default file system </description>
</property>






·         In file conf/mapred-site.xml

<property>
  <name>mapred.job.tracker</name>
  <value>localhost:54311</value>
  <description>The host and port that the MapReduce job tracker runs at. </description> 
</property>


  In file conf/yarn-site.xml

<configuration>
   <property>
      <name>yarn.nodemanager.aux-services</name>
      <value>mapreduce_shuffle</value>
   </property>
</configuration>


  In file conf/hdfs-site.xml

<configuration>
   <property>
      <name>dfs.replication</name >
      <value>1</value>
   </property>
   <property>
      <name>dfs.name.dir</name>
      <value>file:///home/hadoop/hadoopdatastore/hdfs/namenode</value>
   </property>          
   <property>
      <name>dfs.data.dir</name>
      <value>file:///home/hadoop/hadoopdatastore/hdfs/datanode</value>
   </property>
</configuration>

*NOTE  need to create folder structure where we can store hdfs file:

                   file:///home/hadoop/hadoopdatastore/hdfs/datanode</


Formatting the HDFS filesystem via the NameNode

To format the filesystem (which simply initializes the directory specified by the dfs.name.dir variable). Run the command


/usr/local/hadoop/bin/hadoop namenode -format


Run the command

start-all.sh

This will startup a Namenode, Datanode, Jobtracker and a Tasktracker on the machine.

jps


Errors:

1.  If by chance your datanode is not starting, then you have to erase the contents of the folder /app/hadoop/tmp The command that can be used

sudo rm -Rf /hadoop/tmp/*

2.  You can also check with netstat if Hadoop is listening on the configured ports. The command that can be used

sudo netstat -plten | grep java

3. Errors if any, examine the log files in the /logs/ directory.



Comments

Popular posts from this blog

squirrel Hbase configuration