Installation and Configuration

Java Installation

Java JDK 8

Download Java

wget http://download.oracle.com/otn-pub/java/jdk/8u181-b13/96a7b8442fe848ef90c96a2fad6ed6d1/jdk-8u181-linux-x64.tar.gz

Extract Java

tar jdk-8u181-linux-x64.tar.gz

Move to

mv jdk-8u181-linux-x64 /usr/local/

Set Path(~/.bashrc)

export JAVA_HOME=/usr/local/jdk1.7.0_71

export PATH= $PATH:$JAVA_HOME/bin

Apply changes

source ~/.bashrc

Java Version

java -version

Adding a dedicated Hadoop system user

We will use a dedicated Hadoop user account for running Hadoop.

sudo addgroup hadoop_group

sudo adduser --ingroup hadoop_group hduser

sudo adduser hduser

Configuring SSH

The hadoop control scripts rely on SSH to peform cluster-wide operations. For example, there is a script for stopping and starting all the daemons in the clusters. To work seamlessly, SSH needs to be setup to allow password-less login

for the hadoop user from machines in the cluster. The simplest way to achive this is to generate a public/private key pair, and it will be shared across the cluster.

Hadoop requires SSH access to manage its nodes, i.e. remote machines plus your local machine. For our single-node setup of Hadoop, we therefore need to configure SSH access to localhost for the hduser user we created in the earlier.

We have to generate an SSH key for the hduser user.

sudo – hduser

ssh-keygen -t rsa

cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

The final step is to test the SSH setup by connecting to the local machine with the hduser1 user. The step is also needed to save your local machine’s host key fingerprint to the hduser user’s known hosts file.

ssh localhost

SINGLE-NODE HADOOP INSTALLATION

Now, download and extract Hadoop 3.0.1

Download Hadoop

wget http://mirrors.wuchna.com/apachemirror/hadoop/common/hadoop-3.1.1/hadoop-3.1.1.tar.gz

Extract Hadoop

tar hadoop-3.1.1.tar.gz

Move to this folder

mv hadoop-3.1.1 /usr/local/

Setting up Environment Variable Hadoop

export HADOOP_HOME=/usr/local/hadoop

export HADOOP_MAPRED_HOME=$HADOOP_HOME

export HADOOP_COMMON_HOME=$HADOOP_HOME

export HADOOP_HDFS_HOME=$HADOOP_HOME

export YARN_HOME=$HADOOP_HOME

export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native

export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin

export HADOOP_INSTALL=$HADOOP_HOME

Configuration

hadoop-env.sh

Change the file: conf/hadoop-env.sh

export JAVA_HOME=/usr/local/jdk1.8.0_171

conf/*-site.xml

Paste the following between <configuration>

• In file conf/core-site.xml

<property>

          <name>hadoop.tmp.dir</name>

          <value>/hadoop/tmp</value>

          <description>A base for other temporary directories.</description>

</property>

<property>

          <name>fs.default.name</name>

          <value>hdfs://localhost:54310</value>

          <description>The name of the default file system </description>

</property>

· In file conf/mapred-site.xml

<property>

  <name>mapred.job.tracker</name>

  <value>localhost:54311</value>

  <description>The host and port that the MapReduce job tracker runs at. </description>

</property>

• In file conf/yarn-site.xml

<configuration>

   <property>

      <name>yarn.nodemanager.aux-services</name>

      <value>mapreduce_shuffle</value>

   </property>

</configuration>

• In file conf/hdfs-site.xml

<configuration>

   <property>

      <name>dfs.replication</name >

      <value>1</value>

   </property>

   <property>

      <name>dfs.name.dir</name>

      <value>file:///home/hadoop/hadoopdatastore/hdfs/namenode</value>

   </property>

   <property>

      <name>dfs.data.dir</name>

      <value>file:///home/hadoop/hadoopdatastore/hdfs/datanode</value>

   </property>

</configuration>

*NOTE need to create folder structure where we can store hdfs file:

Create folder : file:///home/hadoop/hadoopdatastore/hdfs/namenode,

file:///home/hadoop/hadoopdatastore/hdfs/datanode</

Formatting the HDFS filesystem via the NameNode

To format the filesystem (which simply initializes the directory specified by the dfs.name.dir variable). Run the command

/usr/local/hadoop/bin/hadoop namenode -format

Run the command

start-all.sh

This will startup a Namenode, Datanode, Jobtracker and a Tasktracker on the machine.

jps

Errors:

1. If by chance your datanode is not starting, then you have to erase the contents of the folder /app/hadoop/tmp The command that can be used

sudo rm -Rf /hadoop/tmp/_*

2. You can also check with netstat if Hadoop is listening on the configured ports. The command that can be used

sudo netstat -plten | grep java

3. Errors if any, examine the log files in the /logs/ directory.

Search This Blog

java_tech_suite

Installation and Configuration

Comments

Post a Comment

Popular posts from this blog

multiple instances of SonarQube

squirrel Hbase configuration