Hadoop Installation:

You can configure hadoop in any Operating System, but the main power of hadoop comes with linux.

1. configure on windows -  using cigwin
            -  using virtual machine - install ubuntu. see below link.
(http://hadoop.apache.org/docs/r2.3.0/hadoop-project-dist/hadoop-common/SingleNodeSetup.html)

2. configure on Linux machine.

/home/Admin/

su - root
password: root

adduser dav
passwd dav
(now my user name and password both are set as dav)

su - dav
password: dav
(now i am login with my user)

pwd

/home/dav/

step1: download hadoop - Use the latest stable release from Hadoop site and I used hadoop-1.2.1.tar.gz, it tarball file.
step2: Install java jdk1.6 or jdk1.7 or just put it here as it is.
step3: tar -zxvf hadoop-1.2.1.tar.gz
(now it)

mkdir hdfstmp
(it will create hdfstmp dir here)

go inside the hadoop directory and edit following files:
cd hadoop-1.2.1

vi conf/hadoop-env.sh
remove # tag from JAVA_HOME=/home/dav/jdk1.7.0_45
 
vi conf/core-site.xml 
 
<property>
  <name>hadoop.tmp.dir</name>
  <value>/data/hdfstmp</value>
  <description>A base for other temporary directories.</description>
</property>

<property>
  <name>fs.default.name</name>
  <value>hdfs://localhost:54310</value>
  <description>The name of the default file system.  A URI whose
  scheme and authority determine the FileSystem implementation.  The
  uri's scheme determines the config property (fs.SCHEME.impl) naming
  the FileSystem implementation class.  The uri's authority is used to
  determine the host, port, etc. for a filesystem.</description>
</property>
 
vi conf/hdfs-site.xml 
 
<property>
  <name>dfs.replication</name>
  <value>1</value>
  <description>Default block replication.
  The actual number of replications can be specified when the file is created.
  The default is used if replication is not specified in create time.
  </description>
</property> 

vi conf/mapred-site.xml 
 
<property>
  <name>mapred.job.tracker</name>
  <value>localhost:54311</value>
  <description>The host and port that the MapReduce job tracker runs
  at.  If "local", then jobs are run in-process as a single map
  and reduce task.
  </description>
</property>
 
vi slaves
enter slave machine ip address or write localhost ip.
 
vi masters
 enter master machine ip address or write localhost ip.
 
then format namenode
bin/hadoop  namenode -format

after successfuly formatting start hadoop
bin/start-all.sh
 
it will showing here 
jps
namenode
datanode
tasktracker
jobtracker
secondarynamenode
 
stop hadoop bin/stop-all.sh

jps
 
jps only show means hadoop is stopped.

Comments

Popular posts from this blog

Setup Nginx as a Reverse Proxy for Thingsboard running on different port/server

How to auto re-launch a YARN Application Master on a failure.

Read JSON File in Cassandra