Hadoop Tutorial | Spark Kafka Nosql and BigData tools for DWH

May 13, 2014

Hadoop Installation:

You can configure hadoop in any Operating System, but the main power of hadoop comes with linux.

1. configure on windows - using cigwin
- using virtual machine - install ubuntu. see below link.
(http://hadoop.apache.org/docs/r2.3.0/hadoop-project-dist/hadoop-common/SingleNodeSetup.html)

2. configure on Linux machine.

/home/Admin/

su - root
password: root

adduser dav
passwd dav
(now my user name and password both are set as dav)

su - dav
password: dav
(now i am login with my user)

pwd

/home/dav/

step1: download hadoop - Use the latest stable release from Hadoop site and I used hadoop-1.2.1.tar.gz, it tarball file.
step2: Install java jdk1.6 or jdk1.7 or just put it here as it is.
step3: tar -zxvf hadoop-1.2.1.tar.gz
(now it)

mkdir hdfstmp
(it will create hdfstmp dir here)

go inside the hadoop directory and edit following files:
cd hadoop-1.2.1

vi conf/hadoop-env.sh
remove # tag from JAVA_HOME=/home/dav/jdk1.7.0_45

vi conf/core-site.xml

<property>
  <name>hadoop.tmp.dir</name>
  <value>/data/hdfstmp</value>
  <description>A base for other temporary directories.</description>
</property>

<property>
  <name>fs.default.name</name>
  <value>hdfs://localhost:54310</value>
  <description>The name of the default file system.  A URI whose
  scheme and authority determine the FileSystem implementation.  The
  uri's scheme determines the config property (fs.SCHEME.impl) naming
  the FileSystem implementation class.  The uri's authority is used to
  determine the host, port, etc. for a filesystem.</description>
</property>

vi conf/hdfs-site.xml

<property>
  <name>dfs.replication</name>
  <value>1</value>
  <description>Default block replication.
  The actual number of replications can be specified when the file is created.
  The default is used if replication is not specified in create time.
  </description>
</property>

vi conf/mapred-site.xml

<property>
  <name>mapred.job.tracker</name>
  <value>localhost:54311</value>
  <description>The host and port that the MapReduce job tracker runs
  at.  If "local", then jobs are run in-process as a single map
  and reduce task.
  </description>
</property>

vi slaves

enter slave machine ip address or write localhost ip.

vi masters

 enter master machine ip address or write localhost ip.

then format namenode

bin/hadoop  namenode -format

after successfuly formatting start hadoop

bin/start-all.sh

it will showing here

jps

namenode

datanode

tasktracker

jobtracker

secondarynamenode

stop hadoop bin/stop-all.sh

jps

jps only show means hadoop is stopped.

Search This Blog

Hadoop Tutorial | Spark Kafka Nosql and BigData tools for DWH

Comments

Post a Comment

Popular posts from this blog

Setup Nginx as a Reverse Proxy for Thingsboard running on different port/server

How to auto re-launch a YARN Application Master on a failure.

Hive partitioned tables Issue with schema & PrestoDB