Hadoop Tutorial | Spark Kafka Nosql and BigData tools for DWH

May 13, 2014

hadoop architecture

Input Data

-(Split)

-(Record Reader)

Mapper(tokenize in strings)-[a,1][a,1][b,1], [a,1][b,1]

Combiner(combine similer)-[a,1,1][b,1], [a,1][b,1]

Partitioning(partition on basis of similarity)[a,1,1], [b,1], [a,1], [b,1]

Shuffle and sort(shuffle phase sorts the resulting pairs from the combiner phase, after which, data goes to reducer)[a,1,1,1], [b,1,1]

Reducer.[a,3], [b,2]

for more info see this https://developer.yahoo.com/hadoop/tutorial/module4.html

Search This Blog

Hadoop Tutorial | Spark Kafka Nosql and BigData tools for DWH

Comments

Post a Comment

Popular posts from this blog

Setup Nginx as a Reverse Proxy for Thingsboard running on different port/server

How to auto re-launch a YARN Application Master on a failure.

Hive partitioned tables Issue with schema & PrestoDB