hadoop architecture


Input Data

-(Split)

-(Record Reader)

Mapper(tokenize in strings)-[a,1][a,1][b,1], [a,1][b,1]

Combiner(combine similer)-[a,1,1][b,1], [a,1][b,1]

Partitioning(partition on basis of similarity)[a,1,1], [b,1],  [a,1], [b,1]

Shuffle and sort(shuffle phase sorts the resulting pairs from the combiner phase, after which, data goes to reducer)[a,1,1,1], [b,1,1]

Reducer.[a,3], [b,2]

for more info see this https://developer.yahoo.com/hadoop/tutorial/module4.html

Comments

Popular posts from this blog

Setup Nginx as a Reverse Proxy for Thingsboard running on different port/server

How to auto re-launch a YARN Application Master on a failure.

Read JSON File in Cassandra