hadoop architecture
Input Data
-(Split)
-(Record Reader)
Mapper(tokenize in strings)-[a,1][a,1][b,1], [a,1][b,1]
Combiner(combine similer)-[a,1,1][b,1], [a,1][b,1]
Partitioning(partition on basis of similarity)[a,1,1], [b,1], [a,1], [b,1]
Shuffle and sort(shuffle phase sorts the resulting pairs from the combiner phase, after which, data goes to reducer)[a,1,1,1], [b,1,1]
Reducer.[a,3], [b,2]
for more info see this https://developer.yahoo.com/hadoop/tutorial/module4.html
Comments
Post a Comment