Code-Searching: What is "Partitioning, Shuffle and sort" phase after finishing Map phase ?

Wednesday, 18 May 2016

What is "Partitioning, Shuffle and sort" phase after finishing Map phase ?

Partitioning

determines which reducer instance will receive which intermediate keys and values.
It is necessary that for any key (regardless of which mapper instance generated it) the destination partition is the same.

Shuffle

Process of moving map outputs to the reducers.
After the first map tasks have completed (the nodes may still be performing several more map tasks each) but they also begin exchanging the intermediate outputs from the map tasks to where they are required by the reducers.

Sort

The set of intermediate keys on a single node is automatically sorted by Hadoop before they are presented to the Reducer.

No comments:

Post a Comment

Note: only a member of this blog may post a comment.

Subscribe to: Post Comments (Atom)