Register now or log in to join your professional community.
Hash Partion is the default partioner in hadoop which is handled by Hadoop internally if no partioner has been defined.
Hash partitioning which is the dafault partitioning in hadoop.
Default Partitioner which buckets keys using a hash function is used.
Hashing would be used by default else write code to make it as key and set it in mapper class
By default Hash partioner is used for partioning the data
add the partition column as the key for the mapper
If there is no custom partitioner ,mapreduce by default uses hash algorithm (hash code for map key) and keys with same hashcode will send to same reducer.
Map output stored in memory then spilled to the disk when it reach to the buffer threshold
the spill files are merged into a single partitioned and sorted output file
the maximum number of streams to merge at once is 10