About Big Data and Hadoop

by:

Uncategorized

What is MapReduce ?

MapReduce is a processing technique and a program model for distributed computing based on Java. The MapReduce algorithm contains two important tasks, namely Map and Reduce. Map takes a set of data and converts it into another set of data, where individual elements are broken down into tuples (key/value pairs). Secondly, reduce task which takes the output from a map as an input and combines those data tuples into a smaller set of tuples. Reduce task is always performed after the map job. The major advantage of MapReduce is that it is easy to scale data processing over multiple computing nodes.

Generally, MapReduce paradigm is based on sending the computer to where the data resides. MapReduce program executes in three stages, namely map stage, shuffle stage and reduce stage.

Previous

Leave a Reply

Your email address will not be published. Required fields are marked *