Apache Hadoop, Spark, and Big Data
Apache Hadoop is a platform for managing large amounts of data. It includes many tools and applications under one framework.
- Details
- Written by Douglas Eadline
- Hits: 4515
From the elephant in the room department
Talk to most people about Apache™ Hadoop® and the conversation will quickly turn to using the MapReduce algorithm. MapReduce works quite well as a processing model for many types of problems. In particular, when multiple mapping process are used to span TBytes of data the power of a scalable Hadoop cluster becomes evident. In Hadoop version 1, the MapReduce process was one of two core components. The other component is the Hadoop Distributed File System (HDFS). Once data is stored and replicated in HDFS, the MapReduce process could move computational processes to the server on which specific data resides. The result is a very fast and parallel computational approach to problems with large amounts of data. But, MapReduce is not the whole story.