Apache Hadoop, Spark, and Big Data

Apache Hadoop is a platform for managing large amounts of data. It includes many tools and applications under one framework.

From the elephant in the room department

Hadoop Logo Talk to most people about Apache™ Hadoop® and the conversation will quickly turn to using the MapReduce algorithm. MapReduce works quite well as a processing model for many types of problems. In particular, when multiple mapping process are used to span TBytes of data the power of a scalable Hadoop cluster becomes evident. In Hadoop version 1, the MapReduce process was one of two core components. The other component is the Hadoop Distributed File System (HDFS). Once data is stored and replicated in HDFS, the MapReduce process could move computational processes to the server on which specific data resides. The result is a very fast and parallel computational approach to problems with large amounts of data. But, MapReduce is not the whole story.


Login And Newsletter

Create an account to access exclusive content, comment on articles, and receive our newsletters.


Creative Commons License
©2005-2019 Copyright Seagrove LLC, Some rights reserved. Except where otherwise noted, this site is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 License. The Cluster Monkey Logo and Monkey Character are Trademarks of Seagrove LLC.