Hadoop Ecosystem¶
Hadoop is not a single product, but rather a software family. Its common components consist of the following:
- Pig, a scripting language used to quickly write MapReduce code to handle unstructured sources
- Hive, used to facilitate structure for the data
- HCatalog, used to provide inter-operatability between these internal systems
- HBase, which is essentially a database built on top of Hadoop
- HDFS, the actual file system for hadoop.
- Apache Mahout
- Packaging for Hadoop: BigTop
Hadoop structures data using Hive, but can handle unstructured data easily using Pig.
Hadoop and Mongo¶
AWS EMR¶
Amazon EMR includes
- Ganglia
- Hadoop
- HBase
- HCatalog
- Hive
- Hue
- Mahout
- Oozie
- Phoenix
- Pig
- Prest0
- Spark
- Sqoop
- Tez
- Zeppelin
- ZooKeeper