Apache Hadoop-2.7.0- Components
- The Apache™ Hadoop® project develops open-source software for reliable, scalable, distributed computing.
- The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models.
- It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.
>The project includes these modules:
• Hadoop Common: The common utilities that support the other Hadoop modules.
• Hadoop Distributed File System (HDFS): a distributed file-system that stores data on commodity machines, providing
very high aggregate bandwidth across the cluster.
• Hadoop YARN: a resource-management platform responsible for managing computing resources in clusters and using
them for scheduling of users’ applications.
• Hadoop MapReduce: A YARN-based system for parallel processing of large data sets(programming model for large
scale data processing)
There are five pillars to Hadoop that make it enterprise ready:
1. Data Management: Apache Hadoop YARN, HDFS
2. Data Access: Apache Hive, Apache Pig, MapReduce, Apache Spark, Apache Storm,Apache Hbase, Apache Tez, Apache Kafka, Apache Hcatalog, Apache Slider, Apache Solr, Apache Mahout, Apache Accumulo
3. Data Governance and Integration: Apache Falcon, Apache Flume, Apache Sqoop
4. Security: Apache Knox, Apache Ranger
5. Operations: Apache Ambari, Apache Oozie, Apache ZooKeeper
Providers
Commercial Vendors:
- Cloudera
- Hortonworks
- IBM Infosphere Biginsights
- MapR Technologies
- Think Big Analytics
- Amazon Web Services (Cloud based)
- Microsoft Azure (Cloud based)
Open Source Vendors
- Apache
- Apache Bigtop
- Cascading
- Cloudspace
- Datameer
- Data Mine Lab
- Data Salt
- Data Stax
- Data Torrent
- Debian
- Emblocsoft
- Hstreaming
- Impetus
- Pentaho
- Talend
- Jaspersoft
- Karmasphere
- Apache Mahoot
- Nutch
- NGData
- Pervasive Software
- Pivotal
- Sematext International
- Syncsort
- Tresata
- Wandisco
- Etc..