Blog Archive

Friday, August 18, 2017

Difference Between Hadoop/HDFS and HBASE

Hadoop uses Distributed File system i.e. hadoop for storing bigData.But there are some limitation of HDFS,in order to overcome those limtitation NOSQL (HBASE,CASANDRA,MONGODB etc..) came into existence.
Hadoop is suited for offline data kind of batch processing where as HBASE is for real time data processing.

Hadoop can perform only batch processing, and data will be accessed only in a sequential manner. That means one has to search the entire dataset even for the simplest of jobs.A huge dataset when processed results in another huge data set, which should also be processed sequentially. At this point, a new solution is needed to access any point of data in a single unit of time (random access).


Like all other FileSystems, HDFS provides us storage, but in a fault tolerant manner with high throughput and lower risk of data loss(because of the replication).But, being a File System , HDFS lacks random read and write access. This is where HBase comes into picture. It’s a distributed, scalable, big data store, modeled after Google’s BigTable. Cassandra is somewhat similar to hbase.