Data Empowerment Blog: March 2021

What is Data Lake ?

Data Lake is a huge repository that holds every kind of data in its raw format until it is needed by anyone in the organization to analyze.
Data Lake is not Hadoop. It uses different tools. Hadoop only implements a subset of functionalities
Data Lake is not a database in the traditional sense of the word. A typical implementation of Data Lake uses various NoSQL and In-Memory databases that could co-exist with its relational counterparts.
A Data Lake cannot be implemented in isolation. It has to be implemented alongside a data warehouse as it complements various functionalities of a DW.
It stores large volumes of both unstructured and structured data. It also stores fast-moving streamed data from machine sensors and logs.
It advocates a Store-All approach to huge volumes of data.
It is optimized for data crunching with a high-latency batch mode and it is not geared for transaction processing.
It helps in creating data models that are flexible and could be revised without database redesign.
It can quickly perform data enrichment that helps in achieving data enhancement, augmentation, classification, and standardization of the data.
All of the data stored in the Data Lake can be utilized to get an all-inclusive view. This enables near-real-time, more precise predictive models that go beyond sampling and aid in generating multi-dimensional models too
It is a data scientist's favorite hunting ground. He gets to access the data stored in its raw glory at its most granular level, so that he can perform any ad-hoc queries, and build an advanced model at any time—Iteratively. The classic data warehouse approach does not support this ability to condense the time between data intake and insight generation.
A key attribute of a Data Lake is that data is not classified when it is stored. As a result, the data preparation, cleansing, and transformation tasks are eliminated; these tasks generally take a lion's share of time in a Data Warehouse.

Data Empowerment Blog

Blog Archive

Friday, March 5, 2021

Data Lake design Architecture

About Me

Blog Archive