What is Data Lake ?
- Data Lake is a huge repository that holds every kind of data in its raw format until it is needed by anyone in the organization to analyze.
 - Data Lake is not Hadoop. It uses different tools. Hadoop only implements a subset of functionalities
 - Data Lake is not a database in the traditional sense of the word. A typical implementation of Data Lake uses various NoSQL and In-Memory databases that could co-exist with its relational counterparts.
 - A Data Lake cannot be implemented in isolation. It has to be implemented alongside a data warehouse as it complements various functionalities of a DW.
 - It stores large volumes of both unstructured and structured data. It also stores fast-moving streamed data from machine sensors and logs.
 - It advocates a Store-All approach to huge volumes of data.
 - It is optimized for data crunching with a high-latency batch mode and it is not geared for transaction processing.
 - It helps in creating data models that are flexible and could be revised without database redesign.
 - It can quickly perform data enrichment that helps in achieving data enhancement, augmentation, classification, and standardization of the data.
 - All of the data stored in the Data Lake can be utilized to get an all-inclusive view. This enables near-real-time, more precise predictive models that go beyond sampling and aid in generating multi-dimensional models too
 - It is a data scientist's favorite hunting ground. He gets to access the data stored in its raw glory at its most granular level, so that he can perform any ad-hoc queries, and build an advanced model at any time—Iteratively. The classic data warehouse approach does not support this ability to condense the time between data intake and insight generation.
 - A key attribute of a Data Lake is that data is not classified when it is stored. As a result, the data preparation, cleansing, and transformation tasks are eliminated; these tasks generally take a lion's share of time in a Data Warehouse.