Building Blocks for Big Data Project
- Working knowledge on Hadoop & Hadoop Ecosystem
- Working knowledge on Hadoop & Hadoop Ecosystem
o Be comfortable with basic Linux commands
o Dataware housing Knowledge and SQL
commands
o Programming concepts like Java, Python, R,
Pearl etc.
-
Understanding
data structure & Business objective
-
Data
visualization tools like Tableau, Qlickview, Jasper reports etc.
-
Be
comfortable with analytics tools like R, Python, Spark, SAS etc.
-
Be
comfortable with statistics (exploratory) and machine learning algorithms
What disrupted the Data Center?
Every industry is graced with more data…
• Richer transnational data from portfolio of dozens or hundreds of
business applications
• Usage and behavior data from web and mobile apps
• Social media data
• Sensor and event data from IoT devices
• Data economy – firms buying and selling data
• Derived data from analytics
What is the challenge?
• The challenges include capture, curation, storage, search, sharing
transfer, analysis and visualization
• The main challenge lies in identifying the value, the relevant information within this data, and then transforming and extracting that data for further analysis.
What is Bigdata?
• Is it technology?
• Is it solution?
• Is it problem?
• Is it platform?
• Is it statement/phrase?
Big Data – 4 V’s
- According to IDC(International Data Corporation) the size of digital universe at 4.4 zettabytes in 2013 and forecasting a tenfold growth by 2020 to 40 zettabytes
- A zetta bytes is (10)21 bytes or thousands of exabytes or one million petabytes or one billion terabytes
- The NYSE generates about 4-5 terabytes of data per day
- Facebook hosts more than 240 billion photos, growing at 7 petabytes per month
IBM’s Definition of Big Data
Big data – Myths
· It’s Big : You need to have lots of data
to talk about
big data
· You need to apply it right away
· The more granular the data, the better
· Big Data is good data
· Big Data means that analysts become
all-important
· Big Data gives you concrete answers
· Big Data predicts the future
· Big Data is a magical solution
· Big Data can create self-learning
algorithms
· Big Data is only
for big corporations
· We Have So Much Data, We Don't Need to
Worry
About Every Little
Data Flaw
· Big Data Technology Will Eliminate the
Need for
Data Integration
· It's Pointless Using a Data Warehouse for
Advanced
Analytics
· Data Lakes Will Replace the Data
Warehouse
· Hadoop is the holy grail of big data
·
Machine Learning Overcomes Human Bias
Big Data- Scenarios
What is Hadoop?
DBMS vs. HADOOP
Why Hadoop?
- Hadoop is an Open-Source Data Management framework with scale-out storage &distributed processing
Hadoop is not a database. Hadoop (from Apache Software Foundation) is a Java-based software framework for scalable,decentralized software applications that supports easy handling and analyzing of vast data volumes.
Existing Data Architecture
Limitations of Existing Data Analytics Architecture
An Emerging Data Architecture
Emerging Data Analytics Architecture
DBMS vs. HADOOP
Why Hadoop?
·
Supports
use of inexpensive, commodity hardware
-No RAID needed.
Also, the servers need not be the latest and greatest hardware.
·
Provides
for simple, massive parallelism
·
Provides
resilience by replicating data and eliminating tape backups
·
Provides
locality of execution, as it knows where the data is
·
Software
free
·
High
quality support available at modest cost
·
Certification
available
·
Easy
to support when using GUI such as Cloudera Manager or Ambari
·
Add-on
tools available at relatively low cost, or in some cases no cost
·
Evolving
technology with a high degree of interest around the world
Hadoop Ecosystem
Analytics mapping – Hadoop 1.x
Analytics mapping – Hadoop 2.x
Typical Big Data Project – Role of Hadoop Ecosystem
Opportunity and Market Outlook
Who is using Hadoop?
Which companies Implemented Hadoop?
http://wiki.apache.org/hadoop/poweredBy
Next post would be on Hadoop 2X.......
Used information from Analytic lab