Blog Archive

Thursday, December 7, 2017

Big Data understanding

                                               Building Blocks for Big Data Project

 -        Working knowledge on Hadoop & Hadoop Ecosystem
o   Be comfortable with basic Linux commands
o   Dataware housing Knowledge and SQL commands
o   Programming concepts like Java, Python, R, Pearl etc.
-        Understanding data structure & Business objective
-        Data visualization tools like Tableau, Qlickview, Jasper reports etc.
-        Be comfortable with analytics tools like R, Python, Spark, SAS etc.
-        Be comfortable with statistics (exploratory) and machine learning algorithms




What disrupted the Data Center?




Every industry is graced with more data…

• Richer transnational data from portfolio of dozens or hundreds of
    business applications
• Usage and behavior data from web and mobile apps
• Social media data
• Sensor and event data from IoT devices
• Data economy – firms buying and selling data
• Derived data from analytics

What is the challenge?

• The challenges include capture, curation, storage, search, sharing
   transfer, analysis and visualization
• The main challenge lies in identifying the value, the relevant          information within this data, and then transforming and extracting that data for further analysis.


What is Bigdata?

• Is it technology?
• Is it solution?
• Is it problem?
• Is it platform?
• Is it statement/phrase?

Big Data – 4 V’s
  •  According to IDC(International Data Corporation) the size of digital universe at 4.4 zettabytes in 2013 and forecasting a tenfold growth by 2020 to 40 zettabytes
  • A zetta bytes is (10)21 bytes or thousands of exabytes or one million petabytes or one billion terabytes
  • The NYSE generates about 4-5 terabytes of data per day
  • Facebook hosts more than 240 billion photos, growing at 7 petabytes per month


IBM’s Definition of Big Data


Big data – Myths

·        It’s Big : You need to have lots of data to talk about
big data
·       You need to apply it right away
·       The more granular the data, the better
·       Big Data is good data
·       Big Data means that analysts become all-important
·       Big Data gives you concrete answers
·       Big Data predicts the future
·       Big Data is a magical solution
·       Big Data can create self-learning algorithms
·       Big Data is only for big corporations
·       We Have So Much Data, We Don't Need to Worry
About Every Little Data Flaw
·       Big Data Technology Will Eliminate the Need for
Data Integration
·       It's Pointless Using a Data Warehouse for Advanced
Analytics
·       Data Lakes Will Replace the Data Warehouse
·       Hadoop is the holy grail of big data
·         Machine Learning Overcomes Human Bias


Big Data- Scenarios





What is Hadoop?
  •     Hadoop is an Open-Source Data Management framework with scale-out storage &distributed processing

Hadoop is not a database. Hadoop (from Apache Software Foundation) is a Java-based software framework for scalable,decentralized software applications that supports easy handling and analyzing of vast data volumes.





Existing Data Architecture



Limitations of Existing Data Analytics Architecture




An Emerging Data Architecture



Emerging Data Analytics Architecture



DBMS vs. HADOOP







Why Hadoop?


·        Supports use of inexpensive, commodity hardware
                -No RAID needed. Also, the servers need not be the latest                 and greatest hardware.
·        Provides for simple, massive parallelism
·        Provides resilience by replicating data and eliminating tape backups
·        Provides locality of execution, as it knows where the data is
·        Software free
·        High quality support available at modest cost
·        Certification available
·        Easy to support when using GUI such as Cloudera Manager or Ambari
·        Add-on tools available at relatively low cost, or in some cases no cost

·        Evolving technology with a high degree of interest around the world


Hadoop Ecosystem





Analytics mapping – Hadoop 1.x



Analytics mapping – Hadoop 2.x





Typical Big Data Project – Role of Hadoop Ecosystem




Opportunity and Market Outlook



Who is using Hadoop?




Which companies Implemented Hadoop?

http://wiki.apache.org/hadoop/poweredBy



Next post would be on Hadoop 2X.......

Used information from Analytic lab