Blog Archive

Monday, July 25, 2011

sdlc of data warehouse









1. OLTP:
OLTP stands for “ONLINE TRANSACTION PROCESSING”. OLTP
Consists of complete working transaction processing of a system. This OLTP system forms the base as input to the data warehouse.
The input to the DWH is from various sources like
1. Oracle tables, dump files and data files
2. Flat files or text files, Excel sheets.

2. CRS AND SRS: 
CRS stands for “CUSTOMER REQUIREMENT SPECIFICATIONS” and SRS for “SYSTEM REQUIREMENT SPECIFICATION”.
In this clients will specify their requirements as per business needs and to analyze the business process. From this they want to get answers to specific requirements.    

3. ETL SPECIFICATIONS:
(EXTRACTION TRANSFORMATION LOADING)
ETL Tools are used for Extraction, transformation and loading the source data. In this we have to specify from where we have to extract and load the data.

4.HLD: HLD stands for “ HIGH LEVEL DOCUMENT”.
In this client will tell us which tool we have to use and what is our
Source system and what are naming conventions.

5. DATAWAREHOUSE DESIGN:
In data warehouse design there are three phases of design.
They are as follows:

a. Conceptual Design:
On the basis of CRS we have to design the DWH so we create a conceptual diagram, which gives the concept of the warehouse. It consists of dimensions and facts without any attributes or details.

b. Logical Design:
Logical design is what you draw with a pen and paper or design with Oracle Warehouse Builder or Designer before building your warehouse. The logical model forms the primary basis for the physical model. With the help of dimension modeling technique we can design the logical model. It consists of attributes and the constraints related to dimensions.

c. Physical Design:
Physical design is the creation of the database with SQL statements. The physical model will differ from logical in terms of details specified for the physical database, which includes the data types and data size. In this step we will standardize the naming of database objects. We determine which aggregate tables are necessary and how to partition the large tables. We should complete the indexing strategy and plan for other performance options. The objective of physical design is to improve the performance, provide ease of administration and it should be designed for flexibility.


6. LOADING OR CODING:
In this step we load the prepared data into the data warehouse and store it in the database.
Loading is generally done in 3 ways:
a. Initial load:
                        Populating all the data warehouse tables for the first time.
b. Incremental load:
                               Applying ongoing changes as necessary in a periodic manner.
c. Full Refresh:
Completely erasing the contents of one or more tables and reloading with fresh data.

1.Staging:

Staging area contains the data, which will be loaded into the DWH using ETL tools. So in order to adopt for the changes in the source. All operations related to cleansing and formatting is done here.

2.Dimensions:

Dimension is a table that contains only textual information of Business. In a well-designed dimensional modeling, dimensions may have many columns or attributes. Each dimension table is identified by its primary key, which serves as the basis for referential integrity with any given fact table to which it is joined. A dimension with good attributes delivers good slicing and dicing capabilities. These are entry points into the fact table.

3. Facts:

Next step is to load the fact tables into the DWH. We use the term
Fact to represent a business measure. A fact table is the primary table in a dimensional model where the numeric performance measurements of the business process are stored. We access fact tables via the dimension tables joined to it. The fact table itself has its own primary key made up of subsets of foreign keys. In other words in dimensional model every table that expresses many to many relationship must be a fact table and all other tables are dimension tables.



7. TESTING:

 After loading the data we have to test the loaded data by using unit and system testing.
Unit testing is done to check the smallest piece of software that can be tested independently by a particular developer. For e.g. Small SQL program or procedures are developed and tested independently by the developers individually.
In integration and system testing various modules are integrated together and the testing of the complete software is carried out.



8. CERTIFICATION:

In this step we have to compare ETL spec with mappings done by developers. If our design meets the ETL spec, then it is implemented. 


9. PRODUCTION PHASE:

The final phase is the production phase where further enhancements are carried out depending upon the customer’s request. And after it is successful the project is ready for full deployment.

No comments:

Post a Comment