Custom Search

Popular Posts

Sunday, June 22, 2014

DATA WAREHOUSE SCHEMA


One of the key questions to be answered by the database designer is: How can we design a database that allows unknown queries to be performant? This question encapsulates the differences between designing for a data warehouse and designing for an operational system. In a data warehouse one designs to support the business process rather than specific query requirements. In order to achieve this, the designer must understand the way in which the information within the data warehouse will be used.

In general, the queries directed at a data warehouse tend to ask questions about

some essential fact, analyzed in different ways. For example reporting on:

-The average number of light bulbs sold per store over the past month

-The top ten most viewed cable-TV programs during the past week

-Top spending customers over the past quarter

-Customers with average credit card balances more than Rs.10,000 during the past year. 

Each of these queries has one thing in common: they are all based on factual data. The content and presentation of the results may differ between examples, but the factual and transactional nature of the underlying data is the same.  

sbobet asia indonesia Read More About News, Trends and Ideas that matter most to Entrepreneurs Visit SnapMunk

Table 1: Fact Tables and Attributes
REQUIREMENT
FACTS
ATTRIBUTES
Sales of Light Bulbs
EPOS Transaction.
Quantity Sold
Product Identifier (SKU)
Store Identifier
Data and Time
Revenue Achieved.
Cable Programs
Cable Pay-per-view
Transaction
Customer Identifier
Cable Channel Watched
Program Watched
Data and Time
Duration
Household Identifier.
Customer Spend
Loyalty Card
Transaction
Customer Identifier
Store Identifier
Transaction Value
Date and Time
Customer Account
Account Transactions
Customer Identifier
Account Number
Type of Transaction
Destination Account Number


Fact data possesses some characteristics that allow the underlying information in the database to be structured. Facts are transactions that have occurred at some point in the past, and are unlikely to change in the future. Facts can be analyzed in different ways by cross-referencing the facts with different reference information.

For example, we can look at sales by store, sales by region, or sales by product. In a data warehouse facts also tend to have few attributes, because there are no operational data overheads. For each of the examples described, the attributes of the fact could be as listed in Table 1 above.

One of the major technical challenges within the design of a data warehouse is to structure a solution that will be effective for a reasonable period of time (at least three to five years). This implies that the data should not have to be restructured when the business changes or the query profiles change. This is an important point, because in more traditional applications it is not uncommon to restructure the underlying data in order to address query performance issues. 

Fig : 1 Star Schema 
 
The inherent advantage of factual transactions is that their content is unlikely to change, regardless of how it is analyzed. Also, the majority of the data volume of a data warehouse comes from the factual information. It is therefore possible to treat fact data as read-only data, and the reference data (used to interpret the fact data) as data that is liable to change over time. Thus, if reference data needs to change, the voluminous fact data would not have to be changed or restructured. 

Star schemas (so called due to their ‘star like’ appearance) are physical database structures that store the factual data in the ‘center’, surrounded by the reference (or dimension) data (see Figure 1 above).

The dimension tables should be relatively small (typically less than 10 GB in total) in comparison to the size of the data warehouse, so that restructuring costs are small as long as the keys to the fact tables are not changed. In star schema arrangements, the reference information is often denormalized to a single table to speed up query performance. The redundancy overheads are acceptable, as the sizes involved are small and even the reference information changes infrequently. 

The dimensional information, represented by the reference data is often organized in form of concept hierarchies that are used for analysis, and designing data warehouse aggregations. 

A typical concept hierarchy for a retail chain is depicted in Figure 2 below :
Fig. 2 : Concept Hierarchies
The number of concept hierarchies that can be defined on any given dimension is by no means restricted to one. There may be several arbitrary concept hierarchies in use in any organization to enable data analysis from various angles. Figure 3 below shows how the ‘time’ dimension in a retail data warehouse (represented by the day / date reference data) may be organized into several concept hierarchies or groupings (which are single level concept hierarchies – ‘Easter’ and ‘Summer’ in Figure 3) 

Fig. 3 : Multiple Hierarchies 

When the concept hierarchies and groupings are incorporated into a star schema diagram (like Figure 4 below), the appearance resembles a ‘snowflake’. Hence, schemas of this type are called Snowflake schemas.
Fig 4 : Snowflake Schema



0 comments:

Blog Widget by LinkWithin