Custom Search

Popular Posts

Sunday, June 22, 2014


Metadata in a data warehouse is similar to the data dictionary in the context of a database. It stores data about data in the data warehouse.

Types of Metadata

Metadata in a data warehouse fall into three major categories:

-Operational Metadata

-Extraction and Transformation Metadata

-End-User Metadata

Operational Metadata: As we know, data for the data warehouse comes from several operational systems of the enterprise. These source systems contain different data structures. The data elements selected for the data warehouse have various field lengths and data types. Selecting data from different source files, and loading it into the data warehouse, requires splitting of records, combining parts of records from different source files, and dealing with multiple coding schemes and field lengths. When information is delivered to the end-users, it is essential to be able relate back to the original source data sets. Operational metadata contain all of this information about the operational data sources that allow us to trace back to the original source. Vi hjelper deg med alt for din reise på Her finner du alt du trenger  

Extraction and Transformation Metadata: Extraction and transformation metadata contain data about the extraction of data from the source systems, namely, the extraction frequencies, extraction methods, and business rules for the data extraction. Also, this category of metadata contains information about all the data transformations that take place in the data staging area.

End-User Metadata. The end-user metadata is the navigational map of the data warehouse. It enables the end-users to find information from the data warehouse. The end-user metadata allows the end-users to use their own business terminology and look for information in those ways in which they normally think of the business.

Special Significance

Why is metadata especially important in a data warehouse?

1.   First, it acts as the glue that connects all parts of the data warehouse.
2.   Next, it provides information about the contents and structure to the developers.
3.   Finally, it opens the door to the end-users and makes the contents recognizable in their own terms. 

Metadata Requirements

According to Inmon, a new user approaching a data warehouse wants to know :  

·         What tables, attributes, and keys does the data warehouse contain?
·         From where did each set of data come?
·         What transformation logic was applied in loading the data?
·         How has the data changed over time?
·         What aliases exist, and how are they related to each other?
·         What are the cross-references between technical and business terms? (For instance, the field name XVT-351J presumably meant something to a COBOL programmer in 1965, but what does it mean to me today?)
·         How often does the data get reloaded?
·         How much data is there? This helps end-users to avoid submitting unrealistic queries. Given some means of determining the size of tables, staff can tell the end users, “You can do what you like with 15,000 rows, but if it turns out be 15 million rows, back off and ask for help!” 

Metadata requirements of various classes of users are summarized in Table 1 below:

Table 1: Uses of Metadata

IT Professionals
Power Users
Casual Users
and Discovery
Database Tables, Columns,
Server Platforms.
Databases, Tables, Columns
List of Predefined Queries and Reports, Business views.
of Data
Data structures, Data Definitions, Data Mapping, Cleansing Functions,
Transformation Rules
Business Terms, Data Definitions, Data Mapping,
Cleansing Functions,
Transformation Rules
Business Terms, Data Definitions, Filters, Data Sources, Conversion, Data
Program Code in SQL, 3GL, 4GL, Front-end Applications,
Query Toolsets, Database Access for Complex
Authorization Requests, Information Retrieval into Desktop Applications such as Spreadsheets.

Metadata Components

Warehouse metadata is not very different in kind from ordinary database metadata, although it is versioned in order to permit historical analysis. Prism gives the following breakdown of warehouse metadata in its Tech Topic, “Metadata in the Data Warehouse:” 


The mapping information records how data from operational sources is transformed on its way into the warehouse. Typical contents are:

·         Identification of source fields
·         Simple attribute-to-attribute mapping
·         Attribute conversions
·         Physical characteristic conversions
·         Encoding/reference table conversions
·         Naming changes
·         Key changes
·         Defaults
·         Logic to choose from among multiple sources
·         Algorithmic changes 

Extract History

Whenever historical information is analyzed, meticulous update records have to be kept. The metadata history is a good place to start any time-based report, because the analyst has to know when the rules changed in order to apply the right rules to the right data. If, for example, sales territories were remapped in 2001, results from before that date may not be directly comparable with more recent results. 


·         Aliases can make the warehouses much more use-friendly by allowing a table to be queried by “Widgets produced by each factory” rather than “MFSTATS.” Aliases also come in useful when different departments want to use their own names to refer to the same underlying data. Obviously, though, aliases can also cause a great deal of confusion if they are not carefully tracked.

·         Often, parts of the same data warehouse may be in different stages of development. Status information can be used to keep track of this: for instance, tables might be classified “in-design,” “in-test,” ‘inactive,” or “active.”

·         Volumetric information lets users know how much data they are dealing with, so that they can have some idea how much their queries will cost in terms of time and computational resources. Volumetrics could usefully include such information as number of rows, growth rate, usage characteristics, indexing, and byte specifications.

·         It is also useful to publish the criteria and time scales for purging old data. 

Summarization and Aggregation Algorithms

As discussed above, a typical data warehouse contains lightly and heavily summarized data, and aggregations as well as full detailed records. The algorithms for summarizing (and aggregating) the detail data are obviously of interest to anyone who takes responsibility for interpreting the meaning of the summaries. This metadata can also save time by making it easier to decide which level of summarization is most appropriate for a given purpose.

Relationship Artifacts and History

Data warehouses implement relationships in a different way form production databases. Metadata pertaining to related tables, constraints, and cardinality are maintained, together with text descriptions and ownership records. This information and the history of changes to it can be useful to analysts.

Ownership / Stewardship

Operational databases are often owned by particular departments or business groups. In an enterprise data warehouse, however, all data is stored in a common format and accessible to all. This makes it necessary to identify the originator of each set of data, so that inquiries and corrections can be made by the proper group. It is useful to distinguish between ownership of data in the operational environment and stewardship in the data warehouse.

Access Patterns

It is desirable to record patterns of access to the warehouse in order to optimize and tune performance. Less frequently used data can be migrated to cheaper storage media, and various methods can be used to accelerate access to the data that is most in demand. Most databases do a good job of hiding such physical details, but specialized performance analysis tools are usually available. Some general-purpose tools, such as Information Builders’ Site Analyzer, also are available.

Reference Tables / Encoded Data

Reference data is stored in an external table (see discussion on Star Schema) and contains commonly used translations of encoded values. The contents of these tables must be stored in order to guarantee the ability to recover the original unencoded data, together with effective from and effective to dates.

Data Model-Design Reference

Building a data warehouse without first constructing a data model is very difficult and frustrating. When a data model is used, metadata describing the mapping between the data model and the physical design should be stored. This allows all ambiguities or uncertainties to be resolved.

From the point of view of the Query Manager of the data warehouse, the Metadata Repository can be perceived to have three logical layers: the Information Navigator, the Business Metadata, and the Technical Metadata.

Figure-1 : Metadata Repository

Figure-1 above illustrates this concept. The query manager accesses the metadata through the Information Navigator layer which is the topmost layer of the metadata repository. The higher layers, in turn, access more detailed metadata components resident in the lower layers whenever required. 



Blog Widget by LinkWithin