Wednesday, June 23, 2010

Metadata for Data Warehousing and Business Intelligence

Metadata is one of the important keys to the success of the data warehousing and business intelligence effort. Metadata management answers these questions:

What is Metadata?
How can Metadata be Managed?
Extracting Metadata from Legacy Systems

What is Metadata?
Metadata is your control panel to the data warehouse. It is data that describes the data warehousing and business intelligence system:

1. Reports
2. Cubes
3. Tables (Records, Segments, Entities, etc.)
4. Columns (Fields, Attributes, Data Elements, etc.)
5. Keys
6. Indexes

Metadata is often used to control the handling of data and describes:

1. Rules
2. Transformations
3. Aggregations
4. Mappings

The power of metadata is that enables data warehousing personnel to develop and control the system without writing code in languages such as: Java, C# or Visual Basic. This saves time and money both in the initial set up and on going management.

Data Warehouse Metadata

Data warehousing has specific metadata requirements. Metadata that describes tables typically includes:

1. Physical Name
2. Logical Name
3. Type: Fact, Dimension, Bridge
4. Role: Legacy, OLTP, Stage,
5. DBMS: DB2, Informix, MS SQL Server, Oracle, Sybase
6. Location
7. Definition
8. Notes

Metadata describes columns within tables:

1. Physical Name
2. Logical Name
3. Order in Table
4. Datatype
5. Length
6. Decimal Positions
7. Nullable/Required
8. Default Value
9. Edit Rules
10.Definition
11.Notes

How can Data Warehousing Metadata be Managed?

Data warehousing and business intelligence metadata is best managed through a combination of people, process and tools.

The people side requires that people be trained in the importance and use of metadata. They need to understand how and when to use tools as well as the benefits to be gained through metadata.

The process side incorporates metadata management into the data warehousing and business intelligence life cycle. As the life cycle progresses metadata is entered into the appropriate tool and stored in a metadata repository for further use.

Metadata can be managed through individual tools:

Metadata manager / repository
Metadata extract tools
Data modeling
ETL
BI Reporting
Metadata Manager / Repository

Metadata can be managed through a shared repository that combines information from multiple sources.



The metadata manager can be purchased as a software package or built as "home grown" system. Many organizations start with a spreadsheet containing data definitions and then grow to a more sophisticated approach.

Extracting Metadata from Input Sources

Metadata can be obtained through a manual process of keying in metadata or through automated processes. Scanners can extract metadata from text such as SQL DDL or COBOL programs. Other tools can directly access metadata through SQL catalogs and other metadata sources.

Picking the appropriate metadata extract tools is a key part of metadata management.

Many data modeling tools include a metadata extract capability - otherwise known as "reverse engineering". Through this tool, database information about tables and columns can be extracted. The information can then be exported from the data modeling tool to the metadata manager.

No comments:

Post a Comment