Principles of Dimensional Modeling:
Dimensional modeling is the name of a logical design technique often used for data warehouses. DM is the only viable technique for databases that are designed to support end-user queries in a data warehouse. It is different from, and contrasts with, entity-relation modeling. ER is very useful for the transaction capture and the data administration phases of constructing a data warehouse, but it should be avoided for end-user delivery. This paper explains the dimensional modeling and how dimensional modeling technique varies/ contrasts with ER models.
Dimensional Modeling(DM):
Dimensional Modeling is a favorite modeling technique in data warehousing. DM is a logical design technique that seeks to present the data in a standard, intuitive framework that allows for high-performance access. It is inherently dimensional, and it adheres to a discipline that uses the relational model with some important restrictions. Every dimensional model is composed of one table with a multipart key, called the fact table, and a set of smaller tables called dimension tables. Each dimension table has a single-part primary key that corresponds exactly to one of the components of the multipart key in the fact table. This characteristic “star-like” structure is often called a star join. A fact table, because it has a multipart primary key made up of two or more foreign keys, always expresses a many-to-many relationship. The most useful fact tables also contain one or more numerical measures, or “facts,” that occur for the combination of keys that define each record. Dimension tables, by contrast, most often contain descriptive textual information. Dimension attributes are used as the source of most of the interesting constraints in data warehouse queries, and they are virtually always the source of the row headers in the SQL answer set. Dimension Attributes are the various columns in a dimension table. In the Location dimension, the attributes can be Location Code, State, Country, Zip code. Generally the Dimension Attributes are used in report labels, and query constraints such as where Country=’UK’. The dimension attributes also contain one or more hierarchical relationships. Before designing a data warehouse, one must decide upon the subjects.
Contrast with E-R:
In DM, a model of tables and relations is constituted with the purpose of optimizing decision support query performance in relational databases, relative to a measurement or set of measurements of the outcomes of the business process being modeled. In contrast, conventional E-R models are constituted to remove redundancy in the data model, to facilitate retrieval of individual records having certain critical identifiers, and therefore, optimize On-line Transaction Processing (OLTP) performance.
In a DM, the grain of the fact table is usually a quantitative measurement of the outcome of the business process being analyzed. The dimension tables are generally composed of attributes measured on some discrete category scale that describe, qualify, locate, or constrain the fact table quantitative measurements. Ralph Kimball views that the data warehouse should always be modeled using a DM/star schema. Indeed Kimball has stated that while DM/star schemas have the advantages of greater understandability and superior performance relative to E-R models, their use involves no loss of information, because any E-R model can be represented as a set of DM/star schema models without loss of information. In E-R models, normalization through addition of attributive and sub-type entities destroys the clean dimensional structure of star schemas and creates “snowflakes,” which, in general, slow browsing performance. But in star schemas, browsing performance is protected by restricting the formal model to associative and fundamental entities, unless certain special conditions exist (The key to understanding the relationship between DM and ER is that a single ER diagram breaks down into multiple DM diagrams. The ER diagram does itself a disservice by representing on one diagram multiple processes that never coexist in a single data set at a single consistent point in time. It’s no wonder the ER diagram is overly complex. Thus the first step in converting an ER diagram to a set of DM diagrams is to separate the ER diagram into its discrete business processes and to model each one separately. The dimensional model has a number of important data warehouse advantages that the ER model lacks. The dimensional model is a predictable, standard framework. Report writers, query tools, and user interfaces can all make strong assumptions about the dimensional model to make the user interfaces more understandable and to make processing more efficient.
The wild variability of the structure of ER models means that each data warehouse needs custom, handwritten and tuned SQL. It also means that each schema, once it is tuned, is very vulnerable to changes in the user’s querying habits, because such schemas are asymmetrical. By contrast, in a dimensional model all dimensions serve as equal entry points to the fact table. Changes in users’ querying habits don’t change the structure of the SQL or the standard ways of measuring and controlling performance
Conclusion:
It can be concluded that dimensional modeling is the only viable technique for designing end-user delivery databases. ER modeling defeats end-user delivery and should not be used for this purpose. ER modeling does not really model a business; rather, it models the micro relationships among data elements
Tuesday, March 9, 2010
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment