Thursday, February 25, 2010
Data Warehouse Design & ETL Concepts
Data warehouses and data marts are built on dimensional data modeling where fact tables are connected with dimension tables. This is most useful for users to access data since a database can be visualized as a cube of several dimensions.
Dimensional Data Modeling
Dimensional Data Modeling comprises of one or more dimension tables and fact tables.
Dimensional Data Modeling is used for calculating summarized data. For example, sales data could be collected on a daily basis and then be aggregated to the week level, the week data could be aggregated to the month level, and so on.
Dimension Tables
Dimension table is one that describes the business entities of an enterprise, represented as hierarchical, categorical information such as time, departments, locations, and products. Dimension tables are sometimes called lookup or reference tables.
For example, Product dimension table will store information about products(Product Category, Product Sub Category, Product and Product Features) and location dimension table will store information about location( country, state, county, city, zip.
Fact Tables
The centralized table in a star schema is called as FACT(measures) table. A fact table typically has two types of columns: those that contain facts and those that are foreign keys to dimension tables. The primary key of a fact table is usually a composite key that is made up of all of its foreign keys.
For example Sales Amount can be the measure for any business.
Measure Types
Additive - Measures that can be added across all dimensions.
Non Additive - Measures that cannot be added across all dimensions.
Semi Additive - Measures that can be added across few dimensions and not with others.
Star Schema
Star Schema is a relational database schema for representing multidimensional data. It is the simplest form of data warehouse schema that contains one or more dimensions and fact tables. It is called a star schema because the entity-relationship diagram between dimensions and fact tables resembles a star where one fact table is connected to multiple dimensions. The center of the star schema consists of a large fact table and it points towards the dimension tables. The advantage of star schema is slicing down, performance increase and easy understanding of data.
Snowflake Schema
A snowflake schema is a term that describes a star schema structure normalized through the use of outrigger tables. Dimension table hierarchies are broken into simpler tables.
ETL Concept
Extraction, transformation, and loading. ETL refers to the methods involved in accessing and manipulating source data and loading it into target database.
The first step in ETL process is mapping the data between source systems and target database(data warehouse or data mart). The second step is cleansing of source data in staging area. The third step is transforming cleansed source data and then loading into the target system.
Note that ETT (extraction, transformation, transportation) and ETM (extraction, transformation, move) are sometimes used instead of ETL.
Glossary of ETL
Source System
A database, application, file, or other storage facility from which the data in a data warehouse is derived.
Mapping
The definition of the relationship and data flow between source and target objects.
Metadata
Data that describes data and other structures, such as objects, business rules, and processes. For example, the schema design of a data warehouse is typically stored in a repository as metadata, which is used to generate scripts used to build and populate the data warehouse. A repository contains metadata.
Staging Area
A place where data is processed before entering the warehouse.
Cleansing
The process of resolving inconsistencies and fixing the anomalies in source data, typically as part of the ETL process.
Transformation
The process of manipulating data. Any manipulation beyond copying is a transformation. Examples include cleansing, aggregating, and integrating data from multiple sources.
Transportation
The process of moving copied or transformed data from a source to a data warehouse.
Target System
A database, application, file, or other storage facility to which the "transformed source data" is loaded in a data warehouse.
Monday, February 22, 2010
Business Intelligence Overview - Part I
Business intelligence is the art of gaining business advantage from data. It is the use of an organization's disparate data to provide meaningful information and analysis to employees, customers, suppliers, and partners for more effective decision making.
Business Intelligence Models are based on multi dimensional analysis and key performance indicators (KPI) of an enterprise.
Why Is It Important?
· Real, Actionable Data for Decision-Making
· Greater Returns on Investment
· Higher Level of Member Service
· Business Insight and Information-Sharing
· the best possible utilization of resources
· Competitor analysis
· Key client planning
How does BI Works?
Uses of Business Intelligence:
Business Intelligence is used to answer the organization queries like:
• Who are my best and worst customers (and therefore, where should I concentrate my future sales efforts)
• What parameters affect my sales (Is there a brilliant sales person? Has a campaign been successful?)
• What advantages does my business offer customers, as compared with the competition?
• Where are we making/losing money (in terms of geography, product line, campaigns, resources etc.)
• What is the net income, expenses, gross profit, and net profit for this quarter, semester and year?
Business Intelligence Solution Architecture:
Data Warehouse Concepts
What is Data Warehouse?
A data warehouse is collection of historical data; it is subject oriented data, integrated, time variant and non volatile data for decision making process. Following are key points of data warehouse:
· Data warehouse is a relational and multinational database.
· Designed for query and analysis rather than transaction process.
· Data warehouse usually historical data and derived from online transaction data.
· Data Stored in Demoralized form
· Data warehouse environment often consist an ETL solution, OLAP design.
· The data warehouse is nothing more than the union of all the constituent data marts.
Elements of the data warehouse:
Source System: Multiple Sources, Excel, People Soft, RDBMS, Flat File and SAP etc.
Data Staging Area: ETL- Extraction, Transformation and Loading.
Presentation Server: Physical Data Warehouse.
Dimensional Modeling: Alternative of E/R Modeling, designed based on Dimension and Fact table.
Business Process: A business process is usually a set of activities like "order processing" or "customer pipeline management," but business processes can overlap, and certainly the definition of an individual business process will evolve over time.
Data Marts: Designed for particular line of business activity like Sales, Marketing and Finance etc. A data mart is a complete “pie-wedge” of the overall data warehouse pie.
Operation Data Store: ODS is more than the real enterprise data warehouse, data is refreshed in near time and used for routine business activity.
On-Line Analytic Process: Online Analytical Processing is an approach that helps organization to take advantages of DATA. Technically, OLAP cube allows one to analyze data across multiple dimensions by providing multidimensional view of aggregated, grouped data.
Relation OLAP: ROLAP stands for Relational Online Analytical Process that provides multidimensional analysis of data, stored in a Relational database.
Multidimensional OLAP: MOLAP provides the analysis of data stored in a multi-dimensional data cube.
Hybrid OLAP: HOLAP a combination of both ROLAP and MOLAP can provide multidimensional analysis simultaneously of data stored in a multidimensional database and in a relational database
Desktop or Database OLAP: DOLAP provide multidimensional analysis locally in the client machine on the data collected from relational or multidimensional database servers.
End User Application: Application which provides decision making information in targeted format. Like Excel sheet and Graphical etc.
End User Data Access Tool: Client of the data ware house to access and review the data warehouse information.
Metadata: All of the information in the data warehouse environment that is not the actual data itself.
Sunday, February 21, 2010
Dynamic Management View in SQL Server 2005
The Dynamic management View in SQL server 2005 provides a window into what's going on inside SQL Server. They can provide information on what's currently happening inside the server as well as the objects it's storing. They are designed to be used instead of system tables and the various functions provided in SQL Server 2000.
The Dynamic Management Views are actually composed of both views and table-valued functions. Some apply to the entire server and are stored in the master database. Others are specific to each database. All are stored in the sys schema. They all start with dm_ in the name. They have been broken up into twelve categories:
1. Common Language Runtime Related Dynamic Management Views:
2. I/O Related Dynamic Management Views and Functions
3. Database Mirroring Related Dynamic Management Views
4. Query Notifications Related Dynamic Management Views
5. Database Related Dynamic Management Views
6. Replication Related Dynamic Management Views
7. Execution Related Dynamic Management Views and Functions
8. Service Broker Related Dynamic Management Views
9. Full-Text Search Related Dynamic Management Views
10. SQL Operating System Related Dynamic Management Views
11. Index Related Dynamic Management Views and Functions
12. Transaction Related Dynamic Management Views and Functions
Few are very common views:
Sessions (To view current session):
SELECT session_id, login_name, last_request_end_time, cpu_time
FROM sys.dm_exec_sessions
WHERE session_id >= 51
GO
Connections (To view current connections):
SELECT connection_id, session_id, client_net_address, auth_scheme
FROM sys.dm_exec_connections
GO
Request (To view current Request):
SELECT session_id, status, command, sql_handle, database_id
FROM sys.dm_exec_requests
WHERE session_id >= 51
GO
SQL Text (To view current SQL text):
SELECT st.text
FROM sys.dm_exec_requests r
CROSS APPLY sys.dm_exec_sql_text(sql_handle) AS st
WHERE r.session_id = 56
GO
Security:
In order to query these views a user needs specific permissions granted. To view the server-wide DMVs the user must be granted the VIEW SERVER STATE on the server. Administrator can give the permission using following statement:
GRANT VIEW SERVER STATE to username
Thursday, February 18, 2010
Data modeling
Data models document entities (the persons, places and things [product, warehouse, partner etc.] an organization encounters in the course of business); the relationships of entities (e.g. employee WORKS in warehouse, MANAGES product and SHIPS to partner); and the attributes of entities (description, order number, address, account balance etc.).
There are three common types of data models. Conceptual data models define and describe business concepts at a high level for stakeholders addressing a business challenge. Logical data models are more detailed and describe entities, attributes and relationships in business terms. Physical data models define database objects, schema and the actual columns and tables of data that will be created in the database.
Like the blueprint of a building, a data model is the design specification for a database. Data modeling can be helped by off the shelf data models that can be adapted to a specific use. But data architects warn that without proper time and attention to "design before you build," organizations face inaccurate reporting, incorrect data, costly remediation and difficulty in meeting new user requirements.