Wednesday, March 31, 2010

What's New (Reporting Services 2008?) – Part I

Microsoft SQL Server 2008 Reporting Services (SSRS) introduces many new features and enhancements that increase the reporting capabilities of people who develop reporting solutions. New features and enhancements on all SSRS level:

Report Authoring

Introduce tablix, chart, and gauge data regions. It also introduces support for richly formatted text, new data source types, and Report Builder 2.0, which offers many new features, like enhanced data layout and visualization, in an Office-like authoring environment. Finally, this topic describes incremental changes to authoring tools and the Report Definition Language (RDL) that allow a report author to take full advantage of new processing features.

1. New and Enhanced Data source type:
Included in SQL Server 2008 Reporting Services (SSRS) are new data processing extensions that enable you to report from the Teradata data source.

2. Enhanced Chart Data Region:

The new Chart data region supports a richer set of data visualization features. The new chart types include bar/column cylinder, pyramid, funnel, polar, radar, stock, candlestick, range column, range bar, smooth area, smooth line, stepped line, and box plot chart types. There is also built-in support for Pareto and Histogram charts.
A series can be displayed on any specified chart area and in any specified legend. You can insert empty points in place of missing data. You can combine multiple chart areas, multiple legends, and multiple titles on the same chart.
There is new support for secondary axes, which enable you to display multiple series using more than one axis. On each axis, you can set options to add scale breaks, logarithmic scales, custom axis intervals, and interlaced strip lines.
The new Chart data region supports automatic interval labeling to avoid label collisions, customizable rotation angles, font size and text-wrap properties for axis label calculations, support for axis crossing at custom positions, and support for background strip lines at regular or custom intervals.
You have more control to label, align and color your chart. Smart labels for data points adjust text to avoid label collisions. Custom color palettes and automatic alignment of multiple chart areas make it easier to improve the appearance and layout of charts in your report. You can collect small slices on a pie chart into one slice, drill down or show ToolTip information on individual data points, and place stacked bars or columns side-by-side in any number of stacks. The pie, doughnut, bar, and column charts support 2D and 3D drawing effects.
New design and run-time support for automatically calculated series on the chart are introduced in this release. Reporting Services provides built-in functions for 15 commonly used calculations, including statistical analysis, moving averages, and financial indicators. You can select a formula to create a new series of data points that is calculated at run time.
The user interface has been enhanced to make it easier to work with chart types. There are new dialog boxes for each chart element, shortcut menus for each chart element, support for text editing directly on the chart for titles and labels, a new chart type selector, and the ability to drag, drop, and rearrange fields in the drop-zone of a chart. This enables you to easily change the order in which series appear on a chart.

3. New Gauge Data Region:

The new Gauge data region is most often used to provide a high-level summary of your data by highlighting key performance indicator (KPI) values. The gauge uses a pointer to show a single value. You can add a range to highlight a subset of values on your scale and control the size and positioning of the scale on the gauge to create different visual effects.
The Gauge data region is a gauge panel that contains one more or gauges. You can combine multiple gauges into a gauge panel to show single values side by side. You can apply filtering or sorting on the gauge panel. You can choose between radial and linear gauge types. There is a specific thermometer gauge type to visualize temperature data. You can select a bullet graph from the list of linear gauges.

4. New Tablix Data Region:

The new Tablix data region is a generalized data region item that combines features of both a table and a matrix. In SQL Server 2008 Reporting Services (SSRS), the Tablix data region replaces the Table, Matrix, and List data regions by combining their features into one flexible grid layout. In RDL syntax, the Tablix element replaces the Table, Matrix, and List elements from previous RDL schemas.
The Tablix data region improves report layout flexibility and provides a more consistent rendering behavior. It uses a flexible grid layout to support multiple row groups and column groups. Groups are organized on a row hierarchy and a column hierarchy. Groups can be nested, adjacent, or recursive. After you add a Tablix data region, you can use the Grouping pane to quickly and easily create row and column groups, and add totals, subtotals, and grand totals. The Tablix data region automatically adjusts to include rows and columns on which to display group and summary data.
Tablix opens up new possibilities for working with complex and aggregated data, but does not prevent you from working with simple table and matrix structures. Report Designer provides three templates for the Tablix data region: Table, Matrix, and List, which you can drag from the Toolbox and use as a starting point for your reports. From here, you can design complex reports that integrate features of the various report types. In a tablix cell, you can embed any report item, including another data region.

5. New Enhancements for Text and HTML:

The text box report item has been enhanced. You can mix fonts and colors, add bold and italic styles, and use paragraph styles such as alignment and hanging indents. You can format an entire text box or you can format specific text, numbers, expressions, or fields within the text box. You can also import basic HTML from a field in your database for display in the report. You can create mail merges or template reports that mix data from a database with literal text on the design surface. The new release also provides full support for international complex scripts.
Report authors who use expressions in their reports will notice that simple expressions appear on the design surface as placeholder text. For example, in a text box on the design surface, the expression =Fields!Sales.Value is displayed as [Sales].

6. Enhancements In Report Designer:

The most noticeable change to the Report Designer work area is that the Data tab in Report Designer has been replaced with a Report Data pane that organizes all report data into one place that is always visible as you create the report layout. The Report Data pane shows you data sources, datasets, parameters, images, and built-in fields. Query designers can be accessed from the dataset.
The design surface supports a richer designer experience. Rulers show the current report item position and visible snap lines while you drag and resize items.
Other changes include a new Grouping pane that provides a convenient way to create groups for a Tablix data region, and new property dialog boxes to set properties for report items.

7. Report Builder 2.0:

The new release of Microsoft SQL Server 2008 Report Builder 2.0 offers many new features, like enhanced data layout, visualization and text formatting, and on-demand rendering. Report Builder 2.0 leverages your experience with Office 2007 products by featuring an Office-like authoring environment. You can create reports with multiple data regions, pull data from multiple data sources into a single report, and directly query relational and multidimensional data sources.

You can customize and update all existing reports, regardless of whether they were designed in Report Designer or in the previous version of the tool, Report Builder 1.0, which was installed with SQL Server 2005. Report Builder 1.0 will continue to be installed with Microsoft SQL Server 2008.
You can install the new Report Builder 2.0 separately from the Web. This version of Report Builder 2.0 is a stand-alone version that you or your administrator installs directly on your computer.

SQL Server 2008 PCU1 provides a ClickOnce version of Report Builder 2.0. The ClickOnce version can be used with Reporting Services in native mode or SharePoint integrated mode. You can start Report Builder ClickOnce from Report Manager or a SharePoint library.

8. New and Enhanced Report Items and RDL Schema:

Report Definition Language (RDL) defines the syntax for a Reporting Services report. In this release, major changes to RDL include the addition of Tablix, Chart, and Gauge data regions, which are described in the previous section. However this release also includes incremental enhancements to support the new report processing model and the new report rendering model.

RDL Page Elements

RDL page elements have been redesigned to provide more consistent control over page breaks when you export a report to different renderers. New RDL elements include PageSections, PageSection, and PageBreak. Page headers and footers, page margins, columns, column spacing, the InteractiveHeight element and the InteractiveWidth element have moved from the Report parent element to the Page parent element. New support for KeepTogether and KeepWithGroup has been introduced to support better control over pagination.

Explicit Null Values

The new attribute xsi:nil="true" enables you to distinguish between an element that is explicitly null instead of one that is not specified.

Processing-time Variables

You can declare variables that are global throughout the report or local to particular group scopes. You can use a global variable to calculate an expression one time, and then use that value in expressions throughout the report. The DeferVariableEvaluation element controls whether a variable is evaluated during on-demand processing of the report. Do not set this value if the variable has a time-dependency.

Data type for constants

The Value element has an optional DataType attribute that specifies the data type of the value if it is a constant. If DataType is omitted, a constant uses the String data type. DataType can be Boolean, DateTime, Integer, Float, or String.

Following topics I’ll cover in next article:
• Report Processing and Rendering
• Server Architecture and Tools
• Report Programmability

Thanks,
Krishan

Saturday, March 20, 2010

Overview of SQL Server Reporting Services – PART I

This article will provide some insight into the SQL Server Reporting Services. How we can utilize the SSRS tool to enhance our reporting skills.

Article Highlights:
· Reporting: Challenges
· Introduction of Microsoft Reporting Solution
· What is Reporting Services?
· Reporting Services Architecture
· Reporting Services Key Concepts
· Report Delivery Mechanisms
· Reporting Services Integration
· Reporting Services Pros & Cons

Reporting Challenges:
· Data resides in many places
· Needed “in” applications
· Users’ Reporting Needs Change
· Web Interface
· Multiple Formats Wanted
· High Scale Reporting

How do I?
We can achieve these challenges to build a report with multiple data sources, enable interactive access to information, enable an end user to build (or modify) their own report, provide reports in a format that end users wants and integrate reporting into applications easily.

Introduction of Microsoft Reporting Solution
Microsoft introduce a server-based reporting platform that you can use to create and manage tabular, matrix, graphical, and free-form reports that contain data from relational and multidimensional data sources. The reports that you create can be viewed and managed over a World Wide Web-based connection.
· A complete set of tools that you can use to create, manage, and view reports.
· Variety of outputs including HTML, PDF, TIFF, Excel, CSV, and more.
Note: SQL Server 2008 SSRS can also prepare reports in (DOC) format.
· Set of APIs.

What is Reporting Services?
Reporting Services is an open and extensible platform supporting the authoring, management and delivery of rich, interactive reports to the entire enterprise. SSRS Enhance Reporting Capabilities:
· Scalable Server
1. Rich, Enterprise Reporting Platform
2. Scalable, Multiple data sources with multiple delivery options
3. manageable Web Services architecture
4. Scheduling, Snapshots, Caching, more…
· Advanced Authoring Tools
· End-user, Ad Hoc Reporting
· Strong Management Story
1. SOAP Web Service APIs
2. Report Manager portal
3. Extensible security model
4. Integration with SSAS, SSIS, management tools


Reporting Services Architecture



Scale-Out Architecture



Report Server Catalog
· The SSRS catalog encompasses two databases: Report Server database (the default name is ReportServer) and Report Server temporary database (the default name is ReportServerTempDB). Report Server database is a SQL Server database that stores part of SSRS configuration, report definitions, report metadata, report history, cache policy, snapshots, resources, security settings, encrypted data, scheduling and delivery data, and extension information.
· Another database that SSRS uses is the Report Server temporary database. This database is responsible for storing intermediate processing products, such as cached reports, and session and execution data.
· Report Server periodically cleans expired and orphan data in ReportServerTempDB.

Reporting Services Key Concepts
· Reporting Services Configuration Manager
· Report Manager
· Report Designer
· Report Model Designer
· Report Builder
· Report Server
· Data Source (.ds)
· Data Source View (.dsv)
· Report Model (.smdl)
· Report Definition (.rdl)

Report Designer
· Integrated into Visual Studio 2005
o Supports all languages (C#, VB, J#)
· Drag & drop metaphor, similar to Windows Forms Designer
· Use tables, lists, charts, matrices, etc. to display data
· Specify grouping, sorting, filtering, etc.
· Style the report by specifying fonts, colors, etc.
· Use expressions to change style dynamically based on data values
· Publishing and deployment features

Report Builder
· Ad hoc Reporting for the End-User
· Click Once Install
· With Report Builder you can:
1. Modify a Report
2. Build a New Report
3. Report on Relational or OLAP data




Report Designer vs. Builder




Modular Designer
· Hosted in Visual Studio 2005 (new project type)
· For generating or editing models used by Report Builder
· Auto-generation of models from data sources





Report Controls
· Controls make it easy to embed reporting functionality into applications
· Windows Forms (rich client) and Web Forms (ASP.NET) control
· Local processing mode (no server) or connected server mode



Report Delivery Mechanisms
· Pull Delivery
1. URL Access
2. Web Services
3. Share point web parts
· Push Delivery
1. Standard Subscription
2. Data Driven Subscription
· Local Delivery
1. Window form control
2. ASP.NET server control
· Ad hoc report via Report Builder

Web Services Interface
· Namespace Management
· Item Properties
· Reports Parameters
· Report Execution
· Data Source Management
· Report History
· Scheduling
· Job Management
· Model Management
· Subscription & Delivery

Reporting Services-Pros
· It is free (provided you have the SQL server license)
· Tight data integration with SQL Server, but it handles anything .NET can (Oracle, ODBC etc...).
· 2008 has native support for Terra data too
· Integration with the Visual Studio, Management studio, SharePoint and PPS.
· It is just a web application though so integration into any web page or application that can talk to a web server is easy too.
· Built in tools to do subscriptions (i.e. emails that get sent out on a regular basis to a list of people with the report on them).
· Export to a variety of formats (XML, CSV, Excel, PDF etc...)
· 2008 adds Doc too
· Ability to design templates which power users can use to build reports without knowing SQL.

Reporting Services-Cons
· Charting controls look like Excel 2003 and are limited.
· 2008 has the Dundas controls in by default so they are much more powerful, more varied and better looking.
· 2008 removes that as it is no longer an IIS web app. It runs it's own web server based off the IIS core but is closer to a stand alone app - so the security issues aren't a problem
· Designer support is a pain. 2000 Reports must be developed in VS 2003, 2005 reports must be developed in VS 2005, 2008 reports must be developed in VS 2008. By Visual Studio I mean the normal one or the thin downed version you get with the SQL Management tools.
· Compatibility. Each version of reporting services can run only the current version and one version back of the reports.
· Security is limited to Integrated Windows or Anonymous
· 2008 has added support for forms based security and for custom providers, like you get with ASP.NET

Features - Out of Article Scope
· SharePoint Web Parts
· Rich Client Printing
· Visual Studio Integration
· Management Studio Integration
· Subscription
· Caching
· Snapshot
· Security management
· New features in SSRS 2008
· Scale-out Reporting Server environment

I’ll cover these features in Overview SSRS PART –II……………………………….
Thanks,
Krishan

Friday, March 19, 2010

SSAS - MOLAP, ROLAP and HOLAP

Introduction
There are three standard storage modes (MOLAP, ROLAP and HOLAP) in OLAP applications which affect the performance of OLAP queries and cube processing, storage requirements and also determine storage locations.
SSAS (2005 and 2008) supports not only these three standard storage modes but supports proactive caching, a new feature with SSAS 2005 which enables you to combine the best of both worlds (ROLAP and MOLAP storage) for both frequency of data refresh and OLAP query performance.
Basic storage modes
The cube data can be divided into three different types - meta-data, detail data and aggregate data. No matter what storage is used, the meta-data will always be stored on the OLAP server but storage of the detail data and aggregate data will depend on the storage mode you specify.

MOLAP (Multidimensional OLAP)
This is the default and most frequently used storage mode. In this mode when you process the cube, the source data is pulled from the relational store, the required aggregation is then performed and finally the data is stored in the Analysis Services server in a compressed and optimized multidimensional format.
After processing, once the data from the underlying relational database is retrieved there exists no connection to the relational data stores. So if there is any subsequent changes in the relational data after processing that will not reflect in the cube unless the cube is reprocessed and hence it is called offline data-set mode.
Since both the detail and aggregate data are stored locally on the OLAP server, the MOLAP storage mode is very efficient and provides the fastest query performance.
Pros
1. Stores the detail and aggregate data in the OLAP server in a compressed multidimensional format; as a result the cube browsing is fastest in this mode.
2. Provides maximum query performance, because all the required data (a copy of the detail data and calculated aggregate data) are stored in the OLAP server itself and there is no need to refer to the underlying relational database.
3. All the calculations are pre-generated when the cube is processed and stored locally on the OLAP server hence even the complex calculations, as a part the query result, will be pefromed quickly.
4 MOLAP does not need to have a permanent connection to the underlying relational database (only at the time of processing) as it stores the detail and aggregate data in the OLAP server so the data can be viewed even when there is connection to the relational database.
5. MOLAP uses compression to store the data on the OLAP server and so has less storage requirements than relational databases for same amount of data. (Note however, that beginning with SQL Server 2008 you can use data compression at relational database level as well).
Cons
1. With MOLAP mode, you need frequent processing to pull refreshed data after last processing resulting in drain on system resources.
2. Latency; just after the processing if there is any changes in the relational database it will not be reflected on the OLAP server unless re-processing is performed.
3. MOLAP stores a copy of the relational data at OLAP server and so requires additional investment for storage.
4. If the data volume is high, the cube processing can take longer, though you can use incremental processing to overcome this.

ROLAP (Relational OLAP)
In comparison with MOLAP, ROLAP does not pull data from the underlying relational database source to the OLAP server but rather both cube detail data and aggregation stay at relational database source. In order to store the calculated aggregation the database server creates additional database objects (indexed views). In other words, the ROLAP mode does not copy the detail data to the OLAP server, and when a query result cannot be obtained from the query cache the created indexed views are accessed to provide the results.
Pros
1. Ability to view the data in near real-time.
2. Since ROLAP does not make another copy of data as in case of MOLAP, it has less storage requirements. This is very advantageous for large datasets which are queried infrequently such as historical data.
3. In ROLAP mode, the detail data is stored on the underlying relational database, so there is no limitation on data size that ROLAP can support or limited by the data size of relational database. In nutshell, it can even handle huge volumes of data.
Cons
1. Compared to MOLAP or HOLAP the query response is generally slower because everything is stored on relational database and not locally on the OLAP server.
2. A permanent connection to the underlying database must be maintained to view the cube data.

Note:
If you use ROLAP storage mode and your relational database is SQL Server, the Analysis Services server may create indexed views for aggregation. However this requires a few prerequisite to be available - for example, the data source must be a table, not a view. The table name must use two part naming convention or it must be qualified with owner/schema name etc. For a complete list of these prerequisites you can refer to the link provided in reference section.
HOLAP (Hybrid OLAP)
This mode is a hybrid of MOLAP and ROLAP and attempts to provide the greater data capacity of ROLAP and the fast processing and high query performance of MOLAP.
In HOLAP storage mode, the cube detail data remains in the underlying relational data store and the aggregations are stored on the OLAP server. If you query only summary data in aggregation, the HOLAP storage mode will work similar to MOLAP. For the detail data queries, HOLAP will drill through the detail data in underlying relational data store and hence performance would not be as good as MOLAP. Therefore, your query would be as fast as MOLAP in if your query result can be provided from query cache or aggregation but performance would degrade if it needs the detail data from relational data store.
Pros
1. HOLAP balances the disk space requirement, as it only stores the aggregate data on the OLAP server and the detail data remains in the relational database. So no duplicate copy of the detail data is maintained.
2. Since HOLAP does not store detail data on the OLAP server, the cube and partitions would be smaller in size than MOLAP cubes and partitions.
Performance is better than ROLAP as in HOLAP the summary data are stored on the OLAP server and queries can be satisfied from this summary data.
HOLAP would be optimal in the scenario where query response is required and query results are based on aggregations on large volumes of data.
Cons
1. Query performance (response time) degrades if it has to drill through the detail data from relational data store, in this case HOLAP performs very much like ROLAP.





Summary




Configuring the storage mode
Select a particular OLAP object in BIDS (Business Intelligence Development Studio), right click on it and then select properties. The property called StorageMode will allow you to set the storage mode to MOLAP, ROLAP or HOLAP.

Friday, March 12, 2010

SSIS - Lookup Cache Modes - Full, Partial, None

There are many, many resources out on the ‘Net regarding SSIS and the Lookup component and what each of its cache modes are and how to implement them in your own package. This is going to be a technical post, for those of you interested in what each cache mode does behind the scenes.


For this post, use the following schema and data:
create table fact_sales
(id int identity(1,1),
sales_rep_id int,
sales_dollars decimal(18,2)
)

create table dim_sales_rep
( id int identity(1,1),
first_name varchar(30),
last_name varchar(50)
)

insert into fact_sales (sales_rep_id, sales_dollars) values (1,120.99);
insert into fact_sales (sales_rep_id, sales_dollars) values (2,24.87);
insert into fact_sales (sales_rep_id, sales_dollars) values (3,98.11);
insert into fact_sales (sales_rep_id, sales_dollars) values (4,70.64);
insert into fact_sales (sales_rep_id, sales_dollars) values (4,114.19);
insert into fact_sales (sales_rep_id, sales_dollars) values (4,37.00);
insert into fact_sales (sales_rep_id, sales_dollars) values (5,401.50);

insert into dim_sales_rep (first_name, last_name) values ('K','S');
insert into dim_sales_rep (first_name, last_name) values ('S','D');
insert into dim_sales_rep (first_name, last_name) values ('Larry','White');
insert into dim_sales_rep (first_name, last_name) values ('Carrie','Green');
insert into dim_sales_rep (first_name, last_name) values ('Adam','Smith');


FULL Cache Mode
First, it is always advisable to build a query for the lookup, instead of choosing a table in the Table/View drop-down. The primary reason is so that you can limit the resultset to only the columns needed to perform the lookup as well as return any columns needed downstream, and to have the ability to add a WHERE clause if needed.

The full cache mode will run the specified query (or its own depending on how you assigned the lookup table) and attempt to cache all of the results. It will execute this query very early on in the package execution to ensure that the first set of rows coming out of the source(s) are cached. If SSIS runs out of memory on the machine though, the data flow will fail as the lookup component will not spool its memory overflow to disk. Be cautious of this fact. Once the data is cached, the lookup component will not go back to the database to retrieve its records, so long as the data flow is not restarted. (In SQL Server 2008, you can now reuse lookup caches.)

Using SQL Profiler, you can see that only one database call is made:
declare @p1 int
set @p1=1
exec sp_prepare @p1 output,NULL,N'select sales_rep_id, sales_dollars
from fact_sales',1
select @p1
go
exec sp_execute 1
go
SET NO_BROWSETABLE ON
go
declare @p1 int
set @p1=1
exec sp_prepare @p1 output,NULL,N'select id, first_name, last_name
from dim_sales_rep',1
select @p1
go
exec sp_execute 1
go
exec sp_unprepare 1
go
exec sp_unprepare 1
go


PARTIAL Cache Mode
Partial cache mode will not execute a query immediately at package execution. Instead, it will wait until its first input row arrives. Once the row arrives, whatever lookup value (in this case, sales_rep_id) is being passed in, will get substituted for a parameter, and then SSIS will send the query to the database for retrieval. At this point, all of the data returned will be cached for future lookups. If a new sales_rep_id is encountered, then the query will have to be re-executed, and the new resultset will get added to the lookup cache.

In other words, in the above data, if my source is “select sales_rep_id, sales_dollars from fact_sales”, we should have five database calls made by the lookup component. Even though for sales_rep_id = 4 we have three entries, in partial cache mode the first time we retrieve the lookup records for sales_rep_id = 4, the results will be cached, allowing future occurrences of sales_rep_id = 4 to be retrieved from cache.

This is illustrated in the SQL Profiler data:
exec sp_executesql N'select * from (select id, first_name, last_name
from dim_sales_rep) [refTable]
where [refTable].[id] = @P1',N'@P1 int',1
go
exec sp_executesql N'select * from (select id, first_name, last_name
from dim_sales_rep) [refTable]
where [refTable].[id] = @P1',N'@P1 int',2
go
exec sp_executesql N'select * from (select id, first_name, last_name
from dim_sales_rep) [refTable]
where [refTable].[id] = @P1',N'@P1 int',3
go
exec sp_executesql N'select * from (select id, first_name, last_name
from dim_sales_rep) [refTable]
where [refTable].[id] = @P1',N'@P1 int',4
go
exec sp_executesql N'select * from (select id, first_name, last_name
from dim_sales_rep) [refTable]
where [refTable].[id] = @P1',N'@P1 int',5
go
exec sp_unprepare 1
go

In the above data, you can see at the end each sales_rep_id being passed in. Note that we only have one line for sales_rep_id = 4. That’s because the remaining two records were bounced against the lookup cache, avoiding a trip to the database.
NO Cache Mode
Using the NO Cache Mode will essentially tell SSIS that you want each incoming row (from fact_sales in this case) to be bounced against the database. Since we have seven fact_sales rows, we will see seven calls to the database - MOST of the time. It is important to note that even though we are telling the lookup component to avoid caching rows, it will keep the last match in memory and use it for the next comparison. If the next comparison’s key value matches the value still in memory, a database call is avoided, and the value is carried forward.

In our example data above, if we sort by sales_rep_id, we will still only have five calls to the database because after we lookup our first value of sales_rep_id = 4, it will be reused for the subsequent lookups for sales_rep_id = 4. If we sort our data by sales_dollars, we will have six database calls, because only two sales_rep_id = 4 records are together and hence the first lookup is only used once.

Here is a simple table illustrating each no cache example mentioned above:
SALES_REP_ID, SALES_DOLLARS, LOOKUP DATABASE CALL Y or N
1 120.99 Y
2 24.87 Y
3 98.11 Y
4 70.64 Y
4 114.19 N
4 37.00 N
5 401.50 Y

SALES_REP_ID, SALES_DOLLARS, LOOKUP DATABASE CALL Y or N
2 24.87 Y
4 37.00 Y
4 70.64 N
3 98.11 Y
4 114.19 Y
1 120.99 Y
5 401.50 Y

The SQL Profiler data for the second example above is here:
exec sp_executesql N'select * from (select id, first_name, last_name
from dim_sales_rep) [refTable]
where [refTable].[id] = @P1',N'@P1 int',2
go
exec sp_executesql N'select * from (select id, first_name, last_name
from dim_sales_rep) [refTable]
where [refTable].[id] = @P1',N'@P1 int',4
go
exec sp_executesql N'select * from (select id, first_name, last_name
from dim_sales_rep) [refTable]
where [refTable].[id] = @P1',N'@P1 int',3
go
exec sp_executesql N'select * from (select id, first_name, last_name
from dim_sales_rep) [refTable]
where [refTable].[id] = @P1',N'@P1 int',4
go
exec sp_executesql N'select * from (select id, first_name, last_name
from dim_sales_rep) [refTable]
where [refTable].[id] = @P1',N'@P1 int',1
go
exec sp_executesql N'select * from (select id, first_name, last_name
from dim_sales_rep) [refTable]
where [refTable].[id] = @P1',N'@P1 int',5
go

Thursday, March 11, 2010

Commit to excellence

Robin Sharma Said:
....Commit to excellence. Become massively innovative and wear your passion on your sleave. They might call you different or wierd or even crazy. But please remember, every great leader (or visionary or brave thinkier) was initially laughed at. Now they are revered.

Tuesday, March 9, 2010

Principles for Enterprise Data Warehouse Design

Data warehousing is not a stagnant technology; it's alive and kicking. Indeed, most companies deploy data warehousing technology to some extent, and many have an enterprise-wide DW.

However, as with any technology, a DW can quickly become a quagmire if it's not designed, implemented and maintained properly. With this in mind, there are few principles that can help us us start - and keep - our DW design and implementation on the road to achieving our desired results. there total 7 business and IT principles because most IT issues really involve business and IT equally.

Business Principles
Organizational Consensus(Agreements):
Make every effort to gain acceptance for, and minimize resistance to, the DW. If we can involve the stakeholders early in the process, they're much more likely to embrace the DW, use it and, hopefully, champion it to the rest of the company.
Data Integrity:
Data warehousing - of any business intelligence (BI) project - is a single version of the truth about organizational data. Any design for DW should begin by minimizing the chances for data replication and inconsistency. It should also promote data integration and standardization. Any reasonable methodology we can choose to achieve data integrity should work, as long as we can implement the methodology effectively with the end result in mind.
Implementation Efficiency:
To help meet the needs of company as early as possible and minimize project costs, the DW design should be straightforward and efficient to implement. This is basic fundamental design issue. we can design a technically DW, but if that design is difficult to understand or implement or doesn't meet user needs, our DW project will be in difficulty and cost can be increase.
User Friendliness:
Data warehouse should be user friendliness and ease to use.The DW should a common front-end across the company - based on user roles and security levels. And it should also be intuitive enough to have a minimal learning curve for most users.
Operational Efficiency:
Operational efficiency can be achieved only with a DW design that is easy to implement and maintain. A technically solution might be beautiful, but a practical, easy-to-maintain solution can give better results in the long run. Data warehouse should be easy to support to business change requests. Errors and exceptions should also be easy to remedy, and support costs should be moderate over the life of the DW.
IT Principles
Scalability:
Scalability is often a big problem with DW design. The solution is to build in scalability from the start. Choose toolsets and platforms that support future expansions of data volumes and types as well as changing business requirements.
Compliance with IT Standards:
DW design should compliance with IT standards and can take advantage of skill sets of IT and business users.


These principles, can help in a better position to recognize and address potential problems before they turn into project killers but won't guarantee we will always achieve our desired results in designing and implementing DW.

Principles of Dimensional Modeling

Principles of Dimensional Modeling:
Dimensional modeling is the name of a logical design technique often used for data warehouses. DM is the only viable technique for databases that are designed to support end-user queries in a data warehouse. It is different from, and contrasts with, entity-relation modeling. ER is very useful for the transaction capture and the data administration phases of constructing a data warehouse, but it should be avoided for end-user delivery. This paper explains the dimensional modeling and how dimensional modeling technique varies/ contrasts with ER models.
Dimensional Modeling(DM):
Dimensional Modeling is a favorite modeling technique in data warehousing. DM is a logical design technique that seeks to present the data in a standard, intuitive framework that allows for high-performance access. It is inherently dimensional, and it adheres to a discipline that uses the relational model with some important restrictions. Every dimensional model is composed of one table with a multipart key, called the fact table, and a set of smaller tables called dimension tables. Each dimension table has a single-part primary key that corresponds exactly to one of the components of the multipart key in the fact table. This characteristic “star-like” structure is often called a star join. A fact table, because it has a multipart primary key made up of two or more foreign keys, always expresses a many-to-many relationship. The most useful fact tables also contain one or more numerical measures, or “facts,” that occur for the combination of keys that define each record. Dimension tables, by contrast, most often contain descriptive textual information. Dimension attributes are used as the source of most of the interesting constraints in data warehouse queries, and they are virtually always the source of the row headers in the SQL answer set. Dimension Attributes are the various columns in a dimension table. In the Location dimension, the attributes can be Location Code, State, Country, Zip code. Generally the Dimension Attributes are used in report labels, and query constraints such as where Country=’UK’. The dimension attributes also contain one or more hierarchical relationships. Before designing a data warehouse, one must decide upon the subjects.
Contrast with E-R:
In DM, a model of tables and relations is constituted with the purpose of optimizing decision support query performance in relational databases, relative to a measurement or set of measurements of the outcomes of the business process being modeled. In contrast, conventional E-R models are constituted to remove redundancy in the data model, to facilitate retrieval of individual records having certain critical identifiers, and therefore, optimize On-line Transaction Processing (OLTP) performance.
In a DM, the grain of the fact table is usually a quantitative measurement of the outcome of the business process being analyzed. The dimension tables are generally composed of attributes measured on some discrete category scale that describe, qualify, locate, or constrain the fact table quantitative measurements. Ralph Kimball views that the data warehouse should always be modeled using a DM/star schema. Indeed Kimball has stated that while DM/star schemas have the advantages of greater understandability and superior performance relative to E-R models, their use involves no loss of information, because any E-R model can be represented as a set of DM/star schema models without loss of information. In E-R models, normalization through addition of attributive and sub-type entities destroys the clean dimensional structure of star schemas and creates “snowflakes,” which, in general, slow browsing performance. But in star schemas, browsing performance is protected by restricting the formal model to associative and fundamental entities, unless certain special conditions exist (The key to understanding the relationship between DM and ER is that a single ER diagram breaks down into multiple DM diagrams. The ER diagram does itself a disservice by representing on one diagram multiple processes that never coexist in a single data set at a single consistent point in time. It’s no wonder the ER diagram is overly complex. Thus the first step in converting an ER diagram to a set of DM diagrams is to separate the ER diagram into its discrete business processes and to model each one separately. The dimensional model has a number of important data warehouse advantages that the ER model lacks. The dimensional model is a predictable, standard framework. Report writers, query tools, and user interfaces can all make strong assumptions about the dimensional model to make the user interfaces more understandable and to make processing more efficient.
The wild variability of the structure of ER models means that each data warehouse needs custom, handwritten and tuned SQL. It also means that each schema, once it is tuned, is very vulnerable to changes in the user’s querying habits, because such schemas are asymmetrical. By contrast, in a dimensional model all dimensions serve as equal entry points to the fact table. Changes in users’ querying habits don’t change the structure of the SQL or the standard ways of measuring and controlling performance
Conclusion:
It can be concluded that dimensional modeling is the only viable technique for designing end-user delivery databases. ER modeling defeats end-user delivery and should not be used for this purpose. ER modeling does not really model a business; rather, it models the micro relationships among data elements