Friday, March 19, 2010

SSAS - MOLAP, ROLAP and HOLAP

Introduction
There are three standard storage modes (MOLAP, ROLAP and HOLAP) in OLAP applications which affect the performance of OLAP queries and cube processing, storage requirements and also determine storage locations.
SSAS (2005 and 2008) supports not only these three standard storage modes but supports proactive caching, a new feature with SSAS 2005 which enables you to combine the best of both worlds (ROLAP and MOLAP storage) for both frequency of data refresh and OLAP query performance.
Basic storage modes
The cube data can be divided into three different types - meta-data, detail data and aggregate data. No matter what storage is used, the meta-data will always be stored on the OLAP server but storage of the detail data and aggregate data will depend on the storage mode you specify.

MOLAP (Multidimensional OLAP)
This is the default and most frequently used storage mode. In this mode when you process the cube, the source data is pulled from the relational store, the required aggregation is then performed and finally the data is stored in the Analysis Services server in a compressed and optimized multidimensional format.
After processing, once the data from the underlying relational database is retrieved there exists no connection to the relational data stores. So if there is any subsequent changes in the relational data after processing that will not reflect in the cube unless the cube is reprocessed and hence it is called offline data-set mode.
Since both the detail and aggregate data are stored locally on the OLAP server, the MOLAP storage mode is very efficient and provides the fastest query performance.
Pros
1. Stores the detail and aggregate data in the OLAP server in a compressed multidimensional format; as a result the cube browsing is fastest in this mode.
2. Provides maximum query performance, because all the required data (a copy of the detail data and calculated aggregate data) are stored in the OLAP server itself and there is no need to refer to the underlying relational database.
3. All the calculations are pre-generated when the cube is processed and stored locally on the OLAP server hence even the complex calculations, as a part the query result, will be pefromed quickly.
4 MOLAP does not need to have a permanent connection to the underlying relational database (only at the time of processing) as it stores the detail and aggregate data in the OLAP server so the data can be viewed even when there is connection to the relational database.
5. MOLAP uses compression to store the data on the OLAP server and so has less storage requirements than relational databases for same amount of data. (Note however, that beginning with SQL Server 2008 you can use data compression at relational database level as well).
Cons
1. With MOLAP mode, you need frequent processing to pull refreshed data after last processing resulting in drain on system resources.
2. Latency; just after the processing if there is any changes in the relational database it will not be reflected on the OLAP server unless re-processing is performed.
3. MOLAP stores a copy of the relational data at OLAP server and so requires additional investment for storage.
4. If the data volume is high, the cube processing can take longer, though you can use incremental processing to overcome this.

ROLAP (Relational OLAP)
In comparison with MOLAP, ROLAP does not pull data from the underlying relational database source to the OLAP server but rather both cube detail data and aggregation stay at relational database source. In order to store the calculated aggregation the database server creates additional database objects (indexed views). In other words, the ROLAP mode does not copy the detail data to the OLAP server, and when a query result cannot be obtained from the query cache the created indexed views are accessed to provide the results.
Pros
1. Ability to view the data in near real-time.
2. Since ROLAP does not make another copy of data as in case of MOLAP, it has less storage requirements. This is very advantageous for large datasets which are queried infrequently such as historical data.
3. In ROLAP mode, the detail data is stored on the underlying relational database, so there is no limitation on data size that ROLAP can support or limited by the data size of relational database. In nutshell, it can even handle huge volumes of data.
Cons
1. Compared to MOLAP or HOLAP the query response is generally slower because everything is stored on relational database and not locally on the OLAP server.
2. A permanent connection to the underlying database must be maintained to view the cube data.

Note:
If you use ROLAP storage mode and your relational database is SQL Server, the Analysis Services server may create indexed views for aggregation. However this requires a few prerequisite to be available - for example, the data source must be a table, not a view. The table name must use two part naming convention or it must be qualified with owner/schema name etc. For a complete list of these prerequisites you can refer to the link provided in reference section.
HOLAP (Hybrid OLAP)
This mode is a hybrid of MOLAP and ROLAP and attempts to provide the greater data capacity of ROLAP and the fast processing and high query performance of MOLAP.
In HOLAP storage mode, the cube detail data remains in the underlying relational data store and the aggregations are stored on the OLAP server. If you query only summary data in aggregation, the HOLAP storage mode will work similar to MOLAP. For the detail data queries, HOLAP will drill through the detail data in underlying relational data store and hence performance would not be as good as MOLAP. Therefore, your query would be as fast as MOLAP in if your query result can be provided from query cache or aggregation but performance would degrade if it needs the detail data from relational data store.
Pros
1. HOLAP balances the disk space requirement, as it only stores the aggregate data on the OLAP server and the detail data remains in the relational database. So no duplicate copy of the detail data is maintained.
2. Since HOLAP does not store detail data on the OLAP server, the cube and partitions would be smaller in size than MOLAP cubes and partitions.
Performance is better than ROLAP as in HOLAP the summary data are stored on the OLAP server and queries can be satisfied from this summary data.
HOLAP would be optimal in the scenario where query response is required and query results are based on aggregations on large volumes of data.
Cons
1. Query performance (response time) degrades if it has to drill through the detail data from relational data store, in this case HOLAP performs very much like ROLAP.





Summary




Configuring the storage mode
Select a particular OLAP object in BIDS (Business Intelligence Development Studio), right click on it and then select properties. The property called StorageMode will allow you to set the storage mode to MOLAP, ROLAP or HOLAP.

No comments:

Post a Comment