Wednesday, August 29, 2012

SQL Server – Transaction locks, Isolation Levels and SSIS Transaction


Transaction:

Lets discuss real work scenario, when you given something to me and I take it; then it is a transaction. When you withdrawal money from ATM and you received then it is a transaction but what will happen when you withdrawal money from ATM and account still showing same balance as it was before withdrawal. What will happen when you and you wife both withdrawing all your money from the joint account at the same time from different ATMs. So there must be some methodology to keep track on all such natural disaster conditions.

To achieve the above thought in the database system we have locking mechanism. It acts like this, suppose there is a room and only one person can get access at time with provided user key and password but another person wants same room during same time with different purpose. So the security interface should be manageable. We can conclude that the security system must provide different types of security code or passwords to the corresponding person. Let’s have a deeper look on this. Suppose you are doing a transaction for withdrawing money from the ATM machine and at the same time the bank manager is doing a routine check up of your transaction which is totally a different operation and suppose at the same time the bank teller is checking your account for the remaining balance. All these operations are different but accessing the same entity or resource and that is your account information that is kept inside the database. Out of these operations only you are doing write operation in the database as you are withdrawing money and the remaining balance has to be updated in the database. So a proper security mechanism must be implemented here to ensure non-conflict or smooth going of these operations. Here the security can be ensured by putting locks (and of course the type of locks) for each type of operations, which means you are isolating the resources from other transactions that may hamper its consistency. Here comes the role of Isolation levels.

Let’s have look on Locks and type of Locks. Locks can be treated as a policy which will prevent you or a process to perform any action (that may conflict with other actions) on an object or resource if that object or resource is already occupied by any other process or user. It’s like you are going to propose someone who is already with someone else. So for such type of transaction, we have type of Locks.

Types of Locks

Shared Locks(S): This lock id useful when you are doing some read operations and no manipulation like write operations (Update/Delete/Insert). This will perform dirty reads.

Exclusive Locks(X): These locks are big possessive types. They are not compatible with any other locks means these locks will not work if some other locks are already there with the resource. This locks used for data-modifications.

Update Locks(U): This can be a mixture of or perfect combination of two locks (Shared and Exclusive). Let’s take an example: Suppose you are updating any particular record out of 100, so first you will search that particular record using shared lock and then you will update that record with exclusive lock. So here Shared lock transforms to exclusive lock. This avoids deadlocks to a great extent.

Intent Locks: Also called demand locks. There are used to establish a lock hierarchy. For example we are reading some data with shared lock and same time another user wants to modify data with exclusive lock but shared lock can be compatible with other shared lock as result any number of shared lock can be obtained on piece of data and hence the user with exclusive lock has to wait till the completion of all shared lock operations. To avoid this type of situation, intent locks are very useful. Here if the second user comes with intent exclusive lock, the no other transaction can grab a shared lock. Here it can claim the use of exclusive lock after first transaction completes.

There are basically three types of Intent Locks that are most popular:
• Intent Shared Lock(IS)
• Intent exclusive (IX)
• Shared with intent exclusive (SIX)

Schema Locks: This lock protect schema of database and deals with DDL command. There are two types of schema locks: • Schema Modification (Sch-M): Applied only when modifying structure • Schema Stability (Sch-S: Applied on only when query data and prevent execution of DDL command on the table.

Bulk Update Locks: This lock is useful when preforming bulk operation on the table. It will prevent any other type of normal T-SQL operations to be executed the table.

Lost Updates: It generally occurs when more the one transaction tries to update any specific record at a time. Example when one update successfully update data and second update accidently overwrites previous information called lost update.

Isolation Level:

Isolation levels in SQL Server control the way locking works between transactions. The Isolation levels are categorized depending on the type of locks it uses for a particular level. At lower level of isolation more users can access the same resource without any confliction, but they may face concurrency related issues such as dirty-reads and data inaccuracy (described below). At higher Isolation level, these types of issues can be eliminated but here only a limited no. of users can access the resource.
SQL Server 2008 supports the following isolation levels
• Read Uncommitted
• Read Committed (The default)
• Repeatable Read
• Serializable
• Snapshot

Before I run through each of these in detail you may want to create a new database to run the examples, run the following script on the new database to create the sample data.

Note: We also want to drop the TableName table and re-run this script before each example to reset the data.

GO
CREATE
TABLE TableName
(
Id
INT IDENTITY,
Col1
INT,
Col2
INT,
Col3
INT)
 
INSERT
INTO TableName(Col1,Col2,Col3)
SELECT
1,2,3
UNION
ALL SELECT 1,2,3
UNION
ALL SELECT 1,2,3
UNION
ALL SELECT 1,2,3
UNION
ALL SELECT 1,2,3
UNION
ALL SELECT 1,2,3 UNION ALL SELECT 1,2,3

Before we start each isolation level in details we need to understand two things:

Dirty Read: Quite often in database processing, we come across the situation where in one transaction can change a value, and a second transaction can read this value before the original change has been committed or rolled back. This is known as a dirty read scenario because there is always the possibility that the first transaction may rollback the change, resulting in the second transaction having read an invalid value.

Phantom Reads: Occur when an insert or delete action is performed against a row that belongs to a range of rows being read by a transaction Inconsistent analysis occurs when a second transaction accesses the same row several times and reads different data each time .

Read Uncommitted

This is the lowest isolation level there is. Read uncommitted causes no shared locks to be requested which allows you to read data that is currently being modified in other transactions. It also allows other transactions to modify data that you are reading. As you can probably imagine this can cause some unexpected results in a variety of different ways. For example data returned by the select could be in a half way state if an update was running in another transaction causing some of your rows to come back with the updated values and some not to. To see read uncommitted in action let’s run Query1 in one tab of Management Studio and then quickly run Query2 in another tab before Query1 completes.

BEGIN
TRAN
UPDATE
TableName SET Col1 = 3
WAITFOR
DELAY '00:00:10'
--ROLLBACK
--COMMIT
SELECT
* FROM TableName


There is a syntactic shortcut for querying data using the read uncommitted isolation level by using the NOLOCK table hint. You could change the above Query2 to look like this and it would do the exact same thing.

SELECT
* FROM TableName WITH(NOLOCK)
Read Committed

This is the default isolation level and means selects will only return committed data. Select statements will issue shared lock requests against data you’re querying this causes you to wait if another transaction already has an exclusive lock on that data. Once you have your shared lock any other transactions trying to modify that data will request an exclusive lock and be made to wait until your Read Committed transaction finishes.

BEGIN
TRAN
UPDATE
TableName SET Col1 = 3
where
Id <= 3
WAITFOR
DELAY '00:00:10'
--ROLLBACK
--COMMIT
--SET TRANSACTION ISOLATION LEVEL Read Committed
SELECT
* FROM TableName


Second query waited for the first transaction to complete before returning and also how the data returned is the data we started off with as first query did a rollback. The reason no isolation level was specified is because Read Committed is the default isolation level for SQL Server. If you want to check what isolation level you are running under you can run “DBCC useroptions”.

DBCC
useroptions

Repeatable Read

This is similar to Read Committed but with the additional guarantee that if you issue the same select twice in a transaction you will get the same results both times


SET
TRANSACTION ISOLATION LEVEL REPEATABLE READ
BEGIN
TRAN
SELECT
* FROM TableName
WAITFOR
DELAY '00:00:10'
SELECT
* FROM TableName
ROLLBACK
 
UPDATE
TableName SET Col1 = 1

Serializable

This isolation level takes Repeatable Read and adds the guarantee that no new data will be added eradicating the chance of getting Phantom Reads.

SET
TRANSACTION ISOLATION LEVEL SERIALIZABLE
BEGIN
TRAN
SELECT
* FROM TableName
WAITFOR
DELAY '00:00:10'
SELECT
* FROM TableName
ROLLBACK
 
INSERT
INTO TableName(Col1,Col2,Col3)
VALUES
(100,100,100)


Snapshot

This provides the same guarantees as Serializable. So what’s the difference? Using snapshot inserting or updating doesn’t block other queries the data touched by the snapshot transaction. But you need to enable snapshot on your database:

ALTER
DATABASE databasename SET
ALLOW_SNAPSHOT_ISOLATION
ON
Summary


Now we have good idea how each of the different isolation levels work. We can see how the higher level isolation uses the less concurrency and the more blocking to the table. We should always try to use the lowest isolation level. I will recommended for read committed which is default isolation level in SQL server.

Monday, March 5, 2012

SQL Server Analysis Service Events - Useful Information

Microsoft Analysis Services Events

Date Event
1996-07-01 Microsoft opens new team to build an OLAP product, codenamed
Plato (permutation of letters from OLAP)

1996-07-15 Panorama Software delegation meets with Microsoft
1996-10-27 Microsoft announces acquisition of Panorama Software development team
1998-11 OLAP Services 7.0 (codename Sphinx) ships
2000-08 Analysis Services 2000 (codename Shiloh) ships
2001-11 XML for Analysis Software Development Kit 1.0 ships
2003-04 ADOMD.NET and XML for Analysis SDK 1.1 ship
2005-10-28 Analysis Services 2005 (codename Yukon) ships
2008-08-06 Analysis Services 2008 (codename Katmai) ships
2010-12-21 Analysis Services 2008 R2
2011-07-11 Analysis Services 2008 R2 – SP1


SQL Server Release History
Version Year Release Name Codename
1.0(OS/2)1989 SQL Server 1.0
(16bit)
1.1(OS/2)1991 SQL Server 1.1
(16bit) -
4.21 1993 SQL Server 4.21 SQLNT
(WinNT)
6.0 1995 SQL Server 6.0 SQL95
6.5 1996 SQL Server 6.5 Hydra
7.0 1998 SQL Server 7.0 Sphinx
- 1999 SQL Server 7.0
OLAP Tools Plato
8.0 2000 SQL Server 2000 Shiloh
8.0 2003 SQL Server 2000
64-bit Edition Liberty
9.0 2005 SQL Server 2005 Yukon
10.0 2008 SQL Server 2008 Katmai
10.25 2010 SQL Azure Matrix (aka CloudDB)
10.5 2010 SQL Server 2008 R2 Kilimanjaro (aka KJ)
11.0 2012 SQL Server 2012 Denali

Thursday, March 1, 2012

So, what is the BI semantic Model (BISM)?

BISM is a marketing concept just like ‘UDM’ unified dimension modeling introduced with SQL server 2005. Many people were confused about ‘UDM’. It was just another name for an Analysis Services Cube, but it also represented the nebulous concept of Analysis Services being a semantic layer suitable for all your reporting needs; however people often thought it was a new product that somehow replaced Analysis Services cubes and got themselves tied in all sorts of knots as a result. Thankfully, this terminology died out pretty quickly as everyone started referring to Analysis Services as, well, Analysis Services.

Present days we can see a similar amount of confusion about the term ‘BI Semantic Model’ or BISM for many of the same reason; what is the BI semantic Model exactly? It is Analysis services 2012 plus Power Pivot. Let’s be clear here; it is not just the new Analysis Services tabular model, it’s not even a specific bit of software. Analysis Services 2012 consists of two parts, the Multidimensional Model which is the Analysis Services of Cubes and dimension that we already had in 2008 R2 and earlier version, and the new Tabular model which is the Analysis Services of Table, relationships, in-memory storage and column store. BISM refer to the both models plus Power Pivot, or rather it refers to the way that Analysis Services and Power Pivot can be used as a semantic layer on top of other data for reporting purpose.

So what’s the point of a term like BISM then if it doesn’t refer to something concrete? Why not just call Analysis Services “Analysis Service and Power Pivot “Power Pivot”. BISM is just marketing concept like ‘UDM’, for three reasons:

• A single term like BISM suggests that Analysis Services 2012 and Power Pivot are a single, cohesive product, whereas the tabular and Multidimensional Model are actually very different monsters. If you are going to be working with Analysis Services 2012 on a project the first decision, you will have to make is which type of model to use, and if you change your mind later you will have to start development from scratch and learn lot of new skills. We can think that one day the two models will merge but it won’t happen soon.

• Microsoft has currently identified that many people what do BI but due to the complexity of building Multidimensional models. The simplicity of Tabular Model goes long way to solving this problem; Tabular also replaced report builder model which were really just a semantic layer people didn’t like SSAS or had valid reason to stay with relational reporting.

• Calling something a “semantic layer” suggests that it’s a nice, light, thin, easy to implement layer on top of your relational data warehouse, with no data duplication involved. In actual fact anyone has used the multidimensional model will know you almost always use MOLAP storage which involves all the data being copied in Analysis Services; and I suspect when people start using the Tabular model they will be using it in Vertipaq mode (where again all the data gets copied into Analysis Services) rather than in DirectQuery mode (where all queries are translated to SQL which is then run against SQL Server).

This is my understanding about BISM and may be different with others understanding but I don’t think using the term BISM is useful in any technical context, just as the term UDM wasn’t in the past, because it is a blanket term for several different things.

Monday, January 9, 2012

Mixed datatypes in Excel Cell problem in SSIS

Case

My column in Excel contains mixed data types (strings and numbers). The string values are shown as NULL. How can I get all values?

Solution

The Excel driver reads the first 8 rows to determine the datatype. If the majority of the rows are numbers it will choose a numeric datatype instead of a string data type and visa versa.

So if your Excel sheet looks like this. The column start with number then gets DT_R8 and column start with string gets DT_WSTR.



TypeGuessRows

You can change the default of checking 8 rows (1 to16) by changing the windows registry, If you want to change that number... Start the Windows Registry-editor and search for "TypeGuessRows". The place of that key depends on the version of Windows and Office. Examples:
HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Microsoft\Jet\4.0\Engines\Excel
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\3.0\Engines\Excel

Note: For performance reasons, setting the TypeGuessRows value to zero (0) is not recommended (especially when your Excel table is very large). With this zero value Excel will scan all records in your table to determine the type of data in each column.

Steps needs to be followed:

To change the value of TypeGuessRows, use these steps:

1. On the Start menu, click Run. In the Run dialog box, type Regedt32, and then click OK.

2. Open the following key in the Registry editor:

a. HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\4.0\Engines\Excel
Note For 64-bit systems, the corresponding key is as follows:

b. HKLM\SOFTWARE\wow6432node\microsoft\jet\4.0\engines\excel

3. Double-click TypeGuessRows.

4. In the DWORD editor dialog box, click Decimal under Base(Based on requirement). Type a value between 0 and 16, inclusive, for Value data.

5. Click OK, and then exit the Registry Editor.


Import Mode

If you want to get the data of all rows, regardless the datatypes in Excel, you should add the string ;IMEX=1 to the Excel Connection String. Changing to Import Mode (0 is Export mode, 1 is Import mode and 2 is Linked mode) will result in getting everything (including numbers) as a string. Go to the properties of the Excel Connection manager and change the Connection String property:


Provider=Microsoft.Jet.OLEDB.4.0;Data Source=D:\MyExcelFile1.xls;Extended Properties="EXCEL 8.0;HDR=YES;IMEX=1";

Provider=Microsoft.ACE.OLEDB.12.0;Data Source=D:\MyExcelFile.xlsx;Extended Properties="Excel 12.0 XML;HDR=YES;IMEX=1";