WebsiteSpark

Monday, 9 January 2012

TSQL Merge

SQL Merge statement is specifically targeted towards ETL operation for a data warehouse. This facilitates insert update and delete in a single statement and with conditions depending upon match / unmatched between source and target.
Typically a MERGE statement works on two tables, namely source and target.
In similar to Join statement, there has to be a join criteria for MERGE.
It does the following
1.   Update when there is a match between source & target.
2.   Insert on target, when there is no matching record on target and
3.   Optionally delete when there are extra records on target.
Let’s create a scenario for demonstrating MERGE.
I’ve a database with customer table filled with some hypothetical data assuming this to be a data warehouse Customers dimension.
I create another table tmpCustomers as a copy, and then I perform some updates and deletes on the table to make them out of sync and finally execute MERGE to bring them back in sync.        
Here is the code


/*
-- script for customer table

CREATE TABLE [dbo].[Customers](
      [CustomerID] [int] IDENTITY(1,1) NOT NULL,
      [FirstName] [nvarchar](40) NOT NULL,
      [MiddleInitial] [nvarchar](40) NULL,
      [LastName] [nvarchar](40) NOT NULL,
 CONSTRAINT [CustomerPK] PRIMARY KEY CLUSTERED
(
      [CustomerID] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

*/

-- we don't want unnecessary count getting displayed
SET NOCOUNT ON;
GO

-- copy data into tmpCustomers
SELECT * INTO
      tmpCustomers
FROM
      Customers;

-- 19759 rows in all


-- lets update customer First name & Last name for all the customers
UPDATE
      tmpCustomers
SET
      FirstName = 'Some',
      LastName = 'Somewhere'

-- lets delete some rows from tmpCustomers

DECLARE @i INT;
SET @i = 1;

WHILE (@i < 19760)
BEGIN
       IF( @i % 3 = 0 or @i % 5 = 0)
            DELETE FROM tmpCustomers
            WHERE CustomerId = @i;
           
      SELECT @i = @i + 1;
END
GO

SELECT COUNT(1) FROM tmpCustomers;
GO
--10539 rows now


-- lets insert some data into tmpCustomer
DECLARE @i INT;
SET @i = 19760

-- need to set IDENTITY_INSERT  as we are inserting into PK, which is identity col.
SET IDENTITY_INSERT tmpCustomers ON;

WHILE(@i < 20001)
BEGIN
     
      INSERT INTO tmpCustomers
            (CustomerID, FirstName,MiddleInitial,LastName)
      VALUES
            (@i, N'New', NULL,N'Insert');

SELECT @i = @i + 1;
END;
Go

Set IDENTITY_INSERT tmpCustomers off;

SELECT COUNT(1) FROM tmpCustomers;
--10780 rows in all

-- using merge to update the titles back

SET IDENTITY_INSERT tmpCustomers ON;
-- seting identity insert because the insert part of MERGE will insert identity values.

MERGE dbo.tmpCustomers AS Target
USING (SELECT * FROM dbo.Customers) AS Source
ON (TARGET.CustomerID = SOURCE.CustomerID)
WHEN MATCHED THEN
      UPDATE SET
            Target.FirstName = Source.FirstName,
            Target.MiddleInitial = Source.MiddleInitial,
            Target.LastName = Source.LastName
WHEN NOT MATCHED BY Target THEN
      INSERT
            (CustomerID, FirstName,MiddleInitial,LastName)
      VALUES
            (source.CustomerID, source.FirstName,source.MiddleInitial,source.LastName)
WHEN NOT MATCHED BY Source THEN
      DELETE 
OUTPUT $ACTION, INSERTED.*, DELETED.*;

SET IDENTITY_INSERT tmpCustomers OFF;

SELECT COUNT(1) FROM Customers;
SELECT COUNT(1) FROM tmpCustomers;




No comments:

Post a Comment