Dynamic T-SQL Pivot

In a prior blog post (Standard T-SQL PIVOT) I showed how we can use a PIVOT query to take distinct values in a column and make those values new columns. However,  data changes and new columns may be needed. Looking at the data from the prior post, we only had three types of candy bars. What happens when a new candy bar shows up in our transactions? Well, we could update our query to use the new column value. If another new candy bar shows up we could update the query again. This process could go on and on and on. I am going to propose an alternative: using a dynamic PIVOT query. In this blog post I am going to show how we can use dynamic SQL to implement a dynamic PIVOT query.

Let’s start out by looking at the original data:

CustomerID ProductName Quantity
1 Snickers 1
1 Butterfinger 1
1 Rolo 2
1 Snickers 1
1 Butterfinger 1
2 Rolo 2

After using a PIVOT query we will see the data as described in the table below:

CustomerIDButterfingerRoloSnickers

CustomerID Butterfinger Rolo Snickers
1 2 2 2
2 NULL 2 NULL

Now update the original data with a few new products:

CustomerID ProductName Quantity
1 Snickers 1
1 Butterfinger 1
1 Rolo 2
1 Snickers 1
1 Butterfinger 1
2 Rolo 2
1 MilkyWay 3
2 Twix 2

If we apply that same query from the prior blog post, listed below, we see the same results as before.

SELECT   [CustomerID], 
         [Butterfinger], [Milkyway], [Rolo], [Snicker]
FROM    #CustomerProducts P 
PIVOT ( SUM(Quantity) FOR P.ProductName IN (
                          [Butterfinger], [Milkyway], [Rolo], [Snicker])
) AS PVT
CustomerID Butterfinger Rolo Snickers
1 2 2 2
2 NULL 2 NULL

But what we really wanted was

CustomerID Butterfinger Milkyway Rolo Snickers Twix
1 2 3 2 2 NULL
2 NULL NULL 2 NULL 2

As I mentioned before, we could simply update our query and this blog post is over. However, we are going to push on and create our dynamic version. Let’s take a look at creating the dynamic portion of the PIVOT query. The key part of the dynamic query is the distinct column list. Getting the distinct columns is just a simple DISTINCT query. To get the comma separated column list we will employ a trick that involves a COALESCE or ISNULL statement.

SELECT @PivotList = COALESCE(@PivotList + ',', '')

Once we have the distinct comma separated list, we can start to build out the dynamic query. We will create an nvarchar variable that we will use to store our dynamic SQL statement, and then we will execute that statement with the sp_executesql function.

Looking at our PIVOT query again, what we will want to do is replace the distinct columns with our comma separated list of values. Then we will take and store the new query into our variable

SELECT   [CustomerID], 
         [Butterfinger], [Milkyway], [Rolo], [Snicker]
FROM    #CustomerProducts P 
PIVOT ( SUM(Quantity) FOR P.ProductName IN (
                          [Butterfinger], [Milkyway], [Rolo], [Snicker])
) AS PVT

Converts to this

SET @Output = 'SELECT  [CustomerID]
	, ' + @PivotList + '
FROM    #CustomerProducts P 
PIVOT ( SUM(Quantity) FOR P.ProductName IN (' + @PivotList + ') ) AS PVT'

Now all that is left is to execute the stored query using the sp_executesql function. Here is the entire Dynamic SQL Pivot Script:

CREATE TABLE #CustomerProducts (
   CustomerID int
 , ProductName nvarchar(50)
 , Quantity int)
 
INSERT   INTO #CustomerProducts
         (CustomerID, ProductName, Quantity)
VALUES   (1, N'Snicker', 1),
         (1, N'Butterfinger', 1),
         (1, N'Rolo', 2),
         (1, N'Milkyway', 1),
         (1, N'Snicker', 1),
         (1, N'Butterfinger', 1),
         (2, N'Rolo', 2),
         (1, N'Milkyway', 3),
         (2, N'Twix', 2);
 
DECLARE @Output nvarchar(max) = N''
 , @PivotList varchar(max)
 
SELECT   @PivotList = COALESCE(@PivotList + ', ', N'') + N'[' + ProductName + N']'
 
FROM     (
          SELECT DISTINCT
                  ProductName
          FROM    #CustomerProducts
         ) AS CustProds;
 
SET @Output = 'SELECT  [CustomerID]
	, ' + @PivotList + '
FROM    #CustomerProducts P 
PIVOT ( SUM(Quantity) FOR P.ProductName IN (' + @PivotList + ') ) AS PVT'
 
EXEC sp_executesql @Output;
 
IF OBJECT_ID('tempdb..#CustomerProducts') IS NOT NULL	
	DROP TABLE #CustomerProducts

Standard T-SQL PIVOT

Pivoting data is a common practice while analyzing data. The basic idea is to take a single column’s data and pivot the unique values into separate columns. In this post I am going to show a simple example of how we can implement a PIVOT statement in a query. In a future post I will discuss how we can create a dynamic PIVOT based on the functionality below. SQL Server 2005 introduced functionality to help with pivoting data. You can PIVOT data using the following syntax

SELECT  [AnyColumn], [PivotCol1],[PivotCol2],...,[PivotColN]
FROM    (
	   SELECT  [AnyColumn]
                 , [DataColumn]
		 , [AggregateColumn]
	   FROM    SomeTable	   
	  ) P PIVOT ( SUM(AggregateColumn) FOR DataColumn IN ([PivotCol1],[PivotCol2],...,[PivotColN]) ) AS PVT

So, let’s take a look at a bit of data. The temp table #CustomerProducts below stores a snippet of transaction data. This data is describing the number of candy bars ordered per transaction by a given customer, over time.

CREATE TABLE #CustomerProducts (
   CustomerID int
 , ProductName nvarchar(50)
 , Quantity int)
 
INSERT   INTO #CustomerProducts
         (CustomerID, ProductName, Quantity)
VALUES   (1, N'Snickers', 1),
         (1, N'Butterfinger', 1),
         (1, N'Rolo', 2),
         (1, N'Snickers', 1),
         (1, N'Butterfinger', 1),
         (2, N'Rolo', 2);
CustomerID ProductName Quantity
1 Snickers 1
1 Butterfinger 1
1 Rolo 2
1 Snickers 1
1 Butterfinger 1
2 Rolo 2

However, we would like to see how many candy bars each customer has ordered total, as described in the table below:

CustomerID Butterfinger Rolo Snickers
1 2 2 2
2 NULL 2 NULL

So how do we accomplish this task? Simple, we will implement a PIVOT query…

Before we can develop the PIVOT query we need to determine the PIVOT columns. These columns will come from the data located in the product name column. According the the table above we want to use the following columns: Butterfinger, Rolo, and Snickers. Next we will want to determine the column that we are going to aggregate. In our example, we will use the Quantity column and we will summarize the results. With this information we can plug it into the a PIVOT query and get the query listed below.

CREATE TABLE #CustomerProducts (
   CustomerID int
 , ProductName nvarchar(50)
 , Quantity int)
 
INSERT   INTO #CustomerProducts
         (CustomerID, ProductName, Quantity)
VALUES   (1, N'Snickers', 1),
         (1, N'Butterfinger', 1),
         (1, N'Rolo', 2),         
         (1, N'Snickers', 1),
         (1, N'Butterfinger', 1),
         (2, N'Rolo', 2);
 
 SELECT  [CustomerID]
	, [Butterfinger],[Rolo],[Snickers]
FROM    (
	   SELECT  [CustomerID]
			 , [ProductName]
			 , [Quantity]
	   FROM    #CustomerProducts	   
	  ) P PIVOT ( SUM(Quantity) FOR P.ProductName IN ([Butterfinger],[Rolo],[Snickers]) ) AS PVT
 
IF OBJECT_ID('tempdb..#CustomerProducts') IS NOT NULL	
	DROP TABLE #CustomerProducts

So, whats next? Well, what if another candy bar type is sold? What if 10 different types are sold? We can continue to update the PIVOT query above, or we can use a Dynamic PIVOT Query. In another post I will describe how to create a Dynamic PIVOT Query.

Deleting Records from Large tables

Database maintenance is a necessary evil for a database administrator and deleting records is one of these necessary evils that we need to deal with. Deleting records can be tedious task especially in Very Large Databases (VLDB). This post will demonstrate different ways to handle deleting records from a database table.

Basic Delete

The easiest way to delete records from a database table is to use a DELETE FROM statement

DELETE FROM VLDB_Table WHERE ColumnName = 'Value';

As the number of records grow, the amount of time it takes to delete records from our table increases. While the records are being deleted, the table is locked and other processes will be blocked until the completion of the delete statement.

Truncate Table

The truncate statement is a much faster delete process, which removes all records from a table.

TRUNCATE TABLE VLDB_Table;

A truncate statement is not always an option on a table, if the table has a foreign key relationship with another table, the table is used in an indexed view or used in replication, the truncate option is not an option.

Batch Delete

With the limitation of the truncate statement and potential for blocking using the basic delete, that leaves us looking for an optimal solution for cleaning up records. Using a basic delete in conjunction with the TOP option and a loop, we can delete a records in a batch.

DECLARE @BatchSize int  = 1000;
 
WHILE EXISTS (SELECT 1 FROM VLDB_Table)
BEGIN
   DELETE TOP (@BatchSize) FROM VLDB_Table WHERE ColumnName = 'Value';
END

Another batch delete scrip that I have come across uses the ROWCOUNT as the factor for the loop.

DECLARE @BatchSize int  = 1000;
-- Delete the initial batch outside the loop, this is needed to set @@ROWCONT
DELETE TOP (@BatchSize) FROM VLDB_Table WHERE ColumnName = 'Value';
 
WHILE (@@ROWCOUNT > 0)
BEGIN
   DELETE TOP (@BatchSize) FROM VLDB_Table WHERE ColumnName = 'Value';
END

Fun with Checksum…

Typically when we think of the TSQL function CHECKSUM, we are thinking about a hash index. A hash index can be used to improve indexing of columns with long character data types. Rather than trying to add an index to a NVARCHAR(2000) column, you can hash the NVARCHAR(2000) column and index on that hashed value, using the CHECKSUM function. The CHECKSUM function can take multiple columns as parameters, and it builds a hash value for the multiple columns. This gives us even greater flexibility in our hash indexing.

I am going to demonstrate another usage for the TSQL function CHECKSUM, and I will also bring the aggregate version called CHECKSUM_AGG along for the ride.

Alternate use:

What if we used the CHECKSUM_AGG to aggregate the CHECKSUM values over multiple columns. This would give us the ability to handle complex search requirements for sets of data within a given table. Consider the following dataset:

GCID	HoleNum	Par	Sand	Water	Yard
1	1	4	1	0	380
1	2	5	1	1	500
1	3	3	0	1	180
1	4	5	0	0	420
2	1	4	1	0	370
2	2	3	1	1	510
2	3	5	0	1	190
2	4	4	0	0	400
3	1	4	1	1	380
3	2	3	1	0	500
3	3	5	0	0	180
3	4	5	0	0	420

Let’s say we want to know which courses have the following conditions:

  • Hole 1 is Par 4 with sand and water
  • Hole 2 is Par 3 with sand and no water
  • Hole 3 is Par 5 with no sand and no water

We can use both the CHECKSUM AND CHECKSUM_AGG to calculate a hash value from a table variable, and compare the hash value to calculated hash value from our data set.

--Setup Golf Course Holes Table
DECLARE @CGHoles TABLE(CGID int, HoleNum tinyint, Par tinyint, Details nvarchar(20), Yards smallint)
 
--Populate with some test data
INSERT INTO @CGHoles (CGID, HoleNum, Par, Details, Yards)
VALUES (1,1,4,'Sand',380),(1,2,5,'Water',500),(1,3,3,'Sand',180),(1,4,5,'Both',420)
,(2,1,4,'Sand',370),(2,2,3,'Both',510),(2,3,5,'Water',190),(2,4,4,'None',400)
,(3,1,4,'Both',380),(3,2,3,'Sand',500),(3,3,5,'None',180),(3,4,5,'Water',420)
 
--Create Search Criteria Table 
DECLARE @Search TABLE(HoleNum tinyint, Par tinyint, Details nvarchar(20))
 
--Populate Search Criteria
INSERT INTO @Search
         (HoleNum, Par, Details)
VALUES   (1,4,'Both'),(2,3,'Sand'),(3,5,'None')
 
-- Using a Common Table Expression, calculate the aggragated checksum and 
-- compare against the tables aggregated checksum
;WITH  CTE AS (
    SELECT   CKSM = CHECKSUM_AGG(CHECKSUM(HoleNum, Par, Details))
    FROM     @Search S
 )
    SELECT   CGID
    FROM     @CGHoles CH
    CROSS JOIN CTE
    WHERE    HoleNum BETWEEN 1 AND 3
    GROUP BY CGID, CKSM
    HAVING   CKSM = CHECKSUM_AGG(CHECKSUM(HoleNum, Par, Details))

Problem with CHECKSUM:

Notice that in the example code I switched to using words to represent sand, water, both, or none instead of using bit values. I did this because of a range of values limitation in the CHECKSUM function. The CHECKSUM function has a limited range of values of 231 possible values. With this limitation we cannot guarantee values will not be re-used, which is what happened while using the bit values. The issue was both GCID 1 and 3 both were a match where only GCID 3 is the only true match. In a future post I will an alternative way to solve this same problem

With the limitation in range of values, it is advised to use the CHECKSUM functions with caution. However, if your applications can allow for a margin of error in comparisons, or use of an additional check to verify the comparisons, then using the CHECKSUM functions can provide fantastic functionality.

Bamboozled by Fizz Buzz

Introduction:

In my very first Technical Blog I am going to show how one can accomplish the Fizz Buzz Challenge a few different ways using T-SQL. Fizz Buzz is a popular interview question that will allow an interviewee to demonstrate basic programming knowledge. This is normally done in a programming language such as C# or Java. But today we are going to implement Fizz Buzz using T-SQL…

Background:

A friend, Rick Krueger ( @DataOgre | premierapps.com ), and I have recently decided to give back to the SQL Community by starting to Tweet, Blog, and Present at Local PASS Meetings as well as SQL Saturday events. We had come up with a potential presentation where we needed a Developer and a DBA, the question was who is the DBA and who is the Developer. So a week or so had passed and I found myself with a locked account and a significant amount of wasted time with the help desk.

Rick noticed my frustration and asked the following:

  • (rick) Why on earth did they need to log onto your box to reset something on the server?
  • (me) Not sure, I guess he didnt believe me when I said my account was locked and he had to try for himself?
  • (rick) Did you try FizzBin?
  • (rick) http://www.hanselman.com/blog/FizzBinTheTechnicalSupportSecretHandshake.aspx
  • (me) Nope, but I wish I would have…
  • (rick) Ever hear of FizzBuzz a good interview question…
  • (rick) How would you do on that one?

Now any good developer will know who Scott Hanselman is and has followed his work with .Net and MVC… And being that I came from the development world I know who Hanselman is. So I quickly created a C#.net example and sent it off to Rick.

Fizz Buzz:

  • Output the numbers from 1 to 100.
  • When a number is a multiple of three output “Fizz”.
  • When a number is a multiple of five output “Buzz”.
  • When a number is a multiple of both three and five output “FizzBuzz”
  • When a number is not a multiple of either three or five output the number

C# Code of FizzBuzz:

for(int i = 1; i <= 100; i++){
   if(i % 3 == 0) 
      Console.Write("Fizz "); 
   if(i % 5 == 0) 
      Console.Write("Buzz "); 
   if((i % 3 > 0) && (i % 5 > 0)) 
      Console.Write(i); 
   Console.WriteLine(""); 
}

Bamboozled:

So I asked Rick how he would have implemented the same code to which he simply replied ‘T-SQL’. Without missing a beat, he proceeded to tell me that we now know who the developer and who the DBA is for our presentations…. I was bamboozled into writing C#.net code…

T-SQL:

Below are some T-SQL examples of the Fizz Buzz code. Obviously there are many more ways to write it, but here are a couple I wrote to redeem myself in the DB community…

The example below implements a while loop that populates a table, using the Fizz Buzz logic.

DECLARE @FizzBuzzTable TABLE(Number int, FizzBuzz varchar(10));
DECLARE @i int = 1;
WHILE @i <= 100
BEGIN
 INSERT INTO @FizzBuzzTable
   (Number, FizzBuzz)
 SELECT @i, FizzBuzz = CASE WHEN @i % 3 = 0 AND @i % 5 = 0 THEN 'Fizz Buzz'
   WHEN @i % 3 = 0 THEN 'Fizz'
   WHEN @i % 5 = 0 THEN 'Buzz'
   ELSE CAST(@i AS varchar(3)) END
 SET @i += 1
END
SELECT Number, FizzBuzz FROM @FizzBuzzTable FBT

Using a while loop to insert records may not be the best approach, so the example below implements a recursive CTE to gain 100 values, then applies the Fizz Buzz Logic query.

 ;WITH FizzBuzzTable( Number ) AS (
 SELECT 1 UNION ALL
 SELECT 1 + Number FROM FizzBuzzTable WHERE Number < 100
 )
 SELECT FizzBuzz= CASE WHEN Number % 3 = 0 AND Number % 5 = 0 THEN 'Fizz Buzz'
   WHEN Number % 3 = 0 THEN 'Fizz'
   WHEN Number % 5 = 0 THEN 'Buzz'
   ELSE CAST(Number AS varchar(3)) END
 FROM FizzBuzzTable

Using a recursive CTE can have performance issues, so the example below uses SQL written by Ben Itzak to gain 256 values, then applies the Fizz Buzz Logic query.

;WITH L1 AS (SELECT V1 = 1 UNION ALL SELECT V1 = 1) -- 2
 , L2 AS (SELECT V1 = 1 FROM L1 A CROSS APPLY L1 B)  -- 4
 , L3 AS (SELECT V1 = 1 FROM L2 A CROSS APPLY L2 B)  -- 16
 , L4 AS (SELECT V1 = 1 FROM L3 A CROSS APPLY L3 B)  -- 256
 , FizzBuzzTable AS (SELECT TOP 100 Number = ROW_NUMBER() OVER (ORDER BY V1) FROM L4)
 SELECT FizzBuzz = CASE WHEN Number % 3 = 0 AND Number % 5 = 0 THEN 'Fizz Buz'
   WHEN Number % 3 = 0 THEN 'Fizz'
   WHEN Number % 5 = 0 THEN 'Buzz'
   ELSE CAST(Number AS varchar(3)) END
 FROM FizzBuzzTable