Standard T-SQL UNPIVOT

In prior blog posts I talked about a Standard PIVOT query and a Dynamic PIVOT query. It is only fitting that I talk about how to implement an UNPIVOT query. As DBA’s we commonly unpivot data and turn it into normalized data for storing in a relational data store. Understanding how to use the UNPIVOT query is a great skill to have as a DBA. Just like the PIVOT operator, the UNPIVOT operator was introduced with SQL Server 2005. In this blog post I will show how we can use the UNPIVOT operator.

I got the idea for this blog post as a result from my online grade book. I noticed that the data is displayed as a denormalized set of data. As an instructor I thought it would be interesting to see average, sum, minimum and maximum assignment scores for each student. Below is an example of the display of the online grade book.

StudentID Assignment1 Assignment2 Assignment3 Assignment4
101 90 100 95 100
102 70 65 40 80
103 100 100 100 100

I found that writing a simple query to get the sum (adding each column) and average (adding each column and dividing by the number of columns) was not too bad. However, finding the minimum and maximum proved to be a bit tricky. Sure we could implement a CASE WHEN statement, however for each column the complexity and possibility of error increases significantly, and maintainability dang near goes out the window. In comes UNPIVOT to save the day. After unpivoting the data, we simply use the AVG(), SUM(), MIN() and MAX() functions.

Let’s take a look at the basic structure of the UNPIVOT query. Using the UNPIVOT statement, you specify a value and data column, then select each of the columns you want to unpivot into these new columns. The new value and data columns become select-able columns in the query, as shown in the following code.

SELECT   [AnyColumn]
       , [DataColumn]
       , [ValueColumn]
FROM     SomeTable ST 
UNPIVOT ( ValueColumn FOR DataColumn IN (UnPivotCol1, UnPivotCol2, UnPivotCol2) ) AS UNPVT

Applying this UNPIVOT syntax to our data above we yield the following query.

SELECT   StudentID
       , Assignment
       , ASMT.Score
FROM     #ClassGrades CG 
UNPIVOT ( Score FOR Assignment IN (Assignment1, Assignment2, Assignment3, Assignment4) ) AS ASMT

This query returns the following normalized view of the data

StudentID Assignment Score
101 Assignment1 90
101 Assignment2 100
101 Assignment3 95
101 Assignment4 100
102 Assignment1 70
102 Assignment2 65
102 Assignment3 40
102 Assignment4 80
103 Assignment1 100
103 Assignment2 100
103 Assignment3 100
103 Assignment4 100

Now that we have this data in a normalized result, applying SQL functions such as AVG() and SUM(), is relatively simple. Below is an updated version of the query using the AVG(), SUM(), MIN(), and MAX() functions grouping by StudentID.

SELECT   StudentID
       , Average = AVG(ASMT.Score)
	   , MaxScore = MAX(ASMT.Score)
	   , MinScore = MIN(ASMT.Score)
	   , Total = SUM(ASMT.Score)
FROM     #ClassGrades CG 
UNPIVOT ( Score FOR Assignment IN (Assignment1, Assignment2, Assignment3, Assignment4) ) AS ASMT
GROUP BY StudentID

This query returns the following view of the data

StudentID Average MaxScore MinScore Total
101 96 100 90 385
102 63 80 40 255
103 100 100 100 400

As you can see the PIVOT and UNPIVOT statements are not overly complex and are definitely great TSQL tools for any DBA to keep in their back-pocket. Below is the entire script with test data included.

IF OBJECT_ID('tempdb..#ClassGrades') IS NOT NULL 
   DROP TABLE #ClassGrades;
 
CREATE TABLE #ClassGrades (
   StudentID int PRIMARY KEY
                 NOT NULL
 , Assignment1 int NULL
 , Assignment2 int NULL
 , Assignment3 int NULL
 , Assignment4 int NULL);
 
INSERT   INTO #ClassGrades
         (StudentID, Assignment1, Assignment2, Assignment3, Assignment4)
VALUES   (101, 90, 100, 95, 100),
         (102, 70, 65, 40, 80),
         (103, 100, 100, 100, 100); 
 
SELECT   StudentID
       , Assignment
       , ASMT.Score
FROM     #ClassGrades CG 
UNPIVOT ( Score FOR Assignment IN (Assignment1, Assignment2, Assignment3, Assignment4) ) AS ASMT
 
SELECT   StudentID
       , Average = AVG(ASMT.Score)
	   , MaxScore = MAX(ASMT.Score)
	   , MinScore = MIN(ASMT.Score)
	   , Total = SUM(ASMT.Score)
FROM     #ClassGrades CG 
UNPIVOT ( Score FOR Assignment IN (Assignment1, Assignment2, Assignment3, Assignment4) ) AS ASMT
GROUP BY StudentID;

Dynamic T-SQL Pivot

In a prior blog post (Standard T-SQL PIVOT) I showed how we can use a PIVOT query to take distinct values in a column and make those values new columns. However,  data changes and new columns may be needed. Looking at the data from the prior post, we only had three types of candy bars. What happens when a new candy bar shows up in our transactions? Well, we could update our query to use the new column value. If another new candy bar shows up we could update the query again. This process could go on and on and on. I am going to propose an alternative: using a dynamic PIVOT query. In this blog post I am going to show how we can use dynamic SQL to implement a dynamic PIVOT query.

Let’s start out by looking at the original data:

CustomerID ProductName Quantity
1 Snickers 1
1 Butterfinger 1
1 Rolo 2
1 Snickers 1
1 Butterfinger 1
2 Rolo 2

After using a PIVOT query we will see the data as described in the table below:

CustomerIDButterfingerRoloSnickers

CustomerID Butterfinger Rolo Snickers
1 2 2 2
2 NULL 2 NULL

Now update the original data with a few new products:

CustomerID ProductName Quantity
1 Snickers 1
1 Butterfinger 1
1 Rolo 2
1 Snickers 1
1 Butterfinger 1
2 Rolo 2
1 MilkyWay 3
2 Twix 2

If we apply that same query from the prior blog post, listed below, we see the same results as before.

SELECT   [CustomerID], 
         [Butterfinger], [Milkyway], [Rolo], [Snicker]
FROM    #CustomerProducts P 
PIVOT ( SUM(Quantity) FOR P.ProductName IN (
                          [Butterfinger], [Milkyway], [Rolo], [Snicker])
) AS PVT
CustomerID Butterfinger Rolo Snickers
1 2 2 2
2 NULL 2 NULL

But what we really wanted was

CustomerID Butterfinger Milkyway Rolo Snickers Twix
1 2 3 2 2 NULL
2 NULL NULL 2 NULL 2

As I mentioned before, we could simply update our query and this blog post is over. However, we are going to push on and create our dynamic version. Let’s take a look at creating the dynamic portion of the PIVOT query. The key part of the dynamic query is the distinct column list. Getting the distinct columns is just a simple DISTINCT query. To get the comma separated column list we will employ a trick that involves a COALESCE or ISNULL statement.

SELECT @PivotList = COALESCE(@PivotList + ',', '')

Once we have the distinct comma separated list, we can start to build out the dynamic query. We will create an nvarchar variable that we will use to store our dynamic SQL statement, and then we will execute that statement with the sp_executesql function.

Looking at our PIVOT query again, what we will want to do is replace the distinct columns with our comma separated list of values. Then we will take and store the new query into our variable

SELECT   [CustomerID], 
         [Butterfinger], [Milkyway], [Rolo], [Snicker]
FROM    #CustomerProducts P 
PIVOT ( SUM(Quantity) FOR P.ProductName IN (
                          [Butterfinger], [Milkyway], [Rolo], [Snicker])
) AS PVT

Converts to this

SET @Output = 'SELECT  [CustomerID]
	, ' + @PivotList + '
FROM    #CustomerProducts P 
PIVOT ( SUM(Quantity) FOR P.ProductName IN (' + @PivotList + ') ) AS PVT'

Now all that is left is to execute the stored query using the sp_executesql function. Here is the entire Dynamic SQL Pivot Script:

CREATE TABLE #CustomerProducts (
   CustomerID int
 , ProductName nvarchar(50)
 , Quantity int)
 
INSERT   INTO #CustomerProducts
         (CustomerID, ProductName, Quantity)
VALUES   (1, N'Snicker', 1),
         (1, N'Butterfinger', 1),
         (1, N'Rolo', 2),
         (1, N'Milkyway', 1),
         (1, N'Snicker', 1),
         (1, N'Butterfinger', 1),
         (2, N'Rolo', 2),
         (1, N'Milkyway', 3),
         (2, N'Twix', 2);
 
DECLARE @Output nvarchar(max) = N''
 , @PivotList varchar(max)
 
SELECT   @PivotList = COALESCE(@PivotList + ', ', N'') + N'[' + ProductName + N']'
 
FROM     (
          SELECT DISTINCT
                  ProductName
          FROM    #CustomerProducts
         ) AS CustProds;
 
SET @Output = 'SELECT  [CustomerID]
	, ' + @PivotList + '
FROM    #CustomerProducts P 
PIVOT ( SUM(Quantity) FOR P.ProductName IN (' + @PivotList + ') ) AS PVT'
 
EXEC sp_executesql @Output;
 
IF OBJECT_ID('tempdb..#CustomerProducts') IS NOT NULL	
	DROP TABLE #CustomerProducts

Standard T-SQL PIVOT

Pivoting data is a common practice while analyzing data. The basic idea is to take a single column’s data and pivot the unique values into separate columns. In this post I am going to show a simple example of how we can implement a PIVOT statement in a query. In a future post I will discuss how we can create a dynamic PIVOT based on the functionality below. SQL Server 2005 introduced functionality to help with pivoting data. You can PIVOT data using the following syntax

SELECT  [AnyColumn], [PivotCol1],[PivotCol2],...,[PivotColN]
FROM    (
	   SELECT  [AnyColumn]
                 , [DataColumn]
		 , [AggregateColumn]
	   FROM    SomeTable	   
	  ) P PIVOT ( SUM(AggregateColumn) FOR DataColumn IN ([PivotCol1],[PivotCol2],...,[PivotColN]) ) AS PVT

So, let’s take a look at a bit of data. The temp table #CustomerProducts below stores a snippet of transaction data. This data is describing the number of candy bars ordered per transaction by a given customer, over time.

CREATE TABLE #CustomerProducts (
   CustomerID int
 , ProductName nvarchar(50)
 , Quantity int)
 
INSERT   INTO #CustomerProducts
         (CustomerID, ProductName, Quantity)
VALUES   (1, N'Snickers', 1),
         (1, N'Butterfinger', 1),
         (1, N'Rolo', 2),
         (1, N'Snickers', 1),
         (1, N'Butterfinger', 1),
         (2, N'Rolo', 2);
CustomerID ProductName Quantity
1 Snickers 1
1 Butterfinger 1
1 Rolo 2
1 Snickers 1
1 Butterfinger 1
2 Rolo 2

However, we would like to see how many candy bars each customer has ordered total, as described in the table below:

CustomerID Butterfinger Rolo Snickers
1 2 2 2
2 NULL 2 NULL

So how do we accomplish this task? Simple, we will implement a PIVOT query…

Before we can develop the PIVOT query we need to determine the PIVOT columns. These columns will come from the data located in the product name column. According the the table above we want to use the following columns: Butterfinger, Rolo, and Snickers. Next we will want to determine the column that we are going to aggregate. In our example, we will use the Quantity column and we will summarize the results. With this information we can plug it into the a PIVOT query and get the query listed below.

CREATE TABLE #CustomerProducts (
   CustomerID int
 , ProductName nvarchar(50)
 , Quantity int)
 
INSERT   INTO #CustomerProducts
         (CustomerID, ProductName, Quantity)
VALUES   (1, N'Snickers', 1),
         (1, N'Butterfinger', 1),
         (1, N'Rolo', 2),         
         (1, N'Snickers', 1),
         (1, N'Butterfinger', 1),
         (2, N'Rolo', 2);
 
 SELECT  [CustomerID]
	, [Butterfinger],[Rolo],[Snickers]
FROM    (
	   SELECT  [CustomerID]
			 , [ProductName]
			 , [Quantity]
	   FROM    #CustomerProducts	   
	  ) P PIVOT ( SUM(Quantity) FOR P.ProductName IN ([Butterfinger],[Rolo],[Snickers]) ) AS PVT
 
IF OBJECT_ID('tempdb..#CustomerProducts') IS NOT NULL	
	DROP TABLE #CustomerProducts

So, whats next? Well, what if another candy bar type is sold? What if 10 different types are sold? We can continue to update the PIVOT query above, or we can use a Dynamic PIVOT Query. In another post I will describe how to create a Dynamic PIVOT Query.