Fun with Checksum…

Typically when we think of the TSQL function CHECKSUM, we are thinking about a hash index. A hash index can be used to improve indexing of columns with long character data types. Rather than trying to add an index to a NVARCHAR(2000) column, you can hash the NVARCHAR(2000) column and index on that hashed value, using the CHECKSUM function. The CHECKSUM function can take multiple columns as parameters, and it builds a hash value for the multiple columns. This gives us even greater flexibility in our hash indexing.

I am going to demonstrate another usage for the TSQL function CHECKSUM, and I will also bring the aggregate version called CHECKSUM_AGG along for the ride.

Alternate use:

What if we used the CHECKSUM_AGG to aggregate the CHECKSUM values over multiple columns. This would give us the ability to handle complex search requirements for sets of data within a given table. Consider the following dataset:

GCID	HoleNum	Par	Sand	Water	Yard
1	1	4	1	0	380
1	2	5	1	1	500
1	3	3	0	1	180
1	4	5	0	0	420
2	1	4	1	0	370
2	2	3	1	1	510
2	3	5	0	1	190
2	4	4	0	0	400
3	1	4	1	1	380
3	2	3	1	0	500
3	3	5	0	0	180
3	4	5	0	0	420

Let’s say we want to know which courses have the following conditions:

  • Hole 1 is Par 4 with sand and water
  • Hole 2 is Par 3 with sand and no water
  • Hole 3 is Par 5 with no sand and no water

We can use both the CHECKSUM AND CHECKSUM_AGG to calculate a hash value from a table variable, and compare the hash value to calculated hash value from our data set.

--Setup Golf Course Holes Table
DECLARE @CGHoles TABLE(CGID int, HoleNum tinyint, Par tinyint, Details nvarchar(20), Yards smallint)
 
--Populate with some test data
INSERT INTO @CGHoles (CGID, HoleNum, Par, Details, Yards)
VALUES (1,1,4,'Sand',380),(1,2,5,'Water',500),(1,3,3,'Sand',180),(1,4,5,'Both',420)
,(2,1,4,'Sand',370),(2,2,3,'Both',510),(2,3,5,'Water',190),(2,4,4,'None',400)
,(3,1,4,'Both',380),(3,2,3,'Sand',500),(3,3,5,'None',180),(3,4,5,'Water',420)
 
--Create Search Criteria Table 
DECLARE @Search TABLE(HoleNum tinyint, Par tinyint, Details nvarchar(20))
 
--Populate Search Criteria
INSERT INTO @Search
         (HoleNum, Par, Details)
VALUES   (1,4,'Both'),(2,3,'Sand'),(3,5,'None')
 
-- Using a Common Table Expression, calculate the aggragated checksum and 
-- compare against the tables aggregated checksum
;WITH  CTE AS (
    SELECT   CKSM = CHECKSUM_AGG(CHECKSUM(HoleNum, Par, Details))
    FROM     @Search S
 )
    SELECT   CGID
    FROM     @CGHoles CH
    CROSS JOIN CTE
    WHERE    HoleNum BETWEEN 1 AND 3
    GROUP BY CGID, CKSM
    HAVING   CKSM = CHECKSUM_AGG(CHECKSUM(HoleNum, Par, Details))

Problem with CHECKSUM:

Notice that in the example code I switched to using words to represent sand, water, both, or none instead of using bit values. I did this because of a range of values limitation in the CHECKSUM function. The CHECKSUM function has a limited range of values of 231 possible values. With this limitation we cannot guarantee values will not be re-used, which is what happened while using the bit values. The issue was both GCID 1 and 3 both were a match where only GCID 3 is the only true match. In a future post I will an alternative way to solve this same problem

With the limitation in range of values, it is advised to use the CHECKSUM functions with caution. However, if your applications can allow for a margin of error in comparisons, or use of an additional check to verify the comparisons, then using the CHECKSUM functions can provide fantastic functionality.

Bamboozled by Fizz Buzz

Introduction:

In my very first Technical Blog I am going to show how one can accomplish the Fizz Buzz Challenge a few different ways using T-SQL. Fizz Buzz is a popular interview question that will allow an interviewee to demonstrate basic programming knowledge. This is normally done in a programming language such as C# or Java. But today we are going to implement Fizz Buzz using T-SQL…

Background:

A friend, Rick Krueger ( @DataOgre | premierapps.com ), and I have recently decided to give back to the SQL Community by starting to Tweet, Blog, and Present at Local PASS Meetings as well as SQL Saturday events. We had come up with a potential presentation where we needed a Developer and a DBA, the question was who is the DBA and who is the Developer. So a week or so had passed and I found myself with a locked account and a significant amount of wasted time with the help desk.

Rick noticed my frustration and asked the following:

  • (rick) Why on earth did they need to log onto your box to reset something on the server?
  • (me) Not sure, I guess he didnt believe me when I said my account was locked and he had to try for himself?
  • (rick) Did you try FizzBin?
  • (rick) http://www.hanselman.com/blog/FizzBinTheTechnicalSupportSecretHandshake.aspx
  • (me) Nope, but I wish I would have…
  • (rick) Ever hear of FizzBuzz a good interview question…
  • (rick) How would you do on that one?

Now any good developer will know who Scott Hanselman is and has followed his work with .Net and MVC… And being that I came from the development world I know who Hanselman is. So I quickly created a C#.net example and sent it off to Rick.

Fizz Buzz:

  • Output the numbers from 1 to 100.
  • When a number is a multiple of three output “Fizz”.
  • When a number is a multiple of five output “Buzz”.
  • When a number is a multiple of both three and five output “FizzBuzz”
  • When a number is not a multiple of either three or five output the number

C# Code of FizzBuzz:

for(int i = 1; i <= 100; i++){
   if(i % 3 == 0) 
      Console.Write("Fizz "); 
   if(i % 5 == 0) 
      Console.Write("Buzz "); 
   if((i % 3 > 0) && (i % 5 > 0)) 
      Console.Write(i); 
   Console.WriteLine(""); 
}

Bamboozled:

So I asked Rick how he would have implemented the same code to which he simply replied ‘T-SQL’. Without missing a beat, he proceeded to tell me that we now know who the developer and who the DBA is for our presentations…. I was bamboozled into writing C#.net code…

T-SQL:

Below are some T-SQL examples of the Fizz Buzz code. Obviously there are many more ways to write it, but here are a couple I wrote to redeem myself in the DB community…

The example below implements a while loop that populates a table, using the Fizz Buzz logic.

DECLARE @FizzBuzzTable TABLE(Number int, FizzBuzz varchar(10));
DECLARE @i int = 1;
WHILE @i <= 100
BEGIN
 INSERT INTO @FizzBuzzTable
   (Number, FizzBuzz)
 SELECT @i, FizzBuzz = CASE WHEN @i % 3 = 0 AND @i % 5 = 0 THEN 'Fizz Buzz'
   WHEN @i % 3 = 0 THEN 'Fizz'
   WHEN @i % 5 = 0 THEN 'Buzz'
   ELSE CAST(@i AS varchar(3)) END
 SET @i += 1
END
SELECT Number, FizzBuzz FROM @FizzBuzzTable FBT

Using a while loop to insert records may not be the best approach, so the example below implements a recursive CTE to gain 100 values, then applies the Fizz Buzz Logic query.

 ;WITH FizzBuzzTable( Number ) AS (
 SELECT 1 UNION ALL
 SELECT 1 + Number FROM FizzBuzzTable WHERE Number < 100
 )
 SELECT FizzBuzz= CASE WHEN Number % 3 = 0 AND Number % 5 = 0 THEN 'Fizz Buzz'
   WHEN Number % 3 = 0 THEN 'Fizz'
   WHEN Number % 5 = 0 THEN 'Buzz'
   ELSE CAST(Number AS varchar(3)) END
 FROM FizzBuzzTable

Using a recursive CTE can have performance issues, so the example below uses SQL written by Ben Itzak to gain 256 values, then applies the Fizz Buzz Logic query.

;WITH L1 AS (SELECT V1 = 1 UNION ALL SELECT V1 = 1) -- 2
 , L2 AS (SELECT V1 = 1 FROM L1 A CROSS APPLY L1 B)  -- 4
 , L3 AS (SELECT V1 = 1 FROM L2 A CROSS APPLY L2 B)  -- 16
 , L4 AS (SELECT V1 = 1 FROM L3 A CROSS APPLY L3 B)  -- 256
 , FizzBuzzTable AS (SELECT TOP 100 Number = ROW_NUMBER() OVER (ORDER BY V1) FROM L4)
 SELECT FizzBuzz = CASE WHEN Number % 3 = 0 AND Number % 5 = 0 THEN 'Fizz Buz'
   WHEN Number % 3 = 0 THEN 'Fizz'
   WHEN Number % 5 = 0 THEN 'Buzz'
   ELSE CAST(Number AS varchar(3)) END
 FROM FizzBuzzTable