Fun with Checksum…

Typically when we think of the TSQL function CHECKSUM, we are thinking about a hash index. A hash index can be used to improve indexing of columns with long character data types. Rather than trying to add an index to a NVARCHAR(2000) column, you can hash the NVARCHAR(2000) column and index on that hashed value, using the CHECKSUM function. The CHECKSUM function can take multiple columns as parameters, and it builds a hash value for the multiple columns. This gives us even greater flexibility in our hash indexing.

I am going to demonstrate another usage for the TSQL function CHECKSUM, and I will also bring the aggregate version called CHECKSUM_AGG along for the ride.

Alternate use:

What if we used the CHECKSUM_AGG to aggregate the CHECKSUM values over multiple columns. This would give us the ability to handle complex search requirements for sets of data within a given table. Consider the following dataset:

GCID	HoleNum	Par	Sand	Water	Yard
1	1	4	1	0	380
1	2	5	1	1	500
1	3	3	0	1	180
1	4	5	0	0	420
2	1	4	1	0	370
2	2	3	1	1	510
2	3	5	0	1	190
2	4	4	0	0	400
3	1	4	1	1	380
3	2	3	1	0	500
3	3	5	0	0	180
3	4	5	0	0	420

Let’s say we want to know which courses have the following conditions:

  • Hole 1 is Par 4 with sand and water
  • Hole 2 is Par 3 with sand and no water
  • Hole 3 is Par 5 with no sand and no water

We can use both the CHECKSUM AND CHECKSUM_AGG to calculate a hash value from a table variable, and compare the hash value to calculated hash value from our data set.

--Setup Golf Course Holes Table
DECLARE @CGHoles TABLE(CGID int, HoleNum tinyint, Par tinyint, Details nvarchar(20), Yards smallint)
 
--Populate with some test data
INSERT INTO @CGHoles (CGID, HoleNum, Par, Details, Yards)
VALUES (1,1,4,'Sand',380),(1,2,5,'Water',500),(1,3,3,'Sand',180),(1,4,5,'Both',420)
,(2,1,4,'Sand',370),(2,2,3,'Both',510),(2,3,5,'Water',190),(2,4,4,'None',400)
,(3,1,4,'Both',380),(3,2,3,'Sand',500),(3,3,5,'None',180),(3,4,5,'Water',420)
 
--Create Search Criteria Table 
DECLARE @Search TABLE(HoleNum tinyint, Par tinyint, Details nvarchar(20))
 
--Populate Search Criteria
INSERT INTO @Search
         (HoleNum, Par, Details)
VALUES   (1,4,'Both'),(2,3,'Sand'),(3,5,'None')
 
-- Using a Common Table Expression, calculate the aggragated checksum and 
-- compare against the tables aggregated checksum
;WITH  CTE AS (
    SELECT   CKSM = CHECKSUM_AGG(CHECKSUM(HoleNum, Par, Details))
    FROM     @Search S
 )
    SELECT   CGID
    FROM     @CGHoles CH
    CROSS JOIN CTE
    WHERE    HoleNum BETWEEN 1 AND 3
    GROUP BY CGID, CKSM
    HAVING   CKSM = CHECKSUM_AGG(CHECKSUM(HoleNum, Par, Details))

Problem with CHECKSUM:

Notice that in the example code I switched to using words to represent sand, water, both, or none instead of using bit values. I did this because of a range of values limitation in the CHECKSUM function. The CHECKSUM function has a limited range of values of 231 possible values. With this limitation we cannot guarantee values will not be re-used, which is what happened while using the bit values. The issue was both GCID 1 and 3 both were a match where only GCID 3 is the only true match. In a future post I will an alternative way to solve this same problem

With the limitation in range of values, it is advised to use the CHECKSUM functions with caution. However, if your applications can allow for a margin of error in comparisons, or use of an additional check to verify the comparisons, then using the CHECKSUM functions can provide fantastic functionality.

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.