Typically when we think of the TSQL function CHECKSUM, we are thinking about a hash index. A hash index can be used to improve indexing of columns with long character data types. Rather than trying to add an index to a NVARCHAR(2000) column, you can hash the NVARCHAR(2000) column and index on that hashed value, using the CHECKSUM function. The CHECKSUM function can take multiple columns as parameters, and it builds a hash value for the multiple columns. This gives us even greater flexibility in our hash indexing.
I am going to demonstrate another usage for the TSQL function CHECKSUM, and I will also bring the aggregate version called CHECKSUM_AGG along for the ride.
What if we used the CHECKSUM_AGG to aggregate the CHECKSUM values over multiple columns. This would give us the ability to handle complex search requirements for sets of data within a given table. Consider the following dataset:
GCID HoleNum Par Sand Water Yard 1 1 4 1 0 380 1 2 5 1 1 500 1 3 3 0 1 180 1 4 5 0 0 420 2 1 4 1 0 370 2 2 3 1 1 510 2 3 5 0 1 190 2 4 4 0 0 400 3 1 4 1 1 380 3 2 3 1 0 500 3 3 5 0 0 180 3 4 5 0 0 420
Let’s say we want to know which courses have the following conditions:
- Hole 1 is Par 4 with sand and water
- Hole 2 is Par 3 with sand and no water
- Hole 3 is Par 5 with no sand and no water
We can use both the CHECKSUM AND CHECKSUM_AGG to calculate a hash value from a table variable, and compare the hash value to calculated hash value from our data set.
--Setup Golf Course Holes Table DECLARE @CGHoles TABLE(CGID int, HoleNum tinyint, Par tinyint, Details nvarchar(20), Yards smallint) --Populate with some test data INSERT INTO @CGHoles (CGID, HoleNum, Par, Details, Yards) VALUES (1,1,4,'Sand',380),(1,2,5,'Water',500),(1,3,3,'Sand',180),(1,4,5,'Both',420) ,(2,1,4,'Sand',370),(2,2,3,'Both',510),(2,3,5,'Water',190),(2,4,4,'None',400) ,(3,1,4,'Both',380),(3,2,3,'Sand',500),(3,3,5,'None',180),(3,4,5,'Water',420) --Create Search Criteria Table DECLARE @Search TABLE(HoleNum tinyint, Par tinyint, Details nvarchar(20)) --Populate Search Criteria INSERT INTO @Search (HoleNum, Par, Details) VALUES (1,4,'Both'),(2,3,'Sand'),(3,5,'None') -- Using a Common Table Expression, calculate the aggragated checksum and -- compare against the tables aggregated checksum ;WITH CTE AS ( SELECT CKSM = CHECKSUM_AGG(CHECKSUM(HoleNum, Par, Details)) FROM @Search S ) SELECT CGID FROM @CGHoles CH CROSS JOIN CTE WHERE HoleNum BETWEEN 1 AND 3 GROUP BY CGID, CKSM HAVING CKSM = CHECKSUM_AGG(CHECKSUM(HoleNum, Par, Details))
Problem with CHECKSUM:
Notice that in the example code I switched to using words to represent sand, water, both, or none instead of using bit values. I did this because of a range of values limitation in the CHECKSUM function. The CHECKSUM function has a limited range of values of 231 possible values. With this limitation we cannot guarantee values will not be re-used, which is what happened while using the bit values. The issue was both GCID 1 and 3 both were a match where only GCID 3 is the only true match. In a future post I will an alternative way to solve this same problem
With the limitation in range of values, it is advised to use the CHECKSUM functions with caution. However, if your applications can allow for a margin of error in comparisons, or use of an additional check to verify the comparisons, then using the CHECKSUM functions can provide fantastic functionality.