T-SQL Bitwise Matching Logic – #TSQL2sDay

TSQLTUES

A good friend of mine Rick Kruger (@DataOgre|blog) is hosting this month’s T-SQL Tuesday Blog party. The T-SQL Tuesday Blog parties were started by Adam Machanic (@AdamMachanic|blog) back in December of 2009. This month’s topic is on Rube Goldberg Machines. Rick asked us to look through our closets to dig up a skeleton, something that we made T-SQL do, which might make other DBA’s cringe a bit. After hearing this, I knew exactly what I was going to blog about.

Bitwise Matching!

I remember when I implemented this feature and shook my head when I came up with the idea of using bitwise matching. The requirements I was given was to match two entities based on assigned lookup values. This would not have been that bad, except the lookup values were not static, they were dynamically entered through a web UI. When I first pitched the idea I used a dating site as a way to explain the logic, so in this blog post I will use that same concept.

Concept

Within a dating site one person is matched to another based on activities they both enjoy. Activities can be grouped into different categories. In this example we have two different categories; Exercise and Things to do. People must have at least one matching activity in each category to be a match. We are going to try to match Bob with another person.

People and their activities

  • Bob enjoys Walking, Biking, and Camping.
  • Sally enjoys Walking and Biking
  • Nancy enjoys Camping
  • Beth enjoys Walking, Running, Golfing, and Camping.

The following T-SQL implements a few temporary tables we will use to implement the bitwise matching logic.

-- Create the People that can be matched.
CREATE TABLE #Person (
   PersonID smallint
 , Name nvarchar(25));
INSERT   INTO #Person
         (Name)
VALUES   (1, N'Bob'), (2, N'Sally'), (3, N'Nancy'), (4, N'Beth');
 
-- Create the Activities that people may have in common.
CREATE TABLE #Activities (
   ActivityID smallint
 , Activity nvarchar(25)
 , CategoryID smallint);
 
INSERT   INTO #Activities
         (Activity, CategoryID)
VALUES   (1, N'Walking', 1), (2, N'Biking', 1), (3, N'Running', 1), (4, N'Yoga', 1),        
	 (5, N'Movies', 2), (6, N'Golf', 2), (7, N'Camping', 2);
 
-- Create the Weak Entity Table to store the Persons Activities.
CREATE TABLE #PersonActivities (
   ActivityID smallint
 , PersonID smallint);
 
INSERT   INTO #PersonActivities
         (PersonID, ActivityID)
VALUES   (1, 1), (1, 2), (1, 7),
         (2, 1),(2, 2),
         (3, 7),
         (4, 1),(4, 3),(4, 6),(4, 7);

Bitwise Logic

Using the activities we can assign a bitwise value based on the identity and category.
Example:

Activity Activity ID Category ID Bitwise Value
Walking 1 1 21 = 2
Biking 2 1 22 = 4
Running 3 1 23 = 8
Yoga 4 1 24 = 16
Movies 1 2 21 = 2
Golf 2 2 22 = 4
Camping 3 2 23 = 8

If we summarize the bitwise values we can use the binary and operator (&) to determine if any matches exist. Example:

Summed Bitwise Value 23 22 21 20
6 = 0 1 1 0
12 = 1 1 0 0
4 = 0 1 0 0

The value of 6 represents Walking and Biking, the value of 12 represents Biking and Running. The intersection of the two is Biking, this would be the matched value of 4. Being that we have a matched value, the result is greater than 0. Using this logic we can implement the following query.

WITH  PeopleSums
        AS (
            SELECT   p.Name
                   , p.PersonID
                   , a.CategoryID
                   , BitWiseSum = SUM(POWER(2, pa.ActivityID))
            FROM     #Person p
            JOIN     #PersonActivities pa ON p.PersonID = pa.PersonID
            JOIN     #Activities a ON a.ActivityID = pa.ActivityID
            GROUP BY p.Name
                   , p.PersonID
                   , a.CategoryID
           )
   SELECT   ps2.Name
   FROM     PeopleSums ps
   JOIN     PeopleSums ps2 ON ps.PersonID != ps2.PersonID
                              AND ps.BitWiseSum & ps2.BitWiseSum <> 0
                              AND ps.CategoryID = ps2.CategoryID
   WHERE    ps.Name = 'Bob'
   GROUP BY ps.Name
          , ps2.Name
   HAVING   COUNT(*) = 2;

This query uses a CTE to calculate and then summarize the bitwise values, grouped by person and category. We then self-reference the CTE using the binary AND operator (&) where the result is anything but zero. This concept can show us who Bob matches, we can use the table below to illustrate this.

Persons that Match Bob Category 1 Category 2
Sally Walking, Biking
Nancy Camping
Sally Walking Camping

To make sure that we have a match for each category, we do a count and ensure that it matches the number of categories we currently have. And voila Beth is a match for Bob, because Beth is the only candidate that had matches in both categories.

LAG() and LEAD() Analytical Window Functions

I have been finding more and more uses for the OVER statement recently and I figured it was about time I did a blog post on it. The OVER statement captures a window of data that can be aggregated and included with the details of the original result set. There are a set of window functions that can be applied to the OVER statement to generate row numbers, ranking values, averaging, getting previous values and next values in a set. This idea of getting the previous and next values, in a set of data, is a problem that I just ran into recently. This blog post will describe the usage of LAG and LEAD analytical functions. LAG and LEAD are two new Analytical Window Functions that were introduced in SQL Server 2012.

Recently I was tasked with calculating the difference between two values, however the values existed in different rows. My first thought was a self-join on the table and compare the two results. Using Adventure Works 2012, I ran the following query and the results are shown.

SELECT   soh.PurchaseOrderNumber
       , soh.OrderDate
       , soh.CustomerID
       , PreviousTotal = soh1.TotalDue
       , CurrentTotal = soh.TotalDue
       , NextTotal = soh2.TotalDue
FROM     Sales.SalesOrderHeader soh
JOIN     Sales.SalesOrderHeader soh1 ON soh1.SalesOrderID = soh.SalesOrderID - 1
JOIN     Sales.SalesOrderHeader soh2 ON soh2.SalesOrderID = soh.SalesOrderID + 1;

LagLeadImg1

This result set shows the previous and next record values for the TotalDue column. This got me 1/2 way to what was needed to perform the difference calculations. I needed the results for a given customer rather than all customers. I updated the query to allow for a customer ID to be passed into the query and the results are shown.

DECLARE @CustomerID int = 30117;
SELECT   soh.PurchaseOrderNumber
       , soh.OrderDate
       , soh.CustomerID
       , PreviousTotal = soh1.TotalDue
       , CurrentTotal = soh.TotalDue
       , NextTotal = soh2.TotalDue
FROM     Sales.SalesOrderHeader soh
LEFT JOIN Sales.SalesOrderHeader soh1 ON soh1.SalesOrderID = soh.SalesOrderID - 1
                                         AND soh1.CustomerID = soh.CustomerID
LEFT JOIN Sales.SalesOrderHeader soh2 ON soh2.SalesOrderID = soh.SalesOrderID + 1
                                         AND soh2.CustomerID = soh.CustomerID
WHERE    soh.CustomerID = @CustomerID;

LagLeadImg2

This is was a problem. Being that I was using the SalesOrderID to perform the self-joins on the tables, and the Customer’s records were not in sequential order, I was not getting any of the previous and next records.

LEAD()/LAG() OVER (PARTITION BY)

Instead of going down the path of craziness, I decided to eliminate the use of the self-referencing joins and opted to use the OVER clause with LEAD and LAG to solve the problem. Using the following query and output we can explain the LEAD(), LAG(), OVER (PARTITION BY) statements.

DECLARE @CustomerID int = 30117;
SELECT   PurchaseOrderNumber
       , OrderDate
       , CustomerID
       , CurrentTotalDue = TotalDue
FROM     Sales.SalesOrderHeader soh
WHERE    soh.CustomerID = @CustomerID;

LagLeadImg3

In the blue box is the CustomerID column, this is the column we want to Partition by.

In the red box is the PurchaseOrderNumber and OrderDate columns, these columns we want to order the results of the window.

In the green box is the CurrentTotalDue, this is the column that we will pass to the LEAD and LAG window functions.

I updated the query to use the LEAD and LAG window functions and the results are shown. The red highlights show the results of the LEAD function and the green highlights show the results of the LAG function.

DECLARE @CustomerID int = 30117;
SELECT   PurchaseOrderNumber
       , OrderDate
       , PreviousTotal = LAG(TotalDue) OVER (PARTITION BY CustomerID ORDER BY OrderDate , PurchaseOrderNumber)
       , CurrentTotal = TotalDue
       , NextTotal = LEAD(TotalDue) OVER (PARTITION BY CustomerID ORDER BY OrderDate , PurchaseOrderNumber)
FROM     Sales.SalesOrderHeader soh
WHERE    soh.CustomerID = @CustomerID;

LagLeadImg5

As you can see the syntax for writing a query to use these LEAD and LAG are not overly complex. In a future blog post I will dive deeper into using the OVER statement.

Rules of Engagement – Key Value Stores

Well Rick Krueger( blog | @DataOgre ) and I are back at it again. We have decided to pair up and do a talk about a rather controversial topic in the SQL Server community, NoSQL. Now we are not jumping ship and saying that NoSQL is the way of the future, because that is simply not true. Relational databases have been around since the 70’s and still outperform NoSQL Solutions at most tasks. The idea that we are going to seed is that these NoSQL solutions may have a place in enterprise applications. We are not going to say you should use SQL Server or a NoSQL Solution exclusively, rather we are going to claim Polyglot Persistence. Polyglot Persistence is the idea of using multiple data stores to solve multiple problems, hence Rules of Engagement: NoSQL is SQL Server’s Ally. In this blog post I will be discussing the simplest version of the NoSQL solutions, Key Value Stores.

Key Value Stores

KeyValueStore

There are many Key Value store implementations, however at the root of each solution, they are simply storing values by a specific key. You can think of this as a two column table, a hash table, or an associative array.
They use a unique indexed key, which allows for fast retrieval of data, and a blob type of field for the value. The key value stores are typically written in some type of programming language, commonly Java. This gives the application developer the freedom to store data how they see fit, in a schema-less data store. Yes I used application developer, freedom, data, and schema-less all in the same sentence. This is a good thing. Typically we, as the DBA’s, are not going to be distracted by how the data is stored in the Key Value Store. Rather, we will be involved with serving up the data that will be pushed into the Key Value store, and helping with any transaction support that is needed.

One of the biggest benefit for most NoSQL solutions, including Key Value Stores, would be horizontal scaling. We all know that horizontal scaling and SQL Server, while it’s possible, does not play well. Typically if you need more from SQL Server you scale vertically, which can be costly.

Another benefit for Key Value stores is a lack of schema, this allows for changing the data structure as needed, thus being a bit more flexible. Whereas with SQL Server altering a table could result in stored procedures, functions, views, etc… needing updates, which take time and a DBA resource.

Key Value stores support “Eventual Consistency”, if a feature in your application doesn’t need to fully support ACID, then may not be a significant draw back. However the features of your application that need ACID transaction support should use a RDBMS, like SQL Server.

Another drawback is querying data from a Key Value store. Simple queries for keys in Key Value stores are a minor task because they are indexed. However, supporting joins or complex queries takes coding by an application developer. Code developed queries lose out on years of SQL Server optimization and caching features.

A Key Value Store Implementation

RedisExampleRedis
Redis is an advanced Key Value store, meaning that it can support more complex data types such as Lists, Objects, Hashes, Sets, etc… Redis is a combination of an in-memory and persisted to disk data store. This gives Redis great performance on reads and writes, however it puts a limitation on what can be stored, as memory becomes a constraint. The basic commands for CRUD operations in Redis are rather simple: GET, SET, and DEL. http://try.redis.io/ offers an online utility for learning some of the commands.

Polyglot Persistence, SQL Server and Key Value Stores

Obviously there are pros and cons for Key Value Stores, however creating a hybrid solution, could solve the drawbacks for both SQL Server and Key Value Stores. Every application is different and ultimately you will need to decide what the best fit is, for your application and budget.

With that said an example for a Key Value Store could be session data. Where the session id would be the key and all the session data would be in the value. An in-memory database, such as Redis, could produce quick results to the UI, significantly improving the web applications performance. However when that session ends, all of the updated detailed user information would ultimately need to be persisted, let’s say in a SQL Server Users table.

Or maybe a shopping cart feature, again the shopping cart id would be the key and the contents of the shopping cart would be in the value. What happens when the user checks out? That’s right, we will need to store that transaction, user information, product information, and inventory into a RDBMS such as SQL Server

Conclusion

I am not recommending that you re-write your applications to make room for a NoSQL solution, I am simply suggesting that you as the DBA (keeper of data), should keep an open mind to other possibilities. If there is another solution out there that could assist SQL Server, or is maybe a better fit, exploring the avenue might result in a better data solution. A square object will fit into a round hole if you have a big enough hammer, but could take time and produce a bad product.

T-SQL Tuesday #39 – Can you shell what the PoSH is Cooking?

TSQLTUES

Wayne Sheffield (blog | @DBAWayne) invited me to a party, and anyone that knows me, knows I love a good party. This month’s party is all about how we have integrated Posh into our SQL lives. Thank you Wayne for hosting this month’s T-SQL Tuesdays #39 Can you shell what the PoSH is Cooking?

I have to admit that I am relatively new to PowerShell and all its functionality, however I have quickly learned the power of such a tool.

Before this party started (T-SQL Tuesday #39), I was tasked with creating a process to upload reports to Reporting Services. This process had to be repeatable, automatic, and configurable. My initial suggestion was to just upload each report to the necessary folders, however that still includes the human factor, so repeat-ability is out. After digging around on the inter-webs I came across examples of SSRS report push using a Web Services Proxy. After a few modifications, I was able to use a couple Cmdlets and the Web Services method to upload reports. In this blog post I am going to show how to use Powershell v2.0 Cmdlets and Proxy calls for uploading SSRS reports to a Reporting Services Web Server.

The following example demonstrates how to upload a report (RDL file) and load the report to appropriate folder using the Reporting Services Web Services.

$SSRSService = "http://localhost/ReportServer/ReportService2005.asmx"
$SSRSReport = "C:\Dev\SSRSReport.rdl"
$SSRSNameWithoutExtension = [System.IO.Path]::GetFileNameWithoutExtension($SSRSReport);
$SSRSPath = "/"
 
Write-Output "Reporting Services Server: $ReportingServices"
$SSRSProxyObject = New-WebServiceProxy -Uri $SSRSService -UseDefaultCredential 
 
Write-Output "Converting $SSRSReport to a byte array"
$ByteArray = Get-Content $SSRSReport -Encoding byte     
 
Write-Output "SSRS Report Name: $SSRSNameWithoutExtension "
$SSRSProxyObject.CreateReport($SSRSNameWithoutExtension, $SSRSPath, $true, $ByteArray, $null)

The key PowerShell statements are:

New-WebServiceProxy (Cmdlet)

The New-WebServiceProxy Cmdlet is used for creating a web services proxy object. We will use this to create a proxy object from our Reporting Services Servers Web Services.
New-WebServiceProxy Cmdlet
Example:

$SSRSProxyObject = New-WebServiceProxy -Uri $SSRSService -UseDefaultCredential

Get-Content (Cmdlet)

We will use the Get-Content Cmdlet to get the byte array representation of the RDL file.
Get-Content for FileSystem Cmdlet
Example:

$ByteArray = Get-Content $SSRSReport -Encoding byte

CreateReport (SSRS Web-Service Call)

After we have our Reporting Services Proxy object, we can use it to call a CreateReport() method, which uploads the report to your selected $SSRSPath.
ReportingService2005.CreateReport Method
Example:

$SSRSProxyObject.CreateReport($SSRSNameWithoutExtension, $SSRSPath, $true, $ByteArray, $null)

SQL Saturday Homecoming

Minnesota Sql Saturday #149

RA RA REE kick them in the knee

RA RA RASS kick them in the other knee

And the crowd goes wild!

OK OK OK, this is why I am a DBA not a cheerleader…

Homecoming is always a fun time and for Rick ( b | t ) and I it is no different. We started our journey a few months back with the grand vision of presenting at SQL Saturday #149. We went out on the road to SQL Saturday #159 Kansas City and SQL Saturday #161 East Iowa to share our presentations, and now its time for our homecoming. This coming Saturday, September 29th 2012 Rick and I will be presenting again at SQL Saturday #149 Minnesota.

Currently we are scheduled to present Freaky Fast Development Tips in Room: Rapson 45 at 8:30 AM. Then we follow that up with SSRS: Reporting on Reports in Room: Rapson 43 at 2:15 PM.

SQL Saturday #149 Schedule

So come on out and celebrate our homecoming with us!

Freaky Fast Database Development Tips

Join Rick and Dave on this most excellent adventure as they teach you how to get more work done in less time. Leave your checkbook at home, because this is all about maximum productivity with minimum budget. Leave with Visual Studio and SQL Server Management Studio shortcut keys, extensions, macros, and add-ins. We guarantee you will see something you haven’t seen before, or your money back. Co-Presenter: Rick Krueger (@DataOgre)

SSRS: Reporting on Reports

Business Intelligence gets a lot of press these days, but do you often wonder to yourself as you are writing a report, ‘Who is going to use this’? Learn about the rich information that SSRS is already capturing for you about your reports: who is running them and how often, how long do they take to execute, how much time is spent in data retrieval vs rendering, which parameters were passed in, and more… Co-Presenter: Rick Krueger (@DataOgre)