Rules of Engagement – NoSQL Graph Databases

Well Rick Krueger( blog | @DataOgre ) and I are back at it again. We have decided to pair up and do a talk about a rather controversial topic in the SQL Server community, NoSQL. Now we are not jumping ship and saying that NoSQL is the way of the future, because that is simply not true. Relational databases have been around since the 70′s and still outperform NoSQL Solutions at most tasks. The idea that we are going to seed is that these NoSQL solutions may have a place in enterprise applications. We are not going to say you should use SQL Server or a NoSQL Solution exclusively, rather we are going to claim Polyglot Persistence. Polyglot Persistence is the idea of using multiple data stores to solve multiple problems, hence Rules of Engagement: NoSQL is SQL Server’s Ally. In a previous blog post I discussed Key Value Stores and Column Stores, in this blog post I will be discussing another version of the NoSQL solutions, Graph Databases.

Graph Databases

Graph Databases are data structures that consist of nodes, properties, and edges. In RDBMS terms a node would be a record in a table, a property would be the data within the record, and an edge would be similar to the relationship between tables. However the edges can store properties too, these properties describe the relationship between the nodes.

In the RDBMS land, writing queries to join many large tables, especially self referencing tables, can degrade performance of a query and can be rather costly, even with proper index usage. This is where Graph databases really shine, the graph databases has the ability traverse nodes using the edges and the properties within the edges. Using Graph Database to find relationships between highly interconnected data is very fast. Finding data that would take multiple joins or a recursive query in a RDBMS is a simple task with Graph Databases.

You can visualize a graph database as a flattened RDBMS table structure with named connections. Looking at the the relational tables below we see simple car buying and selling database. This database has a list of cars that people currently own and the types of cars they are looking for.GraphDB_DBStructure If we were to flatten and denormalize the structure we would get something that looks like this.

GraphDB_GraphStructure

Here the Car node is the CarType and Manufacture tables denormalized, the Person node is the Person table, and the edges that connect these two nodes would be the Request and Cars tables. Depending on the data you are looking for, both of these structures have their merits. Lets say you need to write a query that will return a list cars for a given owner that is willing to trade with another owner. This is a simple query for both, however what if there is not a match, but there is a match using a 3rd party for a trade. Or maybe there needs to be a 4th or 5th party for a four or five way trade. This query is starting to get very complicated in a RDBMS, and the performance at each level is progressively degrading. This is where a Graph Database is really going to stand out! Being that the emphasis is on the edges between the nodes, the query is simple and will give you performance your application will desire.

Graph Database Implementations

One of the most recognized Graph Databases implementations is Neo4j and is currently made available by Neo Technologies.

“Neo4j is a robust (fully ACID) transactional property graph database. Due to its graph data model, Neo4j is highly agile and blazing fast. For connected data operations, Neo4j runs a thousand times faster than relational databases.” – http://www.neo4j.org/

Neo4j is available for most platforms and comes with a WebAdmin management console for viewing the graph database. Below are some screen captures of the Web Admin utility showing a Dashboard for over all database health, and some ways to view the data inside the database.
Neo4JDashboard
Neo4JDataBrowser
Neo4JDataViewer

Polyglot Persistence, SQL Server and Graph Databases

Recently Facebook announced a graph search feature in their application.

“With graph search you can look up anything shared with you on Facebook, and others can find stuff you’ve shared with them.” – https://www.facebook.com/about/graphsearch

This is idea of using a graph database within their application is the idea behind polyglot persistence. Their primary data could be stored in any data structure such as a RDBMS, but their graph search feature would be stored in a Graph Database.

Conclusion

I am not recommending that you re-write your applications to make room for a NoSQL solution, I am simply suggesting that you as the DBA (keeper of data), should keep an open mind to other possibilities. If there is another solution out there that could assist SQL Server, or is maybe a better fit, exploring the avenue might result in a better data solution. As always, a round peg will fit into a square hole if you have a big enough hammer.

Rules of Engagement – NoSQL Column Data Stores

Well Rick Krueger( blog | @DataOgre ) and I are back at it again. We have decided to pair up and do a talk about a rather controversial topic in the SQL Server community, NoSQL. Now we are not jumping ship and saying that NoSQL is the way of the future, because that is simply not true. Relational databases have been around since the 70′s and still outperform NoSQL Solutions at most tasks. The idea that we are going to seed is that these NoSQL solutions may have a place in enterprise applications. We are not going to say you should use SQL Server or a NoSQL Solution exclusively, rather we are going to claim Polyglot Persistence. Polyglot Persistence is the idea of using multiple data stores to solve multiple problems, hence Rules of Engagement: NoSQL is SQL Server’s Ally. In a previous blog post I discussed Key Value Stores, in this blog post I will be discussing another version of the NoSQL solutions, Column Data Stores.

Column Data Stores

Column Data Stores and Relational Database share some concepts like rows and columns. However, column data stores do not require columns to be defined, nor is space allocated for each column. A column is simply a key value pair, where the key is an identifier and the value stores values related to the key. Columns can be added without all of the overhead that is incurred with a Relational Database. A column can exist for every row, some rows, or only one row. A row can have many different columns or just one. Being that the primary focus of column data stores is the column, performing aggregations or returning column data becomes very fast. The image below depicts a sample column store, with a concept of column families, which are groupings of like column data. This concept of groupings help with more complex queries.ColumnDataStoreThe image above shows a simple example of what a column’s key values could look like, however they can store much more. The columns themselves can store rows and columns. Thinking of this as a table with-in a table would be correct. In a relational database we would use a secondary table linking back to the primary with a foreign key. The example below shows the comparison between a relational model and a column store model.ColumnDataStoreAdvancedThe big advantage here is that we are bringing the related data closer to each other, which makes data retrieval fast. However trying to aggregate this type of data becomes difficult and requires a Map Reduce job.

Just like the Key Value stores two big benefits to Column Stores are horizontal scaling and the lack of a schema requirement. However, with Column Stores, having a good understanding of the tables and key structure defined prior to development helps implementing a useful data store.

Column Data Stores Implementations

One of the most recognized Column Data Stores comes from the Hadoop world, it’s called HBase. Apache HBase is the database that sits on top of Hadoop and HDFS (Hadoop File System). HBase was designed to handle big data, millions of records with millions of columns type of big data. This is something you should keep in mind when selecting a data store. If you have big data, then looking at a data store solution like HBase is worth the effort. One of the biggest benefits that I have seen is the ability to store all the data that we would normally have to archive in a RDBMS. Here is an example, think of an Automobile company like GM. They have been selling cars and servicing cars for years. This would be big data! In a RDBMS we would have to eventually archive data to keep costs down, however with an HBase solution, we simply add more nodes to the Hadoop cluster and let HDFS work its magic. Imagine being able to go back and getting every service record for a car built in the 50′s, this is what HBase and Hadoop can give you.

Polyglot Persistence, SQL Server and Column Data Stores

An example would be an enterprise application that stores US census data. Here again we have big data. Let’s say the requirements are that we need to store the past 50 years of census data. However only the past 3 years need to be returned in real time, whereas the rest of the data could be returned over time, maybe in a queued report. We could store all 50 years of data in a RDBMS, however that could get rather costly and most likely will impact performance. Rather, we could store the past 3 years in a RDBMS and push the remaining data into a NoSQL solution and scale horizontally as needed. This hybrid data solution is polyglot persistence and best of all solves the problem at hand.

Conclusion

I am not recommending that you re-write your applications to make room for a NoSQL solution, I am simply suggesting that you as the DBA (keeper of data), should keep an open mind to other possibilities. If there is another solution out there that could assist SQL Server, or is maybe a better fit, exploring the avenue might result in a better data solution. As always, a round peg will fit into a square hole if you have a big enough hammer.

Rules of Engagement – Key Value Stores

Well Rick Krueger( blog | @DataOgre ) and I are back at it again. We have decided to pair up and do a talk about a rather controversial topic in the SQL Server community, NoSQL. Now we are not jumping ship and saying that NoSQL is the way of the future, because that is simply not true. Relational databases have been around since the 70’s and still outperform NoSQL Solutions at most tasks. The idea that we are going to seed is that these NoSQL solutions may have a place in enterprise applications. We are not going to say you should use SQL Server or a NoSQL Solution exclusively, rather we are going to claim Polyglot Persistence. Polyglot Persistence is the idea of using multiple data stores to solve multiple problems, hence Rules of Engagement: NoSQL is SQL Server’s Ally. In this blog post I will be discussing the simplest version of the NoSQL solutions, Key Value Stores.

Key Value Stores

KeyValueStore

There are many Key Value store implementations, however at the root of each solution, they are simply storing values by a specific key. You can think of this as a two column table, a hash table, or an associative array.
They use a unique indexed key, which allows for fast retrieval of data, and a blob type of field for the value. The key value stores are typically written in some type of programming language, commonly Java. This gives the application developer the freedom to store data how they see fit, in a schema-less data store. Yes I used application developer, freedom, data, and schema-less all in the same sentence. This is a good thing. Typically we, as the DBA’s, are not going to be distracted by how the data is stored in the Key Value Store. Rather, we will be involved with serving up the data that will be pushed into the Key Value store, and helping with any transaction support that is needed.

One of the biggest benefit for most NoSQL solutions, including Key Value Stores, would be horizontal scaling. We all know that horizontal scaling and SQL Server, while it’s possible, does not play well. Typically if you need more from SQL Server you scale vertically, which can be costly.

Another benefit for Key Value stores is a lack of schema, this allows for changing the data structure as needed, thus being a bit more flexible. Whereas with SQL Server altering a table could result in stored procedures, functions, views, etc… needing updates, which take time and a DBA resource.

Key Value stores support “Eventual Consistency”, if a feature in your application doesn’t need to fully support ACID, then may not be a significant draw back. However the features of your application that need ACID transaction support should use a RDBMS, like SQL Server.

Another drawback is querying data from a Key Value store. Simple queries for keys in Key Value stores are a minor task because they are indexed. However, supporting joins or complex queries takes coding by an application developer. Code developed queries lose out on years of SQL Server optimization and caching features.

A Key Value Store Implementation

RedisExampleRedis
Redis is an advanced Key Value store, meaning that it can support more complex data types such as Lists, Objects, Hashes, Sets, etc… Redis is a combination of an in-memory and persisted to disk data store. This gives Redis great performance on reads and writes, however it puts a limitation on what can be stored, as memory becomes a constraint. The basic commands for CRUD operations in Redis are rather simple: GET, SET, and DEL. http://try.redis.io/ offers an online utility for learning some of the commands.

Polyglot Persistence, SQL Server and Key Value Stores

Obviously there are pros and cons for Key Value Stores, however creating a hybrid solution, could solve the drawbacks for both SQL Server and Key Value Stores. Every application is different and ultimately you will need to decide what the best fit is, for your application and budget.

With that said an example for a Key Value Store could be session data. Where the session id would be the key and all the session data would be in the value. An in-memory database, such as Redis, could produce quick results to the UI, significantly improving the web applications performance. However when that session ends, all of the updated detailed user information would ultimately need to be persisted, let’s say in a SQL Server Users table.

Or maybe a shopping cart feature, again the shopping cart id would be the key and the contents of the shopping cart would be in the value. What happens when the user checks out? That’s right, we will need to store that transaction, user information, product information, and inventory into a RDBMS such as SQL Server

Conclusion

I am not recommending that you re-write your applications to make room for a NoSQL solution, I am simply suggesting that you as the DBA (keeper of data), should keep an open mind to other possibilities. If there is another solution out there that could assist SQL Server, or is maybe a better fit, exploring the avenue might result in a better data solution. A square object will fit into a round hole if you have a big enough hammer, but could take time and produce a bad product.

SQL Saturday Homecoming

Minnesota Sql Saturday #149

RA RA REE kick them in the knee

RA RA RASS kick them in the other knee

And the crowd goes wild!

OK OK OK, this is why I am a DBA not a cheerleader…

Homecoming is always a fun time and for Rick ( b | t ) and I it is no different. We started our journey a few months back with the grand vision of presenting at SQL Saturday #149. We went out on the road to SQL Saturday #159 Kansas City and SQL Saturday #161 East Iowa to share our presentations, and now its time for our homecoming. This coming Saturday, September 29th 2012 Rick and I will be presenting again at SQL Saturday #149 Minnesota.

Currently we are scheduled to present Freaky Fast Development Tips in Room: Rapson 45 at 8:30 AM. Then we follow that up with SSRS: Reporting on Reports in Room: Rapson 43 at 2:15 PM.

SQL Saturday #149 Schedule

So come on out and celebrate our homecoming with us!

Freaky Fast Database Development Tips

Join Rick and Dave on this most excellent adventure as they teach you how to get more work done in less time. Leave your checkbook at home, because this is all about maximum productivity with minimum budget. Leave with Visual Studio and SQL Server Management Studio shortcut keys, extensions, macros, and add-ins. We guarantee you will see something you haven’t seen before, or your money back. Co-Presenter: Rick Krueger (@DataOgre)

SSRS: Reporting on Reports

Business Intelligence gets a lot of press these days, but do you often wonder to yourself as you are writing a report, ‘Who is going to use this’? Learn about the rich information that SSRS is already capturing for you about your reports: who is running them and how often, how long do they take to execute, how much time is spent in data retrieval vs rendering, which parameters were passed in, and more… Co-Presenter: Rick Krueger (@DataOgre)

Another Success at SQL Saturday #161

East Iowa Sql Saturday #161

So my adventure continues…

One short week after a successful pair of sessions in Kansas City, Rick ( b | t ) and I found ourselves back on the road to East Iowa. This road trip was much shorter, and we were able to make (and find) the happy hour event at Back Pocket Brewing in Coralville, IA. If you happen to be in Coralville, I recommend the Double Bock or Slingshot both are pretty tasty. We met up with Ed Leighton-Dick ( b | t ) first, followed by Pat Phelan ( t ) and Eric Selje ( b | t ). Later Sheila Acker ( t ) and Larry Toothman ( t ) arrived. After a couple hours of great conversations, it was close to 8:00 PM and we all proceeded to the speakers dinner. Ed picked a great Italian restaurant called Monica’s. I was able to sit next to Karla Landrum ( b | t ) and Russell Fustino ( b | t ) and had great conversations about everything from SQL Server to family and friends.

One of my primary objectives in speaking at the SQL Saturdays is getting out and meeting new people in the SQL Family, and so far so good. But now it was time to get back to my hotel and review the session materials one more time…

Dave and a Hawkeye

The next morning we made our way to the SQL Saturday #161 event. One thing was very apparent, we were in Hawkeyes Territory. Now as a side note, I saw two kilts at the SQL Saturday #159 event in Kansas City, and now here in East Iowa I am seeing another kilt, maybe this is a new trend for all SQL Saturdays…

We presented Development Tips at 11:00 AM and Reporting on Reports at 3:45 PM. We received very positive feedback and had great dialog in both sessions. Thank you to all of you that came to the sessions. I was able to see Erin Stellato ( b | t ) present on statistics, she did an amazing job! Overall I thought the event was great, Ed and his team did an amazing job organizing the schedules and the event overall. After the event we stayed and mingled at the after party for about an hour before we made our way back home to Minnesota.

Minnesota Sql Saturday #149

The drive home to Minnesota was nice, all the stress and worries were gone and I could finally relax. As I reflected on the past two weeks, I was rather amazed of our accomplishments and realized that I am hooked on presenting at these sessions. Now I am looking forward to SQL Saturday #149 in our home town…

BTW, I just found out yesterday that both of our sessions were selected. Hope to see you there as my adventure continues…