Dave + SQL Server = Ingenious SQL….

SQL Thoughts from @IngeniousSQL

SSIS Importing and Validating Credit Card Data

POS Touch PadI spent some of my early career working with POS data. The irony of Point of Sale data is it is really was POS data. I spent a lot of time validating data all sorts of data, one of the big pain points was validating credit card data. When I decided to write a presentation that’s all about SSIS Scripting, the first demo idea that came to me was validating credit card data. In this blog post we will import credit card information from a flat text file. We will feed the credit card data into a Script Component as a transformation. Using the script component we will validate the credit card number against the LUHN algorithm, if the credit card is valid we will also determine the type of credit card. We will create separate flows for American Express, Master Card, Visa, and invalid credit cards. From the flows we will persist the results into their respective flat text files.

LUHN Algorithm

Before we get into the SSIS package definition having a basic understanding of the LUHN algorithm and factors for credit card types may be useful. The LUHN algorithm uses the last digit in the credit card number as a check digit. Then by processing the remaining numbers with a specific set of rules, shown below, we can calculate what the check digit should be.

  1. Save and removed the last digit from the CC Number. This digit is what we will use to check against the calculated check digit
  2. Starting with the last digit and working backwards, double every other value (the odd spaces) any result higher than 9 summing the digits 18 == 1 + 8 == 9
  3. Add all the numbers together and multiply by 9
  4. The check digit (that the last number of the card) should match the result modulus 10

Here is the implementation of the LUHN algorithm in C#.net.

private bool IsLUHN(string CCNumber) {
  /********************************************************************************************************************************************
  The LUHN Algorithm:
     1.  Save and removed the last digit from the CC Number. This digit is what we will use to check against the calculated check digit
     2.  Reverse the array, this is easier than going backward in an array from the last position.
     3.  Multiply the values in the odd indexes by 2 and subtract 9 from any result higher than 9.  This is the same as summing the digits 18 - 9 == 1 + 8
     4.  Add all the numbers together and multiply by 9
     5.  The check digit (the last number of the card) is the amount that you would need to add to get a multiple of 10 (Modulo 10)
  ********************************************************************************************************************************************/
  char[] charArray;
  int[] intArray;
  int lastValue;
  int sum = 0;
  int checkDigit = 0;  
 
  // Convert Credit Card Number to an integer Array
  charArray = CCNumber.ToCharArray();
  intArray = Array.ConvertAll(charArray, value => (int)Char.GetNumericValue(value));
 
  // Store the last value of the CC, this will be the check bit.
  lastValue = intArray[intArray.Length - 1];
 
  // Step 1 drop the last digit from the array
  Array.Resize(ref intArray, intArray.Length - 1);
  // Step 2 Reverse the array
  Array.Reverse(intArray);
  // Step 3 Double the odd array entries
  for (int i = 0; i < intArray.Length; i++)
  {
     if (i % 2 == 0)
     { 
        intArray[i] *= 2;
        if (intArray[i] > 9) intArray[i] -= 9;
     }
     // Step 4 Sum the array values.
     sum += intArray[i];
  }
  // Steps 4 and 5 multiple sum by 9 and get the last digit by using modulus 10.
  checkDigit = (sum * 9) % 10;
  return checkDigit == lastValue;
}

Credit Card Types

There are more credit card types, and rules, than what we are going to look at in this blog post. For simplicity we are going to use American Express, Visa, and Master card and only look the beginning numbers of the credit card to determine the card type. In the table below are the rules that we will use for this blog post

Credit Card Issuer Starts With
American Express 34, 37
Visa 4
MasterCard 51, 52, 53, 54, 55

Credit Card Validation SSIS package

The Credit Card Validation SSIS package will open a flat text file in the data flow as its source, process the flat text file through a script component transformation and depending on the results of the LUHN validation and credit card number rules, we will send to one of four outputs. Each of these outputs will be written to flat text files for persistence.

Let’s start by creating a new SSIS package and name it CCValidation. We will use five flat text file connection to either read or write data to. The following flat text file connection will be created to support the file processing.

CCValid_ConnectionManagers

Each of the flat file connections are setup the same, except for the file name. The file name will be unique to each file.

connection Name Path
CCValidationData \\localhost\Development\SSIS\Files\CCValidation\CCValidationData.txt
InvalidCCData \\localhost\Development\SSIS\Files\CCValidation\CCValidationInvalid.txt
ValidAmex \\localhost\Development\SSIS\Files\CCValidation\CCValidationAmex.txt
ValidMC \\localhost\Development\SSIS\Files\CCValidation\CCValidationMC.txt
ValidVisa \\localhost\Development\SSIS\Files\CCValidation\ CCValidationVisa.txt

Flat File Connections

In the general section of the flat file connection manager editor we will use a file name from the table above. Our files have header information, so we will need to check the “Columns names in the first data row” option.

CCValid_FileGeneral

In the columns section of the flat file connection manager editor we will select the Vertical Bar {|} value for the Column delimiter drop down list, because the flat text file is pipe delimited.

CCValid_FileColumns

In the Advanced section of the flat file connection manager editor we will need to adjust each of the columns to match the data size and type for the corresponding database column.

CCValid_FileAdvanced

Package Tasks

DFT – Process CC Data File

The Credit Card Validation SSIS package will use a single Data Flow task to validate the credit card data. This Data Flow task “DFT – Process CC Data File” implements the credit card validation process, using a Flat File Source, Script Component and flat text file destinations.

CCValid_SSISControlFlow

Data Flows

DFT – Process CC Data File

This data flow implements a total of six components a Flat File Source for the source of the credit cards, a Script Component for splitting the data flows based on credit card logic, and four flat file text file destinations for persisting the resulting output from the script component.

CCValid_SSISDataFlow

FF_SRC – Open the CC Validate Data File

This data source will define the connection manager and the output columns from the flat text file into a data stream. This data stream will provide the input to the script component. On the connection manager section select the CCValidataionData flat file connection manager.

CCValid_FlatFileSource_ConnectionManager

On the Columns section, we will output all three columns; FirstName, LastName, and CCNumber. As a best practice, if you are not going to use one of the columns in the data flow, its best to deselect it hear. It will takes up less space on your data stream, increase the performance of your package, and prevent warnings that would be written to log files.

CCValid_FlatFileSource_Columns

SCR – Test Each CC number Determine the type of card and if its valid

This Script Component will be used to validate that credit card data is valid by implementing the LUHN algorithm. We will also create data flows for each different credit card type that we find; American Express, Master Card, and Visa. When we first drop the Script Component to the Data Flow window the “Select Script Component Type” window will appear, select the Transformation.

CCValid_ScriptComponent_Type

 

 

 

 

 

 

 

 

 

 

 

 

Typically declaring the Read and Write variables would be done on the Script section of the Script Transformation Editor. However, in this Script Component we will not be using any variables.

CCValid_ScriptComponent_Script

Before we click on the Edit Script… button, we need to define out inputs and outputs. On the Input Columns section, we are going to select all three columns (FirstName, LastName, CCNumber) to be included in the data flow. Notice the Input Name (Input 0), this will become an object type that we can use in the script.

CCValid_ScriptComponent_Inputs

In the Inputs and Outputs section, will define the input and the four outputs MC, Invalid, Visa and Amex. We could change the name of the input in Common Properties for the Input 0, however for this package will leave the default name of Input 0. We will select the Add Output button for each of the four outputs and name them. These outputs will become buffer objects in the data flow.

CCValid_ScriptComponent_Outputs

On each of the outputs we will add three columns; FirstName, LastName, and CCNumber. Each of the columns that we add, will become properties in the buffer object in the data flow. Using the Common Properties we will define the types for each of the columns. For this Script Component all columns will be 50 character strings.

CCValid_ScriptComponent_Outputs2

We will now define the code that will implement out validation logic as well as separate the different types of credit cards. Here is the Input0_ProcessInputRow method. This method will be called one time for each row that is found in the flat text file. It will take the contents of the row and put them in the Row variable. Using the Row variable we can access the FirstName, LastName, and CCNumber properties. Each of our outputs can be accessed through buffer objects that are created when we defined out outputs. Amex == AmexBuffer, Visa == VisaBuffer, MC ==MCBuffer and Invalid = InvalidBuffer. Using these buffer objects we can add rows and give values to either FirstName, LastName, or CCNumber.

public override void Input0_ProcessInputRow(Input0Buffer Row)
{
  if (IsLUHN(Row.CCNumber))
  {
    /*
       Credit Card Issuer	Starts With
       American Express	34, 37
       Write to the Amex Buffer and write to the Valid Amex File.
    */
    if ((Row.CCNumber.StartsWith("34")) || (Row.CCNumber.StartsWith("37")))
    {
      AmexBuffer.AddRow();
      AmexBuffer.CCNumber = Row.CCNumber;
      AmexBuffer.FirstName = Row.FirstName;
      AmexBuffer.LastName = Row.LastName;
    }
    /*
      Credit Card Issuer	Starts With
      Visa	4
      Write to the Visa Buffer and write to the Valid Visa File.
    */
    else if (Row.CCNumber.StartsWith("4"))
    {
      VisaBuffer.AddRow();
      VisaBuffer.CCNumber = Row.CCNumber;
      VisaBuffer.FirstName = Row.FirstName;
      VisaBuffer.LastName = Row.LastName;
    }
    /*
      Credit Card Issuer	Starts With
      MasterCard	51, 52, 53, 54, 55 
      Write to the MC Buffer and write to the Valid MC File.
    */
    else if ((Row.CCNumber.StartsWith("51")) || (Row.CCNumber.StartsWith("52")) || (Row.CCNumber.StartsWith("53")) ||
     (Row.CCNumber.StartsWith("54")) || (Row.CCNumber.StartsWith("54")))
    {
      MCBuffer.AddRow();
      MCBuffer.CCNumber = Row.CCNumber;
      MCBuffer.FirstName = Row.FirstName;
      MCBuffer.LastName = Row.LastName;
    }
    // CC Number is not Amex, MC, or Visa so move to Invalid buffer and write to Invalid File.
    else
    {
      InvalidBuffer.AddRow();
      InvalidBuffer.CCNumber = Row.CCNumber;
      InvalidBuffer.FirstName = Row.FirstName;
      InvalidBuffer.LastName = Row.LastName;
    }
  } 
  // CC Number did not pass the LUHN Algorithm add to the Invalid Flow and write to the Invalid File.
  else
  {
    InvalidBuffer.AddRow();
    InvalidBuffer.CCNumber = Row.CCNumber;
    InvalidBuffer.FirstName = Row.FirstName;
    InvalidBuffer.LastName = Row.LastName;
  }
}

Looking at the American Express section of code the IF statement looks to see of the incoming credit card number, accessed through the Row variable, starts with either 34 or 37. If the credit card starts with either 34 or 37, we will call the AddRow() method on our buffer object. This will create a new row on our data flow that we can add values to. All we have to do now is set the FirstName, LastName and CCNumber properties in our buffer object, in this case AmexBuffer, to the respective values in the Row Variable. The other three inputs all follow this same logic.

/* 
   Credit Card Issuer	Starts With
   American Express	34, 37
   Write to the Amex Buffer and write to the Valid Amex File.
*/
if ((Row.CCNumber.StartsWith("34")) || (Row.CCNumber.StartsWith("37")))
{
  AmexBuffer.AddRow();
  AmexBuffer.CCNumber = Row.CCNumber;
  AmexBuffer.FirstName = Row.FirstName;
  AmexBuffer.LastName = Row.LastName;
}

FF_DST – Output valid Visa data to file
FF_DST – Output valid Master Card data to file
FF_DST – Output valid American Express data to file
FF_DST – Output invalid data to file

Each of these Flat File Destinations will define the details of the destination of our data into the flat text files and which data stream columns will be mapped to which flat text file columns. When we map each output from the script component it should correlate to the corresponding file type. Being that all of the Flat File Destinations are equivalent we will only look at a single one. In the Connection Manager section, we will select the appropriate Flat File Connection Manager.

CCValid_FlatFileDesination_ConnectionManager

On the Mappings section, if the Flat File Destination can match the names, lines will be drawn connecting the two columns. If any columns are not automatically mapped, we will either need to drag and drop the column on the corresponding columns that should be mapped, or select the corresponding column value from the Input Column drop down list.

CCValid_FlatFileDesination_Mapping

Execution

After the package is executed the credit card data will be loaded into one of four flat text files. Depending on if the credit card data is valid, invalid, or a specific type the file details will be into the respective file type. There you have it, a credit card validation SSIS package.

Simple SSIS: Date Formats

One feature that I would like to see added to SSIS is the ability to format a date string with a specified format. I have been tasked many times with outputting file names or folders with a specific date format, such as YYYYMMDD, and found myself writing custom functionality to accomplish this task. This is not difficult in T-SQL, but in SSIS it’s not all that easy. In this blog post I am going to show a few ways that we can format a date string into a specific format. I will show how to use an Expression task, SQL task, and a Script task.

DateFormat_SSISPackage

Expression Task

EXP – Format Date as YYYMMDD
The expression task is a new features with 2012 which allows us to store the result of an expression in a variable. Basically we create an expression using the Expression Builder window and store the result into a SSIS variable. In the windows below you can see an example of the expression being set to the SSISDateFormat variable.

@[User::SSISDateFormat] =  
               (DT_WSTR,4)YEAR(GETDATE())  
+ RIGHT ( "0" + (DT_WSTR,2)MONTH(GETDATE()),2) 
+ RIGHT ( "0" + (DT_WSTR,2)DAY(GETDATE()),2)

DateFormat_ExpressionBuilder

 

 

 

 

 

 

 

 

 

 

 

 

 

SQL Task

SQL – Format Date as YYYYMMDD
The SQL Task is a commonly used task in SSIS. This task allows us to execute queries against a data source and return a full result set, a single row, or nothing at all. We also have the opportunity to pass variables into the query as well. However, for this example we will not pass anything in, and only return back a single row. Using T-SQL’s CONVERT function it’s very easy to format a date string. We will use this CONVERT function and return the single row and store the results into a SSIS Variable. The SQL syntax that we will use is:

SELECT SQLDateFormat = CONVERT(NVARCHAR(8),GETDATE(),112);

Here is a link to additional formats for the CONVERT function. https://technet.microsoft.com/en-us/library/ms187928.aspx

DateFormat_SQLTask

 

 

 

 

 

 

 

 

 

 

 

 

After the SQL Task executes the result will be stored in the SQLDateFormat SSIS variable, which is configured in the window below.

DateFormat_SQLTaskVariables

 

 

 

 

 

 

 

 

 

 

 

 

Script Task

SCR – Format Date YYYYMMDD
The script task is one of my favorite tasks, mainly because all of the additional functionality that I have access too, but it also gives me an opportunity to play with C#.net code. By passing the SSIS variable ScriptDateFormat into the ReadWriteVariables box, we can access that variable through code. In programming languages like C#.net it’s very easy to format a date string. There is a ToString() method that provides many date formats by using specific characters. When we click on the Edit Script button the Visual Studio for Apps design time environment appears. We can access the ScriptDateFormat variable through the Dts.Variables array and save string values to it. The result of C#.net DateTime.Now.ToString(“yyyyMMdd”) method call will give us a formatted date, this formatted date we can save to our Dts.Variables array. To get a date in T-SQL we use the GETDATE and in C#.net we use DateTime.Now. So the entire line of code would look like this.

Dts.Variables["ScriptDateFormat"].Value = DateTime.Now.ToString("yyyyMMdd");

Here is a link to MSDN page that has the format specifiers.
https://msdn.microsoft.com/en-us/library/8kb3ddd4.aspx

DateFormat_ScriptTask

 

 

 

 

 

 

 

 

 

 

 

 

Summary

So which is the best way? I personally prefer the Script task, mainly because it’s clean and simple. The Expression Task is a complicated expression, and the SQL Task involves an unnecessary round trip to the server to get a formatted date string. With that said, all three options work and I am sure there are other ways as well.

Winner Winner Chicken Dinner

DaveKrisVegasSign

“Winner Winner Chicken Dinner”, and “Vegas, Vegas, Vegas” are quotes from two of my favorite Las Vegas movies. It’s no secret that my wife Kristin and I love Las Vegas, it’s also no secret that I love coming up with ingenious database solutions to random problems. So I was thinking, what it would take to create a simple 3 wheel slot machine in SQL Server. Is SQL Server the right place to put slot machine business logic? Ummmm, probably not, but that has never stopped me before. In this blog post I am going to build a 3 wheel slot simulation using small data set that represents the possible combinations that could appear and a random selection process to determine a pull outcome. I will also add a small table to track the results of each pull including a running balance in a players table.

Slot machine details

SlotA standard 3 wheel slot with 20 stopping position yields 8000 possible result combinations (203 = 8000). If there is only one jackpot symbol on each of the wheels, this means there is a 1 in 8000 chance of hitting the jackpot. If there are 5 empty spaces per wheel, there are 125 possibilities to hit all empty spaces. The simple math is multiply the count of a specific symbol on each wheel. Example, if there are five single bars on wheel number one, two 7’s on wheel number two, and one Wild on wheel three, thus 5 X 2 X 1 hence there are 10 opportunities for this (B-7-W) combination to appear. The total number of distinct combinations can be found by multiplying the count of different possible symbols per wheel. We have 6 different symbols on each wheel, thus 6 X 6 X 6 = 216 possibilities.

Slot machine data set

Armed with the combination logic we can build out a dataset for all 8000 possible combinations. Using an opportunity field, we can store the number of opportunities that each specific combination can appear. See the combinations table for full data set and opportunity count. Combining the opportunity field with a numbers (virtual-auxiliary-table-numbers) table, thanks @ItzikBenGan, we can build out all 8000 combinations. Once we have all 8000 possible combinations we can randomly select a single combination.

Slot machine SQL logic

For this simulation we will need a few tables to store the combinations, the results log for our players, and a table for our players. The combinations table will store symbol for each column, a winning amount for the combination, and the number of opportunities the combination will appear. The results log will store the resulting combination, the winning amount, the player, and a timestamp the slot machine was played. The players table will store the players name and the available funds for playing the slot machine.

Table definitions

Combinations Table

CREATE TABLE Combinations (
  CombinationID INT PRIMARY KEY IDENTITY(1,1) NOT NULL
, Reel1 VARCHAR(5) NOT NULL
, Reel2 VARCHAR(5) NOT NULL
, Reel3 VARCHAR(5) NOT NULL
, WinningAmount INT NOT NULL
, Opportunities INT NOT NULL
) ON  [PRIMARY];

ResultsLog Table

CREATE TABLE ResultsLog (
ResultsLogID INT PRIMARY KEY IDENTITY(1,1) NOT NULL
, PlayerID INT NOT NULL 
, Reel1 VARCHAR(5) NOT NULL
, Reel2 VARCHAR(5) NOT NULL
, Reel3 VARCHAR(5) NOT NULL
, WinningAmount INT NOT NULL
, ResultDate DATETIME NOT NULL
);

Player Table

CREATE TABLE Player (
PlayerID INT PRIMARY KEY IDENTITY(1,1) NOT NULL, 
, Name VARCHAR(100) NOT NULL
, AvailableFunds INT NOT NULL
);

Building the Combinations Dataset

Using a CROSS APPLY between the combinations table data and Itzik Ben-Gan’s numbers common table expression will yield 8000 total possible combinations.

WITH L0 AS (SELECT 1 AS C UNION ALL SELECT 1 AS O) , -- 2 rows
     L1 AS (SELECT 1 AS C FROM L0 AS A CROSS JOIN L0 AS B), --4 rows
     L2 AS (SELECT 1 AS C FROM L1 AS A CROSS JOIN L1 AS B),-- 16 rows
     L3 AS (SELECT 1 AS C FROM L2 AS A CROSS JOIN L2 AS B), -- 256 rows
     Nums AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS Number FROM L3)
SELECT  Reel1
        , Reel2
        , Reel3
        , WinningAmount
FROM    Combinations cbn
CROSS APPLY Nums nms
WHERE   nms.Number <= cbn.Opportunities;

Selecting a winning combination

Using TSQL’s random function we can select a random winning combination. The RAND function takes a seed value to start the randomization process, if the same seed is provided, the same random value will be returned. So we will use the NEWID and CHECKSUM functions to gather a random seed value. To get a value between 0 – 7999, we can multiply the RAND function result with 8000 and cast as an integer. Adding one gives us a random value between 1 and 8000.

SELECT CAST(RAND(CHECKSUM(NEWID())) * 8000 AS INT) + 1

Adding a bit of SQL ingenuity we can order the combinations dataset by the random value function. This will completely randomize the combinations dataset. All that’s left to do is select a single record, say the first record.

WITH L0 AS (SELECT 1 AS C UNION ALL SELECT 1 AS O) , -- 2 rows
     L1 AS (SELECT 1 AS C FROM L0 AS A CROSS JOIN L0 AS B), --4 rows
     L2 AS (SELECT 1 AS C FROM L1 AS A CROSS JOIN L1 AS B),-- 16 rows
     L3 AS (SELECT 1 AS C FROM L2 AS A CROSS JOIN L2 AS B), -- 256 rows
     Nums AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS Number FROM L3)
SELECT TOP 1
        Reel1
      , Reel2
      , Reel3
      , WinningAmount
FROM    Combinations cbn
CROSS APPLY Nums nms
WHERE   nms.Number <= cbn.Opportunities
ORDER BY CAST(RAND(CHECKSUM(NEWID())) * 8000 AS INT) + 1;

Time to play the slot

Running the query 5 times yielded the following results.

Reel1 Reel2 Reel3 Winning Amount
_ BB B 0
BBB 7 BB 0
B B _ 0
B W _ 0
7 BBB _ 0

Hmmmm… I didn’t win a single time… I think I successfully created a slot machine.

Log the results

All that’s left is to save the results of each spin and update the players available funds. A simple insert into the Results log, including the player, reels combination, winning amount and time-stamp will handle the logging side.

DECLARE @Reel1 NVARCHAR(5);
DECLARE @Reel2 NVARCHAR(5);
DECLARE @Reel3 NVARCHAR(5);
DECLARE @WinningAmount INT;
 
WITH L0 AS (SELECT 1 AS C UNION ALL SELECT 1 AS O) , -- 2 rows
     L1 AS (SELECT 1 AS C FROM L0 AS A CROSS JOIN L0 AS B), --4 rows
     L2 AS (SELECT 1 AS C FROM L1 AS A CROSS JOIN L1 AS B),-- 16 rows
     L3 AS (SELECT 1 AS C FROM L2 AS A CROSS JOIN L2 AS B), -- 256 rows
     Nums AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS Number FROM L3)
SELECT TOP 1
        @Reel1 = Reel1
      , @Reel1 = Reel2
      , @Reel1 = Reel3
      , @WinningAmount = WinningAmount
FROM    Combinations cbn
CROSS APPLY Nums nms
WHERE   nms.Number <= cbn.Opportunities
ORDER BY CAST(RAND(CHECKSUM(NEWID())) * 8000 AS INT) + 1;
 
INSERT INTO dbo.ResultsLog (PlayerID, Reel1, Reel2, Reel3, WinningAmount, ResultDate)
VALUES  (
  	 529
	,@Reel1
	,@Reel2
	,@Reel3
	,@WinningAmount
	,GETDATE()
);

A simple update will handle the players available funds.

UPDATE  Player
SET     AvailableFunds = AvailableFunds + @WinningAmount - 1
WHERE   PlayerID = 529;

Summary

Obviously this is a simplified implementation of a 3 reel slot machine but it meets the statistical opportunities for all combinations based on the number of symbols per reel. So using some ingenious ideas, a numbers table from Itzik Ben-Gan, and standard T-SQL functionality, we have a functional SQL slot machine that logs its results and keeps players hard earned virtual money. All that’s left to do is book a trip to Las Vegas…

DONE!combacksoon

 

SSIS Find Non-Imported Files

Traversing a folder looking for files to import into a database table is a common practice using integration services. What if you have little control over the import folder or files because of security restrictions? What if there are many files stored in the folder and number of files grows daily? One way to deal with this is using a Foreach Loop Container, call to a database table or log and determine if the file had previously been imported. This may be a sufficient solution if the number of files is small, however if the number of files is larger this could have a big impact on database performance and/or package performance. A better approach may be to use a Script Task. In this blog post I will show how we can use a Script Task with some simple C# code to capture a list of files, remove processed files, and use a Foreach Loop Container to traverse the new files.

File Import Table

Before we get into creating the Integration Services package, we will need a FileImport table to keep a list of all files that have already been imported.

CREATE TABLE dbo.FileImport(
     FileImportID INT IDENTITY(1, 1) NOT NULL
   , FileName NVARCHAR(1000) NOT NULL
   , ImportDate DATETIME NULL
   , RecordCount INT NULL
   , ErrorMessage NVARCHAR(MAX) NULL   
) ON  [PRIMARY];

New File Import SSIS package

The New File Import SSIS package will query the FileImport table to capture a list of previously imported files as well as capturing a list of available files in the import folder. Using some code we will eliminate files that have already been imported and then loop over this new set of files that have not been imported.

Let’s start by creating a new SSIS Package and named it NewFileImport. We will use an OLE DB connection to query the FileImport table and a few package variables to store processing data. The following package variables will be created for the associated purposes.

Trav_SSIS Variables

Variable Definitions

  • The FileName variable will be used store each of the new files as we traverse the new files using a Foreach Loop Container.
  • The ImportedFiles variable will be used to store the previously imported files that is received from FileImport Table
  • The ImportedPath variable is the path where the files to be imported are stored.
  • The NewFiles variable is a list of files that have not been imported into the system yet.

Package Tasks

The package will use a SQL Task, Script Task and Foreach Loop Container.Trav_SSIS Tasks

SQL – Get a list of all processed files

This Task will capture a list of all the files that have been previously imported files and will store them into the ImportedFiles variable. The ResultSet should be set to “Full result set” and a simple query to the FileImport table “SELECT FileName FROM dbo.FileImport;” should be saved in the SQLStatement.

Trav_SQLTaskGeneral

The Result Set should be saved to the User::ImportedFiles variable.

Trav_SQLTaskResultSet

SCR – Get a list of files to Import

This task will create a data set and array list object that will store all of the file names in the FileImport Table and all the file names in the Import Path. A loop will be used to traverse the files names in FileImport object. On each pass through the loop the corresponding file name in the Import Path object will be removed. When the loop is complete a list of only new file names will be remaining in the Import Path object. This Import Path object will be stored in the NewFiles variable and used in the Foreach Loop Container. The following C# script implements this logic.

public void Main(){			
   OleDbDataAdapter da = new OleDbDataAdapter();
   DataTable dt = new DataTable();
   object ImportedFiles = Dts.Variables["ImportedFiles"].Value;
   string ImportedPath = Dts.Variables["ImportedPath"].Value.ToString();
 
   // Load the processed files from the record set into a DataTable variable.              
   da.Fill(dt, ImportedFiles);
 
   // Load the files to be imported from the directory to be imported into an 
   // ArrayList variable.                        
   ArrayList ImportDataFiles = new ArrayList(Directory.GetFiles(ImportedPath));
 
   // Traverse the DataTable and remove any matching files from the ArrayList.
   foreach (DataRow dr in dt.Rows) {
      ImportDataFiles.Remove(dr[0].ToString());
   }
 
   // Return the ArrayList which will only have new files.
   Dts.Variables["NewFiles"].Value = ImportDataFiles;
 
   Dts.TaskResult = (int)ScriptResults.Success;
}

The ImportedFiels, ImportPath and NewFiles are used in the script, so we will need to declare their use in the script.

Trav_ScriptTaskScript

FELC – Loop through all files

This Foreach Loop Container will loop through all of the files that are stored in the NewFiles variable. The enumerator type that will be used is the “Foreach From Variable Enumerator”. The selected Variable should be User::NewFiles.

Trav_FELCCollection

The Variable Mapping should store the result of each loop into the User::FileName variable.

Trav_FELCVariableMapping

On each iteration of the Foreach Loop Container, the full path to the file, from the NewFiles variable will be stored into the FileName variable. Having the path to the file will help in the file import process inside the Data Flow Task. The details of importing file data can be found in this Simple SSIS: Importing Data from Flat Text Files blog post.

Simple SSIS: Traversing Files in a Directory

There are many common daily tasks that (SSIS) SQL Server Integration Services implements with minimal effort. These tasks can be done by adding a few SSIS tasks, data flows, and containers. In this blog series I am going to explore some of these simple tasks and how I go about implementing them. Looping through a directory of files definitely qualifies as simple task using SSIS. In this blog post I will show how to traverse a set of files and log the files found using SSIS. This blog post builds on a previous Simple SSIS blog post, Importing Data from Flat Text Files

Prerequisites

We will need a database table called FileImport, which will store the file details, for the File Loop SSIS package.

FileImport

The File Import table is the historical logging table. This table contains all of the file details, such as start and end time, if it was imported of if an error occurred, as well as a path to the file.

CREATE TABLE dbo.FileImport (
     FileImportID INT IDENTITY(1, 1) NOT NULL
   , FileName NVARCHAR(1000) NOT NULL
   , ImportStartDate DATETIME NULL
   , ImportEndDate DATETIME NULL
   , RecordCount INT NULL
   , Imported BIT NULL
   , ErrorMessage NVARCHAR(MAX) NULL
   , CONSTRAINT PK_FileImport PRIMARY KEY CLUSTERED (FileImportID ASC)
) ON  [PRIMARY];

File Loop SSIS package

The File Loop SSIS package will traverse a specified folder capturing the path of each file stored in the folder. For each file, we will write some basic file details to a log table and import data from the file into a file details table. The details of the file import is left to a previous Simple SSIS blog post.

Let’s start by creating a new SSIS Package and renaming it to FileLoop.dtsx. The package will need a couple variables which are defined below.

FileLoop_SSIS_Variables

Variable Definitions

  • The ImportFolder is the folder where all of the files to be imported are stored.
  • The ImportPath is the actual path to each file. This will be updated on each iteration of the foreach loop container.

Package Tasks

The File Loop package will use a Foreach Loop Container, an Execute SQL Task, and a Data Flow Task.

FileLoop_SSIS_Tasks

FELC – Traverse the Import Path

This Foreach Loop Container will traverse the list of files that are stored in the designated folder. Using an expression we can dynamically set the Directory/Folder at runtime. For this import we are only looking for files that start with FileData and have a .txt extension. We will want the fully qualified path to the file and do not want to traverse the sub folders.

FileLoop_FELCCollection

On each iteration of the loop we will save the path to each of the files in the ImportPath SSIS variable.

FileLoop_FELCVarMapping

SQL – Insert the current file information into FileImport table

This Execute SQL Task will create a new entry into the FileImport Table and return the newly inserted identity value. We will need to set the Result Set to Single Row to capture the identity returned from the INSERT statement. Select the Connection type of OLE DB and the previously setup Connection. We will use a simple SQL Statement to Insert The file name and start date and return the identity.

INSERT  INTO dbo.FileImport (FileName, ImportStartDate)
VALUES  (?, GETDATE());
SELECT FileImportID = @@IDENTITY;

FileLoop_SQLTaskInsGeneral

We are only using a single input parameter so we will need to add and map the ImportPath variable. Select the Data Type as NVARCHAR, Parameter Name = 0, and Parameter Size = 1000.
NOTE: The Parameter Names are a numerical ordering because we are using OLE DB Connections. For other types of connection types see Parameters and Return Codes in the Execute SQL Task, https://msdn.microsoft.com/en-us/library/cc280502.aspx

FileLoop_SQLTaskInsMapping

DFT – Import File data into the File Details staging table

This Data Flow Task will be used to import each of the files into a staging table. In another Simple SSIS blog post the details for the file import process is documented.

Execution

After executing the File Loop SSIS package, all of the file paths that match the pattern FileData*.txt will have been placed in a temporary SSIS string variable, which can be used for other operations such as file imports. In my next Simple SSIS blog post I will review importing data from pipe delimited flat text files.