Map a DataTable Object as a SQLCommand Parameter

Background

I wrangle data from several environments in an inventory project. One of the newest environments is a SQL database that contains a ticketing system’s data. Rather than use a foreach() method that “upserts” and deletes SQL items Row-By-Arduous-Row, I sought a new strategy that utilized SQL’s powerful JOIN operators. This approach has played out extremely well on the enterprise level, and I want to share what I’ve learned about the PowerShell side of the operation. Comments and questions are encouraged—iron sharpens iron.

Insights

The goal in the code is to map a DataTable object to a single SQLCommand parameter. It is achieved in the following line.

$manageTargetTable.Parameters.Add([SqlParameter]::new("@dt",[SqlDbType].Structured)).Value = $SalesOrderDetails;

The @dt SQL parameter is typed as a SQL User-Defined Table Type (UDTT), which is declared as a Table-Valued Parameter (TVP) in a SQL stored procedure (a sproc). A DataTable is allowable for mapping to the TVP, which is typed by the [SqlDbType].Structured declaration. I have yet to find a suitable alternative object that can be mapped to the TVP—even a class that extends Data.Datatable. Therefore, I use one class as a blueprint for the table row, and a PowerShell function to store each instance in the DataTable. This strategy has the additional benefit of skipping the need for a SQL temporary (tempdb object) or a persistent SQL staging table.

Some Prerequisites

  • Exploration of the source data.
  • A SQL table that stores the source data in their final form.
  • SQL User-Defined Table Type.
    • The UDTT has a primary key to speed queries.
  • A SQL Stored Procedure that uses a Table-Value Parameter.
    • The sproc executes an UPDATE, INSERT, and DELETE statements with joins between the source table (the UDTT) and a target SQL table.

Demo Code

I chose a CSV file as the data source because of the familiarity with working in the format on this forum. It contains 121,317 rows from the AdventureWorks 2014 SQL database. More information is in the Remarks section.

using namespace System.Collections.Generic;
using namespace System.Data;
using namespace System.Data.SqlClient;
using namespace System.IO;
using namespace Microsoft.SqlServer.Management.Smo;

[void][reflection.assembly]::LoadWithPartialName("Microsoft.SqlServer.Smo");

Class SalesOrderDetail
{
    [string] $CarrierTrackingNumber;
    [float] $LineTotal;
    [int] $OrderQty;
    [int] $ProductID;
    [int] $SalesOrderDetailID;
    [int] $SalesOrderID;
    [int] $SpecialOfferID;
    [decimal] $UnitPrice;
    [double] $UnitPriceDiscount;
    [string] $HashID;

    SalesOrderDetail(){} # an empty default ctor.
    SalesOrderDetail([string]$trackNbr,[float]$lineTot,[int]$quantity,[int]$prodId,[int]$detailId,[int]$orderId,[int]$offerId,[decimal]$price,[decimal]$discountPrice)
    {
        $this.CarrierTrackingNumber = $trackNbr;
        $this.LineTotal = $lineTot;
        $this.OrderQty = $quantity;
        $this.ProductID = $prodId;
        $this.SalesOrderDetailID = $detailId;
        $this.SalesOrderID = $orderId;
        $this.SpecialOfferID = $offerId;
        $this.UnitPrice = $price;
        $this.UnitPriceDiscount = $discountPrice;
        $this.HashID = $this.GenerateHash(($trackNbr,$lineTot,$quantity,$price,$discountPrice)-join'');
    }

    [string] GenerateHash([string] $p1)
    {
        $paramStream = [MemoryStream]::new();
        $streamWriter = [StreamWriter]::new($paramStream);
        $streamWriter.write($p1);
        $streamWriter.Flush();
        $paramStream.Position = 0;
        return Get-FileHash -InputStream $paramStream -Algorithm MD5 | Select-Object -ExpandProperty Hash;
    }
}

$SalesOrderDetails = [DataTable]::new("SalesOrderDetails","Inventory")
    [void]$SalesOrderDetails.Columns.Add("CarrierTrackingNumber",[string]);
    [void]$SalesOrderDetails.Columns.Add("LineTotal",[float]);
    [void]$SalesOrderDetails.Columns.Add("OrderQty",[int]);
    [void]$SalesOrderDetails.Columns.Add("ProductID",[int]);
    [void]$SalesOrderDetails.Columns.Add("SalesOrderDetailID",[int]);
    [void]$SalesOrderDetails.Columns.Add("SalesOrderID",[int]);
    [void]$SalesOrderDetails.Columns.Add("SpecialOfferID",[int]);
    [void]$SalesOrderDetails.Columns.Add("UnitPrice",[decimal]);
    [void]$SalesOrderDetails.Columns.Add("UnitPriceDiscount",[decimal]);
    [void]$SalesOrderDetails.Columns.Add("HashID",[string]);
    $SalesOrderDetails.PrimaryKey = $SalesOrderDetails.Columns["SalesOrderID","SalesOrderDetailID"];

    function Add-SalesOrderDetail([SalesOrderDetail] $item)
    {
        $row = $SalesOrderDetails.NewRow()
        foreach($i in $item)
        {
            $row.CarrierTrackingNumber = $item.CarrierTrackingNumber;
            $row.LineTotal = $item.LineTotal;
            $row.OrderQty = $item.OrderQty;
            $row.ProductID = $item.ProductID;
            $row.SalesOrderID = $item.SalesOrderID;
            $row.SalesOrderDetailID = $item.SalesOrderDetailID;
            $row.SpecialOfferID = $item.SpecialOfferID;
            $row.UnitPrice = $item.UnitPrice;
            $row.UnitPriceDiscount = $item.UnitPriceDiscount;
            $row.HashID = $item.HashID;
        }
        $SalesOrderDetails.Rows.Add($row)
    }

$source = (Import-Csv A:\SalesOrderDetail.csv -Header 'SalesOrderID','SalesOrderDetailID','CarrierTrackingNumber','OrderQty','ProductID','SpecialOfferID','UnitPrice','UnitPriceDiscount','LineTotal');

foreach($i in $source)
{
    $order = [SalesOrderDetail]::new($i.CarrierTrackingNumber,$i.LineTotal,$i.OrderQty,$i.ProductID,$i.SalesOrderDetailID,$i.SalesOrderID,$i.SpecialOfferID,$i.UnitPrice,$i.UnitPriceDiscount)
    Add-SalesOrderDetail($order);
}

[Server]$targetServer = "MySQLServerName";
[Database]$targetDatabase = $targetServer.Databases.Item("MySQLDatabase");

$targetConnection = [SQLConnection]::new("Server=$($targetServer.Name); Database=$($targetDatabase.Name); Integrated Security=true");

$targetConnection.Open()
    $manageTargetTable = [SqlCommand]::new("Inventory.ManageSalesOrderDetails",$targetConnection);
    $manageTargetTable.CommandType = [CommandType]::StoredProcedure;
    $manageTargetTable.Parameters.Add([SqlParameter]::new("@dt",[SqlDbType].Structured)).Value = $SalesOrderDetails;
    [void]$manageTargetTable.ExecuteNonQuery(); #void b/c my sql sprocs return a 0 when successful. 
$targetConnection.Close()
return

Results

The result set is the first 3 rows for each UnitPriceDiscount by RowModifiedDate. The data are unmodified from the AdventureWorks data except that I use decimal(9,3) on the two Unit~ elements. I withheld some rows from the initial insert to provide different RowModifiedDate values—some of which are returned—later.

Remarks

  • Data exploration is “homework” that allows for better modeling and architectures downstream.
  • I put most elements in alphabetical order to keep track of them as I move system-to-system and language-to-language.
  • The primary key on the DataTable is used for source data validation.
  • HashID is used in the sproc to determine if one or more changes have occurred in a subset of the tuple members.
  • (SQL results) RowModifiedDate is SYSUTCDATETIME() by default.
  • (SQL results) I do not use the money SQL datatype; I format the decimal values to currency (and also percentages) in SQL views after calculations or aggregations.

See Also

The fastest method I’ve found to get data into SQL is using a BULK insert into a staging table and then execute a MERGE sql script\stored proc:

Loading Data With Powershell – SQLServerCentral

SQL Server MERGE to insert, update and delete at the same time (mssqltips.com)

Another benefit of this is you can capture deltas between the STAGING and PROD table to track changes in the data for audit. The slowest part is the capture of data and transforming to a datatable, but once you do that you could do millions of rows in couple of minutes.

Many thanks, @rob-simmers for the reply!

I don’t have the actual time stat captures from the SQL server side, but the TVP Upsert/ Delete is sub-Second on the enterprise servers for ~90K rows (from watching a debugger). It would be interesting to see numbers on if this method outperforms the overall Bulk Insert/ Staging/ Prod method through the streamlining of the overall ETL.

I currently use PowerShell for bulk inserts of Active Directory User and Group data (we have a massive forest). Data manipulation is done prior to the insert, allowing me to skip the transforms downstream. It’s pretty much a lock for those types of data.

Thanks again! And thanks for the links. Our DBA team has kittens anytime they see a MERGE statement go by because of the bad code pushed by some folks. PowerShell has allowed me to sneak MERGE by them that even included HASHBYTES().

-d

Pulling out of a SQL conversation in a PowerShell forum :parachute: