Is there a way to bulkcopy without sorting the data?

Given the following code, how can i achieve something like this:

pseudocode:

$sqlBulkCopy.WriteToServer($CSVDataTable)| No-sort

current code:

$CSVDataTable = Import-Csv $csvFile | Out-DataTable

# Build the sqlbulkcopy connection, and set the timeout to infinite
$sqlBulkCopy = New-Object ("Data.SqlClient.SqlBulkCopy") -ArgumentList $SqlConnection
$sqlBulkCopy.DestinationTableName = "$schemaName.[$csvFileBaseName]"
$sqlBulkCopy.bulkcopyTimeout = 0
$sqlBulkCopy.batchsize = 50000
$sqlBulkCopy.DestinationTableName = "$schemaName.[$csvFileBaseName]"

foreach ($column in $CSVDataTable.Columns) { 
    $sqlBulkCopy.ColumnMappings.Add($column.ColumnName, $column.ColumnName) > $null 
}

$sqlBulkCopy.WriteToServer($CSVDataTable)

Why do i want to do this?

we are bulk copying data from a csv file to sql table using code below. it works great, but one suggestion we received is to retain the order of the source data, i.e. do not sort it, because they connect eventually to a database in excel that has this data as is from the sql table, and its sorted currently which they wish its not. So the question is: how can we bulkcopy the data into the sql table in the order it is in the csv file, i.e. without sorting?

csv file (unsorted):

Report Year
2013
2014
2015
2016
2017
2018
2021
2022
2023
2024
2025
2026
2027
2028
2029
2019 0+12
2019 12+0
2019 8+4
2020 0+12
2020 1+11
BP19
BP20

sql table after bulkcopy (sorted):

sql table

of course, we have an option to add a column to the csv files that lists numbers sequentially, which would retain the order of the rest of the columns, for example:

sequence Report Year_sfx
1 2013
2 2014
3 2015
4 2016
5 2017
6 2018
7 2021
8 2022
9 2023
10 2024
11 2025
12 2026
13 2027
14 2028
15 2029
16 2019 0+12
17 2019 12+0
18 2019 8+4
19 2020 0+12
20 2020 1+11
21 BP19
22 BP20
but i prefer not having to do this because i have many csv files i'd have to do this for...

Good question. It looks like if you’re using bulk copy, there is no guarantee. This answer suggests you add an order column, which you might be able to programmatically while importing the CSV. Hopefully someone else has already figured this out and will chime in. I’ll keep thinking about it in the meantime.

The SQL engine is controlling the bulk data import and doing what is the most efficient way to import the data. Not sure why it’s automatically sorting dates, but the goal is to get data into the database fast, so adding sorting is just extra time. Here is a similar question:

https://www.codeproject.com/Questions/841643/Bulk-Copy-Sequential-insert-into-sql

If you’re trying to retain a non-sortable order, then you should add a sortable column for the application data layer to sort it for presentation, such as the sequence number.

[quote quote=227884]Good question. It looks like if you’re using bulk copy, there is no guarantee. This answer suggests you add an order column, which you might be able to programmatically while importing the CSV. Hopefully someone else has already figured this out and will chime in. I’ll keep thinking about it in the meantime.

[/quote]
thanks Doug, i checked out the link, the accepted answer is exactly what the other option i mentioned in my post, so thats something i could consider if theres no way to retain the same order while bulkcopying. I also tagged you in a previous thread you helped me out with in creating that export from excel to csv script. I figured since the code was perfected there, might as well tag you over there about ti instead of opening a new thread completely :slight_smile:

the second answer in the link suggests creating a staging table. but in my case, wouldnt the datatable technically be this “temporary” staging like table? i tested out the import-csv, it retains the order of the csv file, then i tested out the out-datatable after the import from csv, and it also retains the csv file order. so it looks like it gets sorted at the sqlbulkcopy part…really hoping its as simple as the pseudocode i mentioned, or at least close, because that would make things sooo much easier!

[quote quote=227893]The SQL engine is controlling the bulk data import and doing what is the most efficient way to import the data. Not sure why it’s automatically sorting dates, but the goal is to get data into the database fast, so adding sorting is just extra time. Here is a similar question:

https://www.codeproject.com/Questions/841643/Bulk-Copy-Sequential-insert-into-sql

If you’re trying to retain a non-sortable order, then you should add a sortable column for the application data layer to sort it for presentation, such as the sequence number.

[/quote]
yep, thats what we’re trying to do, retain a non-sortable order, essentially keeping the same order of the csv file data.

the sequence number is definitely an option, but if its avoidable in a much easier way such as piping a non-sort functionality to the sqlbulkcopy, then that would make things much simpler :slight_smile:

All I was trying to say is you don’t have to do your order column manually.
Given this CSV file

columna,columnb,columnc
a,b,c
a,b,c
a,b,c
a,b,c
a,b,c
a,b,c

You can add orderID programmatically.

$count = 0
import-csv C:\temp\original.csv | foreach{
    $_ | Select-Object @{N='OrderID';E={$count}},*
    $count++
} | Export-Csv c:\temp\exportwithorderid.csv -NoTypeInformation
import-csv C:\temp\exportwithorderid.csv

Output:

OrderID columna columnb columnc
------- ------- ------- -------
0       a       b       c      
1       a       b       c      
2       a       b       c      
3       a       b       c      
4       a       b       c      
5       a       b       c   

And if you would rather the orderID column be on the end, just switch the position of the properties.

Select-Object *,@{N='OrderID';E={$count}}

[quote quote=227923]All I was trying to say is you don’t have to do your order column manually.

Given this CSV file

PowerShell
8 lines
<textarea class="ace_text-input" style="opacity: 0; height: 17.9048px; width: 7.20119px; left: 45px; top: 0px;" spellcheck="false" wrap="off"></textarea>
1
2
3
4
5
6
7
8
columna,columnb,columnc
a,b,c
a,b,c
a,b,c
a,b,c
a,b,c
a,b,c
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
You can add orderID programmatically.
PowerShell
7 lines
<textarea class="ace_text-input" style="opacity: 0; height: 17.9048px; width: 7.20119px; left: 45px; top: 0px;" spellcheck="false" wrap="off"></textarea>
1
2
3
4
5
6
7
$count = 0
import-csv C:\temp\original.csv | foreach{
$_ | Select-Object @{N='OrderID';E={$count}},*
$count++
} | Export-Csv c:\temp\exportwithorderid.csv -NoTypeInformation
import-csv C:\temp\exportwithorderid.csv
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
Output:
PowerShell
9 lines
<textarea class="ace_text-input" style="opacity: 0; height: 17.9048px; width: 7.20119px; left: 45px; top: 0px;" spellcheck="false" wrap="off"></textarea>
1
2
3
4
5
6
7
8
9
OrderID columna columnb columnc
------- ------- ------- -------
0 a b c
1 a b c
2 a b c
3 a b c
4 a b c
5 a b c
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
And if you would rather the orderID column be on the end, just switch the position of the properties.

<textarea class=“ace_text-input” style=“opacity: 0; height: 17.9048px; width: 7.20119px; left: 45px; top: 0px;” spellcheck=“false” wrap=“off”></textarea>

1
2
Select-Object *,@{N='OrderID';E={$count}}
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
[/quote] Ohhh that makes sense! Maybe I can add it to the import itself basically, not the csv file itself, and just pipe it directly to the out-datatable

That’s actually another good idea, I like it! Will try it out and let you know!

Another option:

$services = Get-Service | Select -First 5 -Property Name, Status

$services | Select @{Name='Index';Expression={[array]::IndexOf($services, $_)}}, Name, Status

Output:

Index Name          Status
----- ----          ------
    0 AarSvc_d253b Stopped
    1 AESMService  Running
    2 AJRouter     Stopped
    3 ALG          Stopped
    4 AppIDSvc     Stopped

[quote quote=227941]Another option:

PowerShell
4 lines
<textarea class="ace_text-input" style="opacity: 0; height: 18px; width: 6.59781px; left: 44px; top: 0px;" spellcheck="false" wrap="off"></textarea>
1
2
3
4
$services = Get-Service | Select -First 5 -Property Name, Status
$services | Select @{Name='Index';Expression={[array]::IndexOf($services, $_)}}, Name, Status
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
Output:
PowerShell
8 lines
<textarea class="ace_text-input" style="opacity: 0; height: 18px; width: 6.59781px; left: 44px; top: 0px;" spellcheck="false" wrap="off"></textarea>
1
2
3
4
5
6
7
8
Index Name Status
----- ---- ------
0 AarSvc_d253b Stopped
1 AESMService Running
2 AJRouter Stopped
3 ALG Stopped
4 AppIDSvc Stopped
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
[/quote] [quote quote=227923]All I was trying to say is you don’t have to do your order column manually.

Given this CSV file

PowerShell
8 lines
<textarea class="ace_text-input" style="opacity: 0; height: 18px; width: 6.59781px; left: 44px; top: 0px;" spellcheck="false" wrap="off"></textarea>
1
2
3
4
5
6
7
8
columna,columnb,columnc
a,b,c
a,b,c
a,b,c
a,b,c
a,b,c
a,b,c
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
You can add orderID programmatically.
PowerShell
7 lines
<textarea class="ace_text-input" style="opacity: 0; height: 18px; width: 6.59781px; left: 44px; top: 0px;" spellcheck="false" wrap="off"></textarea>
1
2
3
4
5
6
7
$count = 0
import-csv C:\temp\original.csv | foreach{
$_ | Select-Object @{N='OrderID';E={$count}},*
$count++
} | Export-Csv c:\temp\exportwithorderid.csv -NoTypeInformation
import-csv C:\temp\exportwithorderid.csv
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
Output:
PowerShell
9 lines
<textarea class="ace_text-input" style="opacity: 0; height: 18px; width: 6.59781px; left: 44px; top: 0px;" spellcheck="false" wrap="off"></textarea>
1
2
3
4
5
6
7
8
9
OrderID columna columnb columnc
------- ------- ------- -------
0 a b c
1 a b c
2 a b c
3 a b c
4 a b c
5 a b c
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
And if you would rather the orderID column be on the end, just switch the position of the properties.

<textarea class=“ace_text-input” style=“opacity: 0; height: 18px; width: 6.59781px; left: 44px; top: 0px;” spellcheck=“false” wrap=“off”></textarea>

1
2
Select-Object *,@{N='OrderID';E={$count}}
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
[/quote] ok so it turns out that because the table has a clustered index (PK) it wont work even with a sequence column, because the ordering is determined based on the Primary Key (the Report Year)

hi

Yes, but how is the data being consumed? The sequence allows you to do:

SELECT Sequence, 'Report Year'
FROM myTable
ORDER BY Sequence

Also, you should not have spaces in your headers, just asking for pain. Storing the data in the database is not the same as consuming the data in a report or application. It would be like this:

SELECT SEQ AS 'Sequence',
       REPORT_YEAR AS 'Report Year'
FROM myTable
ORDER BY Sequence

This is getting into Relational Data structure best practices, but variables and table headers should not have spaces.

[quote quote=228214]Yes, but how is the data being consumed? The sequence allows you to do:

PowerShell
4 lines
<textarea class="ace_text-input" style="opacity: 0; height: 18px; width: 6.59781px; left: 44px; top: 0px;" spellcheck="false" wrap="off"></textarea>
1
2
3
4
SELECT Sequence, 'Report Year'
FROM myTable
ORDER BY Sequence
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
Also, you should not have spaces in your headers, just asking for pain. Storing the data in the database is not the same as consuming the data in a report or application. It would be like this:
PowerShell
5 lines
<textarea class="ace_text-input" style="opacity: 0; height: 18px; width: 6.59781px; left: 44px; top: 0px;" spellcheck="false" wrap="off"></textarea>
1
2
3
4
5
SELECT SEQ AS 'Sequence',
REPORT_YEAR AS 'Report Year'
FROM myTable
ORDER BY Sequence
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
This is getting into Relational Data structure best practices, but variables and table headers should not have spaces.

[/quote]
unfortunately, the data retrieval process is outta my scope because the tables are just the datasource that feeds data into a tabular cube. the data is fed through a process known as “Process Database”, which i can’t really specify a query in since its a backend microsoft process.

the users end up connecting to the cube though in excel…maybe they can specify the query there somehow? MDX/DAX perhaps?

as for the naming convention and spacing, ya i agree, i actually built our POC version by modifying every single column name to include underscores to eliminate the undesirable characters, but because we keep getting a new file everytime and they wanted us to recreate the cube with more dimensions, i just decided to roll it out however it is in the file. theres like 80 columns and i dont have time to redo this multiple times with different column names. so meh, it is what it is lol