Is there a way to bulkcopy without sorting the data?

cataster · May 13, 2020, 12:36pm

Given the following code, how can i achieve something like this:

pseudocode:

$sqlBulkCopy.WriteToServer($CSVDataTable)| No-sort

current code:

$CSVDataTable = Import-Csv $csvFile | Out-DataTable

# Build the sqlbulkcopy connection, and set the timeout to infinite
$sqlBulkCopy = New-Object ("Data.SqlClient.SqlBulkCopy") -ArgumentList $SqlConnection
$sqlBulkCopy.DestinationTableName = "$schemaName.[$csvFileBaseName]"
$sqlBulkCopy.bulkcopyTimeout = 0
$sqlBulkCopy.batchsize = 50000
$sqlBulkCopy.DestinationTableName = "$schemaName.[$csvFileBaseName]"

foreach ($column in $CSVDataTable.Columns) { 
    $sqlBulkCopy.ColumnMappings.Add($column.ColumnName, $column.ColumnName) > $null 
}

$sqlBulkCopy.WriteToServer($CSVDataTable)

Why do i want to do this?

we are bulk copying data from a csv file to sql table using code below. it works great, but one suggestion we received is to retain the order of the source data, i.e. do not sort it, because they connect eventually to a database in excel that has this data as is from the sql table, and its sorted currently which they wish its not. So the question is: how can we bulkcopy the data into the sql table in the order it is in the csv file, i.e. without sorting?

csv file (unsorted):

Report Year
2013
2014
2015
2016
2017
2018
2021
2022
2023
2024
2025
2026
2027
2028
2029
2019 0+12
2019 12+0
2019 8+4
2020 0+12
2020 1+11
BP19
BP20

sql table after bulkcopy (sorted):

of course, we have an option to add a column to the csv files that lists numbers sequentially, which would retain the order of the rest of the columns, for example:

sequence	Report Year_sfx
1	2013
2	2014
3	2015
4	2016
5	2017
6	2018
7	2021
8	2022
9	2023
10	2024
11	2025
12	2026
13	2027
14	2028
15	2029
16	2019 0+12
17	2019 12+0
18	2019 8+4
19	2020 0+12
20	2020 1+11
21	BP19
22	BP20

but i prefer not having to do this because i have many csv files i'd have to do this for...

krzydoug · May 13, 2020, 1:14pm

Good question. It looks like if you’re using bulk copy, there is no guarantee. This answer suggests you add an order column, which you might be able to programmatically while importing the CSV. Hopefully someone else has already figured this out and will chime in. I’ll keep thinking about it in the meantime.

rob-simmers · May 13, 2020, 1:24pm

The SQL engine is controlling the bulk data import and doing what is the most efficient way to import the data. Not sure why it’s automatically sorting dates, but the goal is to get data into the database fast, so adding sorting is just extra time. Here is a similar question:

If you’re trying to retain a non-sortable order, then you should add a sortable column for the application data layer to sort it for presentation, such as the sequence number.

cataster · May 13, 2020, 1:24pm

[quote quote=227884]Good question. It looks like if you’re using bulk copy, there is no guarantee. This answer suggests you add an order column, which you might be able to programmatically while importing the CSV. Hopefully someone else has already figured this out and will chime in. I’ll keep thinking about it in the meantime.

[/quote]
thanks Doug, i checked out the link, the accepted answer is exactly what the other option i mentioned in my post, so thats something i could consider if theres no way to retain the same order while bulkcopying. I also tagged you in a previous thread you helped me out with in creating that export from excel to csv script. I figured since the code was perfected there, might as well tag you over there about ti instead of opening a new thread completely

the second answer in the link suggests creating a staging table. but in my case, wouldnt the datatable technically be this “temporary” staging like table? i tested out the import-csv, it retains the order of the csv file, then i tested out the out-datatable after the import from csv, and it also retains the csv file order. so it looks like it gets sorted at the sqlbulkcopy part…really hoping its as simple as the pseudocode i mentioned, or at least close, because that would make things sooo much easier!

cataster · May 13, 2020, 1:27pm

[quote quote=227893]The SQL engine is controlling the bulk data import and doing what is the most efficient way to import the data. Not sure why it’s automatically sorting dates, but the goal is to get data into the database fast, so adding sorting is just extra time. Here is a similar question:

https://www.codeproject.com/Questions/841643/Bulk-Copy-Sequential-insert-into-sql

If you’re trying to retain a non-sortable order, then you should add a sortable column for the application data layer to sort it for presentation, such as the sequence number.

[/quote]
yep, thats what we’re trying to do, retain a non-sortable order, essentially keeping the same order of the csv file data.

the sequence number is definitely an option, but if its avoidable in a much easier way such as piping a non-sort functionality to the sqlbulkcopy, then that would make things much simpler

krzydoug · May 13, 2020, 1:54pm

All I was trying to say is you don’t have to do your order column manually.
Given this CSV file

columna,columnb,columnc
a,b,c
a,b,c
a,b,c
a,b,c
a,b,c
a,b,c

You can add orderID programmatically.

$count = 0
import-csv C:\temp\original.csv | foreach{
    $_ | Select-Object @{N='OrderID';E={$count}},*
    $count++
} | Export-Csv c:\temp\exportwithorderid.csv -NoTypeInformation
import-csv C:\temp\exportwithorderid.csv

Output:

OrderID columna columnb columnc
------- ------- ------- -------
0       a       b       c      
1       a       b       c      
2       a       b       c      
3       a       b       c      
4       a       b       c      
5       a       b       c

And if you would rather the orderID column be on the end, just switch the position of the properties.

Select-Object *,@{N='OrderID';E={$count}}

cataster · May 13, 2020, 2:07pm

[quote quote=227923]All I was trying to say is you don’t have to do your order column manually.

Given this CSV file

PowerShell

8 lines

1

2

3

4

5

6

7

8

columna,columnb,columnc

a,b,c

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

You can add orderID programmatically.

PowerShell

7 lines

1

2

3

4

5

6

7

$count = 0

import-csv C:\temp\original.csv | foreach{

$_ | Select-Object @{N='OrderID';E={$count}},*

$count++

} | Export-Csv c:\temp\exportwithorderid.csv -NoTypeInformation

import-csv C:\temp\exportwithorderid.csv

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Output:

PowerShell

9 lines

1

2

3

4

5

6

7

8

9

OrderID columna columnb columnc

------- ------- ------- -------

0 a b c

1 a b c

2 a b c

3 a b c

4 a b c

5 a b c

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

And if you would rather the orderID column be on the end, just switch the position of the properties.

1

2

Select-Object *,@{N='OrderID';E={$count}}

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

[/quote] Ohhh that makes sense! Maybe I can add it to the import itself basically, not the csv file itself, and just pipe it directly to the out-datatable

That’s actually another good idea, I like it! Will try it out and let you know!

rob-simmers · May 13, 2020, 2:29pm

Another option:

$services = Get-Service | Select -First 5 -Property Name, Status

$services | Select @{Name='Index';Expression={[array]::IndexOf($services, $_)}}, Name, Status

Output:

Index Name          Status
----- ----          ------
    0 AarSvc_d253b Stopped
    1 AESMService  Running
    2 AJRouter     Stopped
    3 ALG          Stopped
    4 AppIDSvc     Stopped

cataster · May 14, 2020, 10:29am

[quote quote=227941]Another option:

PowerShell

4 lines

1

2

3

4

$services = Get-Service | Select -First 5 -Property Name, Status

$services | Select @{Name='Index';Expression={[array]::IndexOf($services, $_)}}, Name, Status

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Output:

PowerShell

8 lines

1

2

3

4

5

6

7

8

Index Name Status

----- ---- ------

0 AarSvc_d253b Stopped

1 AESMService Running

2 AJRouter Stopped

3 ALG Stopped

4 AppIDSvc Stopped

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

[/quote] [quote quote=227923]All I was trying to say is you don’t have to do your order column manually.

Given this CSV file

PowerShell

8 lines

1

2

3

4

5

6

7

8

columna,columnb,columnc

a,b,c

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

You can add orderID programmatically.

PowerShell

7 lines

1

2

3

4

5

6

7

$count = 0

import-csv C:\temp\original.csv | foreach{

$_ | Select-Object @{N='OrderID';E={$count}},*

$count++

} | Export-Csv c:\temp\exportwithorderid.csv -NoTypeInformation

import-csv C:\temp\exportwithorderid.csv

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Output:

PowerShell

9 lines

1

2

3

4

5

6

7

8

9

OrderID columna columnb columnc

------- ------- ------- -------

0 a b c

1 a b c

2 a b c

3 a b c

4 a b c

5 a b c

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

And if you would rather the orderID column be on the end, just switch the position of the properties.

1

2

Select-Object *,@{N='OrderID';E={$count}}

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

[/quote] ok so it turns out that because the table has a clustered index (PK) it wont work even with a sequence column, because the ordering is determined based on the Primary Key (the Report Year)

rob-simmers · May 14, 2020, 10:49am

Yes, but how is the data being consumed? The sequence allows you to do:

SELECT Sequence, 'Report Year'
FROM myTable
ORDER BY Sequence

Also, you should not have spaces in your headers, just asking for pain. Storing the data in the database is not the same as consuming the data in a report or application. It would be like this:

SELECT SEQ AS 'Sequence',
       REPORT_YEAR AS 'Report Year'
FROM myTable
ORDER BY Sequence

This is getting into Relational Data structure best practices, but variables and table headers should not have spaces.

cataster · May 14, 2020, 11:51am

[quote quote=228214]Yes, but how is the data being consumed? The sequence allows you to do:

PowerShell

4 lines

1

2

3

4

SELECT Sequence, 'Report Year'

FROM myTable

ORDER BY Sequence

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Also, you should not have spaces in your headers, just asking for pain. Storing the data in the database is not the same as consuming the data in a report or application. It would be like this:

PowerShell

5 lines

1

2

3

4

5

SELECT SEQ AS 'Sequence',

REPORT_YEAR AS 'Report Year'

FROM myTable

ORDER BY Sequence

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

This is getting into Relational Data structure best practices, but variables and table headers should not have spaces.

[/quote]
unfortunately, the data retrieval process is outta my scope because the tables are just the datasource that feeds data into a tabular cube. the data is fed through a process known as “Process Database”, which i can’t really specify a query in since its a backend microsoft process.

the users end up connecting to the cube though in excel…maybe they can specify the query there somehow? MDX/DAX perhaps?

as for the naming convention and spacing, ya i agree, i actually built our POC version by modifying every single column name to include underscores to eliminate the undesirable characters, but because we keep getting a new file everytime and they wanted us to recreate the cube with more dimensions, i just decided to roll it out however it is in the file. theres like 80 columns and i dont have time to redo this multiple times with different column names. so meh, it is what it is lol

Topic		Replies	Views
Reading from Excel and Exporting to CSV? PowerShell Help	1	407	May 16, 2024
export data from csv to sql database PowerShell Help	15	538	May 16, 2024
Sorting Table, Results Not As Expected PowerShell Help	1	239	May 16, 2024
Bulk Insert Via Data.SqlClient.SqlBulkCopy PowerShell Help	1	813	May 16, 2024
Import-Csv and sort by end of column cell value PowerShell Help	4	436	May 16, 2024

Is there a way to bulkcopy without sorting the data?

Related topics