Transforming and exporting SQL agent output to CSV

We have a SQL-server which exports a number of files on a regular basis.
For some reason the files exported via SQL agent is not a CSV file but a plain text file with som wonky formatting. And according to the people responsible for it there’s no getting around that.

I’ve been tasked with cleaning up the files and exporting them as CSV files, which I really didn’t think would be a problem. But I think I may be overthinking this because I’m not getting it right.

An example file looks like this:

Job 'Export to [SYSTEM]' : Step 1, 'SQL-query DB version (Test)' : Began Executing 2022-10-19 11:45:01

DGGRP DGGRPBEN            
----- --------------------
11    Se.tj heltid tillsv.
12    Se.tj deltid tillsv.
13    Semtj hel an tj     
14    Semtj del an tj     
15    Uppeh.tj hel tillsv.

Both columns are padded with spaces, not tabs.
The desired output would be something like this:

DGGRP;DGGRPBEN
11;Se.tj heltid tillsv.
12;Se.tj deltid tillsv.
13;Semtj hel an tj
14;Semtj del an tj
15;Uppeh.tj hel tillsv.

It’s no problem skipping the first lines and trimming the ends of the lines, but for some reason I’m not getting my thinking right when it comes to separating the two columns.
It seems that the width of the columns are set by the widest element.

My first thought was going with something like this:

$Text = Get-Content -Path ExportFile.txt | Select-Object -Skip 2 | ForEach-Object {
    $_.Trim()
}
$Text = $Text -replace('\s{2}',';')

I was thinking that would replace every instance of more than one whitespace with a semicolon, but instead I just got double semicolons on each line except for the first two with the header and the dashes as they only contain one whitespace.
I’m probably overthinking this… I tend to do that, but I’m not really getting anywhere at the moment.

I also need to remove the second line with the dashes. I can probably brute force it by ripping out the first lines and just adding the headers line back in, but if there’s a better or more elegant solution I’m all ears.

You can use a positive lookbehind to only replace the first space(s) from the front of the line. Also need to get rid of the dash line

$text = (Get-Content -Path ExportFile.txt | Select-Object -Skip 2) -notmatch '^--.+'

$text -replace '(?<=^\S+)\s+',';' | ConvertFrom-Csv -Delimiter ';'

You can take a look at the regex demo here

1 Like

Thank you Doug!

I know I need to dedicate some (more) time to Regex… It really makes a difference when dealing with poorly formatted text.
I was still stuck with the trailing space in the second column, but I managed to find another regex that solved that as well.
Here’s the current iteration of the script:

[CmdLetBinding()]
Param (
  [Parameter(
    Mandatory=$true,
    ValueFromPipeLine=$true
    )
  ]
  [string[]]$FileList
)

foreach ($File in $FileList) {
  $CsvFileName = $null
  $CsvFileName = Split-Path -Path $File -Leaf
  $CsvFileName = $CsvFileName -replace '.txt','.csv'
  $CsvFileName = "$(Get-Date -Format "yyyyMMddHHmmss")-$($CsvFileName)"
  # Regex and POSH: Strips the first two lines from the imported file and any lines starting with two or more dashes
  $Text = (Get-Content -Path $File | Select-Object -Skip 2) -notmatch '^--.+'
  # Regex: First replace finds the first occurence of whitespaces and replaces it with a semicolon.
  # The next replace finds all whitespace at the end of each line and replaces it with nothing (basically a TrimEnd())
  # $Text = $Text -replace '(?<=^\S+)\s+',';' -replace '[ \t]+$',''
  # $Text | Out-File -FilePath "C:\Temp\$($CsvFileName)" -Encoding utf8
  $Text -replace '(?<=^\S+)\s+',';' -replace '[ \t]+$','' | Out-File -FilePath "C:\Temp\$($CsvFileName)" -Encoding utf8
}

I’m getting close to where I want to be with the script, but I’m missing something about accepting pipeline input.

The script works fine if I call it like this:

.\Export-DataBaseCsv.ps1 -FileList .\TestFile_1.txt, .\TestFile_2.txt, .\TestFile_3.txt

It outputs three CSV-files to the C:\Temp directory.
If I call it like this, however:

Get-Childitem T*.txt | .\Export-DataBaseCsv.ps1

It only creates a CSV-file for the last text-file in the listing.

Shouldn’t the script accept the file-listing as an array similar to the manually written parameter list?

A separate question.
I’d prefer to slim the the $CsvFileName assignment, but when I tried to write it like this:

$CsvFileName = Split-Path -Path $File -Leaf | $_ -replace '.txt','.csv'

Or this:

$CsvFileName = Split-Path -Path $File -Leaf | ($_).Replace('.txt','.csv')

VSCode complained that expressions are only allowed as the first element in a pipeline.
I feel like there should be a working syntax for using -replace and similar in the pipeline, but my google-fu is failing me.

You’re making this way more complicated then it has to be. The *-Csv cmdlets will deal with extra whitespace for you, so use them to import/export. Next, if you actually get the fileinfo object of the file you can simply grab it’s basename. And finally, you can’t accept pipeline input properly without a process block.

[CmdLetBinding()]
Param (
    [Parameter(
        Mandatory=$true,
        ValueFromPipeLine=$true
    )]
    [string[]]$FileList
)

process {
    foreach ($File in $FileList) {
        $CsvFileName = "{0}-{1}.csv" -f (Get-Date -Format "yyyyMMddHHmmss"),(Get-Item $File).BaseName

        $text = (Get-Content -Path $File | Select-Object -Skip 2) -notmatch '^--.+'

        $text -replace '(?<=^\S+)\s+',';' |
            ConvertFrom-Csv -Delimiter ';' |
                Export-Csv -Path $CsvFileName -Delimiter ';' -NoTypeInformation
    }
}
1 Like

This is why I appreciate coming here, thank you!
I know I tend to overthink and overcomplicate things.

The process{} is what is needed for the script to process pipeline input as an array instead of just manipulating the last item in the array, I’m guessing?

I will say that I did still need to add the second -replace to trim the end of the second column as you can see here from Excel. Without the second -replace the cursor is a number of whitespaces to the right of the text, while with the -replace there’s no extraneous whitespace.
CsvTrimDifference

This is the state of the script now:

[CmdLetBinding()]
Param (
  [Parameter(
    Mandatory=$true,
    ValueFromPipeLine=$true
    )
  ]
  [string[]]$FileList
)

process {
  foreach ($File in $FileList) {
    $CsvFileName = "{0}-{1}.csv" -f (Get-Date -Format 'yyyyMMddHHmmss'),(Get-Item $File).BaseName
    # Regex and POSH: Strips the first two lines from the imported file and any lines starting with two or more dashes
    $Text = (Get-Content -Path $File | Select-Object -Skip 2) -notmatch '^--.+'
    # Regex: First replace finds the first occurence of whitespaces and replaces it with a semicolon.
    # The next replace finds all whitespace at the end of each line and replaces it with nothing (basically a TrimEnd())
    $Text -replace '(?<=^\S+)\s',';' -replace '[ \t]+$','' |
      ConvertFrom-Csv -Delimiter ';' |
      Export-Csv -Path "C:\Temp\$($CsvFileName)" -Delimiter ';' -Encoding utf8 -NoTypeInformation
  }
}

Interesting. I would probably just add a trim in that case.

[CmdLetBinding()]
Param (
    [Parameter(
        Mandatory=$true,
        ValueFromPipeLine=$true
    )]
    [string[]]$FileList
)

process {
    foreach ($File in $FileList) {
        $CsvFileName = "{0}-{1}.csv" -f (Get-Date -Format "yyyyMMddHHmmss"),(Get-Item $File).BaseName

        $text = (Get-Content -Path $File | Select-Object -Skip 2) -notmatch '^--.+'

        $text.Trim() -replace '(?<=^\S+)\s+',';' |
            ConvertFrom-Csv -Delimiter ';' |
                Export-Csv -Path $CsvFileName -Delimiter ';' -NoTypeInformation
    }
}