Importing CSV and Removing Data/Cells

Hello!

It’s been a while since I’ve posted but here I am again looking for help!

I have been tasked with somehow creating a way to import a CSV that has more than 2000 lines and scrubbing the data from it that isn’t needed.

The CSV Looks something like this:

“Plugin”,“Plugin Name”,“Family”,“Severity”,“IP Address”,“Protocol”,“Port”,“Exploit?”,“Repository”

“111111”,“Ping the remote host”,“Port scanners”,“Info”,“X.X.X.X”,“TCP”,“0”,“No”,“Individual Scan”

Then has data that ends up coming in different strings such as:

Object UUID :00000000000-0000-0000-000000000000
UUID : xxx-xxxx-xxxx-xxxxx-xxxxxx version 1.0
Description : Unknown RPC service
Annotation : Impl friendly name
Type : Local RPC service
Named pipe : LRPC-1234567

Then further down has webaddresses and hash numbers that start with 0x, and all of this data always shows up in the Plugin row.

The script I have somewhat works, but I find that when I try to go in and remove data, or look at the CSV in notepad the data I thought was removed actually isn’t, but doesn’t show up if I open it in CSV. I have tried using get-content, but it exports the CSV unformatted, and I’ve tried using other comparisons such as -notlike and -notmatching but they don’t seem to work either.

 

Imported/Exported Variables
$CSVData = Import-Csv -Path "C:\Test.csv" 
$MatchingExportFile = "C:\RemovedItems.csv"
$NonMatchingExportFile = "C:\NewOutput.csv"

$MatchingData = @()
$NonMatchingData = @()

#Strings to search and remove from excel - Possibility to Add More
$RegexStrings = '*Object UUID*', '*UUID*', '*Description*','*Annotation*','*Type*','*senssvc*','*Named*','*Windows*','*0x*','*3A*','*navigation*','*Value*','*+ CGI*','*Methods*','*Argument*','*- https*','*static*','*Attached*'
#Headers
$headerSelection = "Plugin", "Plugin Name","Family","IP Address","Port","Exploit","DNS Name","NetBIOS Name","Plugin Text","Synopsis","Description","Solution","Check Type","Version"

Foreach ($Row in $CSVData) {
$MatchFound = $False
Foreach ($TestString in $RegexStrings) {
If ($Row.Plugin -like $TestString) {
$MatchFound = $True
Break
}
}
If ($MatchFound) {$MatchingData += $Row}
Else {$NonMatchingData += $Row}
}

If ($MatchingData.Count -gt 0) {
$MatchingData | Export-Csv -Path $MatchingExportFile -Force -NoTypeInformation
Write-Host "Matching Data exported to $MatchingExportFile"
}
Else {Write-Host "No matching data to export!"}
If ($NonMatchingData.Count -gt 0) {
$NonMatchingData | Select $headerSelection | Export-Csv -Path $NonMatchingExportFile -Force -NoTypeInformation
Write-Host "Non-matching Data exported to $NonMatchingExportFile"
}
Else {Write-Host "No non-matching data to export!"}

My goal is to import the CSV, delete the cells with the data that is not needed, and delete other blank cells that show up, then export to new CSV. But I cannot get it to successfully do either.

Can somebody provide some input on what I may be doing wrong?

Thank you

You will definitely get help, but please post this in General QnA Forum. This forum is dedicated for Pester related queries.

Thanks I have moved it to the correct forum! Hopefully I can get some help now!

I could imagine that it would be helpful to have some sample source data and some sample targe data… I mean more than 1 line and fomratted as code … the actual data. :wink:

Here is some pseudo code, but you can probably do something like this with a Regex OR and calculated expression

$regex = 'Object UUID|UUID|Description|Annotation|Type|senssvc|Named|Windows|0x|3A|navigation|Value|CGI|Methods|Argument|https|static|Attached'

$test = @()
$test += [pscustomobject]@{
    Plugin = 'This Annotation a string with a UUID in it'
}
$test += [pscustomobject]@{
    Plugin = 'This Description is awesome'
}
$test += [pscustomobject]@{
    Plugin = 'Nothing to match here'
}
$test += [pscustomobject]@{
    Plugin = 'dsdgsWindowssgafgafg'
}

$results = $test | 
           Select Plugin, 
                  @{Name='Match';Expression={$_.Plugin -match $regex}}

Output:

PS C:\> $results

Plugin                                     Match
------                                     -----
This Annotation a string with a UUID in it  True
This Description is awesome                 True
Nothing to match here                      False
dsdgsWindowssgafgafg                        True

Then you can simply filter the results and Export to CSV or whatever else you want to do:

$results | Where{$_.Match -eq $true} | Export-CSV -Path C:\mymatches.csv -NoTypeInformation

Thanks Rob I will see if I can test it today!

Olaf,

I have uploaded a sample CSV here I just make generic data however this is essentially what it looks like, but about 2000+ lines. The data that we are looking to remove is always in the Plugin field.

Hmmm … am I wrong or do you want to remove all lines where you have more than 3 empty cells next to each other? That should be quite forward.

Get-Content -Path C:\sample\sample.csv | 
    Where-Object {$_ -notlike '*,,,,,,,*'}

… works for me … at least with the sample data you provided.
Now it should be easy to use the data just like any other CSV data.

You aren’t wrong about that. I do want to delete the empty lines, but I’m also trying to remove the UUID down to the named pipe, as well as the url and the lines that have the random alphanumeric combinations which occur.

Did you try the code I posted?

Do the original data have the format the sample data have? … like this:

Plugin,Plugin Name,Family,Severity,IP Address,Protocol,Port,Expl,Repository,MAC Address,DNS Name,NetBIOS Name,Plugin Text,Discovered,Last Observed,Frameworks,Synopsis,Description,Solution
12345,Example Name,Family Info goes here,Sev,000.000.00.0,Protocl,0,No,Scan,Plugin Output: Output Goes Here,Jan-19,Jan-19,Synopsis Goes Here,Description Here,n/a
12345,Example Name,Family Info goes here,Sev,000.000.00.0,Protocl,0,No,Scan,Plugin Output: Output Goes Here,Jan-19,Jan-19,Synopsis Goes Here,Description Here,n/a
12345,Example Name,Family Info goes here,Sev,000.000.00.0,Protocl,0,No,Scan,Plugin Output: Output Goes Here,Jan-19,Jan-19,Synopsis Goes Here,Description Here,n/a
,
: senssvc,
Object UUID : 00000000-0000-0000-0000-000000000000,
UUID : abcd-1234-abcd-1234,
Description : Words Usually Follow,
Annotation : Something goes here,
Named pipe : Words,
,
12345,Example Name,Family Info goes here,Sev,000.000.00.0,Protocl,0,No,Scan,Plugin Output: Output Goes Here,Jan-19,Jan-19,Synopsis Goes Here,Description Here,n/a
12345,Example Name,Family Info goes here,Sev,000.000.00.0,Protocl,0,No,Scan,Plugin Output: Output Goes Here,Jan-19,Jan-19,Synopsis Goes Here,Description Here,n/a
,
,
0000xxx000023231231,
0000xxx000023231231,
0000xxx000023231231,
0000xxx000023231231,
0000xxx000023231231,
,
12345,Example Name,Family Info goes here,Sev,000.000.00.0,Protocl,0,No,Scan,Plugin Output: Output Goes Here,Jan-19,Jan-19,Synopsis Goes Here,Description Here,n/a
12345,Example Name,Family Info goes here,Sev,000.000.00.0,Protocl,0,No,Scan,Plugin Output: Output Goes Here,Jan-19,Jan-19,Synopsis Goes Here,Description Here,n/a
,
,
http://www.worldwideweb.com,
http://www.worldwideweb.com,
http://www.worldwideweb.com,
http://www.worldwideweb.com,

OK, this one here should even work when the original data is not valid CSV.

Get-Content -Path C:\sample\sample.csv |
Where-Object {$_ -match ‘([a-zA-z0-9.\s]+,){9,}’}

It looks for at least 9 cells with something in it.

Olaf,

 

I tried what you put, the extra data seems to be coming from the plugin text. I am not sure why It is moving it into the plugin column though. When I ran your code it looked like it deleted all the plugin numbers from the column.

Keith,

hmmm … what does that mean? Did it solve your problem? If not - what exactly is your problem? You may post again some example data and some of the data in the format you desire.

If you are just trying to find the rows that have a no data vs “full” data row vs only plugin, then you can do that with basic filtering. Using the example Olaf was using, you could do this:

$csv = Import-CSV -Path C:\Plugin.csv
$csv.Count

#Remove null plugins
$csv2 = $csv | Where{$_.'Plugin'}
$csv2.Count

#Get all of the rows where there is more than just the Plugin
$csv3 = $csv2 | Where{$_.'Plugin Name'}
$csv3.Count

#Get all of the rows where there is only Plugin value
$csv4 = $csv2 | Where{!($_.'Plugin Name')}
$csv4.Count

Output:

All rows count: 29
Count after removing NULL: 22
Count of full row: 7
Count of rows with only a Plugin value: 15

If you just want the full rows, you can just do it with a one-liner:

$csv = Import-CSV -Path C:\Plugin.csv |
       Where{$_.'Plugin Name'}

So I am trying to delete the cells in Row A, that don’t have an actual plugin number. I found yesterday that the data I am trying to delete is actually coming from the Plugin Text cells and appending to the Plugin cell (row A), due to meeting the 32k excel/csv character limit. This is all coming from a Nessus Security Center report, which is a known issue with exporting these reports to a csv.

Excuse me, Keith, I don’t mean to offend you, but is it possible that our replies overwhelm you? If our code examples did not do what you expected - what was wrong?

No offense taken Olaf. I initially thought the code worked, but that was because I ran the code with only a few headers selected. When I ran the code with all of the headers selected, the Plugin Text column is where the data is generating from and reaching the 32k CSV limit and therefore appending it to the Plugins section. I believe this is just an issue with the Nessus client reports itself and may be able to be fixed by working through the client with a Tenable rep. I am going to attempt to contact them however and see if they can help. I appreciate your help

Get the (good) entries as objects while sending them to a file

[pre]

$objects = Get-Content -Path .\Bad.csv |
Where-Object {$_ -notlike ‘,’} |
Tee-Object -FilePath Good.csv |
ConvertFrom-Csv

[/pre]