Delete all texts before a particular xml tag in Powershell

Hi All

I have a requirement wherein I want to remove all the Texts prior to a particular xml tag. Also, I dont want any junk characters to be in once this conversion happens.

Sample xml file sample.xml

As depicted in my sample.xml file, I want to create a new xml in a different path using sample.xml file, where I want to delete all the texts prior to tag . so my target xml would be as below:

or, in other words, I want my target xml file to have everything between the tags and

Sample xml file sample.xml

 










As depicted in my sample.xml file, I want to create a new xml in a different path using sample.xml file, where I want to delete all the texts prior to tag . so my target xml would be as below:







or, in other words, I want my target xml file to have everything between the tags  and  




instead of my file starting with line 1 and line2. I want that the powershell should trim off the line 1 and line 2 so that I am just left with line 3 in all my xml files

line 1 -
line 2-
line 3-

Did you try

Get-Content -Path ‘Your Sample XML File’ | Select-Object -Skip 2
?

Hi Oalf

Thanks for your reply!

This will not work since the XML files are not always formatted. So,we can have this spanned in two different lines.
At times, we have this in the same line. So skip - 2 will not work.

I am looking forward to a program, which can trim off all the texts from an xml file before a particular keyword, and, write to a new file in a separate directory.

Thanks
Rahul Kumar

If you know this particular key word you just have to search for it and delete everything in front of it. What’s the actual problem?

Hey,

Here is one way. Probably not the most elegant, and it does not export the new file at the end, but I imagine you can sort that part out.

#grab your file
$file = Get-Content -Path C:\MyScripts\myFile.txt

#put the array on one line
$oneLine = $file -join ''

#index your keyword here
$keyword = $oneLine.IndexOf("line 3")

#grab both sides - before and after your keyword
$beforeKeyword = $oneLine.Substring(0,$keyword)

#here is the string you want. export it somehow.
$afterKeyword = $oneLine.Substring($keyword)

If you see this, Olaf, please show me the method you mentioned.

Thank you

EDIT

I just found this Where() method, and it is awesome. You can use ‘SkipUntil’, if you have a keyword to use. Like this…

#grab your file
$file = Get-Content -Path C:\MyScripts\myFile.txt

#set your keyord
$keyword = "my keyword"

#use the Where() method with a scriptblock to match on $keyword, and skip everything in the collection until keyword is found
$keepAfterKeyword = $file.Where({$_ -match $keyword}, 'SkipUntil')

Pretty awesome.

Thanks, Let me try this.

Hi Olaf,

I imagine the problem to be that Rahul does not know how to do what you are suggesting. Please have a look at my methods above and share yours. We can all learn somehing :).

Thank you

Skip until is not working.

error below:

Method invocation failed because [System.Object] doesn’t contain a method named ‘Where’.
At line:6 char:33

  • $keepAfterKeyword = $file1.Where <<<< ({$_ -match $keyword}, 'SkipUntil')
    • CategoryInfo : InvalidOperation: (Where:String) , RuntimeException
    • FullyQualifiedErrorId : MethodNotFound

Where is your code?

Hi

Thanks for your reply!

I think, I am unable to post codes here. I have xml files where I have to remove everything prior
to a particular tag. The way the xml files are created have no specific order of placement of that
DTD tag. It can be in line 1 or line2 or line2. So we cannot always rely on line numbers. If we
can remove everthing in that file prior to that specific tag and write the contents into a new file,
then that should be okay.

Thanks

I am not referencing any line numbers. I am using Get-Content and looking for a keyword. That is what you are asking to do.

Post the code you are using. Thanks


$file_temp = "C:\DTD_R2_RAW"
$xml_in = "C:\DTD_R2_REM"
$file_archive="C:\D2_RAW_ARCHIVE"

$xml_files = Get-ChildItem $file_temp *.XML 

if($xml_files)
{
foreach ($file in $xml_files){
$file1 = Get-Content -Path $file
$keyword = ""
$keepAfterKeyword = $file1.Where({$_ -match $keyword}, 'SkipUntil')
cat $keepAfterKeyword | sc $xml_in\$file
}
}

Hi

thanks for your inputs, one more comment worth mentioning. Iam trying to load xml files
into oracle via sql ldr. Not sure, why most if the files error out with this error.
Apparently junk characters. Any way to deal with this so that this is taken care of

Record 4: Rejected - Error on table TABLE_XML, column XMLDATA.
ORA-31011: XML parsing failed
ORA-19202: Error occurred in XML processing
LPX-00210: expected '<' instead of '¿'
Error at line 1
ORA-06512: at "SYS.XMLTYPE", line 5

There are some strange things going on with your code. Look at this example and try to make it work for you. This works great for me. First create a directory to hold all of your output files. Then look at this…

#set your directory
$file_temp = "C:\DTD_R2_RAW"

#grab your files
$xml_files = Get-ChildItem $file_temp *.XML -Recurse

#designate your keyword
$keyword = "my keyword"

#create your new 'keep' folder
New-Item -ItemType Directory C:\DTD_R2_RAW\Keep

#if there are files, do something...
if ($xml_files) {

    #for each file, skip all characters until your find the keyword, then output everything from that point
    ForEach ($x in $xml_files) {

        $file = Get-Content -Path ($file_temp + '\' + $x.Name)
        
        $keep = $file.Where({$_ -match $keyword}, 'SkipUntil') | Out-File C:\DTD_R2_RAW\keep\$($x.name)  

    }
}

thanks for extending your help, much appreciated!

My Powershell version is version 2 and apparently the where method is not present there.
Any workaround please

HI,

thanks for your reply, Appreciate your help.

getting below error, apparently where clause will not work with my version of Powershell.

Mode LastWriteTime Length Name


d---- 14-Nov-17 10:00 AM Keep
Method invocation failed because [System.Object] doesn’t contain a method named ‘Where’.
At C:\MYLAN\PROJECT\ARGUS_UPGRADE\BFC\EMA_Rule_Increased_Files\CODE\PROCESSING\Camel.ps1:33 char:28

  •     $keep = $file.Where &lt;&lt;&lt;&lt; ({$_ -match $keyword}, &#039;SkipUntil&#039;) | Out-File 
    

Thanks

Try using | Where-Object instead. Or update Powershell. You should be on version 5. Maybe later if it is available.

You could also try the more long-winded approach I gave first…

#grab your file
$file = Get-Content -Path C:\MyScripts\myFile.txt

#put the array on one line
$oneLine = $file -join ''

#index your keyword here
$keyword = $oneLine.IndexOf("line 3")

#grab both sides - before and after your keyword
$beforeKeyword = $oneLine.Substring(0,$keyword)

#here is the string you want. export it somehow.
$afterKeyword = $oneLine.Substring($keyword)

But if I were you, I would update your Windows Management Framework.