Extracting data from .xml files

Hi all

I’m a keen to learn novice at this so I wonder if someone could help. I have a number of .xml files that I wish to extract certain variables from in order to identify the ones that need updating.
A previous colleague had written the formula below to extract equation values so I thought I’d edit this one to extract sub batch types and associated variables but it doesn’t work. Bizarrely the old one also doesn’t work anymore. What am I missing if I wish to extract subbatchtype name, labeltemplate and layouttemplate for each of the worksheets I have?


Get-ChildItem “S:\QA\Worksheets\QAS\QAS current*.xml” | foreach-object {$x = [xml] (gc $_); Write-Output "$($x.worksheet.basicData.code), $($x.worksheet.basicData.worksheetVersion.version), $($x.worksheet.basicData.name), [$($x.worksheet.equations.equation.fixed.variableName), $($x.worksheet.equations.equation.fixed.value), $($x.worksheet.equations.equation.fixed.unit) ] > desktop\XXEquationValuestoday.txt

Example of original .xml files

Welcome DrMartin!

Would you do me a huge favor and edit your post and format your code? See: How to format code on PowerShell.org

so a couple things stand out:

  1. Even formatted ‘as code’, that code IMO is not easy to follow as written. It really would benefit from a small refactor/cleanup.
  2. What that code is doing is grabbing the XML and turning into a PowerShell Object. The ‘nodes’ essentially become properties. You can access the ‘root’ property by using the ‘dot’, then typing it, in this case worksheet. If there’s a sub-node under worksheet, you access it by doing the dot once more, so, for example basicData. In some examples I’m not seeing the property in the file (for example, where is code or WorksheetVersion?)
  3. As mentioned in two, each of those nodes is turned into a hierarchy of properties. To get the data, you’d need to get the XML file like you’re doing in your code then drill down. That said I don’t deal with XML enough to know how it handles those nodes with the same name so you might just need to look at the data manually in PowerShell to see what it looks like. For Example, if you have the XML content in a variable named X you should be able to get to the subBatchTypes node like this:
$x.worksheet.basicData.subBatchTypes

Then you can see what data is being displayed to see how you can access it.

Thanks dotnVo
The first piece relating to the basicdata “parameters” works fine. And they iterate for each of the 300 xml files I have as they are single line entries. The problem lies when I try to get it to ‘loop’ through a number of parameters such as subBatchTypes. It almost seems to me that I’m missing an additional command of the Foreach type to make it ‘loop’ through all the subBatchType entries before it moves on to the next .xml file.

Sorry for the odd terminology. Like I say I’m a novice with this.

In XML those are ‘nodes’ I believe, when translated into a PS object, they are properties, my previous posts has some suggestions to try and play with it. I personally would just import a single document as an XML doc, and play around with by accessing the properties, etc.

You might need to do some looping depending on what you or you might actually be able to access the ‘array’ of the sub objects by using the method I previously posted. I can’t tell you for sure, I don’t have your data set, and I haven’t worked a ton with XML and PowerShell. For me to tell you me concretely, I’d need a copy of one of the XML files so I can see what it looks like (not a screenshot, the actual file so I can pull it into PS). I’d be doing exactly what I suggested in my previous post, just toying around w/ the object to see what data is exposed and how it looks.

$($x.worksheet.equations.equation.fixed.unit) ] > desktop\XXEquationValuestoday.txt

Same as @dotnVo, I have no way to test, however, you may actually be there as you are overwriting your text file for each item. Change:

>

To:

>>

To “append” to the file.

1 Like

Ah, thanks for that reminder. I meant to comment on the redirection, as part of something OP may want to change, but forgot to do so :D.

Hi both
I’ve tried your suggestions but its still not working. I reckon I’m being particularly dull.

Here’s an example of one of the xml files inits entirety I hope this helps.

<worksheet>
  <basicData>
    <worksheetVersion version="12" />
    <code>HSS-02-QAS</code>
    <name>Hydrocortisone in Sodium Chloride 0.9% w/v (Manual) – I.V./I.M.</name>
    <cofcProductName>Hydrocortisone</cofcProductName>
    <coshhStatement>
      <line1>Hydrocortisone must be manufactured in a Biological Safety Cabinet</line1>
      <line2></line2>
    </coshhStatement>
    <shelfLife minimumConcentration="0.94" maximumConcentration="50" description="Infusion bags" days="41" hours="0" minutes="0" />
    <shelfLife minimumConcentration="0.94" maximumConcentration="50" description="Syringes" days="81" hours="0" minutes="0" />
    <storage tempMin="2" tempMax="8" otherInstructions="Protect from light during storage. Keep out of reach and sight of children." />
    <worksheetDiluentName>Sodium Chloride 0.9% w/v</worksheetDiluentName>
    <CofARequired>false</CofARequired>
    <doseUnit>mg</doseUnit>
    <limits>
      <vials maximum="40" />
    </limits>
  </basicData>
  <equations>
    <equation number="9">
      <fixed variableName="X" value="0" unit="ml" />
      <fixed variableName="R" value="50" unit="mg/ml" />
    </equation>
  </equations>
  <method>
    <methodStep>
      <text>Clean cabinet, record readings and report if outside limit to Production Supervisor or QC before proceeding</text>
      <sopReferences>
        <sopReference number="4" />
        <sopReference number="6" />
      </sopReferences>
    </methodStep>
    <methodStep>
      <text>Clean and set-up any equipment being used and record serial numbers</text>
      <sopReferences>
        <sopReference number="4" />
        <sopReference number="6" />
      </sopReferences>
    </methodStep>
    <methodStep>
      <text>Make one sub-batch at a time</text>
      <sopReferences>
        <sopReference number="14" />
      </sopReferences>
    </methodStep>
    <methodStep>
      <text>For the reconstitution of the first vial, first weigh the recon syringe and draw Water for Injection (WFI) to 2ml from WFI bag. Ensure the weight falls between the limits 2.0g – 2.0g (adjust if outside of these limits).</text>
      <sopReferences>
        <sopReference number="15" />
      </sopReferences>
    </methodStep>
    <methodStep>
      <text>If appropriate, reconstitute (L) x 100mg vials with 2ml of Water for Injection and record the weight of the vial. Repeat for all vials.</text>
      <sopReferences>
        <sopReference number="15" />
      </sopReferences>
    </methodStep>
    <methodStep>
      <text>Shake vials vigorously to ensure complete dissolution of Hydrocortisone</text>
      <sopReferences>
        <sopReference number="15" />
      </sopReferences>
    </methodStep>
    <methodStep>
      <text>If appropriate, withdraw required volume of NaCl from bag as shown on sub-batch worksheet. </text>
      <sopReferences>
        <sopReference number="15" />
      </sopReferences>
    </methodStep>
    <methodStep>
      <text>For Devices, transfer H ml of Sodium Chloride 0.9% into to the final device. Record addition weight or obtain volume check as per sub-batch sheet.</text>
      <sopReferences>
        <sopReference number="15" />
      </sopReferences>
    </methodStep>
    <methodStep>
      <text>Transfer F ml of Hydrocortisone into C devices as shown on sub-batch worksheet</text>
      <sopReferences>
        <sopReference number="15" />
      </sopReferences>
    </methodStep>
    <methodStep>
      <text>Check each device contains correct volume, expel any air bubbles and cap</text>
      <sopReferences>
        <sopReference number="15" />
      </sopReferences>
    </methodStep>
    <methodStep>
      <text>Inspect each device before removing from cabinet</text>
      <sopReferences>
        <sopReference number="14" />
      </sopReferences>
    </methodStep>
    <methodStep>
      <text>Repeat steps 7-11 for all remaining sub-batches</text>
      <sopReferences>
        <sopReference number="14" />
      </sopReferences>
    </methodStep>
    <methodStep>
      <text>Perform finger dabs and contacts if the batch is the last to be manufactured in the session</text>
      <sopReferences>
        <sopReference number="4" />
        <sopReference number="5" />
      </sopReferences>
    </methodStep>
    <methodStep>
      <text>Pass all devices and vials out to release area</text>
      <sopReferences>
        <sopReference number="4" />
      </sopReferences>
    </methodStep>
  </method>
  <ingredients>
    <drug name="Hydrocortisone" type="NonSpecific" labelBox="None" manufacturer="Pfizer">
      <size quantity="2" unit="ml" description="Hydrocortisone Sodium Succinate 100mg">
        <allowedStartingMaterials>
          <startingMaterial id="47432e07-4458-4489-95b2-d2cc021f5012" />
          <startingMaterial id="e3d2b27d-a414-4f14-9327-426a58c14913" />
        </allowedStartingMaterials>
      </size>
    </drug>
    <devices>
      <DeviceSubGroup id="bbd5782e-d588-4eef-a53b-a7169a759064" />
      <Device id="ef6c3268-9beb-41fa-9a7f-1945880ce6d9" />
      <Device id="fb78e638-9d16-424d-b346-8f2d58fc6572" />
      <Device id="670c0bd9-e69f-44f1-bbbd-8d42e58d9e2b" />
      <Device id="990949da-6b17-46af-a6c1-2c949f1ef19d" />
      <Device id="80bcf64d-1ea3-44a0-8ea6-1a1a91ac9eda" />
      <Device id="b6e432f8-0db9-42e5-b41b-00fa92c9563b" />
      <Device id="d16d6519-c0df-4769-84c2-a4418eec3c46" />
      <Device id="0cab1a36-6bf7-4beb-a2d4-b8e0a73e4ce0" />
      <Device id="2f7394fe-4d8d-415b-b474-d5b7f673f751" />
      <Device id="48ae7ea3-4d0a-40d5-a74e-e8b7a8c1c548" />
    </devices>
    <diluents>
      <diluent name="Sodium Chloride 0.9% w/v">
        <size quantity="1000" unit="ml">
          <allowedStartingMaterials>
            <startingMaterial id="4d51599f-4993-489d-a556-d1b22732b23d" />
          </allowedStartingMaterials>
        </size>
      </diluent>
    </diluents>
    <other>
      <item name="Additive Port Cover">
        <allowedStartingMaterials>
          <startingMaterial id="9b19dcec-463c-46d9-ab5f-63b4d2fec32e" />
        </allowedStartingMaterials>
      </item>
      <item name="Viaflo Cap">
        <allowedStartingMaterials>
          <startingMaterial id="27c75cbb-35ba-44b5-8a65-453dfa7c2e6c" />
        </allowedStartingMaterials>
      </item>
      <item name="Blind Hub">
        <allowedStartingMaterials>
          <startingMaterial id="83a4f3fb-a72c-456b-99cf-980154721d12" />
        </allowedStartingMaterials>
      </item>
      <item name="Tamper Evident Cap">
        <allowedStartingMaterials>
          <startingMaterial id="d11b08f5-93d0-4969-8986-c24523513f6a" />
        </allowedStartingMaterials>
      </item>
      <item name="Multi Ad Cap">
        <allowedStartingMaterials>
          <startingMaterial id="e791dc61-9235-4c8c-bb49-9a19bb79d871" />
        </allowedStartingMaterials>
      </item>
      <item name="Water For Injection 1000ml (Reconstituent)" quantity="1">
        <allowedStartingMaterials>
          <startingMaterial id="78e8415c-a45f-4cc9-b4bc-817ff9fb685a" />
        </allowedStartingMaterials>
      </item>
    </other>
  </ingredients>
  <equipment />
  <subBatchTypes>
    <subBatchType name="Infusion Bags" labelTemplate="BagLabel" layoutTemplate="B" addAndWithdrawOption="true" namedPatient="true" weightLimitPercentage="0.01" solutionWeight="1.024">
      <routes>
        <route name="Intravenous Infusion" labelText="for Intravenous Infusion" shortName="I.V.I" yellowPanel="false" routeId="8de8a7bf-b115-4aa3-83c0-f9a0ecc0b91b" />
      </routes>
      <warningNotices>For single use only. Discard any unused solution. Do not use if damaged or contains particles.</warningNotices>
    </subBatchType>
    <subBatchType name="Syringes - Devices" labelTemplate="SmallLabel" layoutTemplate="E" addAndWithdrawOption="false" namedPatient="true" weightLimitPercentage="0.01" solutionWeight="1.024">
      <routes>
        <route name="Intramuscular Injection" labelText="for Intramuscular Injection" shortName="I.M" yellowPanel="false" routeId="d01f1fd4-e5c2-43ae-a3eb-d70db47c805d" />
        <route name="Intravenous Infusion" labelText="for Intravenous Infusion" shortName="I.V.I" yellowPanel="false" routeId="8de8a7bf-b115-4aa3-83c0-f9a0ecc0b91b" />
        <route name="Intravenous Injection" labelText="for Intravenous Injection" shortName="I.V." yellowPanel="false" routeId="ea4c9672-8824-49fd-bf78-9d6c35eaa683" />
      </routes>
      <warningNotices>For single use only. Discard any unused solution. Do not use if damaged or contains particles.</warningNotices>
    </subBatchType>
    <subBatchType name="Flagged Syringes - Devices" labelTemplate="FlaggedLabel" layoutTemplate="E" addAndWithdrawOption="false" namedPatient="true" weightLimitPercentage="0.01" solutionWeight="1.024">
      <routes>
        <route name="Intramuscular Injection" labelText="for Intramuscular Injection" shortName="I.M" yellowPanel="false" routeId="d01f1fd4-e5c2-43ae-a3eb-d70db47c805d" />
        <route name="Intravenous Infusion" labelText="for Intravenous Infusion" shortName="I.V.I" yellowPanel="false" routeId="8de8a7bf-b115-4aa3-83c0-f9a0ecc0b91b" />
        <route name="Intravenous Injection" labelText="for Intravenous Injection" shortName="I.V." yellowPanel="false" routeId="ea4c9672-8824-49fd-bf78-9d6c35eaa683" />
      </routes>
      <warningNotices>For single use only. Discard any unused solution. Do not use if damaged or contains particles.</warningNotices>
    </subBatchType>
  </subBatchTypes>
</worksheet>

Just as an example, I took that data set from the single file and wrote just the subbatchtypes to a file:

[xml]$Data = Get-Content -Path '..\Test.XML'
$Data.WorkSheet.subBatchTypes.subBatchType | Out-File -FilePath '..\Text.txt'

This worked pretty well, but does include all the properties, If you only wanted some of the properties of subbatch type i’d just pipe that ( | ) to Select-Object and type in the property names you want.

I know you’re writing more to a file than that, is it in a particular format or are you just trying to get the data into a file? I sorta re-wrote the code a bit you shared, because I noticed it didn’t seem to be correct, there were brackets and missing double quotes for the write-output:

$XMLFile = Get-ChildItem -Path "..\test.xml"
$XMLFile | ForEach-Object {
    $x = [xml] (get-content $_)
    Write-Output "$($x.worksheet.basicData.code)",
    "$($x.worksheet.basicData.worksheetVersion.version)", 
    "$($x.worksheet.basicData.name)", 
    "$($x.worksheet.equations.equation.fixed.variableName)", 
    "$($x.worksheet.equations.equation.fixed.value)", 
    "$($x.worksheet.equations.equation.fixed.unit)" 
}
| Out-File -FilePath '..\Test.txt'  -Append

The output of that file looks like this:

HSS-02-QAS
12
Hydrocortisone in Sodium Chloride 0.9% w/v (Manual) – I.V./I.M.
X R
0 50
ml mg/ml

Is that what is expected, outside the additional data? If so I think that should work you just need to update the file path of all ur xml docs (my code uses my local test path), add in the bit about subbatch types with a select-object to grab only the properties you want, and update the Out-File path. Be sure to reference the same variable name (I used $Data in one example and $x in your slight code re-factor). Note I also changed to Out-File , as I rarely use > or >>. Lastly because I just was working with a single example I didn’t need the -Append on the Out-File but I went ahead and added it, in case you did not to append the data to the existing file instead of overwriting it.

This is brilliant. Thanks very much. In an ideal world having the outputs X, R etc with their associated values and units on one line would be fantastic eg
X 0 ml
R 50 mg/ml
but I bet this is difficult and time consuming to code.
I assume that the code as above literally outputs all the individual variables (name, value, unit) without having to do the first one and then the next one but my looping idea would be needed to do the X 0 ml thing until it ran out of variables and then move on to the next xml file.
In normal usage its just about extracting the data into a new file and then comparing the contents of this file with an older one to flag up what has changed. Its only something we do intermittently usually this time of the year to produce annual report but occasional if something goes wrong and we want to pin point a potential failure point.

With PS you can pretty much do anything you just have to toy with it some . That’s one of the great things about PowerShell, especially with work like this, as its really powerful for easy or ‘quick’ data parsing, IMO.

I think the best thing you can do further toy with the output is pull in one xml file, and explore it with powershell:

[xml]$Data = Get-Content -Path '.\23451.XML'

This will load a variable in your PS named $Data (again, update the path to your actual xml file). Then just type $x.worksheet it’ll output something like:

image

Then drill down further. $x.worksheet.equations and you’ll get:

image

Then continue going down that path… if you want fixed you add .fixed

That let’s you explore the data at each node and get a feel for what ‘PowerShell’ sees.

You talked about wanting the value/units on one line and you can do that, but you just need to manipulate the data further. at the ‘fixed’ property level, there’s X and R. Because of that you need to loop through it and output a one line string instead:

$x.worksheet.equations.equation.fixed | ForEach-Object {
Write-Output "$($_.variableName) $($_.value) $($_.unit)"
}

would output in powershell:

image

You can add separators as well if you want to do that. or maybe you don’t, but this is where playing around with the data will help you understand what’s happening under the hood, but how to manipulate it for your business needs.

I think between the last post and this post that probably gives you enough code to play around with.