简体   繁体   中英

How to loop through XML files and create a CSV file?

I've got a good working PowerShell script (thanks to Ansgar Wiechers) for an XML file, that exports me the desired fields to a CSV file:

$goal = '\\LC\ARCHIV\INPUT_' + (Get-Date -Format yyyyMMddss) + '.xml'
[xml]$xml = Get-Content '\\mcsonlines-impexp\Onlines\LCMS\IMPORT\*.xml'
$xml.SelectNodes('//COMPOUND') |
  Select-Object @{n='SampleID';e={[int]$_.ParentNode.id}},
                @{n='SampleName';e={"B" + $_.ParentNode.name}},
                @{n='CompoundID';e={[int]$_.id}},
                @{n='CompoundName';e={$_.name}},
                @{n='analconc';e={[double]$_.PEAK.analconc}} |
  Export-Csv '\\LC\IMPORT\quandata.csv' -NoType -Delimiter ';'

Move-Item -Path \\LC\IMPORT\*.xml -destination $goal

The XML file:

<?xml version="1.0"?>
<QUANDATASET>
  <XMLFILE>
  <DATASET>
  <GROUPDATA>
    <GROUP>
      <METHODDATA/>
      <SAMPLELISTDATA>
        <SAMPLE id="1" groupid="1" name="Routine_2016_05_30_002">
          <COMPOUND id="1" sampleid="1" groupid="1" name="Leu">
            <PEAK foundscan="0" analconc="0.023423456">
              <ISPEAK/>
            </PEAK>
          </COMPOUND>
          <COMPOUND id="2" sampleid="1" groupid="1" name="Iso">
             <PEAK foundscan="0" analconc="0.123456789">
               <ISPEAK/>
             </PEAK>
          </COMPOUND>
          <COMPOUND id="3" sampleid="1" groupid="1" name="Thre">
          ...
          ...
          ...
        <SAMPLE id="2" groupid="1" name="Routine_2016_05_30_003">
          <COMPOUND id="1" sampleid="2" groupid="1" name="Leu">
          ...
          ...
          ...

The CSV Export looks like:

SampleID   SampleName  CompoundID  CompoundName    analconc
...
6   Routine_2016_11_11_006  1   Leu 60,30064828
6   Routine_2016_11_11_006  2   Iso 60,38823887
6   Routine_2016_11_11_006  3   Thre 74,00187964
...

Now to my question - is it possible to process multiple XML files at once with the script to a CSV file? With my change makes the script unfortunately nothing at all.

First try:

$file = Get-ChildItem '\\LC\IMPORT\*.xml' -Recurse
foreach ($file in $files) {
  [xml]$xml = (Get-Content $file)
  $xml.SelectNodes('//COMPOUND') |
    Select-Object @{n='SampleID';e={[int]$_.ParentNode.id}},
                  @{n='SampleName';e={"B" + $_.ParentNode.name}},
                  @{n='CompoundID';e={[int]$_.id}},
                  @{n='CompoundName';e={$_.name}},
                  @{n='analconc';e={[double]$_.PEAK.analconc}} |
    Export-Csv '\\LC\IMPORT\quandata.csv' -NoType -Delimiter ';'
}

This did not work at all.

Second try:

Get-ChildItem '\\LC\IMPORT\' *.xml -Recurse | % {
  $xml = [xml](Get-Content $_.FullName)
  #$goal = '\\LC\ARCHIV\INPUT_' + (Get-Date -Format yyyyMMddss) + '.xml'

  $xml.SelectNodes('//COMPOUND') |
    Select-Object @{n='SampleID';e={[int]$_.ParentNode.id}},
                  @{n='SampleName';e={"B" + $_.ParentNode.name}},
                  @{n='CompoundID';e={[int]$_.id}},
                  @{n='CompoundName';e={$_.name}},
                  @{n='analconc';e={[double]$_.PEAK.analconc}} |
    Export-Csv '\\LC\IMPORT\quandata.csv' -NoType -Delimiter ';'
}

With this attempt only one XML file is exported to a CSV file.

Here is the link to my first post:

How to output child elements separately, not as one space-delimited string?

You are always overwriting the csv file, use:

[..]Export-Csv '\\LC\IMPORT\quandata.csv' -NoType -Delimiter ';' -Append

instead. -Append will cause powershell to add the new content.

Your first approach didn't do anything because you collect the list of XML files in a variable $file , but then iterate over a variable $files (note the trailing "s"), which is empty.

Your second approach overwrites the output file with each iteration, because you use Export-Csv inside the loop without the parameter -Append .

Either put the Export-Csv statement after the loop:

Get-ChildItem '\\LC\IMPORT\*.xml' -Recurse | ForEach-Object {
  [xml]$xml = Get-Content $_.FullName

  $xml.SelectNodes('//COMPOUND') |
    Select-Object @{n='SampleID';e={[int]$_.ParentNode.id}},
                  @{n='SampleName';e={"B" + $_.ParentNode.name}},
                  @{n='CompoundID';e={[int]$_.id}},
                  @{n='CompoundName';e={$_.name}},
                  @{n='analconc';e={[double]$_.PEAK.analconc}}
} | Export-Csv '\\LC\IMPORT\quandata.csv' -NoType -Delimiter ';'

or call Export-Csv with the parameter -Append inside the loop, so that each iteration appends to the CSV:

Get-ChildItem '\\LC\IMPORT\*.xml' -Recurse | ForEach-Object {
  [xml]$xml = Get-Content $_.FullName

  $xml.SelectNodes('//COMPOUND') |
    Select-Object @{n='SampleID';e={[int]$_.ParentNode.id}},
                  @{n='SampleName';e={"B" + $_.ParentNode.name}},
                  @{n='CompoundID';e={[int]$_.id}},
                  @{n='CompoundName';e={$_.name}},
                  @{n='analconc';e={[double]$_.PEAK.analconc}} |
    Export-Csv '\\LC\IMPORT\quandata.csv' -Append -NoType -Delimiter ';'
}

The first approach is preferable, though, because it avoids repeatedly opening and closing the output file, so it has better performance. Also, the parameter -Append isn't available prior to PowerShell v3, so the second approach requires at least that PowerShell version and will not work on PowerShell v2 or earlier.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM