简体   繁体   English

PowerShell 删除 XML 具有命名空间或前缀的节点

[英]PowerShell Delete XML Nodes with namespace or prefix

I need some help on this issue:在这个问题上我需要一些帮助:

I have a lot of this xml files on directory, and i need to delete part of the xml data.(Everything with opex:**something), **ExtendedXIP and LegacyXIP , but i can figure out what am`i doing wrong.我在目录中有很多 xml 文件,我需要删除部分 xml 数据。(所有带有opex:** 的东西),**ExtendedXIPLegacyXIP ,但我可以弄清楚我做错了什么。

This is my xml example:这是我的 xml 示例:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<opex:OPEXMetadata xmlns:opex="http://www.openpreservationexchange.org/opex/v1.0">
<opex:Transfer>
<opex:SourceID>4720184e-9d02-47f6-867f-2603ab116669</opex:SourceID>
</opex:Transfer>
<opex:Properties>
<opex:Title>Fakamae_2018002_RM</opex:Title>
<opex:Description>Fakamae_2018002_RM</opex:Description>
<opex:Identifiers>
  <opex:Identifier type="code">Fakamae_2018002_RM</opex:Identifier>
</opex:Identifiers>
</opex:Properties>
<opex:DescriptiveMetadata>
<LegacyXIP xmlns="http://preservica.com/LegacyXIP">
  <AccessionRef>88158870-ba1a-44a1-ad70-5cc898a5b436</AccessionRef>
  <AccumulationRef>3b955682-e827-43bb-a446-2dd635f01ef0</AccumulationRef>
</LegacyXIP>
<ExtendedXIP xmlns="http://preservica.com/ExtendedXIP/v6.0">
  <DigitalSurrogate>false</DigitalSurrogate>
  <CoverageFrom>2019-09-21T00:00:00.000Z</CoverageFrom>
     </ExtendedXIP>
<METATRANSCRIPT:METATRANSCRIPT xmlns:METATRANSCRIPT="http://www.mpi.nl/IMDI/Schema/IMDI"
     xmlns="http://www.mpi.nl/IMDI/Schema/IMDI" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" ArchiveHandle="hdl:2196/00-0000-0000-0013-4C3F-7" Date="2018-12-17" FormatId="IMDI 3.04" Originator="CMDI Maker by CLASS - Cologne Language Archive Services" Type="SESSION" Version="1.0" xsi:schemaLocation="http://www.mpi.nl/IMDI/Schema/IMDI http://www.mpi.nl/IMDI/Schema/IMDI_3.0.xsd">
  <Session>
    <Name>Fakamae_2018002_RM</Name>
    <Title>Rua Tau Tupuna</Title>
    <Date>2018-05-09</Date>
    <Description LanguageId="ISO639-3:eng" Link="">A story about a grandmother and granddaughtertranslations.</Description>
    <MDGroup>
      <Location>
        <Continent Link="http://www.mpi.nl/IMDI/Schema/Continents.xml"  Type="ClosedVocabulary">Oceania</Continent>
        <Country Link="http://www.mpi.nl/IMDI/Schema/Countries.xml"  Type="OpenVocabulary">Vanuatu</Country>
        <Region>Shefa Province</Region>
        <Address>Tongamea village Emae island</Address>
      </Location>
      <Project>
        <Name>fakamae-dewar-0487</Name>
          <Contact>
          <Name>Amy Dewar</Name>
          <Address />
          <Email>amy.e.dewar@uon.edu.au</Email>
          <Organisation>University of Newcastle, Australia</Organisation>
        </Contact>
        <Description LanguageId="ISO639-3:eng" Link="" />
      </Project>
  </METATRANSCRIPT:METATRANSCRIPT>
  </opex:DescriptiveMetadata>
  </opex:OPEXMetadata>

This is what i need to get:这是我需要得到的:

 <METATRANSCRIPT:METATRANSCRIPT xmlns:METATRANSCRIPT="http://www.mpi.nl/IMDI/Schema/IMDI"  xmlns="http://www.mpi.nl/IMDI/Schema/IMDI" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"  ArchiveHandle="hdl:2196/00-0000-0000-0013-4C3F-7" Date="2018-12-17" FormatId="IMDI 3.04" Originator="CMDI Maker by CLASS - Cologne Language Archive Services" Type="SESSION" Version="1.0" xsi:schemaLocation="http://www.mpi.nl/IMDI/Schema/IMDI http://www.mpi.nl/IMDI/Schema/IMDI_3.0.xsd">
    <Name>Fakamae_2018002_RM</Name>
    <Title>Rua Tau Tupuna</Title>
    <Date>2018-05-09</Date>
    <Description LanguageId="ISO639-3:eng" Link="">A story about a grandmother The grandmother sends her granddaughter around the village asking for fire from all the old women. This text was recorded in video and archived files are in mp4 video format and wav audio format. The eaf ELAN file contains both English and Bislama translations.</Description>
    <MDGroup>
      <Location>
        <Continent Link="http://www.mpi.nl/IMDI/Schema/Continents.xml"  Type="ClosedVocabulary">Oceania</Continent>
        <Country Link="http://www.mpi.nl/IMDI/Schema/Countries.xml" Type="OpenVocabulary">Vanuatu</Country>
        <Region>Shefa Province</Region>
        <Address>Tongamea village Emae island</Address>
      </Location>
      <Project>
        <Name>fakamae-dewar-0487</Name>
        <Title>Documentation of Fakamae, a Polynesian Outlier of Vanuatu</Title>
        <Id>MDP0369</Id>
        <Contact>
          <Name>Amy Dewar</Name>
          <Address />
          <Email>amy.e.dewar@uon.edu.au</Email>
          <Organisation>University of Newcastle, Australia</Organisation>
        </Contact>
        <Description LanguageId="ISO639-3:eng" Link="" />
      </Project>
   </METATRANSCRIPT:METATRANSCRIPT>

This is my code i have so far:这是我到目前为止的代码:

$XMLFile = "C:\Users\User\Documents\task.xml"
$xml = [xml](Get-Content $XMLFile)

# Load the existing document

$DeleteNames = Select-Xml -Xml $xml -Namespace @{opex='http://www.openpreservationexchange.org /opex/v1.0'} -Xpath //opex:Transfer/opex:Properties
# Specify tag names to delete and then find them

($Doc.Task.ChildNodes |Where-Object { $DeleteNames -contains $_.Name }) | ForEach-Object {
# Remove each node from its parent
[void]$_.ParentNode.RemoveChild($_)
}

# Save the modified document
$xml.Save($XMLFile)

I only need the xml data between我只需要 xml 之间的数据

METATRANSCRIPT >>> METATRANSCRIPT元转录 >>> 元转录

Thanks a lot for any help.非常感谢您的帮助。

In powershell, XML is a series of nodes within nodes.在 powershell 中,XML 是节点内的一系列节点。 So a problem you would be facing is that if you remove a parent node you inherently would delete the child as well.因此,您将面临的一个问题是,如果您删除父节点,您本质上也会删除子节点。 Metatranscript is a child of opex:DescriptiveMetadata so if you remove that you will remove Metatranscript. Metatranscript 是 opex:DescriptiveMetadata 的子项,因此如果您删除它,您将删除 Metatranscript。 One approach would be to treat the file as plain text rather than xml then delete lines that start with < opex etc. Another approach would be to get all nodes then then recursively check parents to see whether or not the parent nodes are kept and clean up the rest.一种方法是将文件视为纯文本而不是 xml 然后删除以 < opex 等开头的行。另一种方法是获取所有节点然后递归检查父节点以查看父节点是否保留并清理rest。

That being said, deleting unwanted nodes may be the wrong approach to the problem you are describing.话虽如此,删除不需要的节点可能是解决您所描述问题的错误方法。 If you just want the contents of METATRANSCRIPT, then the following would do the trick如果你只想要 METATRANSCRIPT 的内容,那么下面的方法就可以了

[xml]$xml=Get-Content test.xml -Raw
$xml.GetElementsByTagName("METATRANSCRIPT:METATRANSCRIPT")[0].OuterXml |out-file Newxml.xml

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM