简体   繁体   English

当我在 Z531B84AD41B7990183ACF 中创建自定义 object 时,如何停止从两个类似的名为 XML 节点获取 object?

[英]How do I stop getting an object from two similar named XML nodes when am creating a custom object in Powershell?

I am trying to parse several rss news feeds which I will later filter based on what I am looking for.我正在尝试解析几个 rss 新闻提要,稍后我将根据我正在寻找的内容进行过滤。 Each feed has a slightly different XML Schema but in general has a Title, Description, link and pubDate.每个提要都有一个稍微不同的 XML 架构,但通常都有标题、描述、链接和 pubDate。 Some use a CDATA section, and some don't so I incorporated and if statement for those that use it.有些使用 CDATA 部分,有些不使用,所以我为那些使用它的人合并了 if 语句。 I am trying to write one routine that matches all.我正在尝试编写一个匹配所有程序的例程。 Here is a sample of the XML giving me the headache:这是一个让我头疼的 XML 样本:

<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:media="http://search.yahoo.com/mrss/" xmlns:dc="http://purl.org/dc/elements/1.1/" version="2.0">
  <channel>
    <title><![CDATA[ABC7 RSS Feed]]></title>
    <link><![CDATA[https://abc7news.com/feed]]></link>
    <lastBuildDate><![CDATA[Thu, 13 Jan 2022 15:49:04 +0000]]></lastBuildDate>
    <pubDate><![CDATA[Thu, 13 Jan 2022 15:49:04 +0000]]></pubDate>
    <description>Keep up with news from your local ABC station.</description>
    <copyright>Copyright 2022 ABC Inc., KGO-TV San Francisco</copyright>
    <managingEditor>KGO-TVWebTeam@email.disney.com(KGO-TV San Francisco)</managingEditor>
    <webMaster>KGO-TVWebTeam@email.disney.com(KGO-TV San Francisco)</webMaster>
    <language><![CDATA[en]]></language>
    <item>
      <title><![CDATA[Biden gives COVID response update; administration to deploy military teams to hospitals | LIVE]]></title>
      <description><![CDATA[Starting next week, 1,000 military medical personnel will begin arriving to help mitigate staffing crunches at hospitals across the country. ]]></description>
      <pubDate><![CDATA[Thu, 13 Jan 2022 15:38:02 +0000]]></pubDate>
      <link><![CDATA[https://abc7news.com/us-covid-biden-speech-today-hospitalizations/11462828/]]></link>
      <type><![CDATA[post]]></type>
      <guid><![CDATA[https://abc7news.com/us-covid-biden-speech-today-hospitalizations/11462828/]]></guid>
      <dc:creator><![CDATA[AP]]></dc:creator>
      <media:keywords><![CDATA[us covid, biden covid, biden speech today, covid hospitalizations, omicron variant, us hospitals, covid cases, covid omicron, biden military medical teams]]></media:keywords>
      <category><![CDATA[Health & Fitness,omicron variant,Coronavirus,military,joe biden,hospitals,u.s. & world]]></category>
      <guid isPermaLink="false">health/live-biden-highlighting-federal-surge-to-help-weather-omicron/11462828/</guid>
    </item>
    <item>
      <title><![CDATA[Massive backup on Bay Bridge after early morning crash]]></title>
      <description><![CDATA[A massive backup continues on the Bay Bridge after an earlier multi-vehicle crash past Treasure Island.]]></description>
      <pubDate><![CDATA[Thu, 13 Jan 2022 15:30:15 +0000]]></pubDate>
      <link><![CDATA[https://abc7news.com/bay-bridge-crash-traffic-accident-sf-commute/11463119/]]></link>
      <type><![CDATA[post]]></type>
      <guid><![CDATA[https://abc7news.com/bay-bridge-crash-traffic-accident-sf-commute/11463119/]]></guid>
      <dc:creator><![CDATA[KGO]]></dc:creator>
      <media:title><![CDATA[Crash triggers massive backup on Bay Bridge]]></media:title>
      <media:description><![CDATA[A crash on the Bay Bridge triggered massive gridlock for the Thursday morning commute.]]></media:description>
      <media:videoId>11463404</media:videoId>
      <media:thumbnail url="https://cdn.abcotvs.com/dip/images/11463261_011322-kgo-sky7-bay-bridge-traffic-img.jpg" width="1280" height="720" />
      <enclosure url="https://vcl.abcotv.net/video/kgo/011322-kgo-6am-bay-bridge-crash-vid.mp4" length="79" type="video/mp4" />
      <media:keywords><![CDATA[Bay Bridge crash, traffic, accident, SF commute, Oakland drive times, bay bridge toll plaza backup, Bay Area, treasure island,]]></media:keywords>
      <category><![CDATA[Traffic,Treasure Island,Oakland,San Francisco,CHP,bay bridge,crash]]></category>
      <guid isPermaLink="false">traffic/massive-backup-on-bay-bridge-after-early-morning-crash/11463119/</guid>
    </item>
  </channel>
</rss>

and Here is the parsing code which puts each item into a object ($posts):这是将每个项目放入 object ($posts) 的解析代码:

    $rss = [xml] (Get-Content 'I:\RSS_Project\Feeds\feed-3.xml')
    $rss.SelectNodes('//item')|%{
    $posts += New-Object psobject -Property @{
        Title = If($_.Title."#cdata-section"){$_.Title."#cdata-section"}else{$_.Title}
        Desc = If($_.description."#cdata-section"){$_.description."#cdata-section"}else{$_.Title}
        link = If($_.link."#cdata-section"){$_.link."#cdata-section"}else{$_.link}
        pubDate = If($_.pubDate."#cdata-section"){$_.pubDate."#cdata-section"}else{$_.pubDate}
        
        }
    }

I get the right link and pubDate with this feed but because there is a media:title and media:description in some items,(yes not consistent in the same feed), and so I get {title,media:title} output into the $posts.title custom object I created.我得到了正确的链接和 pubDate 这个提要,但是因为在某些项目中有一个 media:title 和 media:description (在同一个提要中不一致),所以我得到 {title,media:title} output 进入我创建的 $posts.title 自定义 object。 With this data it would be {Massive backup on Bay Bridge after early morning crash,Crash triggers massive backup on Bay Bridge}.有了这些数据,它将是{清晨崩溃后海湾大桥上的大规模备份,崩溃触发海湾大桥上的大规模备份}。 I can't figure out how to avoid capturing the media:title data.我不知道如何避免捕获媒体:标题数据。 My other XML feeds don't have the media:title.我的其他 XML 提要没有媒体:标题。 Can I do I pre-emptive strike and remove this ahead of time if it exists in any feeds?如果它存在于任何提要中,我可以先发制人并提前删除它吗? I tried using $_.Title[0] which worked on this feed but as the other feeds don't have the array, it did not work on those.我尝试使用适用于此提要的 $_.Title[0] ,但由于其他提要没有数组,因此它不适用于这些提要。 I have the same issue where media:description exists in the item.我有同样的问题,媒体:描述存在于项目中。 I output the data into an HTML table which only lists "System.Object" when I have the title or description array.我将 output 数据放入 HTML 表中,当我有标题或描述数组时,该表仅列出“System.Object”。 Any help to eliminate the media:title into my object would be greatly appreciated.任何有助于消除媒体:标题到我的 object 将不胜感激。

PowerShell's XML type adapter can be a bit "wonky" (for lack of a better technical term), because it attempts to simplify something complex - and as a result, it simply ignores namespace prefixes and resolves nodes by their local name instead, leading to $_.title resolving both the <title> and <media:title> elements. PowerShell 的 XML 类型适配器可能有点“古怪”(因为缺乏更好的技术术语),因为它试图简化复杂的事情 - 结果,它只是忽略命名空间前缀并通过其本地名称解析节点,导致$_.title解析<title><media:title>元素。

Instead, use XPath to resolve the values as well:相反,请使用 XPath 来解析这些值:

$fields = 'title','description','pubDate','link'

$posts = foreach($item in $rss.SelectNodes('//item')) {
    # create dictionary to hold properties of the object we want to construct
    $properties = [ordered]@{}

    # now let's try to resolve them all
    foreach($fieldName in $fields) {
        # use a relative XPath expression to extract relevant child node from current item
        $value = $item.SelectSingleNode("./${fieldName}")

        # handle content wrapped in CData
        if($value.HasChildNodes -and $value.ChildNodes[0] -is [System.Xml.XmlCDataSection]){
            $value = $value.ChildNodes[0]
        }

        # add node value to dictionary
        $properties[$fieldName] = $value.InnerText
    }

    # output resulting object
    [pscustomobject]$properties
}

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 为什么我在从第 3 方 API 加载 XML 时收到“800a01a8”Object 必需错误? - Why am I getting a '800a01a8' Object required error when loading XML from 3rd party API? 当我尝试将文本/xml 转换为对象时,为什么会得到 {&quot;location&quot;: null}? - Why am I getting a {"location": null} when I try to convert a text/xml into an object? 如何判断我在xml对象中的位置 - How to tell where in an xml object i am 当我在 xslt 中迭代时,如何同时从两个不同的节点/元素中获取文本? - How do I grab the text from two different nodes/elements simultaneously as I am iterating in xslt? 如何将反序列化的XML对象(.NET)转换为点分隔的命名键值的单个集合? - How do I turn a deserialized XML object (.NET) into a single collection of dot separated named key values? 如何在创建XmlSerializer对象时解决此异常? - How do I troubleshoot this exception when creating an XmlSerializer object? Powershell-XML中的对象引用 - Powershell - Object reference from XML 如何在AS3中引用名称为关键字的xml节点,例如 <object> X </object> 要么 <name> X </name> ? - How do I reference xml nodes in AS3 whose names are keywords, like <object>x</object> or <name>x</name>? 在使用SAX进行解析时,如何保留未绑定到对象的XML节点 - How to preserve XML nodes that are not bound to an object when using SAX for parsing 从Quickblox自定义对象中的xml文件创建多条记录 - creating multi records from xml file in Quickblox custom object
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM