簡體   English   中英

XmlSlurper使用Groovy解析XML並獲取內部元素的價值

[英]XmlSlurper to parse XML and get value of inside elements using Groovy

我正在嘗試解析以下XML:

    <body>
      <section id="5f884f20-6638-461f-a3f5-3d237341c048" outputclass="definition_and_scope">
        <title>Definition and Scope</title>
        <p>A work that is modified for a purpose, use, or medium other than that for which it was originally intended.</p>
        <p>This relationship applies to changes in form or to works completely rewritten in the same form.</p>
      </section>
      <section id="a7cf019f-dc82-46e2-b5ae-2e947d3c2509" outputclass="popup:ready_reference">
        <title>Element Reference</title>
        <div id="8472e205-3a32-40e3-a7ea-8bd7dbd43715" outputclass="iri">
          <p id="e6ddf17a-6b4b-4de3-886e-a315d88545ea" outputclass="title">
            <b>IRI</b>
          </p>
          <p id="c69f6279-27a3-4cd8-84a6-bb2c5a7b0424">
            <xref format="html" href="http://rdaregistry.info/Elements/w/P10142" scope="external">http://rdaregistry.info/Elements/w/P10142</xref>
          </p>
        </div>
        <div id="3e979983-cbac-4982-84c7-57ae9756e2bb" outputclass="domain">
          <p id="9815dbdf-7483-4dcf-8166-7ea50138b3e5" outputclass="title">
            <b>Domain</b>
          </p>
          <p id="328a1035-1eaf-4c4b-aead-d604586b3f64">
            <xref keyref="rdacC10001/ala-c3e1fff8-0a79-35c6-bee1-39b6b4c9ed35">Work</xref>
          </p>
        </div>
        <div id="13163eda-dcfd-48d9-aea4-cc8abef2f675" outputclass="range">
          <p id="d07d4e37-dff1-4561-baab-f8f557d99662" outputclass="title">
            <b>Range</b>
          </p>
          <p id="3873a6ab-5f73-47e2-9daa-441169e66c36">
            <xref keyref="rdacC10001/ala-c3e1fff8-0a79-35c6-bee1-39b6b4c9ed35">Work</xref>
          </p>
        </div>
    </section>
   </body>

我想提取section&section / div中所有p標簽的值,並將該值附加到stringbuilder。

這是我的代碼:

def docText = new StringBuilder();
def bodyObject = doc.topic.body.toXmlString(true) //I have only pasted a part of my XML in this question. My XML starts with a doc/topic/body etc
def parseBodyObject = new XmlSlurper().parse(new InputSource(new StringReader(bodyObject)));
def findAllSection = parseBodyObject.depthFirst().findAll{it.name()=='section'}

findAllSection.each {section->

      docText.append(" " +section.p)
      docText.append(" " +section.div.p + " ")

}

輸出:我的docText如下所示:

A work that is modified for a purpose, use, or medium other than that for which it was originally intended.This relationship applies to changes in form or to works completely rewritten in the same form. IRIhttp://rdaregistry.info/Elements/w/P10142DomainWorkRangeWorkAlternate labelsUser tasksRecording methodsDublin Core TermsMARC 21 Bibliographic Recording an unstructured descriptionRecording a structured descriptionRecording an identifierRecording an IRI For the inverse of this element, see Work: adapted as work For broader elements, see Work: based on workFor narrower elements, see

我堅持在文本之間添加空格。 例如。 通過section / div / p時,它將所有p加在一起而沒有任何空格,如下所示:

IRIhttp://rdaregistry.info/Elements/w/P10142DomainWorkRangeWorkAlternate

應該輸出為(預期輸出):

IRI http://rdaregistry.info/Elements/w/P10142 Domain Work 

如何將這些值分開? 任何幫助表示贊賞。

我相信,該depthFirst().findAll { it.name() == 'section'}返回一個數組列表,該元素是p標簽內部文本的組合。

讓我們將示例XML定義為xmlDoc 以下是按預期工作的代碼片段:

def parseBodyObject = new XmlSlurper().parseText(xmlDoc)
def findAllPtags = parseBodyObject.children().depthFirst().findAll { 
   it.name() == 'p'
}
def docText = new StringBuilder()
findAllPtags.each { p ->
   docText.append("\n" + p)
}

您可以用空格替換\\n

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM