从 XML 中删除所有出现的特定属性

Question

I have a XML file with content like我有一个 XML 文件，其内容如下

<document> <section> <section SectionName="abstract"> <paragraph> <word Endpoint="1" SciomeSRIE_Sentence.ExposureSentence="1">gutkha</word> <word ExposureSentence="1">split_identifier,</word> <word ExposureSentence="1">and</word> <word ExposureSentence="1">what</word> <word ExposureSentence="1">role</word> <word ExposureSentence="1">split_identifier,</word> <word ExposureSentence="1">if</word> <word ExposureSentence="1">any</word> <word ExposureSentence="1">split_identifier,</word> <word ExposureSentence="1">nicotine</word> <word ExposureSentence="1">contributes</word> <word ExposureSentence="1">to</word> <word ExposureSentence="1">the</word> <word ExposureSentence="1">effects</word> <word ExposureSentence="1">split_identifier.</word> <word EB_NLP_Tagger.Participant="3" AnimalGroupSentence="1" DoseGroupSentence="1" ExposureSentence="2">Adult</word> <word EB_NLP_Tagger.Participant="3" Sex="1" AnimalGroupSentence="1" DoseGroupSentence="1" ExposureSentence="2">male</word> <word EB_NLP_Tagger.Participant="3" Species="1" AnimalGroupSentence="1" DoseGroupSentence="1" ExposureSentence="2">mice</word> <word AnimalGroupSentence="1" DoseGroupSentence="1" ExposureSentence="2">were</word> <word AnimalGroupSentence="1" DoseGroupSentence="1" ExposureSentence="2">treated</word> <word AnimalGroupSentence="1" DoseGroupSentence="1" ExposureSentence="2">daily</word> <word AnimalGroupSentence="1" DoseGroupSentence="1" ExposureSentence="2">for</word>

I want to remove all occurences of "ExposureSentence" attribute.我想删除所有出现的“ExposureSentence”属性。 Output would be Output 将是

 <word Endpoint="1" SciomeSRIE_Sentence.ExposureSentence="1">gutkha</word> <word >split_identifier,</word> <word >and</word> <word >what</word> <word >role</word> <word >split_identifier,</word> <word >if</word> <word >any</word> <word >split_identifier,</word> <word >nicotine</word> <word >contributes</word> <word >to</word> <word >the</word> <word >effects</word> <word >split_identifier.</word> <word EB_NLP_Tagger.Participant="3" AnimalGroupSentence="1" DoseGroupSentence="1" >Adult</word> <word EB_NLP_Tagger.Participant="3" Sex="1" AnimalGroupSentence="1" DoseGroupSentence="1" >male</word> <word EB_NLP_Tagger.Participant="3" Species="1" AnimalGroupSentence="1" DoseGroupSentence="1" >mice</word> <word AnimalGroupSentence="1" DoseGroupSentence="1" >were</word> <word AnimalGroupSentence="1" DoseGroupSentence="1" >treated</word> <word AnimalGroupSentence="1" DoseGroupSentence="1" >daily</word> <word AnimalGroupSentence="1" DoseGroupSentence="1" >for</word>

I tried following, but not sure how to proceed futher.我尝试了以下操作，但不确定如何继续进行。

        DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
        DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
        Document doc = dBuilder.parse(new ByteArrayInputStream(xml.getBytes()));
        NodeList sectionNodeList = doc.getElementsByTagName("section");
        for (int i = 0; i < sectionNodeList.getLength(); i++)
        {
            Node sectionNode = sectionNodeList.item(i);

        }

Answer 1

XPath makes this straightforward: XPath 使这变得简单：

public static void main(String... args)
        throws Exception
{
    DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
    DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
    Document doc = dBuilder.parse(new ByteArrayInputStream(xml.getBytes()));

    XPathFactory xPathfactory = XPathFactory.newInstance();
    XPath xpath = xPathfactory.newXPath();

    // Find word elements with ExposureSentence attribute
    XPathExpression query = xpath.compile("//word[@ExposureSentence]");
    NodeList words = (NodeList) query.evaluate(doc, XPathConstants.NODESET);
    for (int i = 0; i < words.getLength(); i++) {
        // Remove the attribute
        ((Element) words.item(i)).removeAttribute("ExposureSentence");
    }

    // Handle ComponentName
    query = xpath.compile("//ComponentName");
    NodeList componentNames = (NodeList) query.evaluate(doc, XPathConstants.NODESET);
    for (int i = 0; i < componentNames.getLength(); i++) {
        String content = componentNames.item(i).getTextContent();
        componentNames.item(i).setTextContent(
            Arrays.stream(content.split(","))
                .map(String::trim)
                .filter(s -> !s.equals("ExposureSentence"))
                .collect(Collectors.joining(", ")));
    }

    // Omitted: Save the XML
}

Answer 2

I think the simplest solution will be to replace all occurrences of ExposureSentence="1" using a simple regex.我认为最简单的解决方案是使用简单的正则表达式替换所有出现的ExposureSentence="1" 。 Read all the xml contents as String and replace all the specific word occurrences where you do not need XML parsing and replacing.将所有 xml 内容读取为字符串，并替换所有不需要 XML 解析和替换的特定单词出现。

In case of XML parsing, you have parse, manipulate the logic and you have to rebuild XML infoset.在 XML 解析的情况下，您已经解析、操作逻辑并且您必须重建 XML 信息集。

从 XML 中删除所有出现的特定属性

问题描述

2 个解决方案

解决方案1
2 已采纳 2019-10-09 15:33:34

解决方案2
-1 2019-10-09 15:04:22

从 XML 中删除所有出现的特定属性

问题描述

2 个解决方案

解决方案1 2 已采纳 2019-10-09 15:33:34

解决方案2 -1 2019-10-09 15:04:22

解决方案1
2 已采纳 2019-10-09 15:33:34

解决方案2
-1 2019-10-09 15:04:22