简体   繁体   English

从 XML 中删除所有出现的特定属性

[英]remove all occurences of a specific attribute from a XML

I have a XML file with content like我有一个 XML 文件,其内容如下

<document> <section> <section SectionName="abstract"> <paragraph> <word Endpoint="1" SciomeSRIE_Sentence.ExposureSentence="1">gutkha</word> <word ExposureSentence="1">split_identifier,</word> <word ExposureSentence="1">and</word> <word ExposureSentence="1">what</word> <word ExposureSentence="1">role</word> <word ExposureSentence="1">split_identifier,</word> <word ExposureSentence="1">if</word> <word ExposureSentence="1">any</word> <word ExposureSentence="1">split_identifier,</word> <word ExposureSentence="1">nicotine</word> <word ExposureSentence="1">contributes</word> <word ExposureSentence="1">to</word> <word ExposureSentence="1">the</word> <word ExposureSentence="1">effects</word> <word ExposureSentence="1">split_identifier.</word> <word EB_NLP_Tagger.Participant="3" AnimalGroupSentence="1" DoseGroupSentence="1" ExposureSentence="2">Adult</word> <word EB_NLP_Tagger.Participant="3" Sex="1" AnimalGroupSentence="1" DoseGroupSentence="1" ExposureSentence="2">male</word> <word EB_NLP_Tagger.Participant="3" Species="1" AnimalGroupSentence="1" DoseGroupSentence="1" ExposureSentence="2">mice</word> <word AnimalGroupSentence="1" DoseGroupSentence="1" ExposureSentence="2">were</word> <word AnimalGroupSentence="1" DoseGroupSentence="1" ExposureSentence="2">treated</word> <word AnimalGroupSentence="1" DoseGroupSentence="1" ExposureSentence="2">daily</word> <word AnimalGroupSentence="1" DoseGroupSentence="1" ExposureSentence="2">for</word>

I want to remove all occurences of "ExposureSentence" attribute.我想删除所有出现的“ExposureSentence”属性。 Output would be Output 将是

 <word Endpoint="1" SciomeSRIE_Sentence.ExposureSentence="1">gutkha</word> <word >split_identifier,</word> <word >and</word> <word >what</word> <word >role</word> <word >split_identifier,</word> <word >if</word> <word >any</word> <word >split_identifier,</word> <word >nicotine</word> <word >contributes</word> <word >to</word> <word >the</word> <word >effects</word> <word >split_identifier.</word> <word EB_NLP_Tagger.Participant="3" AnimalGroupSentence="1" DoseGroupSentence="1" >Adult</word> <word EB_NLP_Tagger.Participant="3" Sex="1" AnimalGroupSentence="1" DoseGroupSentence="1" >male</word> <word EB_NLP_Tagger.Participant="3" Species="1" AnimalGroupSentence="1" DoseGroupSentence="1" >mice</word> <word AnimalGroupSentence="1" DoseGroupSentence="1" >were</word> <word AnimalGroupSentence="1" DoseGroupSentence="1" >treated</word> <word AnimalGroupSentence="1" DoseGroupSentence="1" >daily</word> <word AnimalGroupSentence="1" DoseGroupSentence="1" >for</word>

I tried following, but not sure how to proceed futher.我尝试了以下操作,但不确定如何继续进行。

        DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
        DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
        Document doc = dBuilder.parse(new ByteArrayInputStream(xml.getBytes()));
        NodeList sectionNodeList = doc.getElementsByTagName("section");
        for (int i = 0; i < sectionNodeList.getLength(); i++)
        {
            Node sectionNode = sectionNodeList.item(i);

        }

XPath makes this straightforward: XPath 使这变得简单:

public static void main(String... args)
        throws Exception
{
    DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
    DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
    Document doc = dBuilder.parse(new ByteArrayInputStream(xml.getBytes()));

    XPathFactory xPathfactory = XPathFactory.newInstance();
    XPath xpath = xPathfactory.newXPath();

    // Find word elements with ExposureSentence attribute
    XPathExpression query = xpath.compile("//word[@ExposureSentence]");
    NodeList words = (NodeList) query.evaluate(doc, XPathConstants.NODESET);
    for (int i = 0; i < words.getLength(); i++) {
        // Remove the attribute
        ((Element) words.item(i)).removeAttribute("ExposureSentence");
    }

    // Handle ComponentName
    query = xpath.compile("//ComponentName");
    NodeList componentNames = (NodeList) query.evaluate(doc, XPathConstants.NODESET);
    for (int i = 0; i < componentNames.getLength(); i++) {
        String content = componentNames.item(i).getTextContent();
        componentNames.item(i).setTextContent(
            Arrays.stream(content.split(","))
                .map(String::trim)
                .filter(s -> !s.equals("ExposureSentence"))
                .collect(Collectors.joining(", ")));
    }

    // Omitted: Save the XML
}

I think the simplest solution will be to replace all occurrences of ExposureSentence="1" using a simple regex.我认为最简单的解决方案是使用简单的正则表达式替换所有出现的ExposureSentence="1" Read all the xml contents as String and replace all the specific word occurrences where you do not need XML parsing and replacing.将所有 xml 内容读取为字符串,并替换所有不需要 XML 解析和替换的特定单词出现。

In case of XML parsing, you have parse, manipulate the logic and you have to rebuild XML infoset.在 XML 解析的情况下,您已经解析、操作逻辑并且您必须重建 XML 信息集。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM