简体   繁体   English

使用sax和java从xml文件中删除多个节点

[英]Remove multiple nodes from an xml file using sax and java

I am new to XML parsing using Java and SAX parser. 我是使用Java和SAX解析器进行XML解析的新手。 I have a really big XML file and because of its size I have been advised to use SAX parser. 我有一个非常大的XML文件,并且由于其大小,建议使用SAX解析器。 I have finished parsing part of my tasks and it works as expected. 我已经完成了部分任务的解析,并且可以正常工作。 Now, there is one task left with XML job: deleting/updating some nodes upon user's request. 现在,XML作业剩下一个任务:根据用户的请求删除/更新某些节点。

I am able to find all tags by their names, change their data attributes, etc. If I am able to do these with SAX, deleting also may be possible. 我可以按名称查找所有标签,更改其data属性等。如果我可以使用SAX进行操作,则也可以删除它们。

Sample XML describes some functionality under some case's. 示例XML描述了某些情况下的某些功能。 User's inputs are the "case"s names ( case1 , case2 ). 用户输入是“案例”人名( case1case2 )。

<ruleset>
    <rule id="1">
        <condition>
            <case1>somefunctionality</case1>
            <allow>true</allow>
        </condition>
    </rule>
    <rule id="2">
        <condition>
            <case2>somefunctionality</case2>
            <allow>false</allow>
        </condition>
    </rule>
</ruleset>

If user wants to delete one of these cases (for example case1 ) not just case1 tag, the complete rule tag must be deleted. 如果用户想删除其中一种情况(例如case1 )而不仅仅是case1标签,则必须删除完整的rule标签。 If case1 is to be deleted, XML will become: 如果要删除case1 ,则XML将变为:

<ruleset>
    <rule id="2">
        <condition>
            <case2>somefunctionality</case2>
            <allow>false</allow>
        </condition>
    </rule>
</ruleset>

My question is, can this be done using SAX? 我的问题是,可以使用SAX做到吗? I can't use DOM or any other parser at this point. 我目前无法使用DOM或任何其他解析器。 Only other option is even worse: string search. 只有其他选择更糟糕:字符串搜索。 How can it be done using SaxParser? 如何使用SaxParser完成?

Try as 尝试为

    XMLReader xr = new XMLFilterImpl(XMLReaderFactory.createXMLReader()) {
        private boolean skip;

        @Override
        public void startElement(String uri, String localName, String qName, Attributes atts)
                throws SAXException {
            if (qName.equals("rule")) {
                if (atts.getValue("id").equals("1")) {
                    skip = true;
                } else {
                    super.startElement(uri, localName, qName, atts);
                    skip = false;
                }
            } else {
                if (!skip) {
                    super.startElement(uri, localName, qName, atts);
                }
            }
        }

        public void endElement(String uri, String localName, String qName) throws SAXException {
            if (!skip) {
                super.endElement(uri, localName, qName);
            }
        }

        @Override
        public void characters(char[] ch, int start, int length) throws SAXException {
            if (!skip) {
                super.characters(ch, start, length);
            }
        }
    };
    Source src = new SAXSource(xr, new InputSource("test.xml"));
    Result res = new StreamResult(System.out);
    TransformerFactory.newInstance().newTransformer().transform(src, res);

output 输出

<?xml version="1.0" encoding="UTF-8"?><ruleset>
    <rule id="2">
        <condition>
            <case2>somefunctionality</case2>
            <allow>false</allow>
        </condition>
    </rule>
</ruleset>

What you need to construct is a SAX event buffer. 您需要构造一个SAX事件缓冲区。

when you come accros a <rule> element, you need to save it (or the information required to regenerate it) and all of the other event that occur between it and your the 'case' you want to delete. 当出现一个<rule>元素时,您需要保存它(或重新生成它所需的信息)以及在它与要删除的“ case”之间发生的所有其他事件。

If the 'rule' you have saved is the same as the one that needs to be deleted, just throw out the info and continue. 如果您保存的“规则”与需要删除的规则相同,则将信息扔掉并继续。

If the 'rule' you saved is not the one that needs to be deleted, you should regenerate the sax events that were saved and the continue. 如果您保存的“规则”不是需要删除的规则,则应重新生成已保存的sax事件并继续。

SAX is most commonly used for reading/parsing XML. SAX最常用于读取/解析XML。 But there is an article on how to use SAX to write files. 但是有一篇关于如何使用SAX写入文件的文章。 And it appears that chapter is available online - see: 该章似乎可以在线获得-请参阅:

http://xmlwriter.net/sample_chapters/Professional_XML/31100604.shtml http://xmlwriter.net/sample_chapters/Professional_XML/31100604.shtml

[The article is dated 1999 so it's using an old version of SAX, but the concepts still apply] [本文的日期为1999年,因此使用的是旧版本的SAX,但概念仍然适用]

The basic idea is you create a custom DocumentHandler/ContentHandler. 基本思想是您创建一个自定义DocumentHandler / ContentHandler。 Whenever it receives a SAX event it serializes and writes the event to a stream/file/whatever. 每当收到SAX事件时,它都会序列化并将事件写入流/文件/任何内容。 So you use your input document as a source of sax events and forward these events to the XMLOutputter. 因此,您可以将输入文档用作sax事件的来源,并将这些事件转发到XMLOutputter。

The hard part is getting to the point where your can parse your XML document into a stream of SAX events, drive the XMLOutputter and generate an exact copy of the input file. 困难的部分在于您可以将XML文档解析为SAX事件流,驱动XMLOutputter并生成输入文件的精确副本。 Once you get that working, you can move onto the editing logic where you read your rules and use these to modify the output file. 一旦完成工作,就可以进入编辑逻辑,在其中读取规则并使用它们来修改输出文件。

It's a lot more work than DOM, JDOM, XSLT etc, but it may help in your situation because you never have to store the entire document in memory. 它比DOM,JDOM,XSLT等要耗费更多的工作,但由于您不必将整个文档存储在内存中,因此可能会有所帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM