在使用SAX进行解析时，如何保留未绑定到对象的XML节点

Question

I am working on an android app which interfaces with a bluetooth camera. 我正在开发一个与蓝牙相机接口的Android应用程序。 For each clip stored on the camera we store some fields about the clip (some of which the user can change) in an XML file. 对于存储在相机上的每个剪辑，我们在XML文件中存储关于剪辑的一些字段（其中一些用户可以更改）。

Currently this app is the only app writing this xml data to the device but in the future it is possible a desktop app or an iphone app may write data here too. 目前这个应用程序是唯一一个将此xml数据写入设备的应用程序，但将来有可能桌面应用程序或iphone应用程序也可能在此处写入数据。 I don't want to make an assumption that another app couldn't have additional fields as well (especially if they had a newer version of the app which added new fields this version didn't support yet). 我不想假设另一个应用程序也没有其他字段（特别是如果他们有一个新版本的应用程序添加了这个版本尚不支持的新字段）。

So what I want to prevent is a situation where we add new fields to this XML file in another application, and then the user goes to use the android app and its wipes out those other fields because it doesn't know about them. 所以我想要防止的是我们在另一个应用程序中向这个XML文件添加新字段的情况，然后用户开始使用android应用程序并清除其他字段，因为它不知道它们。

So lets take hypothetical example: 让我们假设一个例子：

<data>
  <title>My Title</title>
  <date>12/24/2012</date>
  <category>Blah</category>
</data>

When read from the device this would get translated to a Clip object that looks like this (simplified for brevity) 当从设备读取时，这将被转换为看起来像这样的Clip对象（简化为简洁起见）

public class Clip {
  public String title, category;
  public Date date;
}

So I'm using SAX to parse the data and store it to a Clip. 所以我使用SAX来解析数据并将其存储到Clip中。 I simply store the characters in StringBuilder and write them out when I reach the end element for title,category and date. 我只是将字符存储在StringBuilder中，当我到达标题，类别和日期的结束元素时将它们写出来。

I realized though that when I write this data back to the device, if there were any other tags in the original document they would not get written because I only write out the fields I know about. 我意识到，当我将这些数据写回设备时，如果原始文档中还有其他标签，则它们不会被写入，因为我只写出我所知道的字段。

This makes me think that maybe SAX is the wrong option and perhaps I should use DOM or something else where I could more easily write out any other elements that existed originally. 这让我觉得SAX可能是错误的选择，也许我应该使用DOM或其他东西，我可以更容易地写出最初存在的任何其他元素。

Alternatively I was thinking maybe my Clip class contains an ArrayList of some generic XML type (maybe DOM), and in startTag I check if the element is not one of the predefined tags, and if so, until I reach the end of that tag I store the whole structure (but in what?).. Then upon writing back out I would just go through all of the additional tags and write them out to the xml file (along with the fields I know about of course) 或者我想也许我的Clip类包含一些通用XML类型的ArrayList（也许是DOM），并且在startTag中我检查该元素是否不是预定义标签之一，如果是，直到我到达该标签的末尾我存储整个结构（但在什么？）..然后在写回来时，我将浏览所有其他标签并将它们写出到xml文件（以及我当然知道的字段）

Is this a common problem with a good known solution? 这是一个众所周知的解决方案的常见问题吗？

-- Update 5/22/12 -- - 更新5/22/12 -

I didn't mention that in the actual xml the root node (Actually called annotation), we use a version number which has been set to 1. What I'm going to do for the short term is require that the version number my app supports is >= what the version number is of the xml data. 我没有在实际的xml中提到根节点（实际上称为注释），我们使用的版本号已设置为1.我将要做的短期内要求我的应用程序的版本号支持是> = xml数据的版本号是什么。 If the xml is a greater number I will attempt to parse for reading back but will deny any saves to the model. 如果xml是一个更大的数字，我将尝试解析回读，但将拒绝对模型的任何保存。 I'm still interested in any kind of working example though on how to do this. 关于如何做到这一点，我仍然对任何工作实例感兴趣。

BTW I thought of another solution that should be pretty easy. 顺便说一下，我想到了另一个应该非常简单的解决方案。 I figure I can use XPATH to find nodes that I know about and replace the content for those nodes when the data is updated. 我想我可以使用XPATH查找我知道的节点，并在更新数据时替换这些节点的内容。 However I ran some benchmarks and the overhead is absurd in parsing the xml when it is parsed into memory. 但是我运行了一些基准测试，当解析xml到内存中时，开销是荒谬的。 Just the parsing operation without even doing any lookups resulted in performance being 20 times worse than SAX.. Using xpath was between 30-50 times slower in general for parsing, which was really bad considering I parse these in a list view. 只是解析操作甚至没有进行任何查找导致性能比SAX差20倍。使用xpath一般来说解析速度慢30-50倍，考虑到我在列表视图中解析它们，这真的很糟糕。 So my idea is to keep the SAX to parse the nodes to clips, but store the entirety of the XML in an variable of the Clip class (remember, this xml is short, less than 2kb). 所以我的想法是让SAX将节点解析为剪辑，但是将整个XML存储在Clip类的变量中（记住，这个xml很短，小于2kb）。 Then when I go to write the data back out I could use XPATH to replace out the nodes that I know about in the original XML. 然后，当我将数据写回来时，我可以使用XPATH来替换原始XML中我知道的节点。

Still interested in any other solutions though. 仍然对任何其他解决方案感兴趣。 I probably won't accept a solution though unless it includes some code examples. 除非它包含一些代码示例，否则我可能不会接受解决方案。

Answer 1

You're right to say that SAX is probably not the best option if you want to keep the nodes that you've not "consumed". 你说如果你想保留你没有“消耗”的节点，SAX可能不是最好的选择。 You could still do it using some kind of "sax store" that would keep the SAX events and replay them (there are some few implementations of such a thing around), but an object model based API would be much easier to use: you'd easily keep the complete object model and just update "your" nodes. 您仍然可以使用某种“sax存储”来保存SAX事件并重放它们（这些事情有一些实现），但基于对象模型的API将更容易使用：你' d轻松保留完整的对象模型，只需更新“您的”节点。

Of course, you can use DOM which is the standard, but you may also want to consider alternatives which provide an easier access to the specific nodes that you'll be using in an arbitrary data model. 当然，你可以使用DOM这是标准的，但你也可能要考虑其提供给您将使用在任意的数据模型的特定节点的更容易获得的替代品。 Among them, JDOM ( http://www.jdom.org/ ) and XOM ( http://www.xom.nu/ ) are interesting candidates. 其中，JDOM（ http://www.jdom.org/ ）和XOM（ http://www.xom.nu/ ）是有趣的候选人。

Answer 2

Here's how you can go about it with SAX filters : 以下是使用SAX过滤器的方法：

When you read your document with SAX you record all the events. 当您使用SAX阅读文档时，您将记录所有事件。 You record them and bubble them up further to the next level of SAX reader. 你录制它们并将它们冒泡到下一级SAX阅读器。 You basically stack together two layers of SAX readers (with XMLFilter ) - one will record and relay, and the other one is your current SAX handler that creates objects. 您基本上将两层SAX读取器（使用XMLFilter ）堆叠在一起 - 一个将记录和中继，另一个是您当前创建对象的SAX处理程序。
When you're ready to write your modifications back to disk you fire up the recorded SAX events layered with your writer that would overwrite those values/nodes you have altered. 当您准备将修改写回磁盘时，您将启动与您的编写器分层的已记录的SAX事件，这些事件将覆盖您已更改的那些值/节点。

I spent some time with the idea and it worked. 我花了一些时间来完成这个想法并且它有效。 It basically came down to proper chaining of XMLFilter s. 它基本上归结为XMLFilter的正确链接。 Here's how the unit test looks like, your code would do something similar: 这是单元测试的样子，你的代码会做类似的事情：

final SAXParserFactory factory = SAXParserFactory.newInstance();
final SAXParser parser = factory.newSAXParser();

final RecorderProxy recorder = new RecorderProxy(parser.getXMLReader());
final ClipHolder clipHolder = new ClipHolder(recorder);

clipHolder.parse(new InputSource(new StringReader(srcXml)));

assertTrue(recorder.hasRecordingToReplay());

final Clip clip = clipHolder.getClip();
assertNotNull(clip);
assertEquals(clip.title, "My Title");
assertEquals(clip.category, "Blah!");
assertEquals(clip.date, Clip.DATE_FORMAT.parse("12/24/2012"));

clip.title = "My Title Updated";
clip.category = "Something else";

final ClipSerializer serializer = new ClipSerializer(recorder);
serializer.setClip(clip);

final TransformerFactory xsltFactory = TransformerFactory.newInstance();
final Transformer t = xsltFactory.newTransformer();
final StringWriter outXmlBuffer = new StringWriter();

t.transform(new SAXSource(serializer, 
            new InputSource()), new StreamResult(outXmlBuffer));

assertEquals(targetXml, outXmlBuffer.getBuffer().toString());

The important lines are: 重要的是：

your SAX events recorder is wrapped around the SAX parser 您的SAX事件记录器包含在SAX解析器中
your Clip parser ( ClipHolder ) is wrapped around the recorder 你的Clip解析器（ ClipHolder ）缠绕在录音机上
when the XML is parsed, recorder will record everything and your ClipHolder will only look at what it knows about 解析XML时，记录器将记录所有内容，而ClipHolder只会查看它所知道的内容
you then do whatever you need to do with the clip object 然后，您可以对clip对象执行任何操作
the serializer is then wrapped around the recorder (basically re-mapping it onto itself) 然后将序列化器包裹在记录器周围（基本上将其重新映射到自身）
you then work with the serializer and it will take care of feeding the recorded events (delegating to the parent and registering self as a ContentHandler ) overlayed with what it has to say about the clip object. 然后，您将使用序列化程序，它将处理记录的事件（委托给父项并将self注册为ContentHandler ），并将其与clip对象的内容重叠。

Please find the DVR code and the Clip test over at github . 请在github上找到DVR代码和Clip测试。 I hope it helps. 我希望它有所帮助。

ps it's not a generic solution and the whole record->replay+overlay concept is very rudimentary in the provided implementation. ps它不是一个通用的解决方案，整个记录 - >重放+覆盖概念在提供的实现中非常简陋。 An illustration basically. 基本上是插图。 If your XML is more complex and gets "hairy" (eg same element names on different levels, etc.) then the logic will need to be augmented. 如果您的XML更复杂并且变得“毛茸茸”（例如，不同级别上的相同元素名称等），那么逻辑将需要被扩充。 The concept will remain the same though. 但这个概念仍将保持不变。

Answer 3

If you're not bound to a specific xml schema, you should consider doing something like this: 如果您没有绑定到特定的xml架构，则应考虑执行以下操作：

<data>
    <element id="title">
        myTitle
    </element>
    <element id="date">
         18/05/2012
    </element>
    ...
</data>

and then store all those elements in a single ArrayList. 然后将所有这些元素存储在单个ArrayList中。 In this way you wouldn't lose infos, and you still have the possibility of chosing what element you want to show-edit-etc... 通过这种方式你不会丢失信息，你仍然有可能选择你想要显示的元素 - 编辑等...

Answer 4

Your assumption on XPath being 20x slower than SAX parsing is flawed... SAX parsing is just a low level tokenizer on which your processing logic would be built... and your processing logic would require additional parsing... XPath's performance has a lot to be with the implementation... As far as I know, vtd-xml's XPath is at least an order of magnitude faster than DOM in general, and is far better suited for heavy duty XML Processing... below are a few links to further references... 你在XPath上比SAX解析慢20倍的假设是有缺陷的...... SAX解析只是一个低级的tokenizer，你的处理逻辑就会在其上构建......你的处理逻辑需要额外的解析...... XPath的性能有很多和实现一样...据我所知，vtd-xml的XPath至少比DOM快一个数量级，并且更适合于重型XML处理...下面是一些链接到进一步参考......

http://sdiwc.us/digitlib/journal_paper.php?paper=00000582.pdf http://sdiwc.us/digitlib/journal_paper.php?paper=00000582.pdf

Android - XPath evaluate very slow Android - XPath评估速度很慢

在使用SAX进行解析时，如何保留未绑定到对象的XML节点

问题描述

4 个解决方案

解决方案1
1 2012-05-22 18:13:12

解决方案2
1 已采纳 2012-05-23 21:25:58

解决方案3
0 2012-05-18 08:01:27

解决方案4
0 2016-04-22 06:36:12

在使用SAX进行解析时，如何保留未绑定到对象的XML节点

问题描述

4 个解决方案

解决方案1 1 2012-05-22 18:13:12

解决方案2 1 已采纳 2012-05-23 21:25:58

解决方案3 0 2012-05-18 08:01:27

解决方案4 0 2016-04-22 06:36:12

解决方案1
1 2012-05-22 18:13:12

解决方案2
1 已采纳 2012-05-23 21:25:58

解决方案3
0 2012-05-18 08:01:27

解决方案4
0 2016-04-22 06:36:12