简体   繁体   English

有没有更简单的方法来解析Java中的XML?

[英]Is there an easier way to parse XML in Java?

I'm trying to figure out how to parse some XML (for an Android app), and it seems pretty ridiculous how difficult it is to do in Java. 我正在试图弄清楚如何解析一些XML(对于一个Android应用程序),看起来很荒谬,在Java中做起来有多么困难。 It seems like it requires creating an XML handler which has various callbacks (startElement, endElement, and so on), and you have to then take care of changing all this data into objects. 看起来它需要创建一个具有各种回调(startElement,endElement等)的XML处理程序,然后您必须将所有这些数据更改为对象。 Something like this tutorial . 这个教程的东西。

All I really need is to change an XML document into a multidimensional array, and even better would be to have some sort of Hpricot processor. 我真正需要的是将XML文档更改为多维数组,更好的方法是使用某种Hpricot处理器。 Is there any way to do this, or do I really have to write all the extra code in the example above? 有没有办法做到这一点,或者我真的必须在上面的例子中写下所有额外的代码?

There are two different types of processors for XML in Java (3 actually, but one is weird). Java中有两种不同类型的XML处理器(实际上有3种,但有一种很奇怪)。 What you have is a SAX parser and what you want is a DOM parser. 你有一个SAX解析器,你想要的是一个DOM解析器。 Take a look at http://www.mkyong.com/java/how-to-read-xml-file-in-java-dom-parser / for how to use the DOM parser. 请查看http://www.mkyong.com/java/how-to-read-xml-file-in-java-dom-parser /了解如何使用DOM解析器。 DOM will create a tree which you can navigate pretty easily. DOM将创建一个您可以轻松导航的树。 SAX is best for large documents but DOM is much easier if slower and much more memory intensive. SAX最适合大型文档,但如果速度较慢且内存密集程度较高,则DOM更容易。

试试http://simple.sourceforge.net ,它是一个XML到Java的序列化和绑定框架,它与Android完全兼容,并且非常轻量级,270K并且没有依赖性。

Check this article for ways to handle XML on Android. 查看本文,了解在Android上处理XML的方法。 Maybe the DOM or XML Pull style fit your style better 也许DOM或XML Pull风格更适合您的风格

Working with XML on Android 在Android上使用XML

Kyle, 凯尔,

(Please excuse the self-promotey nature of this post... I've been working on this library for months and it's all open source/Apache 2, so not that self-serving, just trying to help). (请原谅这篇文章的自我推销性质...我已经在这个库上工作了好几个月,它都是开源/ Apache 2,所以不是那种自私,只是想帮助)。

I just released a library I'm calling SJXP or "Simple Java XML Parser" http://www.thebuzzmedia.com/software/simple-java-xml-parser-sjxp/ 我刚刚发布了一个我正在调用SJXP或“Simple Java XML Parser”的库http://www.thebuzzmedia.com/software/simple-java-xml-parser-sjxp/

It is a very small/tight (4 classes) abstraction layer that sits on top of any spec-compliant XML Pull Parser. 它是一个非常小/紧(4类)的抽象层,位于任何符合规范的XML Pull Parser之上。

On Android and non-Android Java platforms, pull parsing is probably one of the most performant (both in speed and low memory overhead) methods of parsing. 在Android和非Android Java平台上,pull解析可能是解析方法中性能最高(速度和内存开销都很低)的方法之一。 Unfortunately coding directly against a pull-parser ends up looking a lot like any other XML parsing code (eg SAX) -- you have exception handlers, maintaining parser state, error checking, event handling, value parsing, etc. 不幸的是,直接针对pull-parser进行编码看起来很像任何其他XML解析代码(例如SAX) - 你有异常处理程序,维护解析器状态,错误检查,事件处理,值解析等。

What SJXP does is allows you to define XPath-like "paths" in a document of the elements or attributes you want the values from, like: SJXP的功能是允许您在要从中获取值的元素或属性的文档中定义类似XPath的“路径”,例如:

/rss/channel/title / RSS /信道/标题

and it will invoke your callback, with the value, when that rule matches. 当规则匹配时,它将使用值调用您的回调。 The API is really straight forward and has intuitive support for namespace-qualified elements if that is what you are trying to parse. API非常简单,如果您正在尝试解析,则可以直观地支持名称空间限定的元素。

The code for a standard parser would look something like this (an example that parses an RSS2 feed title): 标准解析器的代码看起来像这样(解析RSS2提要标题的示例):

IRule titleRule = new DefaultRule(Type.CHARACTER, "/rss/channel/title") {
@Override
public void handleParsedCharacters(XMLParser parser, String text) {
    // Store the title in a DB or something fancy
}}

then you just create an XMLParser instance and give it all the rules you want it to care about: 然后,您只需创建一个XMLParser实例,并为其提供您希望它关注的所有规则:

XMLParser parser = new XMLParser(titleRule);
parser.parse(xmlStream);

And that's it, the parser will invoke the handler method every time the rule matches. 就是这样,解析器将在每次规则匹配时调用处理程序方法。 You can stop parsing at any time by calling parser.stop() if you want. 如果需要,可以随时调用parser.stop()来停止解析。

Additionally (and this is the real win of this library) matching namespace qualified elements and attributes is dead easy, you just add their namespace URI inside of brackets prefixing the name of the element in your path. 另外(这是这个库的真正胜利)匹配的命名空间限定元素和属性很容易,您只需在括号内添加其名称空间URI,在前面添加路径中元素的名称。

An example, say you want out of the element for an RSS feed so you can tell what language it is in (ref: http://web.resource.org/rss/1.0/modules/dc/ ). 举个例子,假设你想要一个RSS feed的元素,这样你就可以知道它是什么语言(参考: http//web.resource.org/rss/1.0/modules/dc/ )。 You just use the unique namespace URI for that 'language' element with the 'dc' prefix, and the rule path ends up looking like this: 您只需使用带有'dc'前缀的'language'元素的唯一命名空间URI,规则路径最终如下所示:

/rss/channel/[http://purl.org/dc/elements/1.1/]language /rss/channel/[http://purl.org/dc/elements/1.1/]language

The same goes for namespace-qualified attributes as well. 对于名称空间限定的属性也是如此。

With all that ease, the only overhead you add to the parsing process is an O(1) hash lookup at each location of the XML document and a few-hundred bytes, maybe 1k, for the internal location state of the parser. 尽管如此,您在解析过程中添加的唯一开销是在XML文档的每个位置进行O(1)哈希查找,并为解析器的内部位置状态提供几百字节(可能是1k)。

The library works on Android with no additional dependencies (because the platform provides an org.xmlpull impl already) and in any other Java runtime by adding the XPP3 dependency. 该库在Android上运行,没有其他依赖项(因为该平台已经提供了org.xmlpull impl),并且在任何其他Java运行时通过添加XPP3依赖项。

This library is the result of many months of writing custom pull parsers for every kind of feed XML out there in every language and realizing (over time) that about 90% of parsing can be distilled down into this really basic paradigm. 这个库是几个月来为每种语言编写各种feed XML的自定义pull解析器的结果,并且实现(随着时间的推移)大约90%的解析可以被提炼成这个真正基本的范例。

I hope you find it handy. 我希望你觉得它很方便。

Starting w/ Java 5, there is an XPath library in the SDK. 从Java 5开始,SDK中有一个XPath库。 See this tutorial for an introduction to it. 有关的介绍,请参阅本教程

Acording to me, you should use SAX parser because: - Fast - you can control everything in XML document 根据我的说法,您应该使用SAX解析器,因为: - 快 - 您可以控制XML文档中的所有内容

You will pay more time to coding, but it's once because you will create code template to parse XML 您将花费更多时间进行编码,但这只是因为您将创建代码模板来解析XML

From second case, you only edit content of changes. 从第二种情况来看,您只需编辑更改内容。

Good luck! 祝好运!

In my opinion, using XPath for parsing XML may be your easiest coding approach. 在我看来,使用XPath解析XML可能是最简单的编码方法。 You can embody the logic for pulling out nodes from an XML document in a single expression, rather than having to write the code to traverse the document's object graph. 您可以在单个表达式中体现从XML文档中提取节点的逻辑,而不必编写代码来遍历文档的对象图。

I note that another posted answer to this question already suggested using XPath. 我注意到这个问题的另一个已发布的答案已经建议使用XPath。 But not yet for your Android project . 但还没有适合您的Android项目 As of right now, the XPath parsing class is not yet supported in any Android release (even though the javax.xml namespace is defined in the Dalvik JVM, which could fool you, as it did me at first). 截至目前, 任何Android版本都不支持XPath解析类 (即使在Dalvik JVM中定义了javax.xml命名空间,这可能会欺骗你,就像我最初做的那样)。

Inclusion of XPath class in Android is a current work item in late phase. 在Android中包含XPath类是后期的当前工作项。 (It is being tested and debugged by Google as I write this). (正如我写的那样,它正在由Google测试和调试)。 You can track the status of adding XPath to Davlik here : http://code.google.com/p/android/issues/detail?id=515 您可以在此处跟踪将XPath添加到Davlik的状态http//code.google.com/p/android/issues/detail? id = 515

(It's an annoyance that you cannot assume things supported in most Java VMs are included yet in the Android Dalvik VM.) (令人不安的是,你不能认为大多数Java VM支持的东西都包含在Android Dalvik VM中。)

Another option, while waiting for official Google support, is JDOM , which presently claims Dalvik VM compatibility and also XPath support (in beta). 在等待官方Google支持的另一个选择是JDOM ,它目前声称Dalvik VM兼容性和XPath支持(测试版)。 (I have not checked this out; I'm just repeating current claims from their web site.) (我没有检查过这个;我只是重复他们网站上的当前声明。)

You can try this 你可以试试这个
http://xml.jcabi.com/ http://xml.jcabi.com/
It is is an extra layer on top of DOM that allows simple parsing, printing, and transforming of XML documents and nodes 它是DOM上的一个额外层,允许对XML文档和节点进行简单的解析,打印和转换

I've created a really simple API to solve precisely this problem. 我已经创建了一个非常简单的API来解决这个问题。 It's just a single class that you can include in your code base and it's really clean and easy to parse any XML. 它只是一个可以包含在代码库中的类,它非常简洁,易于解析任何XML。 You can find it here: 你可以在这里找到它:

http://argonrain.wordpress.com/2009/10/27/000/ http://argonrain.wordpress.com/2009/10/27/000/

There is a very good example shows for XmlPullParser for any type of xml. 对于任何类型的xml,XmlPullParser都有一个非常好的示例显示。 It could also parse as a generic way, you do not need to change any thing for that just get that class and put into your android project. 它也可以作为一种通用的方式解析,你不需要改变任何东西,只需获得该类并放入你的android项目。

Generic XmlPullParser 通用XmlPullParser

You could also use Castor to map the XML to Java beans. 您还可以使用Castor将XML映射到Java bean。 I have used it before and it works like a charm. 我以前用它,它就像一个魅力。

Writing SAX handler is the best way to go. 编写SAX handler是最好的方法。 And once you do that you will never go back to anything else. 一旦你这样做,你将永远不会回到别的什么。 It's fast, simple and it crunches away as it goes, no sucking large parts or god forbid a whole DOM into memory. 它快速,简单,随着时间的推移逐渐消失,没有吸吮大部分或上帝禁止整个DOM进入记忆。

A couple of weeks ago I battered out a small library (a wrapper around javax.xml.stream.XMLEventReader ) allowing one to parse XML in a similar fashion to a hand-written recursive descent parser. 几个星期前,我打破了一个小型库( javax.xml.stream.XMLEventReader的包装器),允许人们以类似于手写递归下降解析器的方式解析XML。 The source is available on github , and a simple usage example is below. 源代码在github可用 ,下面是一个简单的用法示例。 Unfortunately Android doesn't support this API but it is very similar to the XmlPullParser API, which is supported, and porting wouldn't be too time-consuming. 不幸的是,Android不支持此API,但它与支持的XmlPullParser API非常相似,并且移植不会太耗时。

accept("tilesets");
    while (atTag("tileset")) {
        String filename = attrib("file");
        File tilesetFile = new File(filename);
        if (!tilesetFile.isAbsolute()) {
            tilesetFile = new File(FilenameUtils.concat(file.getParent(), filename));
        }
        int tilesize = Integer.valueOf(attrib("tilesize"));
        Tileset t = new Tileset(tilesetFile, tilesize);
        t.setID(attrib("id"));
        tilesets.add(t);

        accept();
        close();
    }
close();

expect("map");

int width       = Integer.valueOf(attrib("width"));
int height      = Integer.valueOf(attrib("height"));
int tilesize    = Integer.valueOf(attrib("tilesize"));

Well parsing XML is not an easy task. 解析XML并不是一件容易的事。

Its basic structure is a tree with any node in tree capable of holding a container which consists of an array of more trees. 它的基本结构是树,树中的任何节点都能够容纳一个由多个树组成的容器。

Each node in a tree contains a tag and a value but in addtion can contain an arbitary number of named attributes, and, an arbitary number of children or containers. 树中的每个节点都包含一个标记和一个值,但另外还可以包含一个任意数量的命名属性,以及一个任意数量的子节点或容器。

XML parsing tasks tend to fall in to three catagories. XML解析任务往往属于三个类别。

Things that can be done with "regex". 可以用“正则表达式”完成的事情。 Eg you want to find the value of the first "MailTo" tag and are not interested in the contents of any other tags. 例如,您想要找到第一个“MailTo”标记的值,并且对任何其他标记的内容不感兴趣。

Things you can parse yourself. 你可以解析自己的事情。 The xml structure is always very simple eg a root node and ten well known tags with simple values. xml结构总是非常简单,例如根节点和十个众所周知的具有简单值的标签。

All the rest! 其他的! Even though an xml message format can look deceptively simple home made parsers are easily confused by extra attributes, CDATA and unexpected children. 尽管xml消息格式看起来很简单,但自制的解析器很容易被额外的属性,CDATA和意外的子节点混淆。 Full blown XML parsers can handle all of these situations. 完整的XML解析器可以处理所有这些情况。 Here the basic choice is between a stream or a DOM parser. 这里的基本选择是在流或DOM解析器之间。 If you intend to use most of the entities/attributes given in the order you want to use them then a DOM parser is ideal. 如果您打算使用您想要使用它们的顺序中给出的大多数实体/属性,那么DOM解析器是理想的。 If you are only interested in a few attributes and intend to use them in the order they are presented, if you have performance constraints, or, if the xml files are large ( > 500MB ) than a stream parser is the way to go; 如果您只对一些属性感兴趣并打算按照它们的显示顺序使用它们,那么如果您有性能限制,或者,如果xml文件很大(> 500MB),那么流式解析器就是要走的路; the callback mechanism takes a bit of "groking" but its actually quite simple to program once you get the hang of it. 回调机制需要一些“groking”但实际上很容易编程一旦你得到它的挂起。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM