[英]How to get look ahead for XMLStreamReader?
I can not find any peek
or unread
function in the XMLStreamReader documentation. 我在XMLStreamReader文档中找不到任何peek
或未unread
函数。 What is the preferred way to get at least one token look ahead in order to parse a list of child elements as in the the HTML list for example? 例如,像HTML列表中那样,至少需要一个令牌来解析子元素列表的首选方法是什么?
<ul>
<li>
<li>
</ul>
When I create a recursive decent parser with parse functions for ul
and li
the li
parse function has to terminate when it finds the closing tag of ul
, but it must not consume it, because the ul
parse function needs it to succeed. 当我使用ul
和li
的解析函数创建一个递归体面的解析器时, li
解析函数必须在找到ul
的结束标记时终止,但是一定不能消耗它,因为ul
解析函数需要它成功。
I am used to solve such problems with peek
or unread
but they seem to be missing. 我习惯于通过peek
或未unread
来解决此类问题,但它们似乎丢失了。 What is the preferred Java way to solve this problem? 解决此问题的首选Java方法是什么?
Update : I implemented the parser without look ahead using the XMLStreamReader. 更新 :我没有使用XMLStreamReader来实现解析器。
There's a common way of implementing recursive parsers that avoids the need for unread
or peek
, by pre-reading the next token, storing it, and testing against that: 有一种实现递归解析器的通用方法,它可以通过预读下一个令牌,存储它并进行测试来避免对unread
或peek
的需求:
<li>
and </ul>
) 然后只需使用您要查找的所有令牌对其进行测试(例如<li>
和</ul>
) In effect, you have already peeked ahead. 实际上,您已经向前看了。
The 1st ed of the Dragon compiler book has a good example of this, in their early overview chapter, in C (they use Java in the 2nd ed, but it's unnecessarily overblown, IMHO - the C style works fine in Java). Dragon编译器手册的第一版在其早期概述章节的C语言中提供了一个很好的示例(它们在第二版中使用Java,但是恕我直言,IMHO – C样式在Java中很好用)。
I'll try to extract an example from my own source code, but my code is separated into a library layer with methods for handling the easier to use. 我将尝试从自己的源代码中提取一个示例,但是我的代码被分成具有处理更易于使用的方法的库层。 I'll try to combine them to make a clear example, but it probably won't run standalone. 我将尝试将它们组合成一个清晰的示例,但它可能不会独立运行。 Think of it as pseudo-code, to illustrate the idea, and you'll need to fill in the gaps. 为了说明这一点,可以将其视为伪代码,您需要填补空白。
XMLStreamReader in;
int token;
String localname;
public void parse() {
next();
if (token==START_ELEMENT && localname.equals("ul")) ul();
}
void ul() {
next(); // assume we are called when a <ul> is seen, so we consume it
while (true) { // loops for list
if (token==START_ELEMENT && localname.equals("li")) li(); // ifs for choice
else if (token==START_ELEMENT && localname.equals("sometag")) sometag();
else break;
}
if (token==END_ELEMENT && localname.equals("ul")) next();
else throw new RuntimeException("expected </ul>");
// <li> or <sometag> would also be acceptable
}
void li() {
next();
...
}
void next() {
token = in.next(); // consume the token means to set up the next one
localname = in.getLocalName();
}
I found it much easier to use if you create a layer-library to handle the repetitive stuff, eg I have: 我发现如果您创建一个层库来处理重复的内容,它会更容易使用,例如,我有:
boolean startTag(String name)
just returns true if it matches boolean startTag(String name)
如果匹配则返回true void requireStartTag(String name)
consumes if match, else throws exception 如果匹配,则void requireStartTag(String name)
消耗,否则抛出异常 But I think the example is clearer keeping it all literal. 但是我认为这个例子更加清晰。
And there's other issues like skipping non-element tokens (like comments, PIs etc); 还有其他问题,例如跳过非元素令牌(例如注释,PI等); tracking which line you're on for more helpful exceptions etc. 跟踪您所在的行以获取更多有用的例外等信息。
There seems to be no straighforward way of doing this. 似乎没有做到这一点的直接方法。 Could you perhaps use the XMLEventReader to accomplish the same functionality? 您是否可以使用XMLEventReader来完成相同的功能?
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.