简体   繁体   English

如何对XMLStreamReader有所了解?

[英]How to get look ahead for XMLStreamReader?

I can not find any peek or unread function in the XMLStreamReader documentation. 我在XMLStreamReader文档中找不到任何peek或未unread函数。 What is the preferred way to get at least one token look ahead in order to parse a list of child elements as in the the HTML list for example? 例如,像HTML列表中那样,至少需要一个令牌来解析子元素列表的首选方法是什么?

<ul>
  <li>
  <li>
</ul>

When I create a recursive decent parser with parse functions for ul and li the li parse function has to terminate when it finds the closing tag of ul , but it must not consume it, because the ul parse function needs it to succeed. 当我使用ulli的解析函数创建一个递归体面的解析器时, li解析函数必须在找到ul的结束标记时终止,但是一定不能消耗它,因为ul解析函数需要它成功。

I am used to solve such problems with peek or unread but they seem to be missing. 我习惯于通过peek或未unread来解决此类问题,但它们似乎丢失了。 What is the preferred Java way to solve this problem? 解决此问题的首选Java方法是什么?

Update : I implemented the parser without look ahead using the XMLStreamReader. 更新 :我没有使用XMLStreamReader来实现解析器。

There's a common way of implementing recursive parsers that avoids the need for unread or peek , by pre-reading the next token, storing it, and testing against that: 有一种实现递归解析器的通用方法,它可以通过预读下一个令牌,存储它并进行测试来避免对unreadpeek的需求:

  • when you read in a token, you store it in a (global) variable. 当您读入令牌时,会将其存储在(全局)变量中。
  • then you just test against it with all the tokens you're looking for (eg <li> and </ul> ) 然后只需使用您要查找的所有令牌对其进行测试(例如<li></ul>
  • when you've found the right one, you call the method that handles that (or continue) 找到正确的方法后,您将调用处理该方法的方法(或继续执行)
  • (which reads in the next token, having "consumed" the matching one) (读入下一个令牌,“消耗”了匹配的令牌)

In effect, you have already peeked ahead. 实际上,您已经向前看了。

The 1st ed of the Dragon compiler book has a good example of this, in their early overview chapter, in C (they use Java in the 2nd ed, but it's unnecessarily overblown, IMHO - the C style works fine in Java). Dragon编译器手册的第一版在其早期概述章节的C语言中提供了一个很好的示例(它们在第二版中使用Java,但是恕我直言,IMHO – C样式在Java中很好用)。

I'll try to extract an example from my own source code, but my code is separated into a library layer with methods for handling the easier to use. 我将尝试从自己的源代码中提取一个示例,但是我的代码被分成具有处理更易于使用的方法的库层。 I'll try to combine them to make a clear example, but it probably won't run standalone. 我将尝试将它们组合成一个清晰的示例,但它可能不会独立运行。 Think of it as pseudo-code, to illustrate the idea, and you'll need to fill in the gaps. 为了说明这一点,可以将其视为伪代码,您需要填补空白。

XMLStreamReader in; 
int token;
String localname;

public void parse() {
  next();
  if (token==START_ELEMENT && localname.equals("ul")) ul();
}

void ul() {
  next();          // assume we are called when a <ul> is seen, so we consume it
  while (true) {   // loops for list
    if (token==START_ELEMENT && localname.equals("li")) li();  // ifs for choice 
    else if (token==START_ELEMENT && localname.equals("sometag")) sometag();
    else break;
  }
  if (token==END_ELEMENT && localname.equals("ul")) next();
  else throw new RuntimeException("expected </ul>");
       // <li> or <sometag> would also be acceptable
}

void li() {
  next();
  ...
}

void next() {
  token = in.next();         // consume the token means to set up the next one
  localname = in.getLocalName();
}

I found it much easier to use if you create a layer-library to handle the repetitive stuff, eg I have: 我发现如果您创建一个层库来处理重复的内容,它会更容易使用,例如,我有:

  • boolean startTag(String name) just returns true if it matches boolean startTag(String name)如果匹配则返回true
  • void requireStartTag(String name) consumes if match, else throws exception 如果匹配,则void requireStartTag(String name)消耗,否则抛出异常

But I think the example is clearer keeping it all literal. 但是我认为这个例子更加清晰。

And there's other issues like skipping non-element tokens (like comments, PIs etc); 还有其他问题,例如跳过非元素令牌(例如注释,PI等); tracking which line you're on for more helpful exceptions etc. 跟踪您所在的行以获取更多有用的例外等信息。

There seems to be no straighforward way of doing this. 似乎没有做到这一点的直接方法。 Could you perhaps use the XMLEventReader to accomplish the same functionality? 您是否可以使用XMLEventReader来完成相同的功能?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM