简体   繁体   中英

How to get look ahead for XMLStreamReader?

I can not find any peek or unread function in the XMLStreamReader documentation. What is the preferred way to get at least one token look ahead in order to parse a list of child elements as in the the HTML list for example?

<ul>
  <li>
  <li>
</ul>

When I create a recursive decent parser with parse functions for ul and li the li parse function has to terminate when it finds the closing tag of ul , but it must not consume it, because the ul parse function needs it to succeed.

I am used to solve such problems with peek or unread but they seem to be missing. What is the preferred Java way to solve this problem?

Update : I implemented the parser without look ahead using the XMLStreamReader.

There's a common way of implementing recursive parsers that avoids the need for unread or peek , by pre-reading the next token, storing it, and testing against that:

  • when you read in a token, you store it in a (global) variable.
  • then you just test against it with all the tokens you're looking for (eg <li> and </ul> )
  • when you've found the right one, you call the method that handles that (or continue)
  • (which reads in the next token, having "consumed" the matching one)

In effect, you have already peeked ahead.

The 1st ed of the Dragon compiler book has a good example of this, in their early overview chapter, in C (they use Java in the 2nd ed, but it's unnecessarily overblown, IMHO - the C style works fine in Java).

I'll try to extract an example from my own source code, but my code is separated into a library layer with methods for handling the easier to use. I'll try to combine them to make a clear example, but it probably won't run standalone. Think of it as pseudo-code, to illustrate the idea, and you'll need to fill in the gaps.

XMLStreamReader in; 
int token;
String localname;

public void parse() {
  next();
  if (token==START_ELEMENT && localname.equals("ul")) ul();
}

void ul() {
  next();          // assume we are called when a <ul> is seen, so we consume it
  while (true) {   // loops for list
    if (token==START_ELEMENT && localname.equals("li")) li();  // ifs for choice 
    else if (token==START_ELEMENT && localname.equals("sometag")) sometag();
    else break;
  }
  if (token==END_ELEMENT && localname.equals("ul")) next();
  else throw new RuntimeException("expected </ul>");
       // <li> or <sometag> would also be acceptable
}

void li() {
  next();
  ...
}

void next() {
  token = in.next();         // consume the token means to set up the next one
  localname = in.getLocalName();
}

I found it much easier to use if you create a layer-library to handle the repetitive stuff, eg I have:

  • boolean startTag(String name) just returns true if it matches
  • void requireStartTag(String name) consumes if match, else throws exception

But I think the example is clearer keeping it all literal.

And there's other issues like skipping non-element tokens (like comments, PIs etc); tracking which line you're on for more helpful exceptions etc.

There seems to be no straighforward way of doing this. Could you perhaps use the XMLEventReader to accomplish the same functionality?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM