简体   繁体   中英

Java parse nested “tags”

I have "tags" that are in the format

{id|attribute|context|comment|flag1|flag2|...}

The thing is, the id section can be a nested tag, like so:

{{id|attribute|||flag}|attribute}

or even

{{{{id|attribute}|attribute}|attribute}|attribute}

The nesting can theoretically go on indefinitely. I'm trying to find a good way to parse text that could contain any number of these strings, like so

{7953|title} is a {7953|generic} in {{7953|setting}|title}.
{5514|name} lives in {7953|title}.
{{{3216|carrier|20140205191631}|origin}|pronoun||deeply rooted|first|possessive} favorite ...

You get the idea. I need a way to find every "tag" from a given block of text. Some things to note

  • The field deliminator is |
  • Only the first two fields are required for a tag
    • Missing fields are represented by consecutive |s
  • Tags can be arbitrarily nested, but only at the first position
  • White space IS significant (it is part of the fields and should not be ignored)
  • There can be an arbitrary number of flag fields
  • All fields can have any character inside (including id and context ), so {, }, and | must be escapable with \\ (eg \\| will not separate fields)

I know I can parse it by transversing the string and keeping track of when I hit a tag start, how deeply nested I am, when my depth hits 0, and grabbing everything between, but it's a bit of a pain.

I would like to do it with regex if at all possible, but Java doesn't support recursive regex.

What is the best way to go about parsing this?

Extra info

If it makes a difference, the "tags" will be parsed into an object (the parsing and the object built) and the object can then be rendered to the string it represents. That is why regex is preferable as I could use Matcher::appendReplacement and Matcher::appendTail .

This is the code I used to parse the text containing the "tags":

public static String parseText(String text) {
    StringBuilder oldText = new StringBuilder(text);
    StringBuilder newText = new StringBuilder();
    int firstTag = oldText.indexOf("{");
    FullBreak:
    while (firstTag >= 0) {
        newText.append(oldText.substring(0, firstTag));
        oldText.delete(0, firstTag);
        int depth = 1;
        int position = 0;
        while (depth > 0) {
            position++;
            if (position > oldText.length() - 1) {
                break FullBreak;
            }
            if (oldText.charAt(position) == '{' && oldText.charAt(position - 1) != '\\') {
                depth++;
            }
            if (oldText.charAt(position) == '}' && oldText.charAt(position - 1) != '\\') {
                depth--;
            }
        }
        position++;
        newText.append(parseTag(oldText.substring(0, position)).render());
        oldText.delete(0, position);
        firstTag = oldText.indexOf("{");
    }
    newText.append(oldText);
    return newText.toString();
}

In this case, parstTag(String) returns a Tag , which has a render() method.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM