简体   繁体   中英

Using backreference to refer to a pattern rather than actual match

I am trying to write a regex which would match a (not necessarily repeating) sequence of text blocks, eg:

foo,bar,foo,bar

My initial thought was to use backreferences, something like

(foo|bar)(,\\1)*

But it turns out that this regex only matches foo,foo or bar,bar but not foo,bar or bar,foo (and so on).

Is there any other way to refer to a part of a pattern?

In the real world, foo and bar are 50+ character long regexes and I simply want to avoid copy pasting them to define a sequence.

With a decent regex flavor you could use (foo|bar)(?:,(?-1))* or the like. But Java does not support subpattern calls.

So you end up having a choice of doing String replace/format like in ajx's answer, or you could condition the comma if you know when it should be present and when not. For example:

(?:(?:foo|bar)(?:,(?!$|\s)|))+

Perhaps you could build your regex bit by bit in Java, as in:

String subRegex = "foo|bar";
String fullRegex = String.format("(%1$s)(,(%1$s))*", subRegex);

The second line could be factored out into a function. The function would take a subexpression and return a full regex that would match a comma-separated list of subexpressions.

The point of the back reference is to match the actual text that matches, not the pattern, so I'm not sure you could use that.

Can you use quantifiers like:

    String s= "foo,bar,foo,bar";
            String externalPattern = "(foo|bar)"; // comes from somewhere else
            Pattern p = Pattern.compile(externalPattern+","+externalPattern+"*");
    Matcher m = p.matcher(s);
    boolean b = m.find();

which would match 2 or more instances of foo or bar (followed by commas)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM