简体   繁体   中英

Splitting a string with regex, ignoring delimiters that occur within braces

Suppose I have a string

Max and Bob and Merry and {Jack and Co.} and Lisa .

I need to split it with and being the delimiter, but only if it does not occur within curly braces .

So from the above string I should get 5 strings:
Max , Bob , Merry , Jack and Co. , Lisa .

I tried something like this pattern:

[^\\\{.+]\\band\\b[^.+\\\}]

But it doesn't work - Jack and Co. are still split as well (I use C++ so I have to escape special characters twice).

If lookaheads are supported by the QRegExp you can check if inside braces by looking ahead at the final word boundary if there is a closing } with no opening { in between.

\band\b(?![^{]*})

See this demo at regex101

Need to be escaped as desired or try the raw string literal like @SMeyer commented.

Here is a possible solution, partially based on the comment by bobble-bubble . It will produce the five strings as requested, without surrounding whitespace or curly brackets.

std::string text = "Max and Bob and Merry and {Jack and Co.} and Lisa";
std::regex re(R"(\}? +and +(?![^{]*\})\{?)");

std::sregex_token_iterator it(text.begin(), text.end(), re, -1);
std::sregex_token_iterator end;

while (it != end)
    std::cout << *it++ << std::endl;

I tried to keep it simple, you might want to replace the spaces around and with full whitespace detection. An interactive version is available here .

Let the {...} part match first. That is, put it on the left side of | .

\{.*?\}|and

That will match {foo and bar} if possible, but if not then it will try to match and .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM