简体   繁体   中英

The correct regular expression to search for multiple occurrences

I have this source text:

{ff0000}Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.{8b00ff}Ut enim {FFFFFF}ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.{0000ff}Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

The task is to divide this text into 4 occurrences of a regular expression

The results should look like this:

  1. {ff0000}Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
  2. {8b00ff}Ut enim
  3. {FFFFFF}ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.
  4. {0000ff}Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

I have compiled a regular expression that looks for HEX colors, but I can't limit the search to the next occurrence

Regular expression: \{[a-zA-Z0-9]{6}\}

I tried: \{[a-zA-Z0-9]{6}\}.* | (\{[a-zA-Z0-9]{6}\}{1}).* | ((\{[a-zA-Z0-9]{6}\}{1}).*){1} \{[a-zA-Z0-9]{6}\}.* | (\{[a-zA-Z0-9]{6}\}{1}).* | ((\{[a-zA-Z0-9]{6}\}{1}).*){1}

In all three cases after the addition .* all other HEX characters are no longer perceived by the regular expression and are perceived as .* The question is to somehow limit each following regular expression to the occurrence of the next HEX color

For matching characters that are not { use a negated character class .

\{[A-Za-z\d]{6}\}[^{]*

See this demo at regex101

Just use the positions of the matches to determine the relevant substrings. Every substring starts with the occurance of one of the patterns and ends either at the end of the input string or the next match of the pattern, whatever comes first.

std::vector<std::string> FindMatches(std::string_view const input)
{
    std::regex reg("\\{[a-zA-Z0-9]{6}\\}");
    constexpr size_t MatchLength = 8;


    std::vector<std::string> result;

    std::match_results<std::string_view::const_iterator> match;
    if (std::regex_search(input.begin(), input.end(), match, reg))
    {
        auto partStart = input.begin() + match.position();
        while (std::regex_search(partStart + MatchLength, input.end(), match, reg))
        {
            auto partEnd = partStart + (MatchLength + match.position());
            result.emplace_back(partStart, partEnd);
            partStart = partEnd;
        }
        result.emplace_back(partStart, input.end());
    }

    return result;
}

int main()
{
    using namespace std::literals::string_view_literals;

    auto const haysack = "{ff0000}Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.{8b00ff}Ut enim {FFFFFF}ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.{0000ff}Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum."sv;

    size_t index = 1;

    for (auto& part : FindMatches(haysack))
    {
        std::cout << index << ". " << part << '\n';
        ++index;
    }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM