简体   繁体   中英

C++ Boost:regex_search expression - Issue combining expressions to catch all sequences

I'm trying to write a template parser and need to pickup (3) distinct sets of sequences for string replacement.

// Each of These Expressions Work Perfect Separately!
// All Sequences start with | pipe. Followed by 

boost::regex expr {"(\\|[0-9]{2})"};               // 2 Digits only.
boost::regex expr {"(\\|[A-Z]{1,2}+[0-9]{1,2})"};  // 1 or 2 Uppercase Chars and 1 or 2 Digits.
boost::regex expr {"(\\|[A-Z]{2})(?!\\d)"};        // 2 Uppercase Chars with no following digits.

However, once I try to combine them into a single statement, I get can't them to work properly to catch all sequences. I must be missing something. Can anyone shed some light on what I'm missing?

Here is what I have so far:

// Each sequence is separated with a | for or between parenthesis. 
boost::regex expr {"(\\|[0-9]{2})|(\\|[A-Z]{1,2}+[0-9]{1,2})|(\\|[A-Z]{2})(?!\\d)"};

I'm using the follow string for testing, and probably little more then needed here is the code as well.

#include <boost/regex.hpp>
#include <string>
#include <iostream>


std::string str = "|MC01 |U1 |s |A22 |12 |04 |2 |EW |SSAADASD |15";

boost::regex expr {"(\\|[0-9]{2})|(\\|[A-Z]{1,2}+[0-9]{1,2})|(\\|[A-Z]{2})(?!\\d)"};

boost::smatch matches;
std::string::const_iterator start = str.begin(), end = str.end();
while(boost::regex_search(start, end, matches, expr))
{
    std::cout << "Matched Sub '" << matches.str()
              << "' following ' " << matches.prefix().str()
              << "' preceeding ' " << matches.suffix().str()
              << std::endl;

    start = matches[0].second;
    for(size_t s = 1; s < matches.size(); ++s)
    {
        std::cout << "+ Matched Sub " << matches[s].str()
                  << " at offset " << matches[s].first - str.begin()
                  << " of length " << matches[s].length()
                  << std::endl;
    }
}

I believe this is what you want:

const boost::regex expr {"(\\|[0-9]{2})|(\\|[A-Z]{1,2}+[0-9]{1,2})|(\\|[A-Z]{2})"}; // basically, remove the constraint on the last sub

I also suggest being explicit in your flags for expr and passed to regex_search .

I also fond that by added an extra check for matches on matched, this removes half-matched patterns which was throwing me off.

for(size_t s = 1; s < matches.size(); ++s)
{
    if (matches[s].matched)  // Check for bool True/False
    {
        std::cout << "+ Matched Sub " << matches[s].str()
              << " at offset " << matches[s].first - str.begin()
              << " of length " << matches[s].length()
              << std::endl;
    }
}

Without it, matches where showing with an offset at the end of the string showing length 0. So I hope this helps anyone else who runs into this.

Another Tip is, in the loop, checking s == 1, 2, 3 refers back to the match on the expressions. Since I have (3) expressions, if it matched on the first part of the expression, s will have a 1 value when matched is a true value, otherwise it will have 2 or 3. Pretty nice!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM