简体   繁体   English

在 C++ 中编写正则表达式的正确方法是什么?

[英]What is the right way to write regular expression in C++?

Having hard time writing below regex expression in C++很难在 C++ 中编写下面的正则表达式

(?=[a-zA-Z])*(?=[\s])?(00|\+)[\s]?[0-9]+[\s]?[0-9]+(?=[\sa-zA-Z])*

Example string: "ABC + 91 9997474545 DEF"示例字符串: "ABC + 91 9997474545 DEF"

Matched string must be: "+ 91 9997474545"匹配的字符串必须是: "+ 91 9997474545"

C++ code : C++ 代码:

#include <iostream> 
#include <regex> 

using namespace std; 
int main() 
{ 
    string a = "ABC + 91 9997474545 DEF"; 
    try
    {
        regex b("(?=[a-zA-Z])*(?=[\\s])?(00|\\+)[\\s]?[0-9]+[\\s]?[0-9]+(?=[\\sa-zA-Z])*"); 

        smatch amatch;
        if ( regex_search(a, amatch, b) )
        {
            for(const auto& aMa : amatch)
            {
                cout<< "match :" <<aMa.str()<<endl;
            }
        }
    }
    catch (const regex_error& err)
    { 
        std::cout << "There was a regex_error caught: " << err.what() << '\n'; 
    }
    return 0; 
}

Output:输出:

There was a regex_error caught: regex_error

What is wrong in the code?代码有什么问题?

Edit: an improved version (based on Toto comment):编辑:改进版本(基于 Toto 评论):

regex b(R"(([alpha]*\s*)(\+?\s*\d+\s*\d+)(\s*[alpha]*))");
  • Use [alpha] character class which is alphabetic character - instead of \\w which can contain digits as well.使用 [alpha] 字符类,它是字母字符 - 而不是 \\w ,它也可以包含数字。
  • In second/main group (\\+?\\s*\\d+\\s*\\d+) use + to force at least one digit.在第二组/主组(\\+?\\s*\\d+\\s*\\d+)使用+强制至少一位数字。

Two suggestions to make your code more readable:使您的代码更具可读性的两个建议:

  • Use raw string (R) to avoid double quote使用原始字符串 (R) 来避免双引号
  • Use character class such as \\w (for letters), \\s (for spaces), \\d (for digit)使用字符类,例如 \\w(表示字母)、\\s(表示空格)、\\d(表示数字)

Then your regex could be simplified like this:那么你的正则表达式可以像这样简化:

regex b(R"((\w*\s*)(\+?\s*\d*\s*\d*)(\s*\w*))");

which would yield the results (assume you want to extract the number with optional plus sign):这将产生结果(假设您想提取带有可选加号的数字):

match :ABC + 91 9997474545 DEF
match :ABC 
match :+ 91 9997474545
match : DEF

Note the regex above contains 3 groups:请注意,上面的正则表达式包含 3 个组:

  • (\\w*\\s*) - some preceding letters and spaces (\\w*\\s*) - 一些前面的字母和空格
  • (+?\\s*\\d*\\s*\\d*) - plus sign then some digits (91), some optional space, and some other digits (9997474545) (+?\\s*\\d*\\s*\\d*) - 加号,然后是一些数字 (91)、一些可选的空格和一些其他数字 (9997474545)
  • (\\s*\\w*) - some spaces, then some letters. (\\s*\\w*) - 一些空格,然后是一些字母。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM