简体   繁体   English

符号'是提升正则表达式的特殊符号吗?

[英]Is symbol ’ special one for boost regexp?

Regular expression: “[^”]*“正则表达式: “[^”]*“

String: “lips“字符串: “lips“

Result: match结果:匹配

String: “lips'“字符串: “lips'“

Result: not match结果:不匹配

I expect both strings to match.我希望两个字符串都匹配。

C++ code: C++ 代码:

#include <iostream>
#include <string>
#include <boost/regex.hpp>

using namespace std;
using namespace boost;

int main()
{
    const string s1 = "“lips“";
    const string s2 = "“lips’“";
    if (regex_search(s1, regex("“[^”]*“"))) cout << "s1 matched" << endl;
    if (regex_search(s2, regex("“[^”]*“"))) cout << "s2 matched" << endl;
    return 0;
}

output: s1 matched output:s1 匹配

Is the symbol ' special?符号'特殊吗? Why is the second string not matching?为什么第二个字符串不匹配?

boost regex library does not use utf-8 by default. boost 正则表达式库默认不使用 utf-8。 utf-8 quote symbol and apostrophe have common byte, that`s why regex does not work. utf-8 引号和撇号有共同的字节,这就是正则表达式不起作用的原因。 Code for utf-8: utf-8 的代码:

#include <iostream>
#include <string>
#include <boost/regex.hpp>
#include <boost/regex/icu.hpp>

using namespace std;
using namespace boost;

int main()
{
    const string s1 = "“lips“";
    const string s2 = "“lips’“";
    if (u32regex_search(s1, make_u32regex("“[^”]*“"))) cout << "s1 matched" << endl;
    if (u32regex_search(s2, make_u32regex("“[^”]*“"))) cout << "s2 matched" << endl;
    return 0;
}

compilation: g++ -std=c++11./test.cc -licuuc -lboost_regex编译: g++ -std=c++11./test.cc -licuuc -lboost_regex

output: output:

s1 matched
s2 matched

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM