C ++正则表达式排除双引号不起作用

Question

I am considering input files with lines like 我正在考虑输入文件，例如

"20170103","MW JANE DOE","NL01 INGB 1234 5678 90","NL02 INGB 1234 5678 90","GT","Af","12,34","Internetbankieren","Mededeling_3"
"20170102","MW JANE DOE","NL01 INGB 1234 5678 90","NL02 INGB 1234 5678 90","GT","Af","12,34","Internetbankieren","Mededeling_2"
"20170101","MW JANE DOE","NL01 INGB 1234 5678 90","NL02 INGB 1234 5678 90","GT","Af","12,34","Internetbankieren","Mededeling_1"

. 。 I want to get the separate strings WITHOUT THE DOUBLE QUOTES and store them in std::vector<std::string> . 我想得到没有双引号的单独的字符串，并将它们存储在std::vector<std::string> 。 So, for instance, I want to have 20170101 , MW JANE DOE , NL01 INGB 1234 5678 90 , NL02 INGB 1234 5678 90 , GT , Af , 12,34 , Internetbankieren , and Mededeling_1 as a result. 因此，例如，我想要结果为20170101 ， MW JANE DOE ， NL01 INGB 1234 5678 90 ， NL02 INGB 1234 5678 90 ， GT ， Af ， 12,34 ， Internetbankieren和Mededeling_1 。

I try to do so with the code 我尝试用代码来做到这一点

std::regex re("\"(.*?)\"");
std::regex_iterator<std::string::iterator> it (line.begin(),line.end(),re);
std::regex_iterator<std::string::iterator> end;
std::vector<std::string> lineParts;
std::string linePart="";

// Split 'line' into line parts and save these in the vector 'lineParts'.
while (it!=end)
{
    linePart=it->str();
    std::cout<<linePart<<std::endl; // Print substring.
    lineParts.push_back(linePart);
    ++it;
}

However, the double quotes are still included in the elements of lineParts , even though I used the regex "\\"(.*?)\\"" so that supposedly only the part within the double quotes is saved, and not the double quotes themselves. 但是，即使我使用了正则表达式"\\"(.*?)\\"" ，双引号仍包含在lineParts的元素中，因此，假定只保存了双引号中的部分，而不保存了双引号本身。

What am I doing wrong? 我究竟做错了什么？

Answer 1

You have a pattern with a capturing group . 您有一个带有捕获组的模式。 So, when your regex finds a match, the double quotes are part of the whole match value (that is stored in the [0] th element), but the captured part is stored in the [1] th element. 因此，当您的正则表达式找到匹配项时，双引号是整个匹配值的一部分（存储在第[0]个元素中），但是捕获的部分存储在第[1]个元素中。

So, you just need to access capturing group #1 contents: 因此，您只需要访问捕获组1的内容：

linePart=it->str(1);

See regular-expressions.info Finding a Regex Match : 请参阅regular-expressions.info 查找正则表达式匹配项 ：

When the function call returns true, you can call the str() , position() , and length() member functions of the match_results object to get the text that was matched, or the starting position and its length of the match relative to the subject string. 当函数调用返回true时，您可以调用match_results对象的str() ， position()和length()成员函数以获取匹配的文本，或匹配的起始位置及其长度（相对于主题字符串。 Call these member functions without a parameter or with 0 as the parameter to get the overall regex match. 调用这些不带参数或以0为参数的成员函数，以获得整体正则表达式匹配。 Call them passing 1 or greater to get the match of a particular capturing group. 称他们通过1或更大，以获取特定捕获组的匹配。 The size() member function indicates the number of capturing groups plus one for the overall match. size()成员函数指示捕获组的数量加一个用于整体匹配的组。 Thus you can pass a value up to size()-1 to the other three member functions. 因此，您可以将size()-1的值传递给其他三个成员函数。

Answer 2

As others have said, regex_iterator::operator-> returns a match_results and match_results::str is defaulted to 0: 正如其他人所说， regex_iterator::operator->返回match_results并且match_results::str默认为0：

The first sub_match (index 0 ) contained in a match_result always represents the full match within a target sequence made by a regex , and subsequent sub_matches represent sub-expression matches corresponding in sequence to the left parenthesis delimiting the sub-expression in the regex match_result包含的第一个sub_match （索引0 ）始终表示由regex生成的目标序列中的完全匹配，随后的sub_matches表示子表达式匹配，该子表达式匹配顺序与左括号相对应，从而限定了regex的子regex

So the problem with your code is you're not using linePart = it->str(1) . 因此，代码的问题是您没有使用linePart = it->str(1) 。

A better solution would be to use a regex_token_iterator . 更好的解决方案是使用regex_token_iterator 。 With whitch you could just use your re to directly initialize lineParts : 使用whitch，您可以只使用re直接初始化lineParts ：

vector<string> lineParts { sregex_token_iterator(cbegin(line), cend(line), re, 1), sregex_tokent_iterator() };

But I'd just like to point out that c++14 introduced quoted does exactly what you're trying to do here, and more (it even handles escaped quotes for you!) It'd just be a shame not to use it. 但是我想指出的是，引入quoted c ++ 14确实可以满足您在此处要执行的操作，甚至还有更多（它甚至可以为您处理转义的引号！）不使用它只是可耻的。

You probably are already getting your input from a stream, but just in the case you're not you'd need to initialize an istringstream , for the purposes of example I'll call mine: line . 您可能已经从流中获取了输入，但是就您而言，您不需要初始化istringstream ，就示例而言，我将其称为mine： line 。 Then you can use quoted to populate lineParts like this: 然后，您可以使用quoted填充lineParts如下所示：

for(string linePart; line >> quoted(linePart); line.ignore(numeric_limits<streamsize>::max(), ',')) {
    lineParts.push_back(linePart);
}

Live Example 现场例子

C ++正则表达式排除双引号不起作用

问题描述

2 个解决方案

解决方案1
2 已采纳 2017-02-14 14:25:31

解决方案2
2 2017-02-14 14:32:45

C ++正则表达式排除双引号不起作用

问题描述

2 个解决方案

解决方案1 2 已采纳 2017-02-14 14:25:31

解决方案2 2 2017-02-14 14:32:45

解决方案1
2 已采纳 2017-02-14 14:25:31

解决方案2
2 2017-02-14 14:32:45