[英]C++ regex exclusion double quotes not working
I am considering input files with lines like 我正在考虑输入文件,例如
"20170103","MW JANE DOE","NL01 INGB 1234 5678 90","NL02 INGB 1234 5678 90","GT","Af","12,34","Internetbankieren","Mededeling_3"
"20170102","MW JANE DOE","NL01 INGB 1234 5678 90","NL02 INGB 1234 5678 90","GT","Af","12,34","Internetbankieren","Mededeling_2"
"20170101","MW JANE DOE","NL01 INGB 1234 5678 90","NL02 INGB 1234 5678 90","GT","Af","12,34","Internetbankieren","Mededeling_1"
. 。 I want to get the separate strings WITHOUT THE DOUBLE QUOTES and store them in
std::vector<std::string>
. 我想得到没有双引号的单独的字符串,并将它们存储在
std::vector<std::string>
。 So, for instance, I want to have 20170101
, MW JANE DOE
, NL01 INGB 1234 5678 90
, NL02 INGB 1234 5678 90
, GT
, Af
, 12,34
, Internetbankieren
, and Mededeling_1
as a result. 因此,例如,我想要结果为
20170101
, MW JANE DOE
, NL01 INGB 1234 5678 90
, NL02 INGB 1234 5678 90
, GT
, Af
, 12,34
, Internetbankieren
和Mededeling_1
。
I try to do so with the code 我尝试用代码来做到这一点
std::regex re("\"(.*?)\"");
std::regex_iterator<std::string::iterator> it (line.begin(),line.end(),re);
std::regex_iterator<std::string::iterator> end;
std::vector<std::string> lineParts;
std::string linePart="";
// Split 'line' into line parts and save these in the vector 'lineParts'.
while (it!=end)
{
linePart=it->str();
std::cout<<linePart<<std::endl; // Print substring.
lineParts.push_back(linePart);
++it;
}
However, the double quotes are still included in the elements of lineParts
, even though I used the regex "\\"(.*?)\\""
so that supposedly only the part within the double quotes is saved, and not the double quotes themselves. 但是,即使我使用了正则表达式
"\\"(.*?)\\""
,双引号仍包含在lineParts
的元素中,因此,假定只保存了双引号中的部分,而不保存了双引号本身。
What am I doing wrong? 我究竟做错了什么?
You have a pattern with a capturing group . 您有一个带有捕获组的模式。 So, when your regex finds a match, the double quotes are part of the whole match value (that is stored in the
[0]
th element), but the captured part is stored in the [1]
th element. 因此,当您的正则表达式找到匹配项时,双引号是整个匹配值的一部分(存储在第
[0]
个元素中),但是捕获的部分存储在第[1]
个元素中。
So, you just need to access capturing group #1 contents: 因此,您只需要访问捕获组1的内容:
linePart=it->str(1);
See regular-expressions.info Finding a Regex Match : 请参阅regular-expressions.info 查找正则表达式匹配项 :
When the function call returns true, you can call the
str()
,position()
, andlength()
member functions of the match_results object to get the text that was matched, or the starting position and its length of the match relative to the subject string.当函数调用返回true时,您可以调用match_results对象的
str()
,position()
和length()
成员函数以获取匹配的文本,或匹配的起始位置及其长度(相对于主题字符串。 Call these member functions without a parameter or with 0 as the parameter to get the overall regex match.调用这些不带参数或以0为参数的成员函数,以获得整体正则表达式匹配。 Call them passing 1 or greater to get the match of a particular capturing group.
称他们通过1或更大,以获取特定捕获组的匹配。 The
size()
member function indicates the number of capturing groups plus one for the overall match.size()
成员函数指示捕获组的数量加一个用于整体匹配的组。 Thus you can pass a value up tosize()-1
to the other three member functions.因此,您可以将
size()-1
的值传递给其他三个成员函数。
As others have said, regex_iterator::operator->
returns a match_results
and match_results::str
is defaulted to 0: 正如其他人所说,
regex_iterator::operator->
返回match_results
并且match_results::str
默认为0:
The first
sub_match
(index0
) contained in amatch_result
always represents the full match within a target sequence made by aregex
, and subsequentsub_matches
represent sub-expression matches corresponding in sequence to the left parenthesis delimiting the sub-expression in theregex
match_result
包含的第一个sub_match
(索引0
)始终表示由regex
生成的目标序列中的完全匹配,随后的sub_matches
表示子表达式匹配,该子表达式匹配顺序与左括号相对应,从而限定了regex
的子regex
So the problem with your code is you're not using linePart = it->str(1)
. 因此,代码的问题是您没有使用
linePart = it->str(1)
。
A better solution would be to use a regex_token_iterator
. 更好的解决方案是使用
regex_token_iterator
。 With whitch you could just use your re
to directly initialize lineParts
: 使用whitch,您可以只使用
re
直接初始化lineParts
:
vector<string> lineParts { sregex_token_iterator(cbegin(line), cend(line), re, 1), sregex_tokent_iterator() };
But I'd just like to point out that c++14 introduced quoted
does exactly what you're trying to do here, and more (it even handles escaped quotes for you!) It'd just be a shame not to use it. 但是我想指出的是,引入
quoted
c ++ 14确实可以满足您在此处要执行的操作,甚至还有更多(它甚至可以为您处理转义的引号!)不使用它只是可耻的。
You probably are already getting your input from a stream, but just in the case you're not you'd need to initialize an istringstream
, for the purposes of example I'll call mine: line
. 您可能已经从流中获取了输入,但是就您而言,您不需要初始化
istringstream
,就示例而言,我将其称为mine: line
。 Then you can use quoted
to populate lineParts
like this: 然后,您可以使用
quoted
填充lineParts
如下所示:
for(string linePart; line >> quoted(linePart); line.ignore(numeric_limits<streamsize>::max(), ',')) {
lineParts.push_back(linePart);
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.