[英]Using std::regex to filter input
I have an ugly mess of a string, that is composed of several URIs. 我有一个字符串的丑陋,它由几个URI组成。
:/SymbolStandards/JMSymbology/MIL_STD_2525D_Symbols/0_301_0.svg,:/SymbolStandards/JMSymbology/MIL_STD_2525D_Symbols/02011.svg,:/SymbolStandards/JMSymbology/MIL_STD_2525D_Symbols/02012.svg,:/SymbolStandards/JMSymbology/MIL_STD_2525D_Symbols/02110000.svg
What I would like to do is strip out every occurrence of the characters :/.,
, so I can have a single string that would be a valid filename. 我想要做的是删除每次出现的字符
:/.,
,所以我可以有一个单独的字符串,它将是一个有效的文件名。
I've written this simple regex expression in order to do jus that: [^(:/,.)]
It seems to be the correct regex expression, according to http://www.regexpal.com/ . 我写了这个简单的正则表达式来执行jus:
[^(:/,.)]
根据http://www.regexpal.com/ ,它似乎是正确的正则表达式。
However, when I run the following C++ code, I do not get back what I was expecting(just alphanumeric characters and underscores), I just get back the first alphanumeric character in the sequence: S
. 但是,当我运行以下C ++代码时,我没有得到我期望的东西(只是字母数字字符和下划线),我只是回到序列中的第一个字母数字字符:
S
。
What am I doing incorrectly with std::regex, or is my regex expression off? 我对std :: regex做错了什么,或者我的正则表达式是什么?
#include <iostream>
#include <regex>
#include <string>
static const std::string filenames {R"(:/SymbolStandards/JMSymbology/MIL_STD_2525D_Symbols/0_301_0.svg,:/SymbolStandards/JMSymbology/MIL_STD_2525D_Symbols/02011.svg,:/SymbolStandards/JMSymbology/MIL_STD_2525D_Symbols/02012.svg,:/SymbolStandards/JMSymbology/MIL_STD_2525D_Symbols/02110000.svg)"};
static const std::regex filename_extractor("[^(:/,.)]");
int main() {
std::smatch filename_match;
if(std::regex_search(filenames, filename_match, filename_extractor))
{
std::cout << "Number of filenames: " << filename_match.size() << std::endl;
for(std::size_t i = 0; i < filename_match.size(); ++i)
{
std::cout << i << ": " << filename_match[i] << std::endl;
}
}
return 0;
}
The size()
of std::smatch
returns the number of sub-expression + 1 (with (
and )
, which you do not have). std::smatch
的size()
返回子表达式的数量+ 1(带(
和)
,你没有)。
You need to call std::regex_search
repeatedly, or use std::regex_iterator
. 您需要重复调用
std::regex_search
,或使用std::regex_iterator
。
In addition, your regex actually searched only for a single character. 此外,您的正则表达式实际上只搜索单个字符。 You need to use a
+
to search for the longest character sequences: [^(:/,.)]+
. 您需要使用
+
来搜索最长的字符序列: [^(:/,.)]+
。
Here is your code, incorporating the example from cppreference.com : 这是您的代码,包含来自cppreference.com的示例:
#include <iostream>
#include <iterator>
#include <regex>
#include <string>
static const std::string filenames {R"(:/SymbolStandards/JMSymbology/MIL_STD_2525D_Symbols/0_301_0.svg,:/SymbolStandards/JMSymbology/MIL_STD_2525D_Symbols/02011.svg,:/SymbolStandards/JMSymbology/MIL_STD_2525D_Symbols/02012.svg,:/SymbolStandards/JMSymbology/MIL_STD_2525D_Symbols/02110000.svg)"};
static const std::regex filename_extractor("[^(:/,.)]+");
int main() {
auto files_begin = std::sregex_iterator(filenames.begin(), filenames.end(), filename_extractor);
for (auto i = files_begin; i != std::sregex_iterator(); ++i) {
std::string filename = i->str();
std::cout << filename << '\n';
}
return 0;
}
However, this returns also the intermediate "directories". 但是,这也返回中间的“目录”。 If you use the regex
[^(:,)]+
, you get the result I would expect you wanted to have: 如果你使用正则表达式
[^(:,)]+
:, [^(:,)]+
,你会得到我希望你想要的结果:
/SymbolStandards/JMSymbology/MIL_STD_2525D_Symbols/0_301_0.svg
/SymbolStandards/JMSymbology/MIL_STD_2525D_Symbols/02011.svg
/SymbolStandards/JMSymbology/MIL_STD_2525D_Symbols/02012.svg
/SymbolStandards/JMSymbology/MIL_STD_2525D_Symbols/02110000.svg
std::regex_search
searches only for the first occurence of the regular expression, and any sub-expressions within. std::regex_search
仅搜索正则表达式的第一个出现,以及其中的任何子表达式。
For example, the expression ab([cd])([ef])
will match the string xxabcfxxabdef
. 例如,表达式
ab([cd])([ef])
将匹配字符串xxabcfxxabdef
。 The first match is the part abcf
, with c
being the match for the first sub-expression [cd]
and e
being the match for the second sub-expression [ef]
. 第一个匹配是部分
abcf
,其中c
是第一个子表达式[cd]
的匹配, e
是第二个子表达式[ef]
的匹配。
The second match is the part abde
(not abdef
!), where e
is the match for the second sub-expression. 第二个匹配是部分
abde
(不是abdef
!),其中e
是第二个子表达式的匹配。
With std::regex_search
, you search for the first match, and the matcher returns you the complete first match and the matches for the sub-expressions. 使用
std::regex_search
,您将搜索第一个匹配项,匹配器将返回完整的第一个匹配项以及子表达式的匹配项。 If you want to find further matches, you have to start the search from the rest of the string ( std::smatch::suffix()
). 如果要查找更多匹配项,则必须从字符串的其余部分开始搜索(
std::smatch::suffix()
)。
In addition, the regex [ef]
matches only a single character. 另外,正则表达式
[ef]
仅匹配单个字符。 [ef]+
would match the longest sequence of e
s and f
s. [ef]+
将匹配e
s和f
s的最长序列。 Thus, the match for the second sub-expression of ab([cd])([ef])
for the target string above would match ef
, and not just e
. 因此,对于上面的目标字符串,
ab([cd])([ef])
的第二子表达式的匹配将匹配ef
,而不仅仅是e
。
I think std::regex_replace
is what you need here: 我认为
std::regex_replace
是你需要的:
#include <regex>
#include <string>
#include <iostream>
const std::string filenames {R"(:/MIL_STD/0_3.svg,:/SS/2525D/02011.svg)"};
const std::regex filename_extractor("[(:/,.)]");
int main()
{
std::string r;
std::regex_replace(std::back_inserter(r),
filenames.begin(), filenames.end(), filename_extractor, "");
std::cout << "before: " << filenames << '\n';
std::cout << " after: " << r << '\n';
}
However I think regex is probably overkill for removing characters you can do this more efficiently with std::remove_copy_if : 但是我觉得正则表达式对于删除字符可能有点过分,你可以用std :: remove_copy_if更有效地做到这一点:
#include <string>
#include <iostream>
#include <algorithm>
const std::string filenames {R"(:/MIL_STD/0_3.svg,:/SS/2525D/02011.svg)"};
const std::string filename_extractor("(:/,.)");
int main()
{
std::string r;
std::remove_copy_if(filenames.begin(), filenames.end(),
std::back_inserter(r), [](char c)
{
return filename_extractor.find(c) != std::string::npos;
});
std::cout << "before: " << filenames << '\n';
std::cout << " after: " << r << '\n';
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.