使用std :: regex过滤输入

Question

I have an ugly mess of a string, that is composed of several URIs. 我有一个字符串的丑陋，它由几个URI组成。

:/SymbolStandards/JMSymbology/MIL_STD_2525D_Symbols/0_301_0.svg,:/SymbolStandards/JMSymbology/MIL_STD_2525D_Symbols/02011.svg,:/SymbolStandards/JMSymbology/MIL_STD_2525D_Symbols/02012.svg,:/SymbolStandards/JMSymbology/MIL_STD_2525D_Symbols/02110000.svg

What I would like to do is strip out every occurrence of the characters :/., , so I can have a single string that would be a valid filename. 我想要做的是删除每次出现的字符:/., ，所以我可以有一个单独的字符串，它将是一个有效的文件名。

I've written this simple regex expression in order to do jus that: [^(:/,.)] It seems to be the correct regex expression, according to http://www.regexpal.com/ . 我写了这个简单的正则表达式来执行jus： [^(:/,.)]根据http://www.regexpal.com/ ，它似乎是正确的正则表达式。

However, when I run the following C++ code, I do not get back what I was expecting(just alphanumeric characters and underscores), I just get back the first alphanumeric character in the sequence: S . 但是，当我运行以下C ++代码时，我没有得到我期望的东西（只是字母数字字符和下划线），我只是回到序列中的第一个字母数字字符： S 。

What am I doing incorrectly with std::regex, or is my regex expression off? 我对std :: regex做错了什么，或者我的正则表达式是什么？

#include <iostream>
#include <regex>
#include <string>

static const std::string filenames {R"(:/SymbolStandards/JMSymbology/MIL_STD_2525D_Symbols/0_301_0.svg,:/SymbolStandards/JMSymbology/MIL_STD_2525D_Symbols/02011.svg,:/SymbolStandards/JMSymbology/MIL_STD_2525D_Symbols/02012.svg,:/SymbolStandards/JMSymbology/MIL_STD_2525D_Symbols/02110000.svg)"};
static const std::regex filename_extractor("[^(:/,.)]");

int main() {
    std::smatch filename_match;
    if(std::regex_search(filenames, filename_match, filename_extractor))
    {
        std::cout << "Number of filenames: " << filename_match.size() << std::endl;
        for(std::size_t i = 0; i < filename_match.size(); ++i)
        {
            std::cout << i << ": " << filename_match[i] << std::endl;
        }
    }

    return 0;
}

Answer 1

The size() of std::smatch returns the number of sub-expression + 1 (with ( and ) , which you do not have). std::smatch的size()返回子表达式的数量+ 1（带(和) ，你没有）。

Solution 解

You need to call std::regex_search repeatedly, or use std::regex_iterator . 您需要重复调用std::regex_search ，或使用std::regex_iterator 。

In addition, your regex actually searched only for a single character. 此外，您的正则表达式实际上只搜索单个字符。 You need to use a + to search for the longest character sequences: [^(:/,.)]+ . 您需要使用+来搜索最长的字符序列： [^(:/,.)]+ 。

Here is your code, incorporating the example from cppreference.com : 这是您的代码，包含来自cppreference.com的示例：

#include <iostream>
#include <iterator>
#include <regex>
#include <string>

static const std::string filenames {R"(:/SymbolStandards/JMSymbology/MIL_STD_2525D_Symbols/0_301_0.svg,:/SymbolStandards/JMSymbology/MIL_STD_2525D_Symbols/02011.svg,:/SymbolStandards/JMSymbology/MIL_STD_2525D_Symbols/02012.svg,:/SymbolStandards/JMSymbology/MIL_STD_2525D_Symbols/02110000.svg)"};
static const std::regex filename_extractor("[^(:/,.)]+");

int main() {
    auto files_begin = std::sregex_iterator(filenames.begin(), filenames.end(), filename_extractor);

    for (auto i = files_begin; i != std::sregex_iterator(); ++i) {
        std::string filename = i->str(); 
        std::cout << filename << '\n';
    }   

    return 0;
}

However, this returns also the intermediate "directories". 但是，这也返回中间的“目录”。 If you use the regex [^(:,)]+ , you get the result I would expect you wanted to have: 如果你使用正则表达式[^(:,)]+ ：， [^(:,)]+ ，你会得到我希望你想要的结果：

/SymbolStandards/JMSymbology/MIL_STD_2525D_Symbols/0_301_0.svg
/SymbolStandards/JMSymbology/MIL_STD_2525D_Symbols/02011.svg
/SymbolStandards/JMSymbology/MIL_STD_2525D_Symbols/02012.svg
/SymbolStandards/JMSymbology/MIL_STD_2525D_Symbols/02110000.svg

Your problem explained 你的问题解释了

std::regex_search searches only for the first occurence of the regular expression, and any sub-expressions within. std::regex_search仅搜索正则表达式的第一个出现，以及其中的任何子表达式。

For example, the expression ab([cd])([ef]) will match the string xxabcfxxabdef . 例如，表达式ab([cd])([ef])将匹配字符串xxabcfxxabdef 。 The first match is the part abcf , with c being the match for the first sub-expression [cd] and e being the match for the second sub-expression [ef] . 第一个匹配是部分abcf ，其中c是第一个子表达式[cd]的匹配， e是第二个子表达式[ef]的匹配。

The second match is the part abde (not abdef !), where e is the match for the second sub-expression. 第二个匹配是部分abde （不是abdef ！），其中e是第二个子表达式的匹配。

With std::regex_search , you search for the first match, and the matcher returns you the complete first match and the matches for the sub-expressions. 使用std::regex_search ，您将搜索第一个匹配项，匹配器将返回完整的第一个匹配项以及子表达式的匹配项。 If you want to find further matches, you have to start the search from the rest of the string ( std::smatch::suffix() ). 如果要查找更多匹配项，则必须从字符串的其余部分开始搜索（ std::smatch::suffix() ）。

In addition, the regex [ef] matches only a single character. 另外，正则表达式[ef]仅匹配单个字符。 [ef]+ would match the longest sequence of e s and f s. [ef]+将匹配e s和f s的最长序列。 Thus, the match for the second sub-expression of ab([cd])([ef]) for the target string above would match ef , and not just e . 因此，对于上面的目标字符串， ab([cd])([ef])的第二子表达式的匹配将匹配ef ，而不仅仅是e 。

Answer 2

I think std::regex_replace is what you need here: 我认为std::regex_replace是你需要的：

#include <regex>
#include <string>
#include <iostream>

const std::string filenames {R"(:/MIL_STD/0_3.svg,:/SS/2525D/02011.svg)"};
const std::regex filename_extractor("[(:/,.)]");

int main()
{
    std::string r;

    std::regex_replace(std::back_inserter(r),
        filenames.begin(), filenames.end(), filename_extractor, "");

    std::cout << "before: " << filenames << '\n';
    std::cout << " after: " << r << '\n';
}

However I think regex is probably overkill for removing characters you can do this more efficiently with std::remove_copy_if : 但是我觉得正则表达式对于删除字符可能有点过分，你可以用std :: remove_copy_if更有效地做到这一点：

#include <string>
#include <iostream>
#include <algorithm>

const std::string filenames {R"(:/MIL_STD/0_3.svg,:/SS/2525D/02011.svg)"};
const std::string filename_extractor("(:/,.)");

int main()
{
    std::string r;

    std::remove_copy_if(filenames.begin(), filenames.end(),
        std::back_inserter(r), [](char c)
    {
        return filename_extractor.find(c) != std::string::npos;
    });

    std::cout << "before: " << filenames << '\n';
    std::cout << " after: " << r << '\n';
}

使用std :: regex过滤输入

问题描述

2 个解决方案

解决方案1
3 已采纳 2016-08-12 17:49:25

Solution 解

Your problem explained 你的问题解释了

解决方案2
2 2016-08-12 17:48:49

使用std :: regex过滤输入

问题描述

2 个解决方案

解决方案1 3 已采纳 2016-08-12 17:49:25

Solution 解

Your problem explained 你的问题解释了

解决方案2 2 2016-08-12 17:48:49

解决方案1
3 已采纳 2016-08-12 17:49:25

解决方案2
2 2016-08-12 17:48:49