如何在C ++中正確存儲正則表達式匹配

Question

我想通過解析UVA985中的輸入來嘗試C ++ 11 regex庫，但是，我不明白如何在容器中存儲所有匹配項，以便我可以迭代並使用它。

#include <regex>
#include <string>
#include <iostream>
#include <vector>
#include <cstdio>

using namespace std;

vector<string> get_names(const string &sentence) {
    vector<string> vname;
    regex author_regex("(.+\\.\\,\\s)|(.+\\.:)", regex_constants::ECMAScript);
    smatch names; // This is always empty
    regex_match(sentence, names, author_regex); // Is this correct?
    for (auto name: names) {
        vname.push_back(name.str() + ".");
    }
    return vname;
}

int main(void) {
    const string papers[] = {
        "Smith, M.N., Martin, G., Erdos, P.: Newtonian forms of prime \
            factor matrices",
        "Erdos, P., Reisig, W.: Stuttering in petri nets",
        "Smith, M.N., Chen, X.: First oder derivates in structured programming",
        "Jablonski, T., Hsueh, Z.: Selfstabilizing data structures" };
    vector<vector<string>> input_data;
    for (auto paper : papers) {
        input_data.push_back(get_names(paper));
    }

    int counter = 1;
    for (auto scenario : input_data) {
        cout << "Paper " << counter << ":\n";
        for (auto author: scenario) {
            cout << author << endl;
            counter += 1;
        }
    }
    return 0;
}

我嘗試將正則表達式模式更改為簡單的東西. ，但容器smatch總是空的，我錯過了什么？

Answer 1

存儲在容器中可以通過兩種方式完成，范圍構造和默認構造然后插入。 <regex>庫包含std::sregex_token_iterator ，它將返回與您的模式匹配的字符串。 我們可以使用它來進行范圍構造並返回一個std::vector<> 。

std::vector<std::string> names(std::sregex_token_iterator(sentence.begin(), sentence.end(), author_regex),
                               std::sregex_token_iterator());
return names;

現在你的正則表達式需要一些工作。 引文中的每個作者字段由姓氏（ "\\\\w+," ）和表示第一/中間名稱（ "(\\\\w.)+" ）的首字母定義。 現在，只要我們沒有遇到冒號，我們就想這樣做，所以我們可以在表達式前加上"(?!:)" 。 只需將這三者結合起來，我們現在可以從每個引文中獲取所有作者姓名。 不幸的是，除了第一個之外的每個名稱現在都有一個領先的空間 可以通過忽略任何前導空格（ "[^ ]+" ）來刪除它。 現在我們將它們組合在一起，我們得到"(?!:)[^ ]+\\\\w+, (\\\\w.)+" 。 你的get_names()現在看起來像

std::vector<std::string> get_names(const std::string& sentence) {
   std::regex author_regex("(?!:)[^ ]+\\w+, (\\w.)+", std::regex_constants::ECMAScript);

   std::vector<std::string> names(std::sregex_token_iterator(sentence.begin(), sentence.end(), author_regex),
                                  std::sregex_token_iterator());
   return names;
}

早在main() ，如果你想用的名字傾倒std::copy()為std::vector<>用std::back_inserter()或成std::set<>用std::inserter() 。

int main() {
   const std::string citations[] = {"Smith, M.N., Martin, G., Erdos, P.: Newtonian forms of prime factor matrices",
                                    "Erdos, P., Reisig, W.: Stuttering in petri nets",
                                    "Smith, M.N., Chen, X.: First oder derivates in structured programming",
                                    "Jablonski, T., Hsueh, Z.: Selfstabilizing data structures"};
   std::set<std::string> all_authors;

   for (const auto& citation : citations) {
      auto citation_authors = get_names(citation);
      std::copy(citation_authors.begin(), citation_authors.end(), std::back_inserter(all_authors));
   }
}

如何在C ++中正確存儲正則表達式匹配

問題描述

1 個解決方案

解決方案1
3 已采納 2014-07-24 06:44:09

如何在C ++中正確存儲正則表達式匹配

問題描述

1 個解決方案

解決方案1 3 已采納 2014-07-24 06:44:09

解決方案1
3 已采納 2014-07-24 06:44:09