正則表達式 C++：提取子字符串

Question

我想在另外兩個之間提取一個子字符串。
例如： /home/toto/FILE_mysymbol_EVENT.DAT
或者只是FILE_othersymbol_EVENT.DAT
我想得到： mysymbol和othersymbol

我不想使用 boost 或其他庫。 只是來自 C++ 的標准東西，除了 CERN 的 ROOT 庫，帶有TRegexp ，但我不知道如何使用它......

Answer 1

自去年以來，C++ 已經在標准中內置了正則表達式。 該程序將展示如何使用它們來提取您所追求的字符串：

#include <regex>
#include <iostream>

int main()
{
    const std::string s = "/home/toto/FILE_mysymbol_EVENT.DAT";
    std::regex rgx(".*FILE_(\\w+)_EVENT\\.DAT.*");
    std::smatch match;

    if (std::regex_search(s.begin(), s.end(), match, rgx))
        std::cout << "match: " << match[1] << '\n';
}

它將輸出：

match: mysymbol

但應該注意的是，它在 GCC 中不起作用，因為它的庫對正則表達式的支持不是很好。 在 VS2010（可能還有 VS2012）中運行良好，並且應該在 clang 中運行。

到現在（2016 年末），所有現代 C++ 編譯器及其標准庫都完全符合 C++11 標准，即使不是全部也是 C++14 標准。 GCC 6 和即將到來的 Clang 4 也支持大部分即將到來的 C++17 標准。

Answer 2

與其他正則表達式相比，TRegexp 僅支持非常有限的正則表達式子集。 這使得構建適合您需求的單個正則表達式有些尷尬。

一種可能的解決方案：

[^_]*_([^_]*)_

將匹配字符串直到第一個下划線，然后捕獲所有字符直到下一個下划線。 然后在第 1 組中找到匹配的相關結果。

但在你的情況下，為什么要使用正則表達式呢？ 只需在字符串中找到第一次和第二次出現的分隔符_並提取這些位置之間的字符。

Answer 3

如果你想使用正則表達式，我真的推薦使用 C++11 的正則表達式，或者，如果你有一個還不支持它們的編譯器，Boost。 Boost 是我認為幾乎是標准 C++ 的一部分。

但是對於這個特定的問題，您實際上並不需要任何形式的正則表達式。 在添加所有適當的錯誤檢查（ beg != npos 、 end != npos等）、測試代碼並刪除我的錯別字之后，類似這個草圖的東西應該可以正常工作：

std::string between(std::string const &in,
                    std::string const &before, std::string const &after) {
  size_type beg = in.find(before);
  beg += before.size();
  size_type end = in.find(after, beg);
  return in.substr(beg, end-beg);
}

顯然，您可以將std::string更改為模板參數，並且它應該可以與std::wstring或更很少使用的std::basic_string實例一起正常工作。

Answer 4

在信任它之前，我會研究極端案例。

但這是一個很好的候選人：

std::string text = "/home/toto/FILE_mysymbol_EVENT.DAT";
std::regex reg("(.*)(FILE_)(.*)(_EVENT.DAT)(.*)");
std::cout << std::regex_replace(text, reg, "$3") << '\n';

Answer 5

一些程序員老兄、Tim Pietzcker 和 Christopher Creutzig 的答案很酷且正確，但在我看來，它們對於初學者來說並不是很明顯。

下面的函數是試圖為一些程序員老兄和 Tim Pietzcker 的答案創建一個輔助插圖：

void ExtractSubString(const std::string& start_string
    , const std::string& string_regex_extract_substring_template)
{
    std::regex regex_extract_substring_template(
        string_regex_extract_substring_template);

    std::smatch match;

    std::cout << std::endl;

    std::cout << "A substring extract template: " << std::endl;
    std::cout << std::quoted(string_regex_extract_substring_template) 
        << std::endl;

    std::cout << std::endl;

    std::cout << "Start string: " << std::endl;
    std::cout << start_string << std::endl;

    std::cout << std::endl;

    if (std::regex_search(start_string.begin(), start_string.end()
       , match, regex_extract_substring_template))
    {
        std::cout << "match0: " << match[0] << std::endl;
        std::cout << "match1: " << match[1] << std::endl;
        std::cout << "match2: " << match[2] << std::endl;
    }

    std::cout << std::endl;
}

以下重載函數試圖幫助說明 Christopher Creutzig 的答案：

void ExtractSubString(const std::string& start_string
    , const std::string& before_substring, const std::string& after_substring)
{
    std::cout << std::endl;

    std::cout << "A before substring: " << std::endl;
    std::cout << std::quoted(before_substring) << std::endl;

    std::cout << std::endl;

    std::cout << "An after substring: " << std::endl;
    std::cout << std::quoted(after_substring) << std::endl;

    std::cout << std::endl;

    std::cout << "Start string: " << std::endl;
    std::cout << start_string << std::endl;

    std::cout << std::endl;

    size_t before_substring_begin 
        = start_string.find(before_substring);
    size_t extract_substring_begin 
        = before_substring_begin + before_substring.size();
    size_t extract_substring_end 
        = start_string.find(after_substring, extract_substring_begin);

    std::cout << "Extract substring: " << std::endl;
    std::cout
    << start_string.substr(extract_substring_begin
       , extract_substring_end - extract_substring_begin)
    << std::endl;

    std::cout << std::endl;
}

這是運行重載函數的主要函數：

#include <regex>
#include <iostream>
#include <iomanip>

int main()
{
    const std::string start_string 
        = "/home/toto/FILE_mysymbol_EVENT.DAT";

    const std::string string_regex_extract_substring_template(
        ".*FILE_(\\w+)_EVENT\\.DAT.*");
    const std::string string_regex_extract_substring_template2(
        "[^_]*_([^_]*)_");

    ExtractSubString(start_string, string_regex_extract_substring_template);

    ExtractSubString(start_string, string_regex_extract_substring_template2);

    const std::string before_substring = "/home/toto/FILE_";
    const std::string after_substring = "_EVENT.DAT";

    ExtractSubString(start_string, before_substring, after_substring);
}

這是執行 main 函數的結果：

A substring extract template: 
".*FILE_(\\w+)_EVENT\\.DAT.*"

Start string: 
"/home/toto/FILE_mysymbol_EVENT.DAT"

match0: /home/toto/FILE_mysymbol_EVENT.DAT
match1: mysymbol
match2: 


A substring extract template: 
"[^_]*_([^_]*)_"

Start string: 
"/home/toto/FILE_mysymbol_EVENT.DAT"

match0: /home/toto/FILE_mysymbol_
match1: mysymbol
match2: 


A before substring: 
"/home/toto/FILE_"

An after substring: 
"_EVENT.DAT"

Start string: 
"/home/toto/FILE_mysymbol_EVENT.DAT"

Extract substring: 
mysymbol

正則表達式 C++：提取子字符串

問題描述

5 個解決方案

解決方案1
55 已采納 2012-07-24 11:21:13

解決方案2
4 2012-07-24 09:04:02

解決方案3
4 2012-07-24 09:25:56

解決方案4
0 2020-02-14 21:15:54

解決方案5
0 2022-07-03 17:57:45

正則表達式 C++：提取子字符串

問題描述

5 個解決方案

解決方案1 55 已采納 2012-07-24 11:21:13

解決方案2 4 2012-07-24 09:04:02

解決方案3 4 2012-07-24 09:25:56

解決方案4 0 2020-02-14 21:15:54

解決方案5 0 2022-07-03 17:57:45

解決方案1
55 已采納 2012-07-24 11:21:13

解決方案2
4 2012-07-24 09:04:02

解決方案3
4 2012-07-24 09:25:56

解決方案4
0 2020-02-14 21:15:54

解決方案5
0 2022-07-03 17:57:45