如何在 C++ 中使用正則表達式換行后不捕獲空格

Question

我試圖從 c/c++/java 文件中捕獲注釋，但我找不到跳過新行后可能存在的空格的方法。 我的正則表達式模式是

regex reg("(//.*|/\\\\*(.|\\\\n)*?\\\\*/)");

例如在下面的代碼中（不要理會隨機代碼片段，它們可以是任何東西......）我正確地捕捉了評論：

// my  program in C++
#include <iostream>
/** playing around in
a new programming language **/
using namespace std;

輸出是：

// my  program in C++
/** playing around in
a new programming language **/

但是，當我在多行注釋上有帶有空格的代碼時，例如：

int main(){
        /* start always points to the first node of the linked list.
           temp is used to point to the last node of the linked list.*/
        node *start,*temp;
        start = (node *)malloc(sizeof(node));
        temp = start;
        temp -> next = NULL;
        temp -> prev = NULL;
        /* Here in this code, we take the first node as a dummy node.
           The first node does not contain data, but it used because to avoid handling special cases
           in insert and delete functions.
         */
        printf("1. Insert\n");

我捕獲：

/* start always points to the first node of the linked list.
           temp is used to point to the last node of the linked list.*/
/* Here in this code, we take the first node as a dummy node.
           The first node does not contain data, but it used because to avoid handling special cases
           in insert and delete functions.
         */

代替：

/* start always points to the first node of the linked list.
temp is used to point to the last node of the linked list.*/
/* Here in this code, we take the first node as a dummy node.
The first node does not contain data, but it used because to avoid handling special cases
in insert and delete functions.
*/

我怎樣才能在正則表達式模式中繞過它來避免這種情況？

注意：如果可能，我想避免使用字符串操作符等，只需修改正則表達式即可。

Answer 1

轉換我上面的評論。

不可能匹配不連續的文本。 相反，您可以將文本的一部分與正則表達式匹配，然后使用另一個正則表達式或字符串操作對匹配（或捕獲）的值進行后處理。

這是一個例子（不是最好的，只是為了展示這個概念）：

string data("int main(){// Singleline content\n        /* start always points to the first node of the linked list.\n           temp is used to point to the last node of the linked list.*/\n        node *start,*temp;\n        start = (node *)malloc(sizeof(node));\n        temp = start;\n        temp -> next = NULL;\n        temp -> prev = NULL;\n        /* Here in this code, we take the first node as a dummy node.\n           The first node does not contain data, but it used because to avoid handling special cases\n           in insert and delete functions.\n         */\n        printf(\"1. Insert\n\");");
    //std::cout << "Data: " << data << std::endl;
    std::regex pattern(R"(//.*|/\*[^*]*\*+(?:[^/*][^*]*\*+)*/)");
    std::smatch result;

    while (regex_search(data, result, pattern)) {
        std::cout << std::regex_replace(result[0].str(), std::regex(R"((^|\n)[^\S\r\n]+)"), "$1") << std::endl;
        data = result.suffix().str();
    }

查看IDEONE 演示

注意：原始字符串文字簡化了正則表達式定義。

R"(//.*|/\\*[^*]*\\*+(?:[^/*][^*]*\\*+)*/)"匹配// + 任意 0+ 個字符但是換行符（單行注釋）和/\\*[^*]*\\*+(?:[^/*][^*]*\\*+)*/匹配/*后跟 0+ 非* s帶有 1+ * s，后跟 0+ 字符序列，而不是/和* ，然后是 0+ 非* ，然后是 1+ * s（多行注釋）。 這個多行注釋比你的多行注釋高效得多，因為它是寫成 acc 的。 到展開循環技術。

我用regex_replace(result[0].str(), std::regex(R"((^|\\n)[^\\S\\r\\n]+)"), "$1")刪除了一行上的第一個水平空格regex_replace(result[0].str(), std::regex(R"((^|\\n)[^\\S\\r\\n]+)"), "$1") : (^|\\n)[^\\S\\r\\n]+匹配並捕獲字符串開頭的錨點或后跟 1 個以上字符的換行符，而不是非空白、CR 和如果。

如何在 C++ 中使用正則表達式換行后不捕獲空格

問題描述

1 個解決方案

解決方案1
1 已采納 2016-05-03 19:23:14

如何在 C++ 中使用正則表達式換行后不捕獲空格

問題描述

1 個解決方案

解決方案1 1 已采納 2016-05-03 19:23:14

解決方案1
1 已采納 2016-05-03 19:23:14