[英]How to not capture whitespaces after a new line with regex in c++
我試圖從 c/c++/java 文件中捕獲注釋,但我找不到跳過新行后可能存在的空格的方法。 我的正則表達式模式是
regex reg("(//.*|/\\\\*(.|\\\\n)*?\\\\*/)");
例如在下面的代碼中(不要理會隨機代碼片段,它們可以是任何東西......)我正確地捕捉了評論:
// my program in C++
#include <iostream>
/** playing around in
a new programming language **/
using namespace std;
輸出是:
// my program in C++
/** playing around in
a new programming language **/
但是,當我在多行注釋上有帶有空格的代碼時,例如:
int main(){
/* start always points to the first node of the linked list.
temp is used to point to the last node of the linked list.*/
node *start,*temp;
start = (node *)malloc(sizeof(node));
temp = start;
temp -> next = NULL;
temp -> prev = NULL;
/* Here in this code, we take the first node as a dummy node.
The first node does not contain data, but it used because to avoid handling special cases
in insert and delete functions.
*/
printf("1. Insert\n");
我捕獲:
/* start always points to the first node of the linked list.
temp is used to point to the last node of the linked list.*/
/* Here in this code, we take the first node as a dummy node.
The first node does not contain data, but it used because to avoid handling special cases
in insert and delete functions.
*/
代替:
/* start always points to the first node of the linked list.
temp is used to point to the last node of the linked list.*/
/* Here in this code, we take the first node as a dummy node.
The first node does not contain data, but it used because to avoid handling special cases
in insert and delete functions.
*/
我怎樣才能在正則表達式模式中繞過它來避免這種情況?
注意:如果可能,我想避免使用字符串操作符等,只需修改正則表達式即可。
轉換我上面的評論。
不可能匹配不連續的文本。 相反,您可以將文本的一部分與正則表達式匹配,然后使用另一個正則表達式或字符串操作對匹配(或捕獲)的值進行后處理。
這是一個例子(不是最好的,只是為了展示這個概念):
string data("int main(){// Singleline content\n /* start always points to the first node of the linked list.\n temp is used to point to the last node of the linked list.*/\n node *start,*temp;\n start = (node *)malloc(sizeof(node));\n temp = start;\n temp -> next = NULL;\n temp -> prev = NULL;\n /* Here in this code, we take the first node as a dummy node.\n The first node does not contain data, but it used because to avoid handling special cases\n in insert and delete functions.\n */\n printf(\"1. Insert\n\");");
//std::cout << "Data: " << data << std::endl;
std::regex pattern(R"(//.*|/\*[^*]*\*+(?:[^/*][^*]*\*+)*/)");
std::smatch result;
while (regex_search(data, result, pattern)) {
std::cout << std::regex_replace(result[0].str(), std::regex(R"((^|\n)[^\S\r\n]+)"), "$1") << std::endl;
data = result.suffix().str();
}
注意:原始字符串文字簡化了正則表達式定義。
R"(//.*|/\\*[^*]*\\*+(?:[^/*][^*]*\\*+)*/)"
匹配//
+ 任意 0+ 個字符但是換行符(單行注釋)和/\\*[^*]*\\*+(?:[^/*][^*]*\\*+)*/
匹配/*
后跟 0+ 非*
s帶有 1+ *
s,后跟 0+ 字符序列,而不是/
和*
,然后是 0+ 非*
,然后是 1+ *
s(多行注釋)。 這個多行注釋比你的多行注釋高效得多,因為它是寫成 acc 的。 到展開循環技術。
我用regex_replace(result[0].str(), std::regex(R"((^|\\n)[^\\S\\r\\n]+)"), "$1")
刪除了一行上的第一個水平空格regex_replace(result[0].str(), std::regex(R"((^|\\n)[^\\S\\r\\n]+)"), "$1")
: (^|\\n)[^\\S\\r\\n]+
匹配並捕獲字符串開頭的錨點或后跟 1 個以上字符的換行符,而不是非空白、CR 和如果。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.