簡體   English   中英

正則表達式匹配文本中的單詞,但不匹配引號或注釋中的單詞

[英]Regex match word in a text but not in quotes or comments

我正在為 VS Code 構建擴展並使用格式化程序 API 來大寫所有關鍵字。

假設我在編輯器中有代碼。

TYPE MyStruct : STRUCT
        this.var1 : POINTER TO INT; (* Указатель 1 *)
        var2 : POINTER TO INT; (* this is Указатель 2 *)
        sStr: STRING(200) := "This 
            Test this line";    
        sStr: STRING(200) := "Test this line";    
        sStr: STRING(200) := 'Test this line';    
    END_STRUCT
END_TYPE

THIS.MyStruct := 100;

我想在注釋(* ... *)或字符串(單引號或雙引號中)中找到所有this單詞?

我的嘗試是 ig

(?<=^([^"'])*)\bthis\b

但它仍然在評論中選擇,如果有新行。

這是我的真實代碼示例

let keywords = [
    'true', 'false', 'exit', 'continue', 'return', 'constant', 'retain',
    'public', 'private', 'protected', 'abstract','persistent','internal',
    'final','of','else','elsif','then','__try','__catch','__finally',
    '__endtry','do','to','by','task','with','using','uses','from',
    'until','or','or_else','and','and_then','not','xor','nor','ge',
    'le','eq','ne','gt','lt','__new','__delete', 'extends','implements',
    'this','super'
];
let regEx = new RegExp(`\\b(?:${keywords.join('|')}|AT|BOOL|BYTE|(?:D|L)?WORD|U?(?:S|D|L)?INT|L?REAL|TIME(?:_OF_DAY)?|TOD|DT|DATE(?:_AND_TIME)?|STRING|ARRAY|ANY)\\b`, "ig");
text = text.replace(regEx, (match) => {
    return match.toUpperCase();
});

您需要匹配您需要丟棄的上下文,然后匹配並捕獲您需要修改的模式的那些出現:

/(?<!\\(?:\\{2})*)"[^"\\]*(?:\\[\s\S][^\\"]*)*"|\(\*[\s\S]*?\*\)|\b(true|false|exit|continue|return|constant|retain|public|private|protected|abstract|persistent|internal|final|of|else|elsif|then|__try|__catch|__finally|__endtry|do|to|by|task|with|using|uses|from|until|or|or_else|and|and_then|not|xor|nor|ge|le|eq|ne|gt|lt|__new|__delete|extends|implements|this|super|AT|BOOL|BYTE|(?:D|L)?WORD|U?(?:S|D|L)?INT|L?REAL|TIME(?:_OF_DAY)?|TOD|DT|DATE(?:_AND_TIME)?|STRING|ARRAY|ANY)\b/gi

請參閱此正則表達式演示

我將第一個(?:在您的模式中更改為(以便您的預期匹配被捕獲到第 1 組中,並添加(?<!\\\\(?:\\\\{2})*)"[^"\\\\]*(?:\\\\[\\s\\S][^\\\\"]*)*"|\\(\\*[\\s\\S]*?\\*\\)|在模式的開頭:

  • (?<!\\\\(?:\\\\{2})*)"[^"\\\\]*(?:\\\\[\\s\\S][^\\\\"]*)*" - 一個位置不是前面有一個反斜杠,可選擇后跟任意數量的反斜杠,然后是一個支持轉義序列的雙引號字符串
  • | - 或者
  • \\(\\*[\\s\\S]*?\\*\\) - (* ,然后是任何 0+ 個字符,盡可能少,然后是*)

參見 JavaScript 演示:

 const keywords = [ 'true', 'false', 'exit', 'continue', 'return', 'constant', 'retain', 'public', 'private', 'protected', 'abstract','persistent','internal', 'final','of','else','elsif','then','__try','__catch','__finally', '__endtry','do','to','by','task','with','using','uses','from', 'until','or','or_else','and','and_then','not','xor','nor','ge', 'le','eq','ne','gt','lt','__new','__delete', 'extends','implements', 'this','super' ]; const regEx = new RegExp(String.raw`(?<!\\\\(?:\\\\{2})*)"[^"\\\\]*(?:\\\\.[^\\\\"]*)*"|\\(\\*.*?\\*\\)|\\b(${keywords.join('|')}|AT|BOOL|BYTE|(?:D|L)?WORD|U?(?:S|D|L)?INT|L?REAL|TIME(?:_OF_DAY)?|TOD|DT|DATE(?:_AND_TIME)?|STRING|ARRAY|ANY)\\b`, "igs"); let text = "TYPE MyStruct : STRUCT\\n this.var1 : POINTER TO INT; (* Указатель 1 *)\\n var2 : POINTER TO INT; (* this is Указатель 2 *)\\n sStr: STRING(200) := \\"This \\n Test this line\\"; \\n sStr: STRING(200) := \\"Test this line\\"; \\n sStr: STRING(200) := 'Test this line'; \\n END_STRUCT\\nEND_TYPE\\n\\nTHIS.MyStruct := 100;"; text = text.replace(regEx, (match,group) => { return group != undefined ? match.toUpperCase() : match; }); console.log(text);

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM