簡體   English   中英


[英]Regex: How to select everything but a specified regex pattern


正如您在這里看到的: https ://regex101.com/r/kFJFVi/2

我想忽略的文本模式是這個<([^>]+?)([^>]*?)>(.*?)<\/\1> 我嘗試使用一些策略,但到目前為止沒有成功。

基於問題例如: ^.*(<([^>]+?)([^>]*?)>(.*?)<\/\1>)?.*$但此模式選擇所有文本並且不忽略標簽




     This is the second paragraph. It contains an ordered list: <ol> <li>Item 1</li> <li>Item 2</li> <li>Item 3</li> </ol> This is a text after the list in the second paragraph. This is another part of a paragraph <ol> <li>Item 1</li> <li>Item 2</li> <li>Item 3</li> </ol> This is a text after the other list in the second paragraph. This is a text after the list in the second paragraph. This is another part of a paragraph <ol> <li>Item 1</li> <li>Item 2</li> <li>Item 3</li> </ol> test to odfjdf iofsdfsoh



     This is a text after the list in the second paragraph.
            This is another part of a paragraph


    This is a text after the other list in the second paragraph.
    This is a text after the list in the second paragraph.
            This is another part of a paragraph


    test to odfjdf iofsdfsoh


     test to odfjdf iofsdfsoh

    基本上,所有不在 HTML 標記中的文本。

    如果 RegExp 不是絕對要求:

    使用 DOMParser 解析 XML/HTML 通常比使用 RegExp 更容易。 下面的代碼創建一個新文檔,刪除<ol>標簽,並清理結果。

     const p = new DOMParser(); const doc = p.parseFromString(document.getElementById("content").innerHTML, "text/html"); doc.querySelectorAll("body ol").forEach(n=>doc.querySelector("body").removeChild(n)); let result = doc.querySelector("body").textContent.split("\n"); result = result.map(str=>str.trim()).filter(str=>str.trim();== ""). console;log(result);
     <div id="content"> This is the second paragraph. It contains an ordered list: <ol> <li>Item 1</li> <li>Item 2</li> <li>Item 3</li> </ol> This is a text after the list in the second paragraph. This is another part of a paragraph <ol> <li>Item 1</li> <li>Item 2</li> <li>Item 3</li> </ol> This is a text after the other list in the second paragraph. This is a text after the list in the second paragraph. This is another part of a paragraph <ol> <li>Item 1</li> <li>Item 2</li> <li>Item 3</li> </ol> test to odfjdf iofsdfsoh </div>

    感謝 Jay,我找到了一種檢索解決方案的方法。 由於他們在 Javascript 中的帖子,我找到了一種查找正則表達式反轉搜索的方法。

    我的解決方案是在 C#

    var content = @"
    This is the second paragraph. It contains an ordered list: 
                <li>Item 1</li>
                <li>Item 2</li>
                <li>Item 3</li>
            This is a text after the list in the second paragraph.
            This is another part of a paragraph
                <li>Item 1</li>
                <li>Item 2</li>
                <li>Item 3</li>
            This is a text after the other list in the second paragraph.
    This is a text after the list in the second paragraph.
            This is another part of a paragraph
                <li>Item 1</li>
                <li>Item 2</li>
                <li>Item 3</li>
    test to odfjdf iofsdfsoh
    // first thing: I created a regex group for the string I want to ignore.
    Regex textOutsideTag = new(@"(?<innerTags><([^>]+?)([^>]*?)>(.*?)<\/\1>)", RegexOptions.Singleline);
    // Using linq, I select all matches and after that I made the replacement for the string {break} for break lines and receive it as array;        
    var textGroups = textOutsideTag
                                    .Select(p => content.Replace(p.Groups["innerTags"].Value, "{break}"))
    foreach(var texts in textGroups){
    /// output:
    This is the second paragraph. It contains an ordered list: 
    This is a text after the list in the second paragraph.
    This is another part of a paragraph
    This is a text after the other list in the second paragraph.
    This is a text after the list in the second paragraph.
    This is another part of a paragraph
    test to odfjdf iofsdfsoh

    要創建一個正則表達式來選擇文本中除指定模式之外的所有內容,您可以使用否定先行斷言。 否定先行斷言允許您指定不應匹配的模式,並且僅當模式不存在時正則表達式才會匹配。

    例如,要匹配問題中指定的 HTML 標記中未包含的所有文本,您可以使用以下正則表達式:


    這個正則表達式將匹配任何字符 (.) 零次或多次 (*),只要它后面沒有跟隨 ((?....)) 指定的 HTML 標記模式。


    let input = "..."; // the input text
    let regex = /(?!<([^>]+?)([^>]*?)>(.*?)<\/\1>).*/g; // the regular expression
    let matches = input.match(regex); // get the matches


    聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

    粵ICP備18138465號  © 2020-2024 STACKOOM.COM