簡體   English   中英

通過正則表達式刪除Wikitext超鏈接

[英]Remove wikitext hyperlinks via regex

維基文本超鏈接有兩種:

[[stack]]
[[heap (memory region)|heap]]

我想刪除超鏈接,但保留以下文本:

stack
heap

當前,我正在運行兩個階段,使用兩個不同的正則表達式:

public class LinkRemover
{
    private static final Pattern
    renamingLinks = Pattern.compile("\\[\\[[^\\]]+?\\|(.+?)\\]\\]");

    private static final Pattern
    simpleLinks = Pattern.compile("\\[\\[(.+?)\\]\\]");

    public static String removeLinks(String input)
    {
        String temp = renamingLinks.matcher(input).replaceAll("$1");
        return simpleLinks.matcher(temp).replaceAll("$1");
    }
}

有沒有一種方法可以將兩個正則表達式“融合”為一個,從而獲得相同的結果?

如果要檢查建議的解決方案的正確性,請使用以下簡單的測試類:

public class LinkRemoverTest
{
    @Test
    public void test()
    {
        String input = "A sheep's [[wool]] is the most widely used animal fiber, and is usually harvested by [[Sheep shearing|shearing]].";
        String expected = "A sheep's wool is the most widely used animal fiber, and is usually harvested by shearing.";
        String output = LinkRemover.removeLinks(input);
        assertEquals(expected, output);
    }
}

您可以將零件制作成直到管道可選:

\\[\\[(?:[^\\]|]*\\|)?([^\\]]+)\\]\\]

為確保您始終位於方括號之間,請使用字符類。

提琴 (單擊Java按鈕)

圖案細節:

\\[\\[         # literals opening square brackets
(?:            # open a non-capturing group
    [^\\]|]*   # zero or more characters that are not a ] or a |
    \\|        # literal |
)?             # make the group optional
([^\\]]+)      # capture all until the closing square bracket
\\]\\]

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM