簡體   English   中英

跳過我正在尋找的模式中包含的正則表達式模式

[英]Skipping over a regex pattern contained within the pattern I'm looking for

我正在解析包含以^[開頭並以]結尾的腳注的 Pandoc-markdown 文件,其中一些包含嵌入的[] 例如:

...
to explain how the feature came to be as it is, so you can use generics more
effectively.^[Angelika Langer's [Java Generics FAQ](
www.angelikalanger.com/GenericsFAQ/JavaGenericsFAQ.html) as well as her other
writings (together with Klaus Kreft) were invaluable during the preparation of
this chapter.]
...

(在 Python 中)的簡單方法:

re.compile(r"\^\[.+?\]", flags=re.DOTALL)

在第一個]處停止,因此不會捕獲整個腳注。 有沒有辦法傳遞嵌套的[]子句?

您可以使用 PyPi 正則表達式模塊使用子程序來做到這一點,您只需要在設置組邊界時小心:

import regex
text = r"""...
to explain how the feature came to be as it is, so you can use generics more
effectively.^[Angelika Langer's [Java Generics FAQ](
www.angelikalanger.com/GenericsFAQ/JavaGenericsFAQ.html) as well as her other
writings (together with Klaus Kreft) were invaluable during the preparation of
this chapter.]
..."""
print( [x.group(1) for x in regex.finditer(r'\^(\[(?:[^][]++|(?1))*])', text)] )

輸出:

["[Angelika Langer's [Java Generics FAQ](\nwww.angelikalanger.com/GenericsFAQ/JavaGenericsFAQ.html) as well as her other\nwritings (together with Klaus Kreft) were invaluable during the preparation of\nthis chapter.]"]

請參閱Python 演示正則表達式演示 細節:

  • \\^ - ^字符
  • (\\[(?:[^][]++|(?1))*]) - 第 1 組:
    • \\[ - 一個[字符
    • (?:[^][]++|(?1))* - 0 次或多次出現:
      • [^][]++ - 除][之外的一個或多個字符
      • | - 或者
      • (?1) - 第 1 組模式
  • ] - 一個]字符。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM