使用python中的regex從tex文件中提取引用的bibtex鍵

Question

我正在嘗試使用python中的regex從LaTeX文檔中提取引用的BibTeX鍵。

我想排除引文如果被注釋掉（前面的％），但如果前面有百分號（\\％），仍然包含它。

這是我到目前為止提出的：

\\(?:no|)cite\w*\{(.*?)\}

嘗試一下的一個例子：

blablabla
Author et. al \cite{author92} bla bla. % should match
\citep{author93} % should match
\nocite{author94} % should match
100\%\nocite{author95} % should match
100\% \nocite{author95} % should match
%\nocite{author96} % should not match
\cite{author97, author98, author99} % should match
\nocite{*} % should not match

Regex101測試： https ：//regex101.com/r/ZaI8kG/2/

我感謝任何幫助。

Answer 1

使用具有以下表達式的較新的regex模塊（ pip install regex ）：

(?<!\\)%.+(*SKIP)(*FAIL)|\\(?:no)?citep?\{(?P<author>(?!\*)[^{}]+)\}

請參閱regex101.com上的演示 。

更詳細：

 (?<!\\\\)%.+(*SKIP)(*FAIL) # % (not preceded by \\) # and the whole line shall fail | # or \\\\(?:no)?citep? # \\nocite, \\cite or \\citep \\{ # { literally (?P<author>(?!\\*)[^{}]+) # must not start with a star \\} # } literally

如果無法安裝另一個庫，則需要將表達式更改為

['author92', 'author93', 'author94', 'author95', 'author95', 'author97, author98, author99']

並且需要以編程方式檢查第二個捕獲組是否已設置（不是空的，即）。
后者可能是Python ：

 import re latex = r""" blablabla Author et. al \\cite{author92} bla bla. % should match \\citep{author93} % should match \\nocite{author94} % should match 100\\%\\nocite{author95} % should match 100\\% \\nocite{author95} % should match %\\nocite{author96} % should not match \\cite{author97, author98, author99} % should match \\nocite{*} % should not match """ rx = re.compile(r'''(?<!\\\\)%.+|(\\\\(?:no)?citep?\\{((?!\\*)[^{}]+)\\})''') authors = [m.group(2) for m in rx.finditer(latex) if m.group(2)] print(authors)

哪個收益率

 ['author92', 'author93', 'author94', 'author95', 'author95', 'author97, author98, author99']

Answer 2

我沒有遵循最后一個的邏輯，在我看來*可能不需要{} ，在這種情況下，也許你想設計一個類似於的表達式：

^(?!(%\\(?:no)?cite\w*\{([^}]*?)\}))[^*\n]*$

雖然不確定。

使用python中的regex從tex文件中提取引用的bibtex鍵

問題描述

2 個解決方案

解決方案1
3 已采納 2019-07-16 20:21:03

解決方案2
1 2019-07-16 20:23:33

DEMO

使用python中的regex從tex文件中提取引用的bibtex鍵

問題描述

2 個解決方案

解決方案1 3 已采納 2019-07-16 20:21:03

解決方案2 1 2019-07-16 20:23:33

DEMO

解決方案1
3 已采納 2019-07-16 20:21:03

解決方案2
1 2019-07-16 20:23:33