[英]Multiline regex match retrieving line numbers and matches
我正在嘗試遍歷文件中的所有行以匹配可能的模式;
一個示例輸入是;
new File()
new
File()
there is a new File()
new
File()
there is not a matching pattern here File() new
new File() test new File() occurs twice on this line
示例 output 將是;
new File() Found on line 1
new File() Found on lines 2 & 3
new File() Found on line 4
new File() Found on lines 5 & 9
new File() Found on line 11
new File() Found on line 11
6 occurrences of new File() pattern in test.txt (Filename)
正則表達式模式看起來像;
pattern = r'new\s+File\s*\({1}\s*\){1}'
查看此處的文檔,我可以看到 match、findall 和 finditer 都在字符串的開頭返回匹配項,但我沒有看到使用搜索 function 的方法,該搜索在任何位置查找我們正在搜索的字符串的正則表達式for 超過多行(上面我的要求中的第四個)。
足夠簡單,可以匹配每行不止一次出現的正則表達式;
示例輸入:
line = "new File() new File()"
代碼:
i = 0
matches = []
while i < len(line):
while line:
matchObj = re.search(r"new\s+File\s*\({1}\s*\){1}", line, re.MULTILINE | re.DOTALL)
if matchObj:
line = line[matchObj.end():]
matches.append(matchObj.group())
print(matches)
打印以下匹配項 - 目前不包括行號等:
['new File()', 'new File()']
有沒有辦法用 Python 的正則表達式來做我正在尋找的東西?
您可以首先找到文本中的所有\n
字符及其各自的位置/字符索引。 由於每個\n
...well... 開始一個新行,因此此列表中每個值的索引表示找到的\n
字符終止的行號。 然后搜索所有出現的模式並使用上述列表查找匹配的開始/結束 position ...
import re
import bisect
text = """new
File()
aa new File()
new
File()
there is a new File() and new
File() again
new
File()
there is not a matching pattern here File() new
new File() test new File() occurs twice on this line
"""
# character indices of all \n characters in text
nl = [m.start() for m in re.finditer("\n", text, re.MULTILINE|re.DOTALL)]
matches = list(re.finditer(r"(new\s+File\(\))", text, re.MULTILINE|re.DOTALL))
match_count = 0
for m in matches:
match_count += 1
r = range(bisect.bisect(nl, m.start()-1), bisect.bisect(nl, m.end()-1)+1)
print(re.sub(r"\s+", " ", m.group(1), re.DOTALL), "found on line(s)", *r)
print(f"{match_count} occurrences of new File() found in file....")
output:
new File() found on line(s) 0 1
new File() found on line(s) 2
new File() found on line(s) 3 4
new File() found on line(s) 5
new File() found on line(s) 5 6
new File() found on line(s) 7 8 9 10 11
new File() found on line(s) 13
new File() found on line(s) 13
8 occurrences of new File() found in file....
可以先統計匹配前的換行數,再統計匹配值中的換行數,合並行號:見Python演示:
import re
s='new File()\nnew\nFile()\nthere is a new File()\nnew\n \n \n \nFile()\nthere is not a matching pattern here File() new\nnew File() test new File() occurs twice on this line'
pattern = r'new\s+File\s*\(\s*\)'
for m in re.finditer(pattern, s):
linenums = [s[:m.start()].count('\n') + 1]
for _ in range(m.group().count('\n')):
linenums.append(linenums[-1] + 1)
print('{} Found on line {}'.format(re.sub(r'\s+', ' ', m.group()), ", ".join(map(str,linenums))))
請參閱在線 Python 演示。
Output:
new File() Found on line 1
new File() Found on line 2, 3
new File() Found on line 4
new File() Found on line 5, 6, 7, 8, 9
new File() Found on line 11
new File() Found on line 11
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.