[英]regex to capture overlapping matches preceding any number with more than 4 digits
我正在寫一個正則表達式,在下面的文本中有一個超過4位的數字之前選擇30個字符。 這是我的代碼:
text = "I went and I bought few tickets and ticket numbers 100000,100001 and 100002.I bought them for 200,300 and 400 USD. Box office collections were 55555555 USD"
reg=".{0,30}(?:[\d]+[ .]?){5,}"
regc=re.compile(reg)
res=regc.findall(text)
這給出了以下部分結果
我在100000之前得到30個字符。
如何在100001之前獲得30個字符?如何在100002之前獲得30個字符?
由於您需要重疊匹配,因此需要使用外觀。 但是, re
中的lookbehinds具有固定寬度,因此,您可以使用hack:反轉字符串,使用前瞻的正則表達式,然后反轉匹配:
import re
rev_rx = r'((?:\d+[ .]?){5,})(?=(.{0,30}))'
text="I went and I bought few tickets and ticket numbers 100000,100001 and 100002.I bought them for 200,300 and 400 USD. Box office collections were 55555555 USD"
results = [ "{}{}".format(y[::-1], x[::-1]) for x, y in re.findall(rev_rx, text[::-1]) ]
print(results)
# => ['D. Box office collections were 55555555', 'cket numbers 100000,100001 and 100002', 'ets and ticket numbers 100000,100001', 'few tickets and ticket numbers 100000']
請參閱Python演示 。
((?:\\d+[ .]?){5,})(?=(.{0,30}))
正則表達式匹配並捕獲組1中五個或更多1+位序列和一個可選空格或逗號。 然后,正向前瞻檢查字符串中是否有0到30個字符。 子字符串被捕獲到第2組中。因此,您只需連接反向的第2組和第1組值即可獲得所需的匹配項。
您可以通過將一些簡單的正則表達式與字符串方法結合使用來獲得超過4位數的任何數字前面的30個字符(而不是使用更復雜的正則表達式來查找匹配項並捕獲所需的字符)。
下面的示例使用正則表達式查找超過4位的所有數字,然后使用str.find()
獲取原始文本中每個匹配的位置,以便您可以切片前30個字符:
import re
text = "I went and I bought few tickets and ticket numbers 100000,100001 and 100002.I bought them for 200,300 and 400 USD. Box office collections were 55555555 USD"
patt = re.compile(r'\d{5,}')
nums = patt.findall(text)
matches = [text[:text.find(n)][-30:] for n in nums]
print(matches)
# OUTPUT (shown on multiple lines for readability)
# [
# 'ew tickets and ticket numbers ',
# 'ets and ticket numbers 100000,',
# 'ket numbers 100000,100001 and ',
# '. Box office collections were '
# ]
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.