[英]Non-greedy regex in Python
給定文本:
'Adf adf asdf asdf asfdf https://.com/abcabcabc kdfja ladsjfladsjf ladksjf ladsjfl adsfadf adf asdf asdf asfdf https://.com/abcabcabc \\ n kdfja ladsjfladsjf ladksjf ladsjcom / jfka \\ nka djldjfld djfladjf ldfdjlkfj ldfj。”
如何匹配https://.com/subdir形式的任何url [直到出現空格或換行,逗號或句號]?
嘗試:
re.findall('http.*',s)
['https://<somepage>.com/abcabcabc kdfja ladsjfladsjf ladksjf ladsjfl adsfadf adf asdf asdf asfdf https://<somepage>.com/abcabcabc', 'https://<somepage>.com/djflkajdsfl']
re.findall('http.* ',s)
['https://<somepage>.com/abcabcabc kdfja ladsjfladsjf ladksjf ladsjfl adsfadf adf asdf asdf asfdf ']
re.findall('http.* ?',s)
['https://<somepage>.com/abcabcabc kdfja ladsjfladsjf ladksjf ladsjfl adsfadf adf asdf asdf asfdf https://<somepage>.com/abcabcabc', 'https://<somepage>.com/djflkajdsfl']
re.findall('http.* {1}?',s)
['https://<somepage>.com/abcabcabc kdfja ladsjfladsjf ladksjf ladsjfl adsfadf adf asdf asdf asfdf ']
re.findall('http.* +?',s)
['https://<somepage>.com/abcabcabc kdfja ladsjfladsjf ladksjf ladsjfl adsfadf adf asdf asdf asfdf ']
re.findall('http.*[^ \n]',s)
['https://<somepage>.com/abcabcabc kdfja ladsjfladsjf ladksjf ladsjfl adsfadf adf asdf asdf asfdf
https://<somepage>.com/abcabcabc', 'https://<somepage>.com/djflkajdsfl']
re.findall('http.*[^ \\n]',s) ['https://<somepage>.com/abcabcabc kdfja ladsjfladsjf ladksjf ladsjfl adsfadf adf asdf asdf asfdf
https://<somepage>.com/abcabcabc', 'https://<somepage>.com/djflkajdsfl']
re.findall('http.*[^ \\\n]',s) ['https://<somepage>.com/abcabcabc kdfja ladsjfladsjf ladksjf ladsjfl adsfadf adf asdf asdf asfdf
https://<somepage>.com/abcabcabc', 'https://<somepage>.com/djflkajdsfl']
re.findall('http.* *?',s) ['https://imgur.com/abcabcabc kdfja ladsjfladsjf ladksjf ladsjfl adsfadf adf asdf asdf asfdf https://imgur.com/abcabcabc', 'https://somepage.com/djflkajdsfl']
第一個示例中的問題不在於regexp匹配的空格太多; 在空格前匹配了太多字母。 因此,不要放下您的“非貪婪” ?
空格后的修飾符,請將其放在.*
后面,因為這與當前的匹配太多有關。
py3.7 >>> re.findall('http.*? ', s)
['https://.com/abcabcabc ']
另一方面, [^ \\n]
並不是任何修飾符,它是一個完全匹配的表達式。 因此,將其放在現有表達式之后並不會使其匹配程度降低; 您現在有兩個匹配表達式,它們一起匹配更多。
您必須使用它代替匹配過多的表達式,而不是.
:
py3.7 >>> re.findall('http[^ \n]*', s)
['https://.com/abcabcabc', 'https://.com/abcabcabc', 'https://.com/djflkajdsfl']
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.