![](/img/trans.png)
[英]How to match/search for a specific pattern but not having another specific pattern?
[英]How to search for urls that match a specific pattern?
所以我的目標是制作一個 python 腳本,該腳本讀取 email,然后選擇其中的特定鏈接,然后在網絡瀏覽器中打開該鏈接。
但目前我被困在獲得所有 URL 鏈接的部分。 但我只想將它們過濾到特定的特定 URL 包含"/user/cm-l.php?"
但在問號之后,你會得到一個隨機生成的鏈接。
有人知道如何解決此問題或編輯腳本以僅過濾包含該部分的 URL 嗎?
我用re.search/findall/match
嘗試了一些東西,但我無法讓它工作,所以它只會過濾那個 URL。
import imaplib
import email
import re
# imap and user credentials.
mail = imaplib.IMAP4_SSL('imap.domain.com')
mail.login('username@domain.com', 'password')
mail.list()
# connect to right mailbox inside inbox.
mail.select("inbox")
result, data = mail.search(None, "ALL")
# data is a list.
ids = data[0]
# ids is a space separated string.
id_list = ids.split()
# changes which e-mail to read. '-1': gets the latest e-mail.
latest_email_id = id_list[6]
result, data = mail.fetch(latest_email_id, "(RFC822)")
raw_email = data[0][1]
raw_email = str(raw_email)
# this will search al the urls in an email.
def Find(string):
regex = r"(?i)\b((?:https?://|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/user)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'\".,<>?«»“”‘’]))"
url = re.findall(regex,string)
return [x[0] for x in url]
# prints all of the URLs.
print(Find(raw_email))
通過使用應用組(..)
定義正則表達式模式,您可以找到帶有可選前綴和后綴的確切字符串。 ([a-zA-Z\/]*?)(\/user\/cm-l\.php\?)(.*)?
包括三組。
以下示例顯示了如何訪問提取的內容。
import re
mailstring = """
/user/cm-l.php?
some link : /main/home/user/cm-l.php?
link with suffix /user/cm-l.php?345TfvbzteW4rv#!_
"""
def Find(string):
pattern = r'([a-zA-Z\/]*?)(\/user\/cm-l\.php\?)(.*)?'
for idx,match in enumerate(re.findall(pattern,string)):
print(f'### Match {idx}')
print('full= ',''.join(match))
print('0= ',match[0])
print('1= ',match[1]) # match[1] is the base url
print('2= ',match[2])
Find(mailstring)
'''
### Match 0
full= /user/cm-l.php?
0=
1= /user/cm-l.php?
2=
### Match 1
full= /main/home/user/cm-l.php?
0= /main/home
1= /user/cm-l.php?
2=
### Match 2
full= /user/cm-l.php?345TfvbzteW4rv#!_
0=
1= /user/cm-l.php?
2= 345TfvbzteW4rv#!_
'''
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.