如何搜索与特定模式匹配的 url？

Question

So my goal is to make a python script that reads an email and then selects a specific link in it, which it then opens in a web-browser.所以我的目标是制作一个 python 脚本，该脚本读取 email，然后选择其中的特定链接，然后在网络浏览器中打开该链接。

But at the moment I'm stuck at the part whereby I get all the URL links.但目前我被困在获得所有 URL 链接的部分。 But I want to filter those to only a specific one The specific URL contains "/user/cm-l.php?"但我只想将它们过滤到特定的特定 URL 包含"/user/cm-l.php?" but after the question mark, you get a randomly generated link.但在问号之后，你会得到一个随机生成的链接。

Does someone know how to fix this or edit the script to filter for only URLs that contain that part?有人知道如何解决此问题或编辑脚本以仅过滤包含该部分的 URL 吗？

I tried something with the re.search/findall/match but I couldn't make it work so it would filter for only that URL.我用re.search/findall/match尝试了一些东西，但我无法让它工作，所以它只会过滤那个 URL。

import imaplib 
import email
import re

# imap and user credentials.
mail = imaplib.IMAP4_SSL('imap.domain.com')
mail.login('username@domain.com', 'password')
mail.list()
# connect to right mailbox inside inbox.
mail.select("inbox")

result, data = mail.search(None, "ALL")

# data is a list.
ids = data[0]
# ids is a space separated string.
id_list = ids.split()
# changes which e-mail to read. '-1': gets the latest e-mail.
latest_email_id = id_list[6]

result, data = mail.fetch(latest_email_id, "(RFC822)")

raw_email = data[0][1]
raw_email = str(raw_email)

# this will search al the urls in an email.
def Find(string):
    regex = r"(?i)\b((?:https?://|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/user)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'\".,<>?«»“”‘’]))"
    url = re.findall(regex,string)      
    return [x[0] for x in url] 

# prints all of the URLs.
print(Find(raw_email))

Answer 1

By defining regex pattern with applying groups (..) , you can find exact strings with optional pre- and suffix.通过使用应用组(..)定义正则表达式模式，您可以找到带有可选前缀和后缀的确切字符串。 ([a-zA-Z\/]*?)(\/user\/cm-l\.php\?)(.*)? includes three groups.包括三组。

The following example shows how to access the extracted content.以下示例显示了如何访问提取的内容。

import re
mailstring = """
/user/cm-l.php?

some link : /main/home/user/cm-l.php?

link with suffix /user/cm-l.php?345TfvbzteW4rv#!_
"""


def Find(string):
    pattern = r'([a-zA-Z\/]*?)(\/user\/cm-l\.php\?)(.*)?'

    for idx,match in enumerate(re.findall(pattern,string)):
        print(f'### Match {idx}')
        print('full= ',''.join(match))
        print('0= ',match[0])
        print('1= ',match[1]) # match[1] is the base url
        print('2= ',match[2])

Find(mailstring)

'''
### Match 0
full=  /user/cm-l.php?
0=  
1=  /user/cm-l.php?
2=  
### Match 1
full=  /main/home/user/cm-l.php?
0=  /main/home
1=  /user/cm-l.php?
2=  
### Match 2
full=  /user/cm-l.php?345TfvbzteW4rv#!_
0=  
1=  /user/cm-l.php?
2=  345TfvbzteW4rv#!_
'''

如何搜索与特定模式匹配的 url？

问题描述

1 个解决方案

解决方案1
0 2020-12-18 21:07:53

如何搜索与特定模式匹配的 url？

问题描述

1 个解决方案

解决方案1 0 2020-12-18 21:07:53

解决方案1
0 2020-12-18 21:07:53