简体   繁体   中英

How to identify and capture specific URL elements in a Python list?

I currently have a list of URLs, for example:

list = ['https://finance.yahoo.com/', 'https://query1.finance.yahoo.com/', 'https://ad.doubleclick.net/ddm/trackclk/']

I want to isolate the "query1.finance" URL and delete the others. I would like to be able to do this across different lists with different elements, using only the criteria that a URL that contains the text "query1" be kept in each list.

Is there an easy way to do this? I am using a selenium driver to pull hrefs off of websites and the hrefs are all imported as URLs, but I only want one of the href's for my use.

If the only condition is that the url contains the string 'query1' then the following code will work:

url_list = [
    'https://finance.yahoo.com/', 
    'https://query1.finance.yahoo.com/', 
    'https://ad.doubleclick.net/ddm/trackclk/'
]

filtered_list = [url for url in url_list if 'query1' in url]

You could simple use a for loop and check if 'query1.' is a substring of that url. If it's not a substring simply remove it from the list.

for i in list:
   if (i.find('query1.') == -1):
       list.remove(i)

This code below does the trick and returns a new list.

def filterLinks(lyst):
final_list = []
for i in range(len(lyst)):
    if 'query1' in lyst[i]:
        final_list.append(lyst[i])
return final_list

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM