How to identify and capture specific URL elements in a Python list?

Question

I currently have a list of URLs, for example:

list = ['https://finance.yahoo.com/', 'https://query1.finance.yahoo.com/', 'https://ad.doubleclick.net/ddm/trackclk/']

I want to isolate the "query1.finance" URL and delete the others. I would like to be able to do this across different lists with different elements, using only the criteria that a URL that contains the text "query1" be kept in each list.

Is there an easy way to do this? I am using a selenium driver to pull hrefs off of websites and the hrefs are all imported as URLs, but I only want one of the href's for my use.

Answer 1

If the only condition is that the url contains the string 'query1' then the following code will work:

url_list = [
    'https://finance.yahoo.com/', 
    'https://query1.finance.yahoo.com/', 
    'https://ad.doubleclick.net/ddm/trackclk/'
]

filtered_list = [url for url in url_list if 'query1' in url]

Answer 2

You could simple use a for loop and check if 'query1.' is a substring of that url. If it's not a substring simply remove it from the list.

for i in list:
   if (i.find('query1.') == -1):
       list.remove(i)

Answer 3

This code below does the trick and returns a new list.

def filterLinks(lyst):
final_list = []
for i in range(len(lyst)):
    if 'query1' in lyst[i]:
        final_list.append(lyst[i])
return final_list

How to identify and capture specific URL elements in a Python list?

Question

3 answers

solution1
0 ACCPTED 2020-12-01 01:59:27

solution2
0 2020-12-01 02:00:51

solution3
0 2020-12-01 02:15:59

How to identify and capture specific URL elements in a Python list?

Question

3 answers

solution1 0 ACCPTED 2020-12-01 01:59:27

solution2 0 2020-12-01 02:00:51

solution3 0 2020-12-01 02:15:59

solution1
0 ACCPTED 2020-12-01 01:59:27

solution2
0 2020-12-01 02:00:51

solution3
0 2020-12-01 02:15:59