简体   繁体   中英

tying to extract each link that has dungeon separate them and add each link to a list and remove the duplicates

I'm tying to extract each link that has dungeon separate them and add each link to a list and remove the duplicates I'm sure I separate them removing the duplicates will be easy I have a feeling its something simple I'm missing I would give the site but it requires an account

This is what I got it grabs them but its one big string how do I separate them into a list so i can pick each one out I will be clicking them later. or is there a better way to filter out links with "dungeon" in them using xpath link text don't work

for elem in elems:
    if "dungeon" in elem.get_attribute("href"):
        list = elem.get_attribute("href")

        print(list)

        print(list[0])

and this is the output

javascript:dungeon(0,84579684);
j
javascript:dungeon(0,84579684);
j
javascript:dungeon(0,84579674);
j
javascript:dungeon(0,84579674);
j
javascript:dungeon(0,84579672);
j
javascript:dungeon(0,84579672);
j
javascript:dungeon(0,84579662);
j
javascript:dungeon(0,84579662);
j
its one big string output i think
print(list)

javascript:dungeon(0,84579684);
javascript:dungeon(0,84579684);
javascript:dungeon(0,84579674);
javascript:dungeon(0,84579674);
javascript:dungeon(0,84579672);
javascript:dungeon(0,84579672);
javascript:dungeon(0,84579662);
javascript:dungeon(0,84579662);


I want to be able to print(list[3]) and have javascript:dungeon(0,84579674); come up not "a" come up

I would do something like this:

use .append method to add into a list .

url = "https://www.sofascore.com/de/tennis/2019-01-01"
driver.get(url)
href_bucket = []
elems = driver.find_elements_by_xpath("//a")
print(len(elems))
counter = 1
fail_counter = 0
for ele in elems:
    if "de" in ele.get_attribute('href'):
        counter = counter  + 1
        href_bucket.append(ele.get_attribute('href'))
    else:
        #print("fail", fail_counter)
        fail_counter = fail_counter + 1

print(href_bucket[3])

If you want to remove duplicates:

seen = set(href_bucket)
if item not in seen:
    seen.add(item)
    href_bucket.append(item)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM