简体   繁体   中英

Replace part of a string in python for a bs4 script?

I have a BS4 script that scrapes for links. It returns something that looks like this: "/watch/f568a5e2sdfd783"

I put that into Excel to mass convert it to look like: https://website.com/f568a5e2sdfd783.jpg "

How do I modify the code below to skip the manual Excel step and just replace "/watch/" with " https://website.com/ " and add the ".jpg" at the end before it prints the link?

Code:

    page = requests.get(URL)
    time.sleep(1)

    soup = BeautifulSoup(page.content, 'html.parser')

    for links in soup.find('div',id='view').find_all('a'):
        print(links['href'])
except:
    continue

That should work for your case:

website = "https://website.com/"

for links in soup.find('div',id='view').find_all('a'):
    parts = links['href'].split("/")
    new_link = parts[1].replace(parts[1], website) + '/'.join(parts[2:]) + ".jpg"
    print(new_link)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM