I have a BS4 script that scrapes for links. It returns something that looks like this: "/watch/f568a5e2sdfd783"
I put that into Excel to mass convert it to look like: https://website.com/f568a5e2sdfd783.jpg "
How do I modify the code below to skip the manual Excel step and just replace "/watch/" with " https://website.com/ " and add the ".jpg"
at the end before it prints the link?
Code:
page = requests.get(URL)
time.sleep(1)
soup = BeautifulSoup(page.content, 'html.parser')
for links in soup.find('div',id='view').find_all('a'):
print(links['href'])
except:
continue
That should work for your case:
website = "https://website.com/"
for links in soup.find('div',id='view').find_all('a'):
parts = links['href'].split("/")
new_link = parts[1].replace(parts[1], website) + '/'.join(parts[2:]) + ".jpg"
print(new_link)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.