I'm trying to scrape a weblink from the of the below HTML without the surrounding values. This is done for multiple iterations on the same page.
The html:
<div class="facets-item-cell-list-right" = $0
<meta itemprop="url" content"https://www.website.com/linktoitem1">
<div class="title">
<span itemprop="name"> item1 </span>
My code:
item = []
link = []
for product in soup.select('div.facets-item-cell-list-right'):
item.append(product.span.get_text(strip=True))
link.append(product.meta)
print("Setup Complete", *link, sep='\n')
print("Setup Complete", *item, sep='\n')
This prints:
<meta content="https://www.website.com/linktoitem1" itemprop="url"/>
<meta content="https://www.website.com/linktoitem2" itemprop="url"/>
etc
How can I make it so that the first print function prints only
https://www.website.com/linktoitem1
https://www.website.com/linktoitem2
etc
Is this what you want?
from bs4 import BeautifulSoup
sample = """
<meta content="https://www.website.com/linktoitem1" itemprop="url"/>
<meta content="https://www.website.com/linktoitem2" itemprop="url"/>
"""
link = []
for m in BeautifulSoup(sample, "html.parser").find_all("meta"):
link.append(m["content"])
print("\n".join(link))
Output:
https://www.website.com/linktoitem1
https://www.website.com/linktoitem2
EDIT: based on the link you've shared, try this:
import requests
from bs4 import BeautifulSoup
headers = {
"user-agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.152 Safari/537.36"
}
url = "https://www.nickollsandperks.co.uk/New-and-Special-Offers/New-Whisky?order=relevance:asc"
page = requests.get(url, headers=headers).text
link = []
for m in BeautifulSoup(page, "html.parser").select('div.facets-item-cell-list-right'):
link.append(m.find("meta")["content"])
print("\n".join(link))
Output:
https://www.nickollsandperks.co.uk/Longmorn-18-Year-Old-2002-Chorlton-Whisky
https://www.nickollsandperks.co.uk/Bruichladdich-15-Year-Old-2005-Chorlton-Whisky
https://www.nickollsandperks.co.uk/Inchfad-13-Year-Old-2007-Chorlton-Whisky
https://www.nickollsandperks.co.uk/Port-Charlotte-12-Year-Old-2007-Valinch-STC01
https://www.nickollsandperks.co.uk/Longrow-Red-10-Year-Old-Malbec-Cask
https://www.nickollsandperks.co.uk/Kilkerran-8-Year-Old-Cask-Strength-56-9
https://www.nickollsandperks.co.uk/Springbank-18-Year-Old
https://www.nickollsandperks.co.uk/Kilkerran-12-Year-Old-46_1
https://www.nickollsandperks.co.uk/Tomatin-Decades-II
https://www.nickollsandperks.co.uk/Ardbeg-Traigh-Bhan-19-Year-Old-Batch-2
https://www.nickollsandperks.co.uk/Ardbeg-Arrrrrrrdbeg-Committee-Release
https://www.nickollsandperks.co.uk/Teaninich-13-Year-Old-2005-cask-487-Single-Cask-Nation
https://www.nickollsandperks.co.uk/Tomatin-12-Year-Old-2006-cask-800230-Single-Cask-Nation
https://www.nickollsandperks.co.uk/Trinidadian-Rum-16-Year-Old-2003-cask-3-Single-Cask-Nation
https://www.nickollsandperks.co.uk/Craigellachie-13-Year-Old-2005-cask-314984-Single-Cask-Nation
https://www.nickollsandperks.co.uk/Blended-Malt-9-Year-Old-2009-cask-417-Single-Cask-Nation
https://www.nickollsandperks.co.uk/Invergordon-45-Year-Old-1974-cask-7844000025-Single-Cask-Nation
https://www.nickollsandperks.co.uk/Aberfeldy-28-Year-Old-1991-cask-7435-Single-Cask-Nation
https://www.nickollsandperks.co.uk/Glen-Elgin-10-Year-Old-2010-cask-801386-Single-Cask-Nation
https://www.nickollsandperks.co.uk/Kentucky-Bourbon-24-Year-Old-1994-Single-Cask-Nation
https://www.nickollsandperks.co.uk/The-Macallan-Edition-No-3
https://www.nickollsandperks.co.uk/Craigellachie-26-Year-Old-1994-Connoisseurs-Choice-Gordon-And-MacPhail
https://www.nickollsandperks.co.uk/Strathisla-33-Year-Old-1987-Connoisseurs-Choice-Gordon-And-MacPhail
https://www.nickollsandperks.co.uk/Glen-Grant-30-Year-Old-1990-Connoisseurs-Choice-Gordon-And-MacPhail
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.