I'm trying to extract the information from a "script" tag, the code is as follows
response = requests.get("https://www.zalando.es/jordan-air-jordan-mid-zapatillas-altas-blackdark-beetrootwhitehyper-royal-joc11a024-g11.html?hl=1610800800024", headers=headers)
soup = BeautifulSoup(response.content, 'html.parser')
marca = soup.find("h3", {"class":"OEhtt9 ka2E9k uMhVZi uc9Eq5 pVrzNP _5Yd-hZ"}).text
nombre = soup.find("h1", {"class":"OEhtt9 ka2E9k uMhVZi z-oVg8 pVrzNP w5w9i_ _1PY7tW _9YcI4f"}).text
color = soup.find("span", {"class":"u-6V88 ka2E9k uMhVZi dgII7d z-oVg8 pVrzNP"}).text
precio = soup.find("span", {"class":"uqkIZw ka2E9k uMhVZi FxZV-M z-oVg8 pVrzNP"}).text
talla = soup.find("span", {"class":"u-6V88 ka2E9k uMhVZi FxZV-M z-oVg8 pVrzNP"}).text
imagen = soup.find("img", {"class": "_6uf91T z-oVg8 u-6V88 ka2E9k uMhVZi FxZV-M _2Pvyxl JT3_zV EKabf7 mo6ZnF _1RurXL mo6ZnF PZ5eVw"})['src']
sku355 = api + str(soup.find_all('script')[15]).split('sku":"')[3][:-137]
sku36 = api + str(soup.find_all('script')[15]).split('sku":"')[4][:-139]
sku365 = api + str(soup.find_all('script')[15]).split('sku":"')[5][:-139]
sku375 = api + str(soup.find_all('script')[15]).split('sku":"')[6][:-137]
sku38 = api + str(soup.find_all('script')[15]).split('sku":"')[7][:-139]
sku385 = api + str(soup.find_all('script')[15]).split('sku":"')[8][:-137]
sku39 = api + str(soup.find_all('script')[15]).split('sku":"')[9][:-137]
sku40 = api + str(soup.find_all('script')[15]).split('sku":"')[10][:-139]
sku405 = api + str(soup.find_all('script')[15]).split('sku":"')[11][:-137]
sku41 = api + str(soup.find_all('script')[15]).split('sku":"')[12][:-137]
sku42 = api + str(soup.find_all('script')[15]).split('sku":"')[13][:-139]
sku425 = api + str(soup.find_all('script')[15]).split('sku":"')[14][:-137]
sku43 = api + str(soup.find_all('script')[15]).split('sku":"')[15][:-125]
print (sku3555)
print (sku36)
print (sku365)
print (sku375)
print (sku38)
print (sku385)
print (sku39)
print (sku40)
print (sku405)
print (sku41)
print (sku42)
print (sku425)
print (sku43)
Everything works perfect with these shoes, but when I switch for example to this link it gives me something else, what I would like to take out is the SKU of each size, regardless of the link that puts
Could not reproduce your example, would be cool to improve your question.
Just in case
If you just wanna grab the sizes, try the following:
import requests, json
from bs4 import BeautifulSoup
headers = {"user-agent": "Mozilla/5.0"}
response = requests.get("https://www.zalando.es/jordan-air-jordan-mid-zapatillas-altas-blackdark-beetrootwhitehyper-royal-joc11a024-g11.html?hl=1610800800024", headers=headers)
soup = BeautifulSoup(response.content, 'lxml')
json_object = json.loads(soup.select_one('script#z-vegas-pdp-props').contents[0].split('CDATA')[1].split(']>')[0])
for item in json_object[0]['model']['articleInfo']['units']:
print('sku:{0} - size:{1}'.format(item['id'],item['size']['local']))
Output
sku:JOC11A024-G110005000 - size:35.5
sku:JOC11A024-G110055000 - size:36
sku:JOC11A024-G110006000 - size:36.5
sku:JOC11A024-G110065000 - size:37.5
sku:JOC11A024-G110007000 - size:38
sku:JOC11A024-G110075000 - size:38.5
sku:JOC11A024-G110008000 - size:39
sku:JOC11A024-G110085000 - size:40
sku:JOC11A024-G110009000 - size:40.5
sku:JOC11A024-G110095000 - size:41
sku:JOC11A024-G110010000 - size:42
sku:JOC11A024-G110105000 - size:42.5
sku:JOC11A024-G110011000 - size:43
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.