[英]How do I webscrape nested lists in Python?
网站链接: https ://www.zivame.com/rosaline-chromaticity-knit-cotton-top-florida-key.html?trksrc=category&trkid=search&trkorder=relevance
我想刮什么:短袖款式,宽松舒适(基本上是描述下的要点)
这是我目前使用的代码:
from selenium import webdriver
import re
from bs4 import BeautifulSoup
import requests
result = requests.get("https://www.zivame.com/rosaline-chromaticity-knit-cotton-top-florida-key.html?trksrc=category&trkid=search&trkorder=relevance")
soup = BeautifulSoup(result.text, 'lxml')
page = soup.find('div', id="product-page")
description = page.find('div', id="product-basicdetail")
point1 = description.find('div', id="ff-rm text-size pd-b5")
print(point1)
数据以 JSON 数据的形式出现,您可以直接从源页面抓取数据。
import requests
from lxml import html
r = requests.get('https://www.zivame.com/rosaline-chromaticity-knit-cotton-top-florida-key.html?trksrc=category&trkid=search&trkorder=relevance')
source_page = html.fromstring(r.text)
json_value = source_page.xpath("//script[contains(.,'window.__product=')]/text()")[0]
json_value = json_value.split("{features:{values:[{list:[")[1].split("]}],count:1}}},modelMetaData:")[0]
print(json_value.split(','))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.