[英]scraping a web site in python with requests-html library, it didn't get all the elements when selected by beautifulsoup
[英]Python requests-HTML library doesn't work sometimes
我正在使用 requests_html 库从网站上抓取。 我写的
get_product_info(url: str) -> dict<\/code>方法返回页面上的产品名称、价格和产品的 url。
我注意到,当我使用相同的 url 多次运行该函数时,它并不总是返回结果。
from requests_html import HTMLSession session = HTMLSession() sub_cat2_link = 'https:\/\/www.sokmarket.com.tr\/bulasik-c-1442' def get_product_info(url: str) -> dict: r2 = session.get(url) r2.html.render() product_names = [item.text for item in r2.html.find('main.listing-results strong')] product_prices = [item.text for item in r2.html.find('main.listing-results div.pricetag')] product_links = [MAIN_URL + item.links.pop() for item in r2.html.find('main.listing-results a.productbox-wrap')] return {"prod": product_names, "price": product_prices, "prod_link": product_links} result = get_product_info(sub_cat2_link) print(result)<\/code><\/pre>"
我遇到了同样的问题。 我最终重试了渲染,到目前为止它对我有用。
for attempt in range(3):
try:
r2.html.render()
#do something
except:
time.sleep(5) # not sure if this is needed
print(attempt)
else:
break
else:
print('all attempts failed')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.