![](/img/trans.png)
[英]Using BeautifulSoup to scrape specific element within a CSS class
[英]scrape element which is preceded by specific element using beautifulsoup and css selector instead of lxml and xpath
我想从这个页面抓取“服务/产品”部分: https : //www.yellowpages.com/deland-fl/mip/ryan-wells-pumps-20533306?lid=1001782175490
文本位于 dd 元素内,该元素始终位于该元素之后
import requests from lxml import html url = "" headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:76.0) Gecko/20100101 Firefox/76.0'} session = requests.Session() r = session.get(url, timeout=30, headers=headers) t = html.fromstring(r.content) products = t.xpath('//dd[preceding-sibling::dt[contains(.,"Services/Products")]]/text()[1]')[0] if t.xpath('//dd[preceding-sibling::dt[contains(.,"Services/Products")]]') else ''
有什么方法可以使用 Beautifulsoup(如果可能,还有 css 选择器)而不是 lxml 和 xpath 来获取相同的文本?
尝试使用 BeautifulSoup 和 Requests。 这要容易得多。 这是一些代码
# BeautifulSoup is an HTML parser. You can find specific elements in a BeautifulSoup object
from bs4 import BeautifulSoup
from requests import get
url = "https://www.yellowpages.com/deland-fl/mip/ryan-wells-pumps-20533306?lid=1001782175490"
obj = BeautifulSoup(get(url).content, "html.parser")
# Gets the section with the Services
buisness_info = obj.find("section", {"id":"business-info"})
# Getting all <dd> elements (cause you can pick off the one you need from the list)
all_dd = buisness_info.find_all("dd")
# Finds the specific tag with the text you need
services_and_products = all_dd[2]
# Gets the text
text = services_and_products.text
# All Done
print(text)
在您的页面上尝试这样的操作:
inf = soup.select_one('section#business-info dl')
target = inf.find("dt", text='Services/Products').nextSibling
for t in target.stripped_strings:
print(t)
输出:
Pumps|Well Pumps|Residential Pumps|Water Pumps|Residential Pumps|Well Pumps|Residential Pumps|Commercial Pumps|Well Pumps|Pumps & Water Tanks|Residential & Commercial|Residential & Commercial|Water Tanks|Pump Maintenance|Pump Maintenance|Free Estimates|Service & Repair|Emergency Service Avail|Residential & Commercial|Service & Repair|Residential & Commercial|Pumps|Bonded|Insured|Water Tanks|Deep Wells|4 Wells|Pumps & Water Tanks 4'' Wells|2' - 12' Diameter Wells
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.