簡體   English   中英

美麗的湯提取跨度標簽之間的文本

[英]Beautiful soup extract text between span tags

<span id="priceblock_dealprice" class="a-size-medium a-color-price"><span class="currencyINR">&nbsp;&nbsp;</span> 33,990.00 </span>

我需要從上面的html中提取數字33,990.00。

隨着美麗的湯:

from bs4 import BeautifulSoup as bs

content = '''<span id="priceblock_dealprice" class="a-size-medium a-color-price"><span class="currencyINR">&nbsp;&nbsp;</span> 33,990.00 </span>'''

soup = bs(content,'html5lib')
print(soup.text.strip())

對於硒來說,這是一項好工作:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

browser = webdriver.Firefox()

browser.get(URL)

delay = 30  # seconds
WebDriverWait(browser, delay).until(EC.presence_of_element_located((By.ID, 'priceblock_dealprice')))
print("Page is ready!")

text = browser.find_element_by_id("priceblock_dealprice").text

為什么要使用selenium 沒必要 如果頁面是JavaScript呈現的,則僅使用selenium 否則,請使用以下命令:

from bs4 import BeautifulSoup
html = '<span id="priceblock_dealprice" class="a-size-medium a-color-price"><span class="currencyINR">&nbsp;&nbsp;</span> 33,990.00 </span>'
soup = BeautifulSoup(html, 'lxml')
text = soup.select_one('span.a-color-price').text.strip()

輸出:

33,990.00

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM