[英]Extracting with .find() the second of 2 identical 'div' from html page with BS4
我正在嘗試從aa湯元素中提取2個相同的“ div”中的第二個。 解析槽並使用.find()方法提取時,它排在最前面。 如果滿足某些條件,如何告訴腳本跳過第一個腳本並獲取下一個腳本? 下面是我要從中提取的html代碼。
<div class="a-row a-size-base a-color-secondary"><span>MPAA Rating: PG (Parental Guidance Suggested)</span></div>
</div>
</div></div>
<div class="sg-1"><div class="sg-2">
<div class="a-section a-spacing-none a-spacing-top-small">
<div class="a-row a-size-base a-color-base">
</div>
</div>
<div class="a-section a-spacing-none a-spacing-top-mini">
<div class="a-row a-size-base a-color-secondary"><span>$0.00 with a CONtv trial on Prime Video Channels</span></div>
</div>
這是我正在嘗試的代碼:
if '$' not in str(product.find('div', {'class': 'a-row a-size-base a-color-secondary'})):
print('NOT IN')
pass
price = product.find('div', {'class': 'a-row a-size-base a-color-secondary'})
print(price)
else:
price = product.find('div', {'class': 'a-row a-size-base a-color-secondary'})
print(price)
但是,結果仍然給了我這個:
NOT IN
<div class="a-row a-size-base a-color-secondary"><span>MPAA Rating: PG (Parental Guidance Suggested)</span></div>
而不是這樣:
<div class="a-row a-size-base a-color-secondary"><span>$0.00 with a CONtv trial on Prime Video Channels</span></div>
有什么建議么?
您需要find_all
然后索引到返回列表中,因為find
僅返回第一個匹配項。 您可以使用select
做同樣的事情。 使用bs4 4.7.1。 你可以使用:contains
目標innerText
由子(例如,元素的CONtv trial
),然后使用select_one
如果第一場比賽想要的或select
,如果多個匹配。 您想先嘗試測試if None
然后再嘗試訪問.text
from bs4 import BeautifulSoup as bs
import requests
html = '''
<div class="a-row a-size-base a-color-secondary"><span>MPAA Rating: PG (Parental Guidance Suggested)</span></div>
</div>
</div></div>
<div class="sg-1"><div class="sg-2">
<div class="a-section a-spacing-none a-spacing-top-small">
<div class="a-row a-size-base a-color-base">
</div>
</div>
<div class="a-section a-spacing-none a-spacing-top-mini">
<div class="a-row a-size-base a-color-secondary"><span>$0.00 with a CONtv trial on Prime Video Channels</span></div>
</div>
'''
soup = bs(html, 'lxml')
print(soup.find_all('div', {'class': 'a-row a-size-base a-color-secondary'})[1].text)
print(soup.select('.a-color-secondary')[1].text)
print(soup.select_one('.a-color-secondary:contains("CONtv trial")').text)
用find_all循環
matches = soup.find_all('div', {'class': 'a-row a-size-base a-color-secondary'})
for item in matches:
if '$' in str(item):
print(item.text)
假設div現在直接位於<body>
下,則可以使用標准的Python索引。 在您的真實代碼中,將選擇器中的body
替換為適當的元素:
data = '''<div class="a-row a-size-base a-color-secondary"><span>MPAA Rating: PG (Parental Guidance Suggested)</span></div>
</div>
</div></div>
<div class="sg-1"><div class="sg-2">
<div class="a-section a-spacing-none a-spacing-top-small">
<div class="a-row a-size-base a-color-base">
</div>
</div>
<div class="a-section a-spacing-none a-spacing-top-mini">
<div class="a-row a-size-base a-color-secondary"><span>$0.00 with a CONtv trial on Prime Video Channels</span></div>
</div>'''
from bs4 import BeautifulSoup
import re
soup = BeautifulSoup(data, 'lxml')
print(soup.select('body > div')[1].text.strip())
打印:
$0.00 with a CONtv trial on Prime Video Channels
注意select()
的>
符號,這意味着我們希望所有<div>
直接在<body>
。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.