[英]problem extracting text from a div with BS4
我嘗試使用 bs4 和 json 從此地址( https://www.catawiki.com/a/346823-japanese-antiques-auction-samurai )獲取一些投標項目的一些關鍵信息。 我成功地從 div 中的字典中獲取了大部分信息(如 for 循環中列出的),但是,“價格”和“批次狀態”不在同一個位置。 無論我對 select、select_one、find、find_all 做什么,這些值都不會顯示在打印結果中,就好像它們在原始編碼中不存在一樣。 我做錯了什么? bs4執行的div是否有限制深度? 為什么價格在源代碼中明顯顯示,但在湯中卻沒有?
這是我的代碼,問題發生在最后兩行:
from bs4 import BeautifulSoup
import requests
import json
page = requests.get(
'https://www.catawiki.com/a/346823-japanese-antiques-auction-samurai'
)
soup = BeautifulSoup(page.content, 'html.parser')
container = soup.find('main', class_='u-col-9-12 u-move-4-12 u-col-6-9-m u-move-4-9-m')
# extracting the value of 'results'
data_prop = json.loads(container.select_one("div.be-lot-list__loader")['data-props'])
result = data_prop.get('results')
# selecting items from dictionary and attributing values to each one of them
for i in range(len(result)):
ids = result[i]['id']
titles = result[i]['title']
subtitles = result[i]['subtitle']
favoriteCounts = result[i]['favoriteCount']
auctionIds = result[i]['auctionId']
biddingStartTimes = result[i]['biddingStartTime']
# abstracting prices and lot status
prince_lot = container.find_all('div', class_='be-lot__price u-placeholder')
print(prince_lot)
價格和批次狀態從外部 URL https://www.catawiki.com/buyer/api/v1/lots/live?ids=
加載,其中ids=
是頁面上商品的逗號分隔 ID。
例如:
from bs4 import BeautifulSoup
import requests
import json
page = requests.get('https://www.catawiki.com/a/346823-japanese-antiques-auction-samurai')
soup = BeautifulSoup(page.content, 'html.parser')
container = soup.find('main', class_='u-col-9-12 u-move-4-12 u-col-6-9-m u-move-4-9-m')
# extracting the value of 'results'
data_prop = json.loads(container.select_one("div.be-lot-list__loader")['data-props'])
results = data_prop.get('results')
lots = requests.get('https://www.catawiki.com/buyer/api/v1/lots/live?ids=' + ','.join(str(result['id']) for result in results)).json()
lots = {l['id']: l for l in lots['lots']}
# uncomment to see all data:
# print(json.dumps(lots, indent=4))
for result in results:
ids = result['id']
titles = result['title']
subtitles = result['subtitle']
favoriteCounts = result['favoriteCount']
auctionIds = result['auctionId']
biddingStartTimes = result['biddingStartTime']
price = lots[int(ids)]['current_bid_amount']['EUR']
closed = lots[int(ids)]['closed']
print(titles)
print('Price:', price)
print('Closed:', closed)
print('-' * 80)
印刷:
Katana - Tamahagane stem - 肥前國住廣任 Hizen kuni ju Hiroto met een NBTHK Hozon certificaat. - Japan - early Edo period, 17th century.
Price: 3700.0
Closed: True
--------------------------------------------------------------------------------
Yoroi (1) - Leather - Samurai - Japan - Meiji period (1868-1912)
Price: 2800.0
Closed: True
--------------------------------------------------------------------------------
Katana, Sword - Tamahagane steel - Rare KIKU engraved- NBTHK - 68cm Nagasa - Japan - 17th century
Price: 11000.0
Closed: True
--------------------------------------------------------------------------------
Tsuba - Iron - Benkei and Mii-Dera Bell - Japan - Edo Period (1600-1868)
Price: 1100
Closed: True
--------------------------------------------------------------------------------
Wakizashi - Steel - Mumei , toegeschreven aan 1e gen. Bizen Yokoyama Sukekane, Hoge kwaliteit montering ! - Japan - ca. 1850
Price: 2400.0
Closed: True
--------------------------------------------------------------------------------
Fuchikashira - Copper, Gold, Shakudo - Japan - Edo Period (1600-1868)
Price: 270
Closed: True
--------------------------------------------------------------------------------
Tsuba - Iron - Flowers - NBTHK Tokubetsu Kichio - Japan - Edo Period (1600-1868)
Price: 340.0
Closed: True
--------------------------------------------------------------------------------
Mengu/ Menpo - Lacquered metal - Lacquered metal facial samurai mask (menpo), with four-piece gorget (nodowa) with blue cords. - Japan - Meiji period (1868-1912)
Price: 390.0
Closed: True
--------------------------------------------------------------------------------
...and so on.
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.