簡體   English   中英

使用 BS4 從 div 中提取文本的問題

[英]problem extracting text from a div with BS4

我嘗試使用 bs4 和 json 從此地址( https://www.catawiki.com/a/346823-japanese-antiques-auction-samurai )獲取一些投標項目的一些關鍵信息。 我成功地從 div 中的字典中獲取了大部分信息(如 for 循環中列出的),但是,“價格”和“批次狀態”不在同一個位置。 無論我對 select、select_one、find、find_all 做什么,這些值都不會顯示在打印結果中,就好像它們在原始編碼中不存在一樣。 我做錯了什么? bs4執行的div是否有限制深度? 為什么價格在源代碼中明顯顯示,但在湯中卻沒有?

這是我的代碼,問題發生在最后兩行:

from bs4 import BeautifulSoup
import requests
import json
page = requests.get(
    'https://www.catawiki.com/a/346823-japanese-antiques-auction-samurai'
)
soup = BeautifulSoup(page.content, 'html.parser')
container = soup.find('main', class_='u-col-9-12 u-move-4-12 u-col-6-9-m u-move-4-9-m')

# extracting the value of 'results'

data_prop = json.loads(container.select_one("div.be-lot-list__loader")['data-props'])
result = data_prop.get('results')

# selecting items from dictionary and attributing values to each one of them
for i in range(len(result)):
    ids = result[i]['id']
    titles = result[i]['title']
    subtitles = result[i]['subtitle']
    favoriteCounts = result[i]['favoriteCount']
    auctionIds = result[i]['auctionId']
    biddingStartTimes = result[i]['biddingStartTime']

# abstracting prices and lot status

prince_lot = container.find_all('div', class_='be-lot__price u-placeholder')
print(prince_lot) 

價格和批次狀態從外部 URL https://www.catawiki.com/buyer/api/v1/lots/live?ids=加載,其中ids=是頁面上商品的逗號分隔 ID。

例如:

from bs4 import BeautifulSoup
import requests
import json

page = requests.get('https://www.catawiki.com/a/346823-japanese-antiques-auction-samurai')
soup = BeautifulSoup(page.content, 'html.parser')
container = soup.find('main', class_='u-col-9-12 u-move-4-12 u-col-6-9-m u-move-4-9-m')

# extracting the value of 'results'
data_prop = json.loads(container.select_one("div.be-lot-list__loader")['data-props'])
results = data_prop.get('results')

lots = requests.get('https://www.catawiki.com/buyer/api/v1/lots/live?ids=' + ','.join(str(result['id']) for result in results)).json()
lots = {l['id']: l for l in lots['lots']}

# uncomment to see all data:
# print(json.dumps(lots, indent=4))

for result in results:
    ids = result['id']
    titles = result['title']
    subtitles = result['subtitle']
    favoriteCounts = result['favoriteCount']
    auctionIds = result['auctionId']
    biddingStartTimes = result['biddingStartTime']

    price = lots[int(ids)]['current_bid_amount']['EUR']
    closed = lots[int(ids)]['closed']

    print(titles)
    print('Price:', price)
    print('Closed:', closed)
    print('-' * 80)

印刷:

Katana - Tamahagane stem -  肥前國住廣任 Hizen kuni ju Hiroto met een NBTHK Hozon certificaat. - Japan - early Edo period, 17th century.
Price: 3700.0
Closed: True
--------------------------------------------------------------------------------
Yoroi (1) - Leather - Samurai - Japan - Meiji period (1868-1912)
Price: 2800.0
Closed: True
--------------------------------------------------------------------------------
Katana, Sword - Tamahagane steel - Rare KIKU engraved- NBTHK - 68cm Nagasa - Japan - 17th century
Price: 11000.0
Closed: True
--------------------------------------------------------------------------------
Tsuba - Iron - Benkei and Mii-Dera Bell - Japan - Edo Period (1600-1868)
Price: 1100
Closed: True
--------------------------------------------------------------------------------
Wakizashi - Steel - Mumei , toegeschreven aan 1e gen. Bizen Yokoyama Sukekane, Hoge kwaliteit montering ! - Japan - ca. 1850
Price: 2400.0
Closed: True
--------------------------------------------------------------------------------
Fuchikashira - Copper, Gold, Shakudo - Japan - Edo Period (1600-1868)
Price: 270
Closed: True
--------------------------------------------------------------------------------
Tsuba - Iron - Flowers - NBTHK Tokubetsu Kichio - Japan - Edo Period (1600-1868)
Price: 340.0
Closed: True
--------------------------------------------------------------------------------
Mengu/ Menpo - Lacquered metal - Lacquered metal facial samurai mask (menpo), with four-piece gorget (nodowa) with blue cords. - Japan - Meiji period (1868-1912)
Price: 390.0
Closed: True
--------------------------------------------------------------------------------

...and so on.

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM