簡體   English   中英

如何使用 python 從網站捕獲數據作為鍵值對?

[英]How to capture data from website as key-value pairs from the website using python?

enter code here
test_link = 'https://www.amd.com/en/products/cpu/amd-ryzen-9-3900xt'
r = requests.get(test_link, headers=headers)
soup = BeautifulSoup(r.content,'lxml')
whole_data = soup.find('div', class_='fieldset-wrapper')
specifications = []
specifications_value=[]
for variable1 in whole_data.find_all('div', class_='field__label'):
    #print(variable1.text)
    variable1 = variable1.text
    specifications = list(variable1.split('\n'))
    #print(specifications)
for variable2 in whole_data.find_all('div', class_='field__item'):
    #print(variable2.text)
    variable2 = variable2.text
    specifications_value = list(variable2.split('\n'))
    #print(specifications_value)

問題:我正在獲取數據,但是在單獨的變量和 for 循環中,如何使用鍵值對 map 這兩個變量? 這樣我就可以檢查以下條件:如果該值是平台,則僅說明它的值(盒處理器)

我想以這樣一種方式捕獲數據,如果“關鍵”是平台,那么只捕獲它的值(盒裝處理器)。 對於所有其他 14 個標簽也是如此。

您可以遍歷預期鍵列表並使用:-soup-contains來定位描述節點。 如果那不是 None 那么 select 子值。 否則,返回''。

import requests
from bs4 import BeautifulSoup as bs

links = ['https://www.amd.com/en/products/cpu/amd-ryzen-7-3800xt',
         'https://www.amd.com/en/products/cpu/amd-ryzen-9-3900xt']

all_keys = ['Platform', 'Product Family', 'Product Line', '# of CPU Cores',
            '# of Threads', 'Max. Boost Clock', 'Base Clock', 'Total L2 Cache', 'Total L3 Cache',
            'Default TDP', 'Processor Technology for CPU Cores', 'Unlocked for Overclocking', 'CPU Socket',
            'Thermal Solution (PIB)', 'Max. Operating Temperature (Tjmax)', 'Launch Date', '*OS Support']

with requests.Session() as s:

    s.headers = {'User-Agent': 'Mozilla/5.0'}

    for link in links:

        r = s.get(link)
        soup = bs(r.content, 'lxml')
        specification = {}

        for key in all_keys:

            spec = soup.select_one(
                f'.field__label:-soup-contains("{key}") + .field__item, .field__label:-soup-contains("{key}") + .field__items .field__item')

            if spec is None:
                specification[key] = ''
            else:
                if key == '*OS Support':
                    specification[key] = [
                        i.text for i in spec.parent.select('.field__item')]
                else:
                    specification[key] = spec.text

        print(specification)
        print()

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM