簡體   English   中英

python 響應解碼損壞

[英]python response decode corrupted

我是 python 和數據抓取的新手。
我正在嘗試使用 python 腳本獲取有關某些車型的數據。
我遇到的問題是 python 將響應解碼為混雜且與響應內容不匹配的文本。
發現我需要的信息包含在 html 頭元素內的腳本標簽之一中。
這是我正在使用的簡化腳本:

import requests
import lxml.html
urls = "https://www.ultimatespecs.com/car-specs/Audi/119438/Audi-A3-(8Y)-Sedan-35-TDI.html"
res = requests.get(urls)
print(res.headers)
tree = lxml.html.fromstring(res.content)
helem = lxml.html.tostring(tree.xpath('//head/script[@type=\'application/ld+json\']')[0])
print(helem)
print(helem.decode('utf-8'))

響應 header

{'日期':'星期日,2021 年 2 月 14 日 10:54:09 GMT', '內容類型':'text/html; charset=UTF-8' , 'Transfer-Encoding': 'chunked', 'Connection': 'keep-alive', 'Set-Cookie': '__cfduid=d938bb826c443ab15f20272199e2f18141613300048; 到期=格林威治標准時間 21 年 3 月 16 日星期二 10:54:08; 路徑=/; 域=.ultimatespecs.com; 僅http; SameSite=松弛,PHPSESSID=ea60d27909207143c5ccd860e6fb3b76; path=/', 'Expires': 'Thu, 19 Nov 1981 08:52:00 GMT', 'Cache-Control': 'no-store, no-cache, must-revalidate', 'Pragma': 'no-緩存','變化':'接受編碼,用戶代理','CF-緩存狀態':'動態','cf-request-id':'0841c63a9c0000b61bda381000000001','期望CT':'max- age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"', 'Report-To': '{"group":"cf-nel" ,"endpoints":[{"url":"https:\/\/a.nel.cloudflare.com\/report?s=kB6vGZn5zLDoI%2FeQt9AF8174Aanh5La%2Bvh2beLKlCdnrHv5jbEIhC0h3FUVb56wTidKKSMFq1zuWhbakIydNto3EBXMZRt%2BwLD2FZgMsmHH53aJpanc%3D"}],"max_age":604800} ', 'NEL': '{"max_age":604800,"report_to":"cf-nel"}', '服務器': 'cloudflare', 'CF-RAY': '62163fd76b76b61b-TLL', '內容編碼':'gzip'}

helem 作為字節:

b'\r\t\t{\r\t\t"@context": "http://schema.org/",\r\t\t"@type": "汽車",\r\t \t"brand": "Audi",\r\t\t"manufacturer": "Audi",\r\t\t"name": "Audi A3 (8Y) Sedan 35 TDI","description":" 35 TDI 規格:功率 150 PS (148 hp);柴油;平均消耗量:3.6 l/100km (65 MPG);尺寸:長度:449.5 厘米(176.97 英寸);寬度:181.6 厘米(71.5 英寸);高度:142.5 厘米(56.1 英寸);重量:1390 公斤(3064 磅);Model 2020,2021 年“生產日期”:“2020”,“mainEntityOfPage”:“https://www.ultimatespecs.com/car-specs/Audi/ 119438/Audi-A3-(8Y)-Sedan-35-TDI.html","image": {\r\t\t\t\t"@type": "ImageObject",\r\t\ t\t\t\t"contentUrl": "https://www.ultimatespecs.com/wallpaper.php?id=7243"\r\t\t\t\t\t}\r\t\t\ t\t\t,"height": {\r\t\t\t"@type": "QuantitativeValue",\r\t\t\t"unitCode": "CMT",\r\t\t \t"value": "142.5"\r\t\t\t},"width": {\r\t\t\t"@type": "QuantitativeValue",\r\t\t\t" unitCode": "CMT",\r\t\t\t"值": "181.6"\r\t\t\t},"weight": {\r\t\t\t"@type": “曲 antitativeValue",\r\t\t\t"unitCode": "KGM",\r\t\t\t"value": "1390"\r\t\t\t},"accelerationTime": {\ r\t\t\t"@type": "QuantitativeValue",\r\t\t\t"unitCode": "SEC",\r\t\t\t"value": "8.4"\r\ t\t\t},"driveWheelConfiguration": {\r\t\t\t"@type": "DriveWheelConfigurationValue",\r\t\t\t"@id": "https://schema.org /FrontWheelDriveConfiguration"},"bodyType": "轎車","cargoVolume": {\r\t\t\t"@type": "QuantitativeValue",\r\t\t\t"unitCode": "LTR" "value": "425"},"emissionsCO2": "96","fuelCapacity": {\r\t\t\t"@type": "QuantitativeValue",\r\t\t\t"unitCode ": "LTR", "value": "50"\r\t\t\t},"fuelConsumption": {\r\t\t\t"@type": "QuantitativeValue",\r\t\ t\t"unitText": "L/100 km",\r\t\t\t"valueReference": "平均值",\r\t\t\t"value": "3.6"\r\t\ t\t},"fuelEfficiency": {\r\t\t\t"@type": "QuantitativeValue",\r\t\t\t"unitText": "US MPG",\r\t\t \t"valueReference": "平均",\r\t\t\t"value": "65"\r\t\t\t},"fuelType": "Diesel","numberOfDoors": "4" ,"vehicleSeatingCapacity": "5","numberOfForwardGears": "7","vehicleTransmissi on": "Dualclutch Automatic","軸距": {\r\t\t\t"@type": "QuantitativeValue",\r\t\t\t"unitCode": "CMT", "value": "263.6"\r\t\t\t},"速度": {\r\t\t\t"@type": "QuantitativeValue",\r\t\t\t"unitCode": "KMH" , "值": "232"\r\t\t\t},"vehicleConfiguration": "35 TDI","vehicleEngine":[\r\t\t{\r\t\t"@type": "EngineSpecification",\r\t\t"fuelType":"Diesel","engineDisplacement": {\r\t\t\t"@type": "QuantitativeValue",\r\t\t\t"unitCode ": "CMQ",\r\t\t\t"值": "1968"\r\t\t\t},"扭矩": {\r\t\t\t"@type": " QuantitativeValue",\r\t\t\t"unitCode": "NU",\r\t\t\t"value": "360"},"enginePower": {\r\t\t\t" @type": "QuantitativeValue",\r\t\t\t"unitCode": "N12",\r\t\t\t"value": "150"}}]} '

helem 作為文本:

"value": "150"}}]},: {cement": {eEngine":[SeatingCapacity": "5","numberOfForwardGears": "7","vehicleTransmission": "Dualclutch Automatic","軸距": {(176.97 英寸);寬度:181.6 厘米(71.5 英寸);高度:142.5 厘米(56.1 英寸);重量:1390 公斤(3064 磅);Model 年 2020,2021","productionDate":"2020","mainEntityOfPage ": "https://www.ultimatespecs.com/car-specs/Audi/119438/Audi-A3-(8Y)-Sedan-35-TDI.html","圖片": {

如您所見,解碼后的文本在其自身上重疊了多次。
我究竟做錯了什么?

如果我理解正確,您正在尋找以下數據。

在此處輸入圖像描述

代碼

import requests
import lxml.html
import json
import pprint as pp
urls = "https://www.ultimatespecs.com/car-specs/Audi/119438/Audi-A3-(8Y)-Sedan-35-TDI.html"
res = requests.get(urls)
tree = lxml.html.fromstring(res.content)
helem = tree.xpath('//head/script[@type=\'application/ld+json\']')[0].text
data = json.loads(helem)
pp.pprint(data,)

Output

{'@context': 'http://schema.org/',
 '@type': 'Car',
 'accelerationTime': {'@type': 'QuantitativeValue',
                      'unitCode': 'SEC',
                      'value': '8.4'},
 'bodyType': 'Sedan',
 'brand': 'Audi',
 'cargoVolume': {'@type': 'QuantitativeValue',
                 'unitCode': 'LTR',
                 'value': '425'},
 'description': '35 TDI Specs:Power 150 PS (148 hp); Diesel;Average '
                'consumption:3.6 l/100km (65 MPG);Dimensions: Length:449.5 cm '
                '(176.97 inches); Width:181.6 cm (71.5 inches);Height:142.5 cm '
                '(56.1 inches);Weight:1390 kg (3064 lbs);Model Years 2020,2021',
 'driveWheelConfiguration': {'@id': 'https://schema.org/FrontWheelDriveConfiguration',
                             '@type': 'DriveWheelConfigurationValue'},
 'emissionsCO2': '96',
 'fuelCapacity': {'@type': 'QuantitativeValue',
                  'unitCode': 'LTR',
                  'value': '50'},
 'fuelConsumption': {'@type': 'QuantitativeValue',
                     'unitText': 'L/100 km',
                     'value': '3.6',
                     'valueReference': 'Average'},
 'fuelEfficiency': {'@type': 'QuantitativeValue',
                    'unitText': 'US MPG',
                    'value': '65',
                    'valueReference': 'Average'},
 'fuelType': 'Diesel',
 'height': {'@type': 'QuantitativeValue', 'unitCode': 'CMT', 'value': '142.5'},
 'image': {'@type': 'ImageObject',
           'contentUrl': 'https://www.ultimatespecs.com/wallpaper.php?id=7243'},
 'mainEntityOfPage': 'https://www.ultimatespecs.com/car-specs/Audi/119438/Audi-A3-(8Y)-Sedan-35-TDI.html',
 'manufacturer': 'Audi',
 'name': 'Audi A3 (8Y) Sedan 35 TDI',
 'numberOfDoors': '4',
 'numberOfForwardGears': '7',
 'productionDate': '2020',
 'speed': {'@type': 'QuantitativeValue', 'unitCode': 'KMH', 'value': '232'},
 'vehicleConfiguration': '35 TDI',
 'vehicleEngine': [{'@type': 'EngineSpecification',
                    'engineDisplacement': {'@type': 'QuantitativeValue',
                                           'unitCode': 'CMQ',
                                           'value': '1968'},
                    'enginePower': {'@type': 'QuantitativeValue',
                                    'unitCode': 'N12',
                                    'value': '150'},
                    'fuelType': 'Diesel',
                    'torque': {'@type': 'QuantitativeValue',
                               'unitCode': 'NU',
                               'value': '360'}}],
 'vehicleSeatingCapacity': '5',
 'vehicleTransmission': 'Dualclutch Automatic',
 'weight': {'@type': 'QuantitativeValue', 'unitCode': 'KGM', 'value': '1390'},
 'wheelbase': {'@type': 'QuantitativeValue',
               'unitCode': 'CMT',
               'value': '263.6'},
 'width': {'@type': 'QuantitativeValue', 'unitCode': 'CMT', 'value': '181.6'}}

Process finished with exit code 0

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM