简体   繁体   English

Web 用 python 抓取,request.json() 显示 status_code 为 200 但无法提取 json 数据

[英]Web scraping with python, request.json() shows status_code of 200 but can not extract json data

I am trying to scrape shopee item information using python.我正在尝试使用 python 来抓取 Shopee 商品信息。

Take https://shopee.com.my/All%20in%20one%20pc%20Intel%20core%20I3/I5/I7%20Dual-core%208G%20RAM%20128%20gb%20SSD%20With%20optical%20drive%20CD%2023.8%20Inch%20computer%20Office%20Desktop%20All-in-one%20desktop%20Support%20WiFi-i.206039726.5859069631 as an example.采用https://SHOPEE.Z4D236D9A2D9A2D102C5FE6AD1C50DA4BEC50DA4BEC50Z.MY/ALL%20IN%20IN%20IN%20ONE%20ONE%20PC%20PCPITILL以%2023.8%20Inch%20computer%20Office%20Desktop%20All-in-one%20desktop%20Support%20WiFi-i.206039726.5859069631为例。

As it is using ajax, I am trying to extract it from: https://shopee.com.my/api/v2/item/get?itemid=5859069631&shopid=206039726 As it is using ajax, I am trying to extract it from: https://shopee.com.my/api/v2/item/get?itemid=5859069631&shopid=206039726

When i copy the above link in my browser, it works perfectly well with all info i need.当我在浏览器中复制上述链接时,它可以很好地处理我需要的所有信息。 But when I try to get it using request.get(), it response an json with no actual data;但是当我尝试使用 request.get() 获取它时,它会响应一个没有实际数据的 json;

{'item': None, 'version': 'be8962b139db1273b88c291407137744', 'data': None, 'error_msg': None, 'error': None}

my code:我的代码:

url = 'https://shopee.com.my/api/v2/item/get?itemid=5859069631&shopid=206039726'

response = requests.get(url)

if response.status_code == 200:

    item_info = response.json()
    
    
print(item_info)

Strange thing is that the code works perfectly with: url = 'https://shopee.com.my/api/v4/product/get_shop_info?shopid=206039726' when i am trying to extract the shop info.奇怪的是,当我试图提取商店信息时,代码可以完美地与: url = 'https://shopee.com.my/api/v4/product/get_shop_info?shopid=206039726' 配合使用。

Not sure the reason why and how should i fix this.不知道为什么以及我应该如何解决这个问题。 Many thanks!!非常感谢!!

Add User-Agent HTTP header to obtain correct result:添加User-Agent HTTP header 以获得正确的结果:

import json
import requests

headers = {
    "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:89.0) Gecko/20100101 Firefox/89.0"
}
url = "https://shopee.com.my/api/v2/item/get?itemid=5859069631&shopid=206039726"

response = requests.get(url, headers=headers)

if response.status_code == 200:
    item_info = response.json()

print(json.dumps(item_info, indent=4))

Prints:印刷:

{
    "item": {
        "itemid": 5859069631,
        "price_max_before_discount": -1,
        "item_status": "n",
        "can_use_wholesale": false,
        "brand_id": null,
        "show_free_shipping": true,
        "estimated_days": 6,
        "is_hot_sales": false,
        "is_slash_price_item": false,
        "upcoming_flash_sale": null,
        "slash_lowest_price": null,
        "is_partial_fulfilled": false,
        "condition": 2,
        "show_original_guarantee": true,
        "add_on_deal_info": null,
        "is_non_cc_installment_payment_eligible": false,
        "categories": [
            {
                "display_name": "Computer & Access",
                "catid": 340,
                "image": null,
                "no_sub": true,
                "is_default_subcat": true,
                "block_buyer_platform": null
            },
            {
                "display_name": "Des",
                "catid": 17578,
                "image": null,
                "no_sub": false,
                "is_default_subcat": true,
                "block_buyer_platform": null
            },
            {
                "display_name": "All-in-one Des",
                "catid": 20050,
...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM