简体   繁体   中英

Web scraping with python, request.json() shows status_code of 200 but can not extract json data

I am trying to scrape shopee item information using python.

Take https://shopee.com.my/All%20in%20one%20pc%20Intel%20core%20I3/I5/I7%20Dual-core%208G%20RAM%20128%20gb%20SSD%20With%20optical%20drive%20CD%2023.8%20Inch%20computer%20Office%20Desktop%20All-in-one%20desktop%20Support%20WiFi-i.206039726.5859069631 as an example.

As it is using ajax, I am trying to extract it from: https://shopee.com.my/api/v2/item/get?itemid=5859069631&shopid=206039726

When i copy the above link in my browser, it works perfectly well with all info i need. But when I try to get it using request.get(), it response an json with no actual data;

{'item': None, 'version': 'be8962b139db1273b88c291407137744', 'data': None, 'error_msg': None, 'error': None}

my code:

url = 'https://shopee.com.my/api/v2/item/get?itemid=5859069631&shopid=206039726'

response = requests.get(url)

if response.status_code == 200:

    item_info = response.json()
    
    
print(item_info)

Strange thing is that the code works perfectly with: url = 'https://shopee.com.my/api/v4/product/get_shop_info?shopid=206039726' when i am trying to extract the shop info.

Not sure the reason why and how should i fix this. Many thanks!!

Add User-Agent HTTP header to obtain correct result:

import json
import requests

headers = {
    "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:89.0) Gecko/20100101 Firefox/89.0"
}
url = "https://shopee.com.my/api/v2/item/get?itemid=5859069631&shopid=206039726"

response = requests.get(url, headers=headers)

if response.status_code == 200:
    item_info = response.json()

print(json.dumps(item_info, indent=4))

Prints:

{
    "item": {
        "itemid": 5859069631,
        "price_max_before_discount": -1,
        "item_status": "n",
        "can_use_wholesale": false,
        "brand_id": null,
        "show_free_shipping": true,
        "estimated_days": 6,
        "is_hot_sales": false,
        "is_slash_price_item": false,
        "upcoming_flash_sale": null,
        "slash_lowest_price": null,
        "is_partial_fulfilled": false,
        "condition": 2,
        "show_original_guarantee": true,
        "add_on_deal_info": null,
        "is_non_cc_installment_payment_eligible": false,
        "categories": [
            {
                "display_name": "Computer & Access",
                "catid": 340,
                "image": null,
                "no_sub": true,
                "is_default_subcat": true,
                "block_buyer_platform": null
            },
            {
                "display_name": "Des",
                "catid": 17578,
                "image": null,
                "no_sub": false,
                "is_default_subcat": true,
                "block_buyer_platform": null
            },
            {
                "display_name": "All-in-one Des",
                "catid": 20050,
...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM