简体   繁体   中英

how to scrape data from telemart using python?

import bs4
import requests
url = requests.get(
    'https://www.telemart.pk/mobile-and-tablets/mobile-phone.html')
soup = bs4.BeautifulSoup(url.text, features='lxml')
print(soup)

i want to scrape price, image_link, product_link, title of each item but the data is in XHR. so how can i scrape data from xhr

You can get the content from that page using post http requests like the way I've shown below.

import requests
from bs4 import BeautifulSoup

link = 'https://www.telemart.pk/index.php/home/listed/click/{}'

payload = {
    'category': '35',
    'rating': '',
    'sub_category': '',
    'brand': '',
    'attributesids': '',
    'ishiddenprice': '0',
    'featured': '',
    'range': '6499;471999',
    'express_delivery': '',
    'text': '',
    'view_type': 'grid'
}

with requests.Session() as s:
    r = s.get(link)
    csrf_id = r.cookies['csrf_cookie_name']
    payload['csrf_test_name'] = csrf_id

    for i in range(0,144,48):
        res = s.post(link.format(i),data=payload,headers={"x-requested-with":"XMLHttpRequest"})
        soup = BeautifulSoup(res.text,"html.parser")
        for item in soup.select(".product-details"):
            product_name = item.select_one("h2.product-name").get_text(strip=True)
            price = item.select_one(".price > ins").get_text(strip=True)
            print(product_name,price)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM