简体   繁体   中英

Python Requests Chunk Data

I'm trying to scrap a site with python's requests library, but it's chunking my data. The site in question is a bit weird and returns html from a POST and when I read the response, requests prints only 1/5 of the page. heres the code:

import requests
LIST_ITEMS_URL = 'http://www.solicitador.org/vendas/consultas/ListaBens.jsp'

r = requests.post(LIST_ITEMS_URL, 
            data = {
                'iddistrito': 13, 
                'idconcelho': 6, 
                'tipo_bem':1,
                'pageOri': 'PesquisaAvancada.jsp'
            }, headers = {
                'Content-Type' : 'application/x-www-form-urlencoded',
                'Content-Length' : '111',
                'Cookie' : 'JSESSIONID=0002K67DUGhI4ioO6eE3oCeKYSQ:-G1B89M',
                'Upgrade-Insecure-Requests' : '1'
            }
    )
print r.content

Also if when i use a browser API Client, the response size has 31 KBytes, with requests only +- 8192 Bytes

anyone has any idea of whats limiting the response here?

You're missing pagination in your code. Quick glance to the page revealed that, you may get rest of the pages as using currentPage parameter in your POST request. currentPage:1 gets the second page, currentPage:2 gets the 3rd page so on.

Issues i've found in given example:

  • remove cookie & content-length fields from headers.
  • remove data.

The post result should be complete.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM