简体   繁体   English

curl 请求在 python3 中没有很好地转换

[英]curl request not transformed well in python3


I am using python3.7.7 and I am trying to make a POST request. 我正在使用 python3.7.7,并且正在尝试发出 POST 请求。 This POST seems to have some anomalies because it always generates a 403 error. 这个 POST 似乎有一些异常,因为它总是产生 403 错误。
In Chrome the problem does not appear, the error code is 200(OK) and the POST responds with expeceted data 在 Chrome 中没有出现问题,错误代码是 200(OK)并且 POST 以预期数据响应
To narrow the problem down, I extracted the curl request from the browser and tried to hardcode it into python. 为了缩小问题的范围,我从浏览器中提取了 curl 请求,并尝试将其硬编码为 python。
So, I have the working curl extracted from browser: 所以,我有工作 curl 从浏览器中提取:

curl 'http://www.zjnsf.gov.cn/h/01/news_list.aspx?t=%u57fa%u91d1%u7ed3%u9898%u9879%u76ee%u6e05%u5355' --data '__VIEWSTATE=%2FwEPDwUJOTIwODkyNjA5D2QWAmYPZBYCZg9kFgICAw9kFgICAQ9kFgICBQ9kFgQCAQ9kFgICAw8PFgIeBFRleHQFDOS%2FoeaBr%2BaQnOe0omRkAgMPZBYCAgcPZBYCZg8PFgQeBFJPV1MC5BIeBUlOREVYZmQWCGYPDxYCHgdWaXNpYmxlaGRkAgEPDxYCHwNoZGQCAg8PFgIfA2dkZAIDDw8WAh8DZ2RkZN0f2oaGWjQWIew4DBiZrFuBSFq0&__VIEWSTATEGENERATOR=E98323FB&ctl00%24ctl00%24ContentPlaceHolder1%24ContentPlaceHolder1%24newData=&ctl00%24ctl00%24ContentPlaceHolder1%24ContentPlaceHolder1%24ctl16=&ctl00%24ctl00%24ContentPlaceHolder1%24ContentPlaceHolder1%24ctl05=&ctl00%24ctl00%24ContentPlaceHolder1%24ContentPlaceHolder1%24ctl06=' --compressed --insecure curl 'http://www.zjnsf.gov.cn/h/01/news_list.aspx?t=%u57fa%u91d1%u7ed3%u9898%u9879%u76ee%u6e05%u5355' --data '__VIEWSTATE=%2FwEPDwUJOTIwODkyNjA5D2QWAmYPZBYCZg9kFgICAw9kFgICAQ9kFgICBQ9kFgQCAQ9kFgICAw8PFgIeBFRleHQFDOS% 2FoeaBr%2BaQnOe0omRkAgMPZBYCAgcPZBYCZg8PFgQeBFJPV1MC5BIeBUlOREVYZmQWCGYPDxYCHgdWaXNpYmxlaGRkAgEPDxYCHwNoZGQCAg8PFgIfA2dkZAIDDw8WAh8DZ2RkZN0f2oaGWjQWIew4DBiZrFuBSFq0&__VIEWSTATEGENERATOR=E98323FB&ctl00%24ctl00%24ContentPlaceHolder1%24ContentPlaceHolder1%24newData=&ctl00%24ctl00%24ContentPlaceHolder1%24ContentPlaceHolder1%24ctl16=&ctl00%24ctl00%24ContentPlaceHolder1%24ContentPlaceHolder1%24ctl05=&ctl00%24ctl00%24ContentPlaceHolder1%24ContentPlaceHolder1%24ctl06=' --compressed --不安全

And the python code for it:以及它的 python 代码:

 post_data = {} url = 'http://www.zjnsf.gov.cn/h/01/news_list.aspx?t=%u57fa%u91d1%u7ed3%u9898%u9879%u76ee%u6e05%u5355' post_data['__VIEWSTATE'] = '%2FwEPDwUJOTIwODkyNjA5D2QWAmYPZBYCZg9kFgICAw9kFgICAQ9kFgICBQ9kFgQCAQ9kFgICAw8PFgIeBFRleHQFDOS%2FoeaBr%2BaQnOe0omRkAgMPZBYCAgcPZBYCZg8PFgQeBFJPV1MC5BIeBUlOREVYZmQWCGYPDxYCHgdWaXNpYmxlaGRkAgEPDxYCHwNoZGQCAg8PFgIfA2dkZAIDDw8WAh8DZ2RkZN0f2oaGWjQWIew4DBiZrFuBSFq0' post_data['__VIEWSTATEGENERATOR'] = 'E98323FB' post_data['ctl00%24ctl00%24ContentPlaceHolder1%24ContentPlaceHolder1%24newData'] = '' post_data['ctl00%24ctl00%24ContentPlaceHolder1%24ContentPlaceHolder1%24ctl16'] = '' post_data['ctl00%24ctl00%24ContentPlaceHolder1%24ContentPlaceHolder1%24ctl05'] = '' post_data['ctl00%24ctl00%24ContentPlaceHolder1%24ContentPlaceHolder1%24ctl06'] = '' p_data = json.dumps(post_data) news_search_page = req_session.post(url, data=p_data)

This always responds with 403 error.这总是以 403 错误响应。
Any idea what might be wrong OR in what direction to investigate?知道什么可能是错误的或在什么方向进行调查?

Maybe you need to add a header with some content.也许您需要添加一些内容的 header。 Some API's require you to show which browser (User-Agent) you are using, you can create a header like this:某些 API 要求您显示您正在使用的浏览器(用户代理),您可以像这样创建 header:

HEADERS = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36'}

And then the API will think that you are using an MacBook with Chrome.然后 API 会认为您正在使用带有 Chrome 的 MacBook。

and btw: the headers are added to the code like this:顺便说一句:标题被添加到代码中,如下所示:

news_search_page = req_session.post(url, data=p_data, headers=HEADERS)

You can also try to write json instead of data:您也可以尝试写 json 而不是数据:

news_search_page = req_session.post(url, json=p_data, headers=HEADERS)

Ok, so now I tried myself, and I don't know if I did it right because I don't understand Chineese, But i found out that the header with a user agent was essential for creating a get request.好的,所以现在我自己尝试了,我不知道我是否做对了,因为我不懂中文,但是我发现带有用户代理的 header 对于创建获取请求至关重要。

With this code i got a 200 response.使用此代码,我得到了 200 响应。 I hope you can use it.我希望你可以使用它。

import requests
HEADERS = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36'}
link = 'http://www.zjnsf.gov.cn/h/01/news_list.aspx'
post_data = {
        '__VIEWSTATE': '%2FwEPDwUJOTIwODkyNjA5D2QWAmYPZBYCZg9kFgICAw9kFgICAQ9kFgICBQ9kFgQCAQ9kFgICAw8PFgIeBFRleHQFDOS%2FoeaBr%2BaQnOe0omRkAgMPZBYCAgcPZBYCZg8PFgQeBFJPV1MC5BIeBUlOREVYZmQWCGYPDxYCHgdWaXNpYmxlaGRkAgEPDxYCHwNoZGQCAg8PFgIfA2dkZAIDDw8WAh8DZ2RkZN0f2oaGWjQWIew4DBiZrFuBSFq0',
        '__VIEWSTATEGENERATOR': 'E98323FB',
        'ctl00%24ctl00%24ContentPlaceHolder1%24ContentPlaceHolder1%24newData': '',
        'ctl00%24ctl00%24ContentPlaceHolder1%24ContentPlaceHolder1%24ctl16': '',
        'ctl00%24ctl00%24ContentPlaceHolder1%24ContentPlaceHolder1%24ctl05': '',
        'ctl00%24ctl00%24ContentPlaceHolder1%24ContentPlaceHolder1%24ctl06': '',
        }
r = requests.post(link, json=post_data, headers=HEADERS)
print(r)

I used Wireshark to detect the small differences between the 2 POST requests;我使用 Wireshark 来检测 2 个 POST 请求之间的细微差别; it seems that the link was the actual problem, unicode characters are written in a text editor like this: '%u57fa' something happens in the belly of Python that the characters are miss-interpreted.似乎链接是实际问题,unicode 字符在文本编辑器中编写,如下所示:'%u57fa' 在 Python 的腹部发生了一些事情,这些字符被错误解释了。
For eg char '%u57fa' will be transformed in '%25u57fa';例如 char '%u57fa' 将转换为 '%25u57fa'; so python extra encodes the characters like '%' one more time before making the request.所以 python 在发出请求之前,对像 '%' 这样的字符进行了额外的编码。
So long story short, I changed the link from: /h/01/news_list.aspx?t=%u57fa%u91d1%u7ed3%u9898%u9879%u76ee%u6e05%u5355长话短说,我将链接更改为: /h/01/news_list.aspx?t=%u57fa%u91d1%u7ed3%u9898%u9879%u76ee%u6e05%u5355
TO:至:
http://www.zjnsf.gov.cn/h/01/news_list.aspx?t=基金结题项目清单http://www.zjnsf.gov.cn/h/01/news_list.aspx?t=基金结题项目清单


this solves the problem, but I don't really understand who creates the problem, since unicode values for the characters are OK:这解决了问题,但我真的不明白是谁造成了问题,因为字符的 unicode 值是可以的:

57FA -> 基57FA -> 基

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM