curl 请求在 python3 中没有很好地转换

Question

I am using python3.7.7 and I am trying to make a POST request. 我正在使用 python3.7.7，并且正在尝试发出 POST 请求。 This POST seems to have some anomalies because it always generates a 403 error. 这个 POST 似乎有一些异常，因为它总是产生 403 错误。
In Chrome the problem does not appear, the error code is 200(OK) and the POST responds with expeceted data 在 Chrome 中没有出现问题，错误代码是 200（OK）并且 POST 以预期数据响应
To narrow the problem down, I extracted the curl request from the browser and tried to hardcode it into python. 为了缩小问题的范围，我从浏览器中提取了 curl 请求，并尝试将其硬编码为 python。
So, I have the working curl extracted from browser: 所以，我有工作 curl 从浏览器中提取：

curl 'http://www.zjnsf.gov.cn/h/01/news_list.aspx?t=%u57fa%u91d1%u7ed3%u9898%u9879%u76ee%u6e05%u5355' --data '__VIEWSTATE=%2FwEPDwUJOTIwODkyNjA5D2QWAmYPZBYCZg9kFgICAw9kFgICAQ9kFgICBQ9kFgQCAQ9kFgICAw8PFgIeBFRleHQFDOS%2FoeaBr%2BaQnOe0omRkAgMPZBYCAgcPZBYCZg8PFgQeBFJPV1MC5BIeBUlOREVYZmQWCGYPDxYCHgdWaXNpYmxlaGRkAgEPDxYCHwNoZGQCAg8PFgIfA2dkZAIDDw8WAh8DZ2RkZN0f2oaGWjQWIew4DBiZrFuBSFq0&__VIEWSTATEGENERATOR=E98323FB&ctl00%24ctl00%24ContentPlaceHolder1%24ContentPlaceHolder1%24newData=&ctl00%24ctl00%24ContentPlaceHolder1%24ContentPlaceHolder1%24ctl16=&ctl00%24ctl00%24ContentPlaceHolder1%24ContentPlaceHolder1%24ctl05=&ctl00%24ctl00%24ContentPlaceHolder1%24ContentPlaceHolder1%24ctl06=' --compressed --insecure curl 'http://www.zjnsf.gov.cn/h/01/news_list.aspx?t=%u57fa%u91d1%u7ed3%u9898%u9879%u76ee%u6e05%u5355' --data '__VIEWSTATE=%2FwEPDwUJOTIwODkyNjA5D2QWAmYPZBYCZg9kFgICAw9kFgICAQ9kFgICBQ9kFgQCAQ9kFgICAw8PFgIeBFRleHQFDOS% 2FoeaBr%2BaQnOe0omRkAgMPZBYCAgcPZBYCZg8PFgQeBFJPV1MC5BIeBUlOREVYZmQWCGYPDxYCHgdWaXNpYmxlaGRkAgEPDxYCHwNoZGQCAg8PFgIfA2dkZAIDDw8WAh8DZ2RkZN0f2oaGWjQWIew4DBiZrFuBSFq0&__VIEWSTATEGENERATOR=E98323FB&ctl00%24ctl00%24ContentPlaceHolder1%24ContentPlaceHolder1%24newData=&ctl00%24ctl00%24ContentPlaceHolder1%24ContentPlaceHolder1%24ctl16=&ctl00%24ctl00%24ContentPlaceHolder1%24ContentPlaceHolder1%24ctl05=&ctl00%24ctl00%24ContentPlaceHolder1%24ContentPlaceHolder1%24ctl06=' --compressed --不安全

And the python code for it:以及它的 python 代码：

 post_data = {} url = 'http://www.zjnsf.gov.cn/h/01/news_list.aspx?t=%u57fa%u91d1%u7ed3%u9898%u9879%u76ee%u6e05%u5355' post_data['__VIEWSTATE'] = '%2FwEPDwUJOTIwODkyNjA5D2QWAmYPZBYCZg9kFgICAw9kFgICAQ9kFgICBQ9kFgQCAQ9kFgICAw8PFgIeBFRleHQFDOS%2FoeaBr%2BaQnOe0omRkAgMPZBYCAgcPZBYCZg8PFgQeBFJPV1MC5BIeBUlOREVYZmQWCGYPDxYCHgdWaXNpYmxlaGRkAgEPDxYCHwNoZGQCAg8PFgIfA2dkZAIDDw8WAh8DZ2RkZN0f2oaGWjQWIew4DBiZrFuBSFq0' post_data['__VIEWSTATEGENERATOR'] = 'E98323FB' post_data['ctl00%24ctl00%24ContentPlaceHolder1%24ContentPlaceHolder1%24newData'] = '' post_data['ctl00%24ctl00%24ContentPlaceHolder1%24ContentPlaceHolder1%24ctl16'] = '' post_data['ctl00%24ctl00%24ContentPlaceHolder1%24ContentPlaceHolder1%24ctl05'] = '' post_data['ctl00%24ctl00%24ContentPlaceHolder1%24ContentPlaceHolder1%24ctl06'] = '' p_data = json.dumps(post_data) news_search_page = req_session.post(url, data=p_data)

This always responds with 403 error.这总是以 403 错误响应。
Any idea what might be wrong OR in what direction to investigate?知道什么可能是错误的或在什么方向进行调查？

Answer 1

Maybe you need to add a header with some content.也许您需要添加一些内容的 header。 Some API's require you to show which browser (User-Agent) you are using, you can create a header like this:某些 API 要求您显示您正在使用的浏览器（用户代理），您可以像这样创建 header：

HEADERS = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36'}

And then the API will think that you are using an MacBook with Chrome.然后 API 会认为您正在使用带有 Chrome 的 MacBook。

and btw: the headers are added to the code like this:顺便说一句：标题被添加到代码中，如下所示：

news_search_page = req_session.post(url, data=p_data, headers=HEADERS)

You can also try to write json instead of data:您也可以尝试写 json 而不是数据：

news_search_page = req_session.post(url, json=p_data, headers=HEADERS)

Answer 2

Ok, so now I tried myself, and I don't know if I did it right because I don't understand Chineese, But i found out that the header with a user agent was essential for creating a get request.好的，所以现在我自己尝试了，我不知道我是否做对了，因为我不懂中文，但是我发现带有用户代理的 header 对于创建获取请求至关重要。

With this code i got a 200 response.使用此代码，我得到了 200 响应。 I hope you can use it.我希望你可以使用它。

import requests
HEADERS = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36'}
link = 'http://www.zjnsf.gov.cn/h/01/news_list.aspx'
post_data = {
        '__VIEWSTATE': '%2FwEPDwUJOTIwODkyNjA5D2QWAmYPZBYCZg9kFgICAw9kFgICAQ9kFgICBQ9kFgQCAQ9kFgICAw8PFgIeBFRleHQFDOS%2FoeaBr%2BaQnOe0omRkAgMPZBYCAgcPZBYCZg8PFgQeBFJPV1MC5BIeBUlOREVYZmQWCGYPDxYCHgdWaXNpYmxlaGRkAgEPDxYCHwNoZGQCAg8PFgIfA2dkZAIDDw8WAh8DZ2RkZN0f2oaGWjQWIew4DBiZrFuBSFq0',
        '__VIEWSTATEGENERATOR': 'E98323FB',
        'ctl00%24ctl00%24ContentPlaceHolder1%24ContentPlaceHolder1%24newData': '',
        'ctl00%24ctl00%24ContentPlaceHolder1%24ContentPlaceHolder1%24ctl16': '',
        'ctl00%24ctl00%24ContentPlaceHolder1%24ContentPlaceHolder1%24ctl05': '',
        'ctl00%24ctl00%24ContentPlaceHolder1%24ContentPlaceHolder1%24ctl06': '',
        }
r = requests.post(link, json=post_data, headers=HEADERS)
print(r)

Answer 3

I used Wireshark to detect the small differences between the 2 POST requests;我使用 Wireshark 来检测 2 个 POST 请求之间的细微差别； it seems that the link was the actual problem, unicode characters are written in a text editor like this: '%u57fa' something happens in the belly of Python that the characters are miss-interpreted.似乎链接是实际问题，unicode 字符在文本编辑器中编写，如下所示：'%u57fa' 在 Python 的腹部发生了一些事情，这些字符被错误解释了。
For eg char '%u57fa' will be transformed in '%25u57fa';例如 char '%u57fa' 将转换为 '%25u57fa'; so python extra encodes the characters like '%' one more time before making the request.所以 python 在发出请求之前，对像 '%' 这样的字符进行了额外的编码。
So long story short, I changed the link from: /h/01/news_list.aspx?t=%u57fa%u91d1%u7ed3%u9898%u9879%u76ee%u6e05%u5355长话短说，我将链接更改为： /h/01/news_list.aspx?t=%u57fa%u91d1%u7ed3%u9898%u9879%u76ee%u6e05%u5355
TO:至：
http://www.zjnsf.gov.cn/h/01/news_list.aspx?t=基金结题项目清单http://www.zjnsf.gov.cn/h/01/news_list.aspx?t=基金结题项目清单

this solves the problem, but I don't really understand who creates the problem, since unicode values for the characters are OK:这解决了问题，但我真的不明白是谁造成了问题，因为字符的 unicode 值是可以的：

57FA -> 基57FA -> 基

curl 请求在 python3 中没有很好地转换

问题描述

3 个解决方案

解决方案1
0 2020-07-06 22:11:29

解决方案2
0 2020-07-07 07:07:02

解决方案3
0 已采纳 2020-07-07 12:43:13

curl 请求在 python3 中没有很好地转换

问题描述

3 个解决方案

解决方案1 0 2020-07-06 22:11:29

解决方案2 0 2020-07-07 07:07:02

解决方案3 0 已采纳 2020-07-07 12:43:13

解决方案1
0 2020-07-06 22:11:29

解决方案2
0 2020-07-07 07:07:02

解决方案3
0 已采纳 2020-07-07 12:43:13