简体   繁体   中英

Python requests encoding issues

I use python requests to make get request to this url . Here is the code snippet.

url = 'http://213.139.159.46/prj-wwvauskunft/projects/gus/daten/index.jsp?'
params = {'id': 2619521210}

response = requests.get(
    url,
    params=params
)

print(response.status_code)

text = response.text
content = response.content

I run the same code in Python2.7 and Python3.6

When I compare text variable between the two versions, they are different. But content between the two versions are the same. I am confused as to why content is the same but text are different. Shouldn't the text be the same as well if they are using the same encoding to encode text to content?

I used chardet to detect the encoding of content, both versions ended with ISO-8859-1 . What's could be the possible reason for them not to use utf-8 . Is it just a preference?

Also, when I do:

content.replace('span', '')

In Python2, it works. In Python3, it would throw the this error. TypeError: a bytes-like object is required, not 'str' (Using b'span' and b'' would solve the probelm)

But when I do:

text.replace('span', '')

Both version works. Why is that?

There is no guaranty for Python 2 and Python 3 compatibility (neither backward nor forward). Read eg Python 2 vs Python 3: Key Differences . For instance, if your script was modified (add following code snippet to the end):

print('type(text)   ', type(text))
print('type(content)', type(content))

Output :

py -2 D:\Python\SO3\61954902.py
 200 ('type(text) ', <type 'unicode'>) ('type(content)', <type 'str'>)
py -3 D:\Python\SO3\61954902.py
 200 type(text) <class 'str'> type(content) <class 'bytes'>

For the sake of completeness, the script is as follows:

type D:\Python\SO3\61954902.py
 import requests url = 'http://213.139.159.46/prj-wwvauskunft/projects/gus/daten/index.jsp?' params = {'id': 2619521210} response = requests.get( url, params=params ) print(response.status_code) text = response.text content = response.content print('type(text) ', type(text)) print('type(content)', type(content))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM