简体   繁体   中英

Python 3 requests.get().text returns unencoded string

Python 3 requests.get().text returns unencoded string. If I execute:

import requests
request = requests.get('https://google.com/search?q=Кто является президентом России?').text.lower()
print(request)

I get kind of this:

Кто является презид

I've tried to change google.com to google.ru

If I execute:

import requests
request = requests.get('https://google.ru/search?q=Кто является президентом России?').text.lower()
print(request)

I get kind of this:

d0%9a%d1%82%d0%be+%d1%8f%d0%b2%d0%bb%d1%8f%d0%b5%d1%82%d1%81%d1%8f+%d0%bf%d1%80%d0%b5%d0%b7%d0%b8%d0%b4%d0%b5%d0%bd%d1%82%d0%be%d0%bc+%d0%a0%d0%be%d1%81%d1%81%d0%b8%d0

I need to get an encoded normal string.

You were getting this error because requests was not able to identify the correct encoding of the response. So if you are sure about the response encoding then you can set it like the following:

response = requests.get(url) response.encoding --> to check the encoding response.encoding = "utf-8" --> or any other encoding.

And then get the content with .text method.

I fixed it with urllib.parse.unquote() method:

import requests
from urllib.parse import unquote

request = unquote(requests.get('https://google.ru/search?q=Кто является президентом России?').text.lower())
print(request)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM