简体   繁体   中英

How to remove Byte Order Mark in python

This question is related to a recent change to the Stack Overflow API that I reported here . In that question, I received a response that seems like it'd work, but in practice I'm unable to make it work.

This is my code

import requests
import json
url="https://api.stackexchange.com/2.2/sites/?filter=%21%2AL1%2AAY-85YllAr2%29&pagesize=1&page=1"
response = requests.get(url)
response.text

This outputs

u'\ufeff{"items":[{"site_state":"normal","api_site_parameter":"stackoverflow","name":"Stack Overflow"}],"has_more":true,"quota_max":300,"quota_remaining":294}'

The leading u'\ means that if I do response.json() I get a ValueError: No JSON object could be decoded

The suggestion I was provided was to use decode('utf-8-sig') . However, I can't seem to get this work work either:

Try 1:

response.text.decode('utf-8-sig')
UnicodeEncodeError: 'ascii' codec can't encode character u'\ufeff' in position 0: ordinal not in range(128)

Try 2:

json.loads(response.text).decode('utf-8-sig')
ValueError: No JSON object could be decoded

What is the appropriate way to remove the leading u'\ ?

response.text is a Unicode object, ie it already has been decoded, so you can't decode it again.

What you need to do is tell the response object which encoding it should use:

response = requests.get(url)
response.encoding = "utf-8-sig"
respose.text

See the docs for more background info .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM