简体   繁体   中英

Python UnicodeDecodeError with 'gbk' while using third party libraries

I'm trying the examples in "PyDeepGP" provided by SheffieldML ( https://github.com/SheffieldML/PyDeepGP ). In the example code, a third-party library pods is used to provide some open datasets.

Once running the example code, I immediately encountered the error message: "UnicodeDecodeError: 'gbk' codec can't decode byte 0x93 in position 37571: illegal multibyte sequence". It turns out that the error occurs when the pods trying to open a file called 'data_resources.json' and do file.read() on it.

I manually f = open("...\\data_resources.json", encoding="utf8") and then f.read() , everything will be fine. So it's obvious my python choose the 'gbk' to decode an 'utf8' file with unknown reason.

However, it is infeasible for me to add encoding="utf8" to every piece of code in a third-party library. I want to force my python opening file automatically with utf-8.

I checked the sys.getdefaultencoding() and it is 'utf8' as usual. I have tried to set my vscode with files.autoGuessEncoding turning on and then off. And I even tried the solution provided by http://www.programmersought.com/article/4189689383/ , adding

import _locale
_locale._getdefaultlocale = (lambda *args: ['zh_CN', 'utf8'])

to the codes. It will work only when I manually write open(file) as f and then f.read() . However it fails when I run the whole example code, where the file is open by the third-party library.

I am currently using python 3.6.7 on Windows 10 with Anaconda.

Here's the complete error message:

Traceback (most recent call last):
  File "..../SheffieldML-PyDeepGP/examples/example_supervised_learning.py", line 35, in <module>
    import pods
  File "....\lib\site-packages\pods\__init__.py", line 5, in <module>
    from . import datasets
  File "....\lib\site-packages\pods\datasets.py", line 53, in <module>
    json_data=open(path).read()
UnicodeDecodeError: 'gbk' codec can't decode byte 0x93 in position 37571: illegal multibyte sequence

I got exactly the same error. I got some clue from https://github.com/rkern/line_profiler/issues/37 and change the datasets.py line 53 as

json_data=open(path,'rb').read()

By using the binary mode, it will probably ignore the codec things.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM