简体   繁体   中英

ruamel.yaml 0.17.16 python3 UnicodeEncodeError

I use python3.5.2, and ruamel.yaml version is 0.17.16, when I use ruamel_yaml.dump(content, fp) , it appeared error

"UnicodeEncodeError: 'ascii' codec can't encode character '\’' in position 434: ordinal not in range(128)"

on ruamel/yaml/emitter.py write_comment function.

And I set ruamel_yaml.encoding = True , when read file I also set encoding='UTF-8', it doesn't work.

And I changed to python3.7, There is no UnicodeEncodeError error, but I found generated file has wrong 'utf-8' error.

Does the ruamel.yaml need to match which python version? Or how to solve this problem?

I am not sure why you are getting no error and different results with different Python versions. AFAICT there is nothing 3.5 or 3.7 specific with regards to handling files. Although end-of-life Python 3.5 is still supported and tested.

You don't provide much code (you should), but from the error I can tell you try to dump something with a right single quotation mark ( ' , Unicode code point 2019) .

You should include more of your code so it is clear how the files are opened, and include on which platform you run your code (Windows). You are as you are most likely writing to text file ( open('somefile.yaml') ) where you should write to a file opened for binary ( open('somefile.yaml', 'wb') )

The YAML() instance already has the attribute .encoding set to utf-8 by default, so setting that again will have no effect.

import sys
import pathlib
import ruamel.yaml

data = dict(text="here comes the unicode quote -> \u2019")
print('python version:', sys.version_info)

yaml = ruamel.yaml.YAML()
yaml_file = pathlib.Path('somefile.yaml')

# You can open the Path like this, but it is better to have ruamel.yaml do it
# with yaml_file.open('wb') as fp:
#     yaml.dump(data, fp)

yaml.dump(data, yaml_file)

readback = yaml_file.read_bytes()
print('{:02x}{:02x}{:02x}'.format(readback[-4], readback[-3], readback[-2]))

which gives:

python version: sys.version_info(major=3, minor=5, micro=9, releaselevel='final', serial=0)
e28099

which is the expected UTF-8 encoding of the right single quotation mark.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM