简体繁体中英

Python: how to parse non-ASCII characters in string

原文 2019-05-29 18:17:20 4 1 python/ encoding/ character-encoding/ python-unicode

In my Python script, I'm trying to read in a text file that contains columns with people's first and last names, some of which have non-ASCII characters like ñ . But when I do so, I get the error UnicodeDecodeError: 'utf8' codec can't decode byte 0x96 in position 66 .

From what I've been reading online, I know you can handle this problem by ignoring or dropping the non-ASCII characters, but I don't want to do that. Is there a straight-forward way of converting all non-ASCII characters in a file into a normal string?

Currently, I'm opening my file with infile = open(filename, 'rU') .

Not duplicate question : I'm asking about how to read in a file with unicode characters, not how to write unicode string out to a file.

1 answers

Make a copy of the file.
Make sure that your file in unicode and find out which unicode format it uses. some simple editors like geany helps you to find the right encoding that was used on creation of the file. Split file if it is big and process a part of it by editors.
Use the correct encoding (maybe it is old cp encoding) for opening the file and do file conversion to utf8. Or use a tool (like editor) to convert it to utf8

How to print non-ASCII characters in Python

Python non-ascii characters

How to Parse HTML with Non-ASCII Characters using BeautifulSoup?

How to convert a C string (char array) into a Python string when there are non-ASCII characters in the string?

Remove non-ASCII characters from a string using python / django

Remove non-ascii and special characters from a string Python

Trouble reading string with non-ascii characters in python 3

How to split string containing non-ascii characters in views?

How to split a string by spaces and remove non-ASCII characters?

How to make the python interpreter correctly handle non-ASCII characters in string operations?

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question How to print non-ASCII characters in Python Python non-ascii characters How to Parse HTML with Non-ASCII Characters using BeautifulSoup? How to convert a C string (char array) into a Python string when there are non-ASCII characters in the string? Remove non-ASCII characters from a string using python / django Remove non-ascii and special characters from a string Python Trouble reading string with non-ascii characters in python 3 How to split string containing non-ascii characters in views? How to split a string by spaces and remove non-ASCII characters? How to make the python interpreter correctly handle non-ASCII characters in string operations?

Related Tags

Python: how to parse non-ASCII characters in string

Question

1 answers

solution1 0 2019-05-30 10:51:02

solution1
0 2019-05-30 10:51:02