简体繁体中英

Change to recognized encoding when reading a text file?

原文 2016-09-04 11:09:18 2 1 python/ file/ python-3.x/ encoding

When a text file is open for reading using (say) UTF-8 encoding, is it possible to change encoding during the reading?

Motivation: It hapens that you need to read a text file that was written using non-default encoding. The text format may contain the information about the used encoding. Let an HTML file be the example, or XML, or ASCIIDOC, and many others. In such cases, the lines above the encoding information are allowed to contain only ASCII or some default encoding.

In Python, it is possible to read the file in binary mode, and translate the lines of bytes type to str on your own. When the information about the encoding is found on some line, you just switch the encoding to be used when converting the lines to unicode strings.

In Python 3, text files are implemented using TextIOBase that defines also the encoding attribute, the buffer , and other things.

Is there any nice way to change the encoding information (used for decoding the bytes ) so that the next lines would be decoded in the wanted way?

1 answers

Classic usage is:

Open the file in binary format (bytes string)
read a chunk and guess the encoding (For instance with a simple scanning or using RegEx)

Then:

close the file and re-open it in text mode with the found encoding Or
move to the beginning: seek(0), read the whole content as a bytes string then decode the content using the found encoding.

See this example: Detect character encoding in an XML file (Python recipe) note: the code is a little old, but useful.

Unicode encoding when reading from text file

Python encoding issue in reading from text file

Encoding issue when reading file in Python

encoding issue when reading CSV file with python

Wrong encoding when reading file in Python 3?

Cannot correctly encode string when reading from text file (encoding into sha256…)

UnicodeDecodeError when reading a text file

Encoding issue when writing to text file, with Python

Pandas dataframe and character encoding when reading excel file

encoding utf-8 not working when reading in json file

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Unicode encoding when reading from text file Python encoding issue in reading from text file Encoding issue when reading file in Python encoding issue when reading CSV file with python Wrong encoding when reading file in Python 3? Cannot correctly encode string when reading from text file (encoding into sha256…) UnicodeDecodeError when reading a text file Encoding issue when writing to text file, with Python Pandas dataframe and character encoding when reading excel file encoding utf-8 not working when reading in json file

Related Tags

Change to recognized encoding when reading a text file?

Question

1 answers

solution1 1 2016-09-04 12:05:10

solution1
1 2016-09-04 12:05:10