简体   繁体   中英

Not able to read file due to unicode error in python

I'm trying to read a file and when I'm reading it, I'm getting a unicode error.

def reading_File(self,text):

     url_text =  "Text1.txt"
     with open(url_text) as f:
                content = f.read()

Error:

content = f.read()# Read the whole file
 File "/home/soft/anaconda/lib/python3.6/encodings/ascii.py", line 26, in 
 decode
 return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0x92 in position 404: 
ordinal not in range(128)

Why is this happening? I'm trying to run the same on Linux system, but on Windows it runs properly.

According to the question,

i'm trying to run the same on Linux system, but on Windows it runs properly.

Since we know from the question and some of the other answers that the file's contents are neither ASCII nor UTF-8, it's a reasonable guess that the file is encoded with one of the 8-bit encodings common on Windows.

As it happens 0x92 maps to the character 'RIGHT SINGLE QUOTATION MARK' in the cp125* encodings, used on US and latin/European regions.

So probably the the file should be opened like this:

# Python3
with open(url_text, encoding='cp1252') as f:
    content = f.read()

# Python2
import codecs
with codecs.open(url_text, encoding='cp1252') as f:
    content = f.read()

You can use codecs.open to fix this issue with the correct encoding:

import codecs
with codecs.open(filename, 'r', 'utf8' ) as ff:
    content = ff.read()

As it looks, default encoding is ascii while Python3 it's utf-8, below syntax to open the file can be used

open(file, encoding='utf-8')

Check your system default encoding,

>>> import sys
>>> sys.stdout.encoding
'UTF-8'

If it's not UTF-8, reset the encoding of your system.

 export LANGUAGE=en_US.UTF-8
 export LC_ALL=en_US.UTF-8
 export LANG=en_US.UTF-8
 export LC_TYPE=en_US.UTF-8

I am assuming the data contains bytes, so why don't you try this one

with open("myfile", "rb") as f:
    byte = f.read(1)
    while byte != "":
        # Do stuff with byte.
        byte = f.read(1)

Once you have the bytes, you can easily decode it into a string

myString = byte.decode("utf-8") 

There can be two reasons for that to happen:

  1. The file contains text encoded with an encoding different than 'ascii' and, according you your comments to other answers, 'utf-8' .

  2. The file doesn't contain text at all, it is binary data.

In case 1 you need to figure out how the text was encoded and use that encoding to open the file:

open(url_text, encoding=your_encoding)

In case 2 you need to open the file in binary mode:

open(url_text, 'rb')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM