简体   繁体   中英

My python can only open the saved text file by notepad, why?

I am using Pyhton3.4.1 and win7. I am trying to reading a txt file exported from a software. it seems that python cannot read this text file. But I found if I open the text file by notepad and add a space in any place and save it, the python works well then.

I tried the same code and same file on my mac, it has the same problem as in windows. For original text file, not working,open and saved in windows notepad, working, open ans saved in mac textedit, not working.

I am doubting the original coding of the text file might not be right.

Thanks

Python code

InputFileName=input("Please tell me the input file name:")
#StartLNum=int(input("Please tell me the start line number:"))
#EndLNum=int(input("Please tell me the end line number:"))

StartLNum=18
EndLNum=129

lnum=1
OutputName='out'+InputFileName
fw=open(OutputName,'w')
with open(InputFileName,"r") as fr:
    for line in fr:
        if (lnum >= StartLNum) & (lnum<=EndLNum):
            #print(line)
            fw.write(line)
        lnum+=1
fw.close()

Shell

>>> ================================ RESTART ================================
>>> 
Please tell me the input file name:Jul-18-2014.txt
Traceback (most recent call last):
File "C:\Users\Jeremy\Desktop\read.py", line 13, in <module>
for line in fr:
UnicodeDecodeError: 'gbk' codec can't decode byte 0xb3 in position 4309: illegal multibyte sequence
>>> ================================ RESTART ================================
>>> 
Please tell me the input file name:Jul-18-2014.txt
>>> 

Plus, the error below is the same code reported on my mac(Python3.4.1,OS10.9)

Traceback (most recent call last):
File "/Users/Jeremy/Desktop/read.py", line 14, in <module>
for line in fr:
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/encodings/ascii.py",  line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xb3 in position 4174: ordinal not in range(128)

When you save the file in Notepad, the file is reencoded to be saved as your default file encoding for your Windows installation. Notepad auto-detected the encoding when it opened the file, however.

Python opens file using that same system encoding, by default, which is why you can now open the file. Quoting the open() function documentation:

encoding is the name of the encoding used to decode or encode the file. This should only be used in text mode. The default encoding is platform dependent (whatever locale.getpreferredencoding() returns), but any encoding supported by Python can be used.

You'll have to explicitly specify the correct encoding for the file if you wanted to open it directly in Python:

with open(InputFileName, "r", encoding='utf-8-sig') as fr:

I used 'utf-8-sig' as an example here, as that is a file encoding that Notepad can auto-detect. It could well be that the encoding is UTF-16 or plain UTF-8 or any number of other encodings, however.

If you think that the page is encoded with a specific ANSI codepage you still have to name the exact codepage. Your system is configured to use code page 936 (GBK) but that is not the correct encoding for this file.

See the codecs module for a list of supported encodings.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM