简体   繁体   中英

Python: Unicode source file adds spaces (actually null bytes) between characters

I am a newbie. However, I managed to extract some lines from a txt-file (unicode) and write them in another file.

lines = InFile.readlines()
OutFile.writelines(lines[3:])

It is working but (I believe) due to a coding issue there is a space added between each character in the output file. Example of a result:

2 0 1 3 - 1 2 - 2 3 ; ; 3 6 0 . 3 7 
2 0 1 3 - 1 2 - 2 4 ; ; 0 . 0 0 

Lines in the source file:

2013-12-23;;360.37
2013-12-24;;0.00

If I save the txt source file as ANSI before running the script, I receive the correct results. However, as the source file is delivered automatically as Unicode by another software, it is not practical to change that every time manually. I read through a lot of other coding/encoding/decoding questions. But I am completely lost and don't know how I can fix that issue. Which is the correct command? At which place in the script? Or am I completely wrong and it doesn't have anything to do with a coding issue?

I'm fairly certain that your input file is UTF-16 encoded, and the spaces you're seeing are actually null bytes.

Try

with open("myfile.txt", "r", encoding="utf-16") as infile:
    lines = infile.readlines()

and see if the problem persists.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM