python opens text file with a space between every character

Question

Whenever I try to open a .csv file with the python command fread = open('input.csv', 'r') it always opens the file with spaces between every single character. I'm guessing it's something wrong with the text file because I can open other text files with the same command and they are loaded correctly. Does anyone know why a text file would load like this in python?

Thanks.

Update

Ok, I got it with the help of Jarret Hardie's post

this is the code that I used to convert the file to ascii

fread = open('input.csv', 'rb').read()
mytext = fread.decode('utf-16')
mytext = mytext.encode('ascii', 'ignore')
fwrite = open('input-ascii.csv', 'wb')
fwrite.write(mytext)

Thanks!

Answer 1

The post by recursive is probably right... the contents of the file are likely encoded with a multi-byte charset. If this is, in fact, the case you can likely read the file in python itself without having to convert it first outside of python.

Try something like:

fread = open('input.csv', 'rb').read()
mytext = fread.decode('utf-16')

The 'b' flag ensures the file is read as binary data. You'll need to know (or guess) the original encoding... in this example, I've used utf-16, but YMMV. This will convert the file to unicode. If you truly have a file with multi-byte chars, I don't recommend converting it to ascii as you may end up losing a lot of the characters in the process.

EDIT: Thanks for uploading the file. There are two bytes at the front of the file which indicates that it does, indeed, use a wide charset. If you're curious, open the file in a hex editor as some have suggested... you'll see something in the text version like 'ID|.' (etc). The dot is the extra byte for each char.

The code snippet above seems to work on my machine with that file.

Answer 2

The file is encoded in some unicode encoding, but you are reading it as ascii. Try to convert the file to ascii before using it in python.

Answer 3

Isn't csv a simple txt file with values separated with comma. Just try to open it with a text editor to see if the file is correctly formed.

Answer 4

要读取编码文件，只需使用codecs.open替换open codecs.open 。

fread = codecs.open('input.csv', 'r', 'utf-16')

Answer 5

这是一种快速简便的方法，特别是如果python不能正确解析输入

sed 's/ \(.\)/\1/g'

Answer 6

It did never ocurred to me, but as truppo said, it must be something wrong with the file.

Try to open the file in Excel/BrOffice Calc and Save As the file as Csv again.

If the problem persists, try a subset of the data: fist 10/last 10/intermediate 10 lines of the file.

Answer 7

Ok, I got it with the help of Jarret Hardie's post

this is the code that I used to convert the file to ascii

fread = open('input.csv', 'rb').read()
mytext = fread.decode('utf-16')
mytext = mytext.encode('ascii', 'ignore')
fwrite = open('input-ascii.csv', 'wb')
fwrite.write(mytext)

Thanks!

Answer 8

Open the file in binary mode, 'rb'. Check it in a HEX Editor and check for null padding '00'. Open the file in something like Scintilla Text Editor to check the characters present in the file.

python opens text file with a space between every character

Question

8 answers

solution1
17 ACCPTED 2009-03-02 17:36:55

solution2
7 2009-03-02 17:22:06

solution3
1 2009-03-02 17:18:54

solution4
1

solution5
0 2012-05-22 18:48:31

solution6
0 2009-03-02 17:15:28

solution7
0 2009-03-02 17:54:02

solution8
0 2009-03-02 17:55:56

python opens text file with a space between every character

Question

8 answers

solution1 17 ACCPTED 2009-03-02 17:36:55

solution2 7 2009-03-02 17:22:06

solution3 1 2009-03-02 17:18:54

solution4 1

solution5 0 2012-05-22 18:48:31

solution6 0 2009-03-02 17:15:28

solution7 0 2009-03-02 17:54:02

solution8 0 2009-03-02 17:55:56

solution1
17 ACCPTED 2009-03-02 17:36:55

solution2
7 2009-03-02 17:22:06

solution3
1 2009-03-02 17:18:54

solution4
1

solution5
0 2012-05-22 18:48:31

solution6
0 2009-03-02 17:15:28

solution7
0 2009-03-02 17:54:02

solution8
0 2009-03-02 17:55:56