简体   繁体   English

python打开文本文件,每个字符之间有一个空格

[英]python opens text file with a space between every character

Whenever I try to open a .csv file with the python command fread = open('input.csv', 'r') it always opens the file with spaces between every single character. 每当我尝试使用python命令fread = open('input.csv', 'r')打开.csv文件时,它总是打开每个字符之间带有空格的文件。 I'm guessing it's something wrong with the text file because I can open other text files with the same command and they are loaded correctly. 我猜这是文本文件有问题,因为我可以使用相同的命令打开其他文本文件并正确加载它们。 Does anyone know why a text file would load like this in python? 有谁知道为什么文本文件会在python中加载?

Thanks. 谢谢。

Update 更新

Ok, I got it with the help of Jarret Hardie's post 好吧,我是在Jarret Hardie的帖子的帮助下得到的

this is the code that I used to convert the file to ascii 这是我用来将文件转换为ascii的代码

fread = open('input.csv', 'rb').read()
mytext = fread.decode('utf-16')
mytext = mytext.encode('ascii', 'ignore')
fwrite = open('input-ascii.csv', 'wb')
fwrite.write(mytext)

Thanks! 谢谢!

The post by recursive is probably right... the contents of the file are likely encoded with a multi-byte charset. 递归的帖子可能是正确的......文件的内容很可能用多字节字符集编码。 If this is, in fact, the case you can likely read the file in python itself without having to convert it first outside of python. 事实上,如果是这种情况,您可以在python本身中读取文件而无需先在python之外进行转换。

Try something like: 尝试类似的东西:

fread = open('input.csv', 'rb').read()
mytext = fread.decode('utf-16')

The 'b' flag ensures the file is read as binary data. 'b'标志确保将文件读取为二进制数据。 You'll need to know (or guess) the original encoding... in this example, I've used utf-16, but YMMV. 你需要知道(或猜测)原始编码...在这个例子中,我使用了utf-16,但是使用了YMMV。 This will convert the file to unicode. 这会将文件转换为unicode。 If you truly have a file with multi-byte chars, I don't recommend converting it to ascii as you may end up losing a lot of the characters in the process. 如果你真的有一个带有多字节字符的文件,我不建议将它转换为ascii,因为你最终可能会丢失很多字符。

EDIT: Thanks for uploading the file. 编辑:感谢您上传文件。 There are two bytes at the front of the file which indicates that it does, indeed, use a wide charset. 文件前面有两个字节,表示确实使用了宽字符集。 If you're curious, open the file in a hex editor as some have suggested... you'll see something in the text version like 'ID|.' 如果你很好奇,可以在十六进制编辑器中打开文件,就像有人建议的那样...你会在文本版本中看到像'ID |那样的东西。 (etc). (等等)。 The dot is the extra byte for each char. 点是每个char的额外字节。

The code snippet above seems to work on my machine with that file. 上面的代码片段似乎可以在我的机器上使用该文件。

The file is encoded in some unicode encoding, but you are reading it as ascii. 该文件以某种unicode编码进行编码,但您将其视为ascii。 Try to convert the file to ascii before using it in python. 尝试在python中使用之前将文件转换为ascii。

Isn't csv a simple txt file with values separated with comma. csv不是一个简单的txt文件,其值用逗号分隔。 Just try to open it with a text editor to see if the file is correctly formed. 只需尝试使用文本编辑器打开它,看看文件是否正确形成。

要读取编码文件,只需使用codecs.open替换open codecs.open

fread = codecs.open('input.csv', 'r', 'utf-16')

这是一种快速简便的方法,特别是如果python不能正确解析输入

sed 's/ \(.\)/\1/g'

It did never ocurred to me, but as truppo said, it must be something wrong with the file. 它从来没有发生在我身上,但正如truppo所说,文件肯定有问题。

Try to open the file in Excel/BrOffice Calc and Save As the file as Csv again. 尝试在Excel / BrOffice Calc中打开文件,再次将文件另存为Csv。

If the problem persists, try a subset of the data: fist 10/last 10/intermediate 10 lines of the file. 如果问题仍然存在,请尝试数据的子集:文件的第10个/最后10个/中间10行。

Ok, I got it with the help of Jarret Hardie's post 好吧,我是在Jarret Hardie的帖子的帮助下得到的

this is the code that I used to convert the file to ascii 这是我用来将文件转换为ascii的代码

fread = open('input.csv', 'rb').read()
mytext = fread.decode('utf-16')
mytext = mytext.encode('ascii', 'ignore')
fwrite = open('input-ascii.csv', 'wb')
fwrite.write(mytext)

Thanks! 谢谢!

Open the file in binary mode, 'rb'. 以二进制模式'rb'打开文件。 Check it in a HEX Editor and check for null padding '00'. 在HEX编辑器中检查它并检查空填充“00”。 Open the file in something like Scintilla Text Editor to check the characters present in the file. 用Scintilla Text Editor之类的文件打开文件,检查文件中的字符。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM