[英]problem with using pandas to manipulate a big text file in python
我有一个类似于以下小示例的文本文件:
small example
:
0,1,2,3,4,5,6
chr1,144566,144597,30,chr1,120000,210000
chr1,154214,154245,34,chr1,120000,210000
chr1,228904,228935,11,chr1,210000,240000
chr1,233265,233297,13,chr1,210000,240000
chr1,233266,233297,58,chr1,210000,240000
chr1,235438,235469,36,chr1,210000,240000
chr1,262362,262393,16,chr1,240000,610000
chr1,347253,347284,12,chr1,240000,610000
chr1,387022,387053,38,chr1,240000,610000
我要删除第一行,而不是用comma separated
,而是使用tab separated
文件。 如预期的输出:
expected output
:
chr1 144566 144597 30 chr1 120000 210000
chr1 154214 154245 34 chr1 120000 210000
chr1 228904 228935 11 chr1 210000 240000
chr1 233265 233297 13 chr1 210000 240000
chr1 233266 233297 58 chr1 210000 240000
chr1 235438 235469 36 chr1 210000 240000
chr1 262362 262393 16 chr1 240000 610000
chr1 347253 347284 12 chr1 240000 610000
chr1 387022 387053 38 chr1 240000 610000
我想,在做python
用pandas
。 我写了这段代码,但没有返回我想要的。 您如何解决?
import pandas
file = open('myfile.txt', 'rb')
new =[]
for line in file:
new.append(line.split(','))
df = pd.DataFrame(new)
df.to_csv('outfile.txt', index=False)
import pandas as pd
df = pd.read_csv('myfile.txt', header=0)
df.to_csv('outfile.txt', sep='\t', index=None, header=False)
根据文件的大小,避免使用Pandas并使用基本的Python I / O可能是一个更有效的主意。 这样,您不必将整个文件读取到内存中,而是逐行读取并转储到带有制表符分隔的新文件中:
with open("myfile.txt", "r") as r:
with open("myfile2.txt", "w") as w:
for line in r:
w.write("\t".join(line.split(',')))
myfile2.txt
现在是制表符分隔的myfile.txt
版本。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.