[英]TSV to CSV convert Python
我想将此file.tsv转换为csv,转换效果很好,但字段的分隔不是this.filev
protein1 protein2 neighborhood neighborhood_transferred fusion cooccurence homology coexpression coexpression_transferred experiments experiments_transferred database database_transferred textmining textmining_transferred combined_score
9606.ENSP00000003084 9606.ENSP00000301645 0 0 0 0 0 0 0 0 0 0 0 163 129 239
这是第一行结果文件。csv
"protein1 protein2 neighborhood neighborhood_transferred fusion cooccurence homology coexpression coexpression_transferred experiments experiments_transferred database database_transferred textmining textmining_transferred combined_score"
"9606.ENSP00000003084 9606.ENSP00000301645 0 0 0 0 0 0 0 0 0 0 0 163 129 239"
这是代码
import csv
print(csv.list_dialects())
with open('File.tsv', 'r', encoding='utf-8', newline='') as fin, \
open('file2.csv', 'w', encoding='utf-8', newline='') as fout:
reader = csv.reader(fin, dialect='excel-tab')
writer = csv.writer(fout, delimiter=' ')
for row in reader:
writer.writerow(row)
问题是代码不使用空格分隔字段,而是将整个标头占用一行
理想的结果是分隔应该位于以下位置:我将逗号蛋白1,蛋白2,邻域,邻域转移,融合,同生同源性,共表达,共表达转移,实验实验转移,数据库,数据库转移,文本挖掘,textmining_transferred,combined_score 9606.ENSP000000030645 0,0,0,0,0,0,0,0,0,0,0,163,129,239
编辑:与OP交换评论后重写答案。
输入被指定为期望输入中的制表符作为分隔符:
reader = csv.reader(fin, dialect='excel-tab')
但是没有制表符,有空格,所以:
reader = csv.reader(fin, delimiter=' ')
请注意,这会将2个连续的空格视为两个分隔符,并且两个分隔符之间有一个空字段。 您不能像在Excel中那样指定忽略重复定界符 。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.