繁体   English   中英

TSV转换为CSV的Python

[英]TSV to CSV convert Python

我想将此file.tsv转换为csv,转换效果很好,但字段的分隔不是this.filev

protein1 protein2 neighborhood neighborhood_transferred fusion cooccurence homology coexpression coexpression_transferred experiments experiments_transferred database database_transferred textmining textmining_transferred combined_score
9606.ENSP00000003084 9606.ENSP00000301645 0 0 0 0 0 0 0 0 0 0 0 163 129 239

这是第一行结果文件。csv

"protein1 protein2 neighborhood neighborhood_transferred fusion cooccurence homology coexpression coexpression_transferred experiments experiments_transferred database database_transferred textmining textmining_transferred combined_score"
"9606.ENSP00000003084 9606.ENSP00000301645 0 0 0 0 0 0 0 0 0 0 0 163 129 239"

这是代码

import csv


print(csv.list_dialects())


with open('File.tsv', 'r', encoding='utf-8', newline='') as fin, \
     open('file2.csv', 'w', encoding='utf-8', newline='') as fout: 

     reader = csv.reader(fin, dialect='excel-tab')
     writer = csv.writer(fout, delimiter=' ')    

     for row in reader:
         writer.writerow(row)

问题是代码不使用空格分隔字段,而是将整个标头占用一行
理想的结果是分隔应该位于以下位置:我将逗号蛋白1,蛋白2,邻域,邻域转移,融合,同生同源性,共表达,共表达转移,实验实验转移,数据库,数据库转移,文本挖掘,textmining_transferred,combined_score 9606.ENSP000000030645 0,0,0,0,0,0,0,0,0,0,0,163,129,239

编辑:与OP交换评论后重写答案。

输入被指定为期望输入中的制表符作为分隔符:

reader = csv.reader(fin, dialect='excel-tab')

但是没有制表符,有空格,所以:

reader = csv.reader(fin, delimiter=' ')

请注意,这会将2个连续的空格视为两个分隔符,并且两个分隔符之间有一个空字段。 您不能像在Excel中那样指定忽略重复定界符

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM