[英]pandas: replace one cell's value from mutiple row by one particular row based on other columns
[英]selection of one row based on other columns in python
我的csv
文件的一小部分类似于以下几行:
481116 ABCF3 466 0 ENSG00000161204 0
485921 ABCF3 466 0 ENSG00000161204 0
489719 ABCF3 466 0 ENSG00000161204 0
498136 ABCF3 466 2 ENSG00000161204 0.0019723866
273359 ABHD10 326 78 ENSG00000144827 0.0301158301
491580 ABHD10 326 0 ENSG00000144827 0
493784 ABHD10 326 0 ENSG00000144827 0
494817 ABHD10 326 1 ENSG00000144827 0.0012484395
在文件中","
这些列用","
分隔。 在第二列中,有很多重复的ID,我只想根据第六列中的值选择一个ID。 换句话说,对于每个ID,我想在第6列中选择编号最大的ID。上述部分的结果必须是这样的。
498136 ABCF3 466 2 ENSG00000161204 0.0019723866
273359 ABHD10 326 78 ENSG00000144827 0.0301158301
我试图用python制作它,并在以下框架中编写了一些代码,但是没有一个起作用:
with open('data.csv') as f, open('out.txt', 'w') as out:
line = [line.split(',')for line in f]
.
.
out.write(','.join(results))
you_data.csv:
481116,ABCF3, 466,0, ENSG00000161204,0
485921,ABCF3, 466,0, ENSG00000161204,0
489719,ABCF3, 466,0, ENSG00000161204,0
498136,ABCF3, 466,2, ENSG00000161204,0.0019723866
273359,ABHD10,326,78,ENSG00000144827,0.0301158301
491580,ABHD10,326,0, ENSG00000144827,0
493784,ABHD10,326,0, ENSG00000144827,0
494817,ABHD10,326,1, ENSG00000144827,0.0012484395
码:
import csv
from collections import defaultdict
with open('you_data.csv', newline='') as f, open('out.csv', 'w', newline='') as out:
f_reader = csv.reader(f)
out_writer = csv.writer(out)
d = defaultdict(list)
for line in f_reader:
d[line[1]].append(line)
for _,v in d.items():
new_line = sorted(v, key=lambda i:float(i[5]), reverse=True)[0]
out_writer.writerow(new_line)
out.csv:
498136,ABCF3, 466,2, ENSG00000161204,0.0019723866
273359,ABHD10,326,78,ENSG00000144827,0.0301158301
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.