简体   繁体   English

根据python中的其他列选择一行

[英]selection of one row based on other columns in python

small part of my csv file is like the following lines: 我的csv文件的一小部分类似于以下几行:

481116  ABCF3   466 0   ENSG00000161204 0
485921  ABCF3   466 0   ENSG00000161204 0
489719  ABCF3   466 0   ENSG00000161204 0
498136  ABCF3   466 2   ENSG00000161204 0.0019723866
273359  ABHD10  326 78  ENSG00000144827 0.0301158301
491580  ABHD10  326 0   ENSG00000144827 0
493784  ABHD10  326 0   ENSG00000144827 0
494817  ABHD10  326 1   ENSG00000144827 0.0012484395

the columns are separated by "," in the file. 在文件中","这些列用","分隔。 in the 2nd column there are many repeated ids and I would like to select only one of the ids based the values in the 6th column. 在第二列中,有很多重复的ID,我只想根据第六列中的值选择一个ID。 in other word, for each id I want to choose the one with the highest number in the column 6. the results for the mentioned part, must be like this. 换句话说,对于每个ID,我想在第6列中选择编号最大的ID。上述部分的结果必须是这样的。

498136  ABCF3   466 2   ENSG00000161204 0.0019723866
273359  ABHD10  326 78  ENSG00000144827 0.0301158301

I have tried to make it in python and wrote some pieces of codes in the following framework but non of them worked: 我试图用python制作它,并在以下框架中编写了一些代码,但是没有一个起作用:

with open('data.csv') as f, open('out.txt', 'w') as out:
    line = [line.split(',')for line in f]
    .
    .
    out.write(','.join(results))

you_data.csv: you_data.csv:

481116,ABCF3, 466,0, ENSG00000161204,0
485921,ABCF3, 466,0, ENSG00000161204,0
489719,ABCF3, 466,0, ENSG00000161204,0
498136,ABCF3, 466,2, ENSG00000161204,0.0019723866
273359,ABHD10,326,78,ENSG00000144827,0.0301158301
491580,ABHD10,326,0, ENSG00000144827,0
493784,ABHD10,326,0, ENSG00000144827,0
494817,ABHD10,326,1, ENSG00000144827,0.0012484395  

code: 码:

import csv
from collections import defaultdict

with open('you_data.csv', newline='') as f, open('out.csv', 'w', newline='') as out:
    f_reader = csv.reader(f)
    out_writer = csv.writer(out)
    d = defaultdict(list)
    for line in f_reader:
        d[line[1]].append(line)
    for _,v in d.items():
        new_line = sorted(v, key=lambda i:float(i[5]), reverse=True)[0]
        out_writer.writerow(new_line)

out.csv: out.csv:

498136,ABCF3, 466,2, ENSG00000161204,0.0019723866
273359,ABHD10,326,78,ENSG00000144827,0.0301158301

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 pandas:根据其他列将多行中一个单元格的值替换为一个特定行 - pandas: replace one cell's value from mutiple row by one particular row based on other columns 在Python中循环:根据其他列中的值修改一列 - Looping in Python: modify one column based on values in other columns 在 python 中使用 XGBoost 根据其他列预测一列 - Predict one column based on the other columns with XGBoost in python 基于一列合并 dataframe 并对其他列求和 - Python - merge a dataframe based on one column and summing the other columns - Python 根据 Python 中的分类将单行转换为多列 - Convert One Single row to Multiple Columns based on Categorization in Python 如何根据其他列中的数据替换python大熊猫中的某些值? - How do you replace certain values from row to row in python pandas based on data in other columns? Python - 根据其他列中的值和位于同一列第一行的值乘以列 - Python - multiply columns based on values in other column, and values located in the first row of the same columns Select 来自一个 SQLite 表,基于使用 Python 从多个其他表中进行条件选择 - Select from one SQLite table based on conditioned selection from multiple other tables using Python 根据Python Pandas中的其他列对列进行分组 - Group columns based on other columns in Python Pandas 根据Pandas中的特定条件将值从一列复制到同一行中的其他列 - Copy values from one column to other columns in same row based on specific criteria in Pandas
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM