Pandas 如何从一列创建重复列表，并且只保留对应列的最大值？

Question

我想在第一列Primary Mod Site中找到所有重复项，并且只保留数据集中所有化合物（列 BM）的最高值。 excel片材

对于代码，我有：

#read desired excel file
df = pd.read_excel("20220825_CISLIB01_Plate-1_Rows-A-B")

#function to find the duplicates in the dataset, sections them, and remove them
#can be applied to any dataset with the same format as original excel files

def getDuplicate():
    gene = df["Primary Mod Site"]
    #creates a list of all of the duplicates in Primary Mod Site
    pd.concat(g for _, g in df.groupby("gene") if len(g) > 1)

我坚持下一步该做什么。 非常感谢帮助！

Answer 1

如果您将数据作为代码或文本发布以允许复制，这将有所帮助。

但是，IIUC，您需要按“A”列分组，然后从列的 rest 中取最大值，这似乎可以解决问题

df["Primary Mod Site"].max()

Answer 2

根据我在屏幕截图中注意到的内容（例如前 3 行），具有最高值的行往往在所有列中具有最高值，所以，这样的事情可能会起作用。

 df = df.sort_values("ONCV-1-1-1", ascending = False).drop_duplicates("Primary Mod Site", keep='first', ignore_index=True)

或者如果不确定该观察是否对所有行都正确。

可能这会起作用

df = df.groupby("Primary Mod Site").max()

注意：请发布一个可重现的示例，便于复制粘贴供我们测试。

Pandas 如何从一列创建重复列表，并且只保留对应列的最大值？

问题描述

2 个解决方案

解决方案1
0 2022-09-13 19:50:19

解决方案2
0 2022-09-13 19:52:49

Pandas 如何从一列创建重复列表，并且只保留对应列的最大值？

问题描述

2 个解决方案

解决方案1 0 2022-09-13 19:50:19

解决方案2 0 2022-09-13 19:52:49

解决方案1
0 2022-09-13 19:50:19

解决方案2
0 2022-09-13 19:52:49