Pandas 如何從一列創建重復列表，並且只保留對應列的最大值？

Question

我想在第一列Primary Mod Site中找到所有重復項，並且只保留數據集中所有化合物（列 BM）的最高值。 excel片材

對於代碼，我有：

#read desired excel file
df = pd.read_excel("20220825_CISLIB01_Plate-1_Rows-A-B")

#function to find the duplicates in the dataset, sections them, and remove them
#can be applied to any dataset with the same format as original excel files

def getDuplicate():
    gene = df["Primary Mod Site"]
    #creates a list of all of the duplicates in Primary Mod Site
    pd.concat(g for _, g in df.groupby("gene") if len(g) > 1)

我堅持下一步該做什么。 非常感謝幫助！

Answer 1

如果您將數據作為代碼或文本發布以允許復制，這將有所幫助。

但是，IIUC，您需要按“A”列分組，然后從列的 rest 中取最大值，這似乎可以解決問題

df["Primary Mod Site"].max()

Answer 2

根據我在屏幕截圖中注意到的內容（例如前 3 行），具有最高值的行往往在所有列中具有最高值，所以，這樣的事情可能會起作用。

 df = df.sort_values("ONCV-1-1-1", ascending = False).drop_duplicates("Primary Mod Site", keep='first', ignore_index=True)

或者如果不確定該觀察是否對所有行都正確。

可能這會起作用

df = df.groupby("Primary Mod Site").max()

注意：請發布一個可重現的示例，便於復制粘貼供我們測試。

Pandas 如何從一列創建重復列表，並且只保留對應列的最大值？

問題描述

2 個解決方案

解決方案1
0 2022-09-13 19:50:19

解決方案2
0 2022-09-13 19:52:49

Pandas 如何從一列創建重復列表，並且只保留對應列的最大值？

問題描述

2 個解決方案

解決方案1 0 2022-09-13 19:50:19

解決方案2 0 2022-09-13 19:52:49

解決方案1
0 2022-09-13 19:50:19

解決方案2
0 2022-09-13 19:52:49