Pandas：如何刪除需要滿足2個不同列中2個條件的行

Question

假設我有一個看起來像這樣的數據框。 如果所有名稱值都為空，我想刪除具有特定 ID 的所有內容。 就像在這個例子中一樣，ID 為 2 的行中的每個名稱值都丟失了。即使我有 100 行 ID 為 3 並且只有一個名稱值存在，我也想保留它。

ID	姓名
1個	鈉鹽
1個	香蕉
1個	鈉鹽
2個	鈉鹽
2個	鈉鹽
2個	鈉鹽
3個	蘋果
3個	鈉鹽

所以所需的 output 看起來像這樣：

ID	姓名
1個	鈉鹽
1個	香蕉
1個	鈉鹽
3個	蘋果
3個	鈉鹽

到目前為止我嘗試的一切都是錯誤的。 在這次嘗試中，我嘗試計算屬於 ID 的每個 NaN 值，但它仍然返回太多行。 這是我最接近我想要的結果。

df = df[(df['ID']) & (df['Name'].isna().sum()) != 0]

Answer 1

您希望從具有與行數一樣多的 NaN 的 ID 中排除行。 因此，您可以按 ID 分組並計算它們的行數和 NaN 數。

根據此結果，您可以從行數等於他們的 NaN 數的人那里獲取 ID，並將他們從原始 dataframe 中排除。

# Declare column that indicates if `Name` is NaN
df['isna'] = df['Name'].isna().astype(int)

# Declare a dataframe that counts the rows and NaNs per `ID`
counter = df.groupby('ID').agg({'Name':'size', 'isna':'sum'})

# Get ID's from people who have as many NaNs as they have rows
exclude = counter[counter['Name'] == counter['isna']].index.values

# Exclude these IDs from your data
df = df[~df['ID'].isin(exclude)]

Answer 2

使用.groupby和.query

ids = df.groupby(["ID", "Name"]).agg(Count=("Name", "count")).reset_index()["ID"].tolist()
df = df.query("ID.isin(@ids)").reset_index(drop=True)
print(df)

Output：

   ID    Name
0   1     NaN
1   1  Banana
2   1     NaN
3   3   Apple
4   3     NaN

Pandas：如何刪除需要滿足2個不同列中2個條件的行

問題描述

2 個解決方案

解決方案1
2 已采納 2022-12-08 18:48:21

解決方案2
1 2022-12-08 18:53:30

Pandas：如何刪除需要滿足2個不同列中2個條件的行

問題描述

2 個解決方案

解決方案1 2 已采納 2022-12-08 18:48:21

解決方案2 1 2022-12-08 18:53:30

解決方案1
2 已采納 2022-12-08 18:48:21

解決方案2
1 2022-12-08 18:53:30