如何從滿足多個條件的數據框中刪除特定行（python pandas）？

Question

我有以下數據幀：

    id outcome
0    3      no
1    3      no
2    3      no
3    3     yes
4    3      no
5    5      no
6    5      no
7    5     yes
8    5     yes
9    6      no
10   6      no
11   6     yes
12   6     yes
13   6     yes
14   6     yes
15   6     yes
16   6      no
17   6      no
18   6      no
19   7      no
20   7      no
21   7     yes
22   7     yes
23   7      no
24   7      no
25   7      no
26   7      yes

它是根據 id 分組的。

我需要滿足幾個條件。

如果當前行之后的行具有相同的結果，我需要刪除當前行。

如果一行是“是”，那么下一行必須是第一個“否”。

我還必須在“是”序列中包含最后一個“是”行。

此外，我還希望將最后一個“否”保持在“是”上方（因此“是”上方可能有 2 個“否”值：基本上是排在第一個和最后一個“否”的“否”行中）。

然后我需要刪除作為最后一行的任何“是”行。

最后，如果 'id' 列只有一個 'no' 行，那么它也必須被刪除。

這應該是輸出。

    id outcome

2    3      no
3    3     yes
4    3      no
10   6      no
15   6     yes
16   6      no
20   7      no
22   7     yes
23   7      no
25   7      no
26   7      yes

我目前正在這樣做：

df = pd.DataFrame(data={'id':[3,3,3,3,3,5,5,5,5,6,6,6,6,6,6,6,6,6,6,7,7,7,7,7], 
     'outcome': ['no','no','no','yes','no','no','no','yes','yes','no','no','yes','yes','yes','yes','yes','no','no','no', 'no', 'yes', 'no', 'no', 'yes']})



# part 1
g = df.groupby('id')['outcome']
m1 = g.shift().eq('yes') | g.shift(-1).eq('yes')
df = df[m1 & df.outcome.ne('yes') | (df.outcome.eq('yes') & g.shift().ne('yes') ) ]

# part 2
# The following removes any last rows that are a 'yes' per id
m2 = df.groupby(['id'])['outcome'].tail(1) != 'no'
df = df.drop(m2[m2].index)

#part 3
# The following removes any id counts that are one, as the last row 'yes' values should be removed, this would mean only 'no' rows are leftover
df_count = df.groupby(['id'])['outcome'].count().to_frame('count').reset_index() 
df = pd.merge(df, df_count[['id','count']] , on=['id'], how='inner') 
df = df.drop((df[df.count == 1].index))

但是，第 1 部分保留了第一個“是”值行，而不是我需要的最后一個“是”。

我也不確定第 2 部分和第 3 部分是否過於冗長，以及我是否可以做一些更精簡的事情來滿足上述所有條件。

Answer 1

所以我在我的代碼的第 1 部分中使用了它，但它效率不高，並且需要很長時間才能運行。 這與快速執行（秒）的替代（但稍微不正確）答案相比。

值得一提的是，我在 70k 行上運行此代碼。


index_to_remove = list()
data = df.groupby('id')['outcome'].apply(list).to_dict()

count = 0
for key,value in data.items():
    for i in range(len(value)):
        if i == len(value)-1:
            count =count + 1
            continue
        if data[key][i] == "yes" and data[key][i+1] == "yes":
            index_to_remove.append(count)
            count =count + 1
            continue
        if i == 0 and data[key][i]=="no" and data[key][i+1] == "no":
            index_to_remove.append(count)
            count =count + 1
            continue
        elif data[key][i] == "no" and i == len(value) - 3:
            if data[key][i+1] == "no" and data[key][i+2] == "no":
                index_to_remove.append(count)
                count =count + 1
                continue
                
        elif data[key][i] == "no" and i == len(value)-2:
            if data[key][i+1] == "no":
                index_to_remove.append(count)
                count =count + 1
                continue
        elif data[key][i] == "no" and i == len(value)-1:
            count =count + 1
            continue
        elif data[key][i] == "no" and data[key][i+1] == "no":
            if data[key][i-1] == "no" or data[key][i+2] == "no":
                index_to_remove.append(count)
                count =count + 1
                continue
            
        
        count = count + 1

for index in index_to_remove:
    df = df.drop(index)

print(df)

如何從滿足多個條件的數據框中刪除特定行（python pandas）？

問題描述

1 個解決方案

解決方案1
0 2021-11-15 19:21:40

如何從滿足多個條件的數據框中刪除特定行（python pandas）？

問題描述

1 個解決方案

解決方案1 0 2021-11-15 19:21:40

解決方案1
0 2021-11-15 19:21:40