简体   繁体   English

如何应用多个条件从数据帧(熊猫)中删除/选择特定行?

[英]How to apply multiple conditions to drop/select specific rows from a dataframe (pandas)?

I have the following dataframe:我有以下数据框:

    id outcome
0    3      no
1    3      no
2    3      no
3    3     yes
4    3      no
5    5      no
6    5      no
7    5     yes
8    5     yes
9    6      no
10   6      no
11   6     yes
12   6     yes
13   6     yes
14   6     yes
15   6     yes
16   6      no
17   6      no
18   6      no
19   7      no
20   7      no
21   7     yes
22   7     yes
23   7      no
24   7      no
25   7      no
26   7      yes

It is grouped based on id and is in ascending order for date.它根据 id 分组,并按日期升序排列。

There are a few conditions I want to satisfy.我想满足几个条件。

I want to remove a current row if the row after it has the same outcome.如果当前行之后的行具有相同的结果,我想删除当前行。

If a row is 'yes', then the next row must be the first 'no'.如果一行是“是”,那么下一行必须是第一个“否”。

Additionally, I also want to keep the last 'no' above a 'yes' (so there could be 2 'no' values above a 'yes': basically in a row of no's the first and last 'no's).此外,我还想将最后一个“否”保留在“是”上方(因此,“是”上方可能有 2 个“否”值:基本上是排在第一个和最后一个“否”的“否”行中)。

This is the desired outcome for the above dataframe:这是上述数据框的预期结果:

    id outcome
2    3      no
3    3     yes
4    3      no
6    5      no
8    5     yes
10   6      no
15   6     yes
16   6      no
20   7      no
22   7     yes
23   7      no
25   7      no
26   7      yes

At the moment I have created several masks like this:目前我已经创建了几个这样的面具:

df = pd.DataFrame(data={'id':[3,3,3,3,3,5,5,5,5,6,6,6,6,6,6,6,6,6,6,7,7,7,7,7], 
     'outcome': ['no','no','no','yes','no','no','no','yes','yes','no','no','yes','yes','yes','yes','yes','no','no','no', 'no', 'yes', 'no', 'no', 'yes']})


m1 = df['outcome'] # mask 1 is the outcome column as a dataframe 
m2 = m1.groupby(df['id']).shift(-1)  # grouped by dog_id and shifts values up (negative direction) by 1
m3 = m1.groupby(df['id']).shift().eq('yes')&m1.eq('no') # boolean check

df2 = df[~m1.eq(m2)|m3]
m4 = df2['outcome']
m5 = m4.groupby(df2['id']).shift()
df3 = df2[~m4.eq(m5)]

With the above, however, I do not keep the first and last 'no's above a 'yes'.但是,对于上述内容,我不会将第一个和最后一个“否”保留在“是”之上。

You are on the right way for this question你是这个问题的正确方法

g = df.groupby('id')['outcome']
cond1 = g.shift().eq('yes') | g.shift(-1).eq('yes')
out = df[cond1 & df.outcome.ne('yes') | (df.outcome.eq('yes') & g.shift().ne('yes') ) ]



    id outcome
2    3      no
3    3     yes
4    3      no
6    5      no
7    5     yes
10   6      no
11   6     yes
16   6      no
20   7      no
21   7     yes
23   7      no
25   7      no
26   7     yes

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM