根据三列 pandas 过滤我的数据

Question

全部，

我对如何做到这一点感到困惑。

假设我有下表（我提供了一个只有 1 个 id 的片段，但我有很多 id）

      *id*         *status*                     *year*               
        2           active                         2018               
        2           active                         2019                  
        2           dissolved                      2019                
        2           dissolved                      2020 
        3           active                         2018               
        3           dissolved                      2019                  
        3           active                         2019                
        3           dissolved                      2020

我想对其进行过滤，以便如果 id 和 year 相同，则将 status = 的行移至溶解捐赠：

      *id*         *status*                     *year*               
        2           active                         2018                                
        2           dissolved                      2019                
        2           dissolved                      2020 
        3           active                         2018                               
        3           dissolved                      2019                
        3           dissolved                      2020

我努力了：

 df.sort_values(['id','year']).drop_duplicates(subset=['id', 'year'],keep='last')

但有时一家公司会再次从解散状态变为活跃状态，因此当我真的希望该客户在同一年解散状态时，我会获得活跃状态。 这就是为什么我想检测状态是否不同，如果是，请保留已溶解的状态。 IE 在哪里保持'最后'我怎么能基本上保持'溶解'状态。

我怎样才能做到这一点？

Answer 1

import pandas as pd
x = pd.DataFrame([(1,"active",'1994'),(1,"dissolved",'1994'),(1,"active",'1995'),(1,"dissolved",'1996'),(2,"active",'1996')],columns=('id','status','year'))
y=pd.DataFrame(columns =x.columns)

#it will remove all the dublicates
for a,b in x.groupby(["id","year"]):
    if(b["id"].count()>1):
        y =y.append(b[b["status"] =="a"],ignore_index =True)
    else:
        y=y.append(b,ignore_index =True)

#now you can do sorting
y.sort_values(["id","year"])

Answer 2

据我了解，您希望所有行具有相同的 ID 和年份以及状态 == 已解散。 尝试这个：

df[(df.id == df.year) & (df.status == 'dissolved')]

根据三列 pandas 过滤我的数据

问题描述

2 个解决方案

解决方案1
1 2020-08-07 19:40:23

解决方案2
0 2020-08-07 17:11:28

根据三列 pandas 过滤我的数据

问题描述

2 个解决方案

解决方案1 1 2020-08-07 19:40:23

解决方案2 0 2020-08-07 17:11:28

解决方案1
1 2020-08-07 19:40:23

解决方案2
0 2020-08-07 17:11:28