[英]Remove groups when specific date not contained in date column in Pandas
Given a dataframe as follows:给定一个 dataframe 如下:
city district date price
0 bj cy 2019/3/1 NaN
1 bj cy 2019/4/1 6.0
2 sh hp 2019/2/1 4.0
3 sh hp 2019/3/1 4.0
4 bj hd 2019/3/1 7.0
5 bj hd 2019/4/1 NaN
How could I remove groups of city
and date
, if they didn't have entry of 2019/4/1
.如果他们没有2019/4/1
的条目,我怎么能删除city
和date
组。
At this case, groups of sh
and hp
should be removed, since it only has data for 2019/2/1
and 2019/3/1
.在这种情况下,应该删除sh
和hp
组,因为它只有2019/2/1
和2019/3/1
的数据。
My desired output will like this:我想要的 output 会像这样:
city district date price
0 bj cy 2019/3/1 NaN
1 bj cy 2019/4/1 6.0
2 bj hd 2019/3/1 7.0
3 bj hd 2019/4/1 NaN
Sincere thanks for your kind help.衷心感谢您的热心帮助。
Solution with DataFrameGroupBy.filter
:使用DataFrameGroupBy.filter
的解决方案:
df['date'] = pd.to_datetime(df['date'])
f = lambda x: x['date'].eq('2019-04-01').any()
df = df.groupby(['city','district']).filter(f)
print (df)
city district date price
0 bj cy 2019-03-01 NaN
1 bj cy 2019-04-01 6.0
4 bj hd 2019-03-01 7.0
5 bj hd 2019-04-01 NaN
Faster solution with GroupBy.transform
and GroupBy.any
:使用GroupBy.transform
和GroupBy.any
更快的解决方案:
df = (df[df.assign(t = df['date'].eq('2019-04-01'))
.groupby(['city','district'])['t'].transform('any')])
print (df)
city district date price
0 bj cy 2019-03-01 NaN
1 bj cy 2019-04-01 6.0
4 bj hd 2019-03-01 7.0
5 bj hd 2019-04-01 NaN
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.