简体   繁体   English

当 Pandas 的日期列中不包含特定日期时删除组

[英]Remove groups when specific date not contained in date column in Pandas

Given a dataframe as follows:给定一个 dataframe 如下:

  city district      date  price
0   bj       cy  2019/3/1    NaN
1   bj       cy  2019/4/1    6.0
2   sh       hp  2019/2/1    4.0
3   sh       hp  2019/3/1    4.0
4   bj       hd  2019/3/1    7.0
5   bj       hd  2019/4/1    NaN

How could I remove groups of city and date , if they didn't have entry of 2019/4/1 .如果他们没有2019/4/1的条目,我怎么能删除citydate组。

At this case, groups of sh and hp should be removed, since it only has data for 2019/2/1 and 2019/3/1 .在这种情况下,应该删除shhp组,因为它只有2019/2/12019/3/1的数据。

My desired output will like this:我想要的 output 会像这样:

  city district      date  price
0   bj       cy  2019/3/1    NaN
1   bj       cy  2019/4/1    6.0
2   bj       hd  2019/3/1    7.0
3   bj       hd  2019/4/1    NaN

Sincere thanks for your kind help.衷心感谢您的热心帮助。

Solution with DataFrameGroupBy.filter :使用DataFrameGroupBy.filter的解决方案:

df['date'] = pd.to_datetime(df['date'])

f = lambda x: x['date'].eq('2019-04-01').any()
df = df.groupby(['city','district']).filter(f)
print (df)
  city district       date  price
0   bj       cy 2019-03-01    NaN
1   bj       cy 2019-04-01    6.0
4   bj       hd 2019-03-01    7.0
5   bj       hd 2019-04-01    NaN

Faster solution with GroupBy.transform and GroupBy.any :使用GroupBy.transformGroupBy.any更快的解决方案:

df = (df[df.assign(t = df['date'].eq('2019-04-01'))
           .groupby(['city','district'])['t'].transform('any')])
print (df)
  city district       date  price
0   bj       cy 2019-03-01    NaN
1   bj       cy 2019-04-01    6.0
4   bj       hd 2019-03-01    7.0
5   bj       hd 2019-04-01    NaN

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM