简体   繁体   English

熊猫:Groupby并在一个团体内切割

[英]Pandas: Groupby and cut within a group

I have a pandas dataframe which looks like this: 我有一个pandas数据框,如下所示:

userid   name       date
1           name1    2016-06-04
1           name2    2016-06-05
1           name3    2016-06-04
1           name1    2016-06-06
2           name23   2016-06-01
2           name2    2016-06-01
3           name1    2016-06-03
3           name6    2016-06-03
3           name12   2016-06-03
3           name65   2016-06-04

So, I want to retain only the rows of the users till the first date events, and cut the rest. 所以,我想只保留用户的行直到第一个日期事件,并切断其余部分。

The final df would be as follows: 最终的df如下:

userid   name       date
1           name1    2016-06-04
1           name2    2016-06-04
2           name23   2016-06-01
2           name2    2016-06-01
3           name1    2016-06-03
3           name6    2016-06-03
3           name12   2016-06-03



userid     int64
name      object
time      object

The type() of data points in the time column is a datetime.date 时间列中数据点的type()datetime.date

So, the tasks would involve grouping with respect to userid , sorting according to the date , then retaining only the rows with first(/earliest) date . 因此,任务将涉及grouping with respect to useridsorting according to the date ,然后retaining only the rows with first(/earliest) date

You can first sort DataFrame by column date by sort_values and then groupby with apply boolean indexing - get all rows where is first values: 你可以先排序DataFrame由列date通过sort_values然后groupbyapply boolean indexing -让所有行是第一值:

df = df.sort_values('date')
       .groupby('userid')
       .apply(lambda x: x[x.date == x.date.iloc[0]])
       .reset_index(drop=True)

print (df)
   userid    name       date
0       1   name1 2016-06-04
1       1   name3 2016-06-04
2       2  name23 2016-06-01
3       2   name2 2016-06-01
4       3   name1 2016-06-03
5       3   name6 2016-06-03
6       3  name12 2016-06-03

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM