[英]Pandas: Groupby and cut within a group
I have a pandas dataframe which looks like this: 我有一个pandas数据框,如下所示:
userid name date
1 name1 2016-06-04
1 name2 2016-06-05
1 name3 2016-06-04
1 name1 2016-06-06
2 name23 2016-06-01
2 name2 2016-06-01
3 name1 2016-06-03
3 name6 2016-06-03
3 name12 2016-06-03
3 name65 2016-06-04
So, I want to retain only the rows of the users till the first date events, and cut the rest. 所以,我想只保留用户的行直到第一个日期事件,并切断其余部分。
The final df would be as follows: 最终的df如下:
userid name date
1 name1 2016-06-04
1 name2 2016-06-04
2 name23 2016-06-01
2 name2 2016-06-01
3 name1 2016-06-03
3 name6 2016-06-03
3 name12 2016-06-03
userid int64
name object
time object
The type()
of data points in the time column is a datetime.date
时间列中数据点的
type()
是datetime.date
So, the tasks would involve grouping with respect to userid
, sorting according to the date
, then retaining only the rows with first(/earliest) date
. 因此,任务将涉及
grouping with respect to userid
, sorting according to the date
,然后retaining only the rows with first(/earliest) date
。
You can first sort DataFrame
by column date
by sort_values
and then groupby
with apply
boolean indexing
- get all rows where is first values: 你可以先排序
DataFrame
由列date
通过sort_values
然后groupby
与apply
boolean indexing
-让所有行是第一值:
df = df.sort_values('date')
.groupby('userid')
.apply(lambda x: x[x.date == x.date.iloc[0]])
.reset_index(drop=True)
print (df)
userid name date
0 1 name1 2016-06-04
1 1 name3 2016-06-04
2 2 name23 2016-06-01
3 2 name2 2016-06-01
4 3 name1 2016-06-03
5 3 name6 2016-06-03
6 3 name12 2016-06-03
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.