[英]Pandas select all rows from the recent group
I have df:我有df:
id date group
1 1.1 3
1 2.1 3
1 3.1 5
1 4.1 5
2 5.2 2
2 6.2 1
2 9.2 1
2 12.2 1
3 15.3 15
3 20.3 20
I want for each group to get all the rows from the recent date .我希望每个组都从最近的日期获取所有行。 that means that in this df id 2 is the recent group(according to the date column) so I want to filter to display only rows of id 2. So the desire output:
这意味着在这个 df id 2 中是最近的组(根据日期列),所以我想过滤以仅显示 id 2 的行。所以期望输出:
id date group
1 3.1 5
1 4.1 5
2 6.2 1
2 9.2 1
2 12.2 1
3 20.3 20
thanks谢谢
this needs 2 steps.这需要 2 个步骤。
df = pd.DataFrame(
data=np.array([[1,1.1,3],[1,2.1,3],[1,3.1,5],[1,4.1,5],[2,5.2,2],[2,6.2,1],[2,9.2,1],[2,12.2,1,],[3,15.3,15],[3,20.3,20]]),
columns=['id', 'date', 'group']
)
step 1. get last group per id步骤 1. 获取每个 id 的最后一组
I referred to the following address : Pandas dataframe get first row of each group我参考了以下地址: Pandas dataframe get first row of each group
#step 1
lastgroup = df.groupby('id').last()
lastgroup = lastgroup.reset_index()[['id', 'group']]
lastgroup is :最后一组是:
>>> lastgroup
id group
0 1.0 5.0
1 2.0 1.0
2 3.0 20.0
step 2. filter df by lastgroup by using pd.merge :步骤 2. 使用 pd.merge 按 lastgroup 过滤 df:
#step 2
result = pd.merge(left=df, right=lastgroup)
result may be结果可能是
>>> result
id date group
0 1.0 3.1 5.0
1 1.0 4.1 5.0
2 2.0 6.2 1.0
3 2.0 9.2 1.0
4 2.0 12.2 1.0
5 3.0 20.3 20.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.