[英]Create columns with first and last date from a unique ID
I have the dataset below and I would like to extract the first and last appearance date from that specific ID.我有下面的数据集,我想从该特定 ID 中提取第一次和最后一次出现日期。 The last and first appearance is what the result should be:最后和第一次出现的结果应该是:
id date last_apparence first_apparence
653777 2021-02-19 2021-02-19 2021-02-19
1873547 2021-02-19 2021-02-19 2021-02-19
657443 2021-02-19 2021-02-19 2021-02-19
653777 2021-02-20 2021-02-20 2021-02-19
So, for example, the ID 653777 shows up on the 19th and 20th, in this case, the first appearance would be 19th and the last appearance would be 20th.因此,例如,ID 653777 出现在 19 日和 20 日,在这种情况下,第一次出现是 19 日,最后一次出现是 20 日。 I tried to use我试着用
I tried to use this code below but get the same value for the entire column.我尝试在下面使用此代码,但为整个列获得相同的值。
df['latest_apparence'] = df['date'][df.index[-1]]
My last approach was to have a groupby but even trying a bunch of different groups, the closest thing I got was something similar to the countif formula from excel but then I don`t get the first/last date, only how many times the id shows in the dataset我的最后一种方法是有一个 groupby,但即使尝试了一堆不同的组,我得到的最接近的东西是类似于 excel 的 countif 公式,但是我没有得到第一个/最后一个日期,只有多少次 id显示在数据集中
df.groupby(['id'])[['date']].count()
Does anybody have any idea what is the best way to get this result?有谁知道获得此结果的最佳方法是什么?
thanks谢谢
Let us try groupby
with transform
让我们试试groupby
和transform
df['latest_apparence'] = df.groupby('id')['date'].transform('max')
df['first_apparence'] = df.groupby('id')['date'].transform('min')
Depending on how you want your data, you can do根据您想要数据的方式,您可以执行
dt = df.groupby('id')['date'].agg(['min','max']).rename(columns={'min':'first_date','max':'last_date'})
print(dt)
# or broadcasting as Beny suggested with
# AGG is aggregation of min and max
# df.groupby('id')['date'].transform(AGG)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.