[英]get first and last occurence of string in python groupby
I have a python dataframe with the following columns(Attendance data)我有一个带有以下列的 python 数据框(出勤数据)
Empcode T01 T01 T02 T03 T04
TranDate 10/09/2018 10/09/2018 10/09/2018 10/09/2018 10/09/2018
Trn Time 09.29 17.54 13.52 10.01 18.01
I want to get the first occurence of Trn Time as In Time and last occurence of TrnTime as Out time for a given Trandate and Empcode.对于给定的 Trandate 和 Empcode,我想将 Trn Time 的第一次出现作为 In Time 并将 TrnTime 的最后一次出现作为 Out 时间。
if there is only one record for the key the time should come in Out Time.如果该键只有一个记录,则时间应该在 Out Time 中。
g=df.groupby(['Empcode','TrnDate'])
print (pd.DataFrame({'In':g.TrnTime.nth(0),'out':g.TrnTime.nth(-1)}))
The above code works wherever there are 2 records for a Empcode and TranDate.上面的代码适用于 Empcode 和 TranDate 有 2 条记录的任何地方。
If there is a single record, it does not work.如果只有一个记录,则不起作用。
if there is only one record for the key the time should come in Out Time如果该键只有一个记录,则时间应该进入 Out Time
Then let it be so.那就让它这样吧。 Define a function that does exactly this and pass it to GroupBy.apply
:定义一个完全执行此操作的函数并将其传递给GroupBy.apply
:
def fnc(g):
res = {'Out': g.iat[-1]}
if len(g) > 1:
res['In'] = g.iat[0]
return res
dfres = df.groupby(['Empcode','TranDate'])['Trn Time'].apply(fnc).unstack()
print(dfres)
In Out
Empcode TranDate
T01 10/09/2018 09.29 17.54
T02 10/09/2018 NaN 13.52
T03 10/09/2018 NaN 10.01
T04 10/09/2018 NaN 18.01
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.