[英]Pandas keep the latest rows for the same ID with some conditional column values
我想保留具有相同 ID 的最新行以及与某些列值匹配的行。 样本输入:
ID Timestamp Survey Outcome
12 11/26/2021 INCOMPLETE Survey
95 11/26/2021 INCOMPLETE Survey
95 11/27/2021 COMPLETE Survey
95 11/28/2021 RANG-But did not connect
12 11/29/2021 COMPLETE Survey
24 11/26/2021 RANG-But did not connect
24 11/27/2021 INCOMPLETE Survey
95 11/28/2021 RANG-But did not connect
24 11/28/2021 INCOMPLETE Survey
这里 ID 12 有两个值,所以我将保留最新的 (11/29/2021) 行。 但是对于 ID 95,一旦调查完成,它就不能有任何其他选项,例如rang-but did not connect 。 因此,我想保留最新的时间戳数据,并保留一旦数据完成调查但最新数据显示调查不完整或未连接的那些行(查看COMPLETE SURVEY后的所有数据)。
所以我的样品 output 将是:
ID Timestamp Survey Outcome
95 11/27/2021 COMPLETE Survey
95 11/28/2021 RANG-But did not connect
12 11/29/2021 COMPLETE Survey
95 11/28/2021 RANG-But did not connect
24 11/28/2021 INCOMPLETE Survey```
首先使用DataFrame.sort_values
按ID
和Timestamp
,然后在COMPLETE Survey
之后对所有值使用GroupBy.cummax
并添加最后一个与DataFrame.drop_duplicates
不匹配的ID
与isin
:
df['Timestamp'] = pd.to_datetime(df['Timestamp'])
df = df.sort_values(['ID','Timestamp'])
m = df['Survey Outcome'].eq('COMPLETE Survey')
df1 = df[m.groupby(df['ID']).cummax()]
df2 = df.drop_duplicates('ID', keep='last')
df = df1.append(df2[~df2['ID'].isin(df1['ID'])]).sort_index()
print (df)
ID Timestamp Survey Outcome
2 95 2021-11-27 COMPLETE Survey
3 95 2021-11-28 RANG-But did not connect
4 12 2021-11-29 COMPLETE Survey
7 95 2021-11-28 RANG-But did not connect
8 24 2021-11-28 INCOMPLETE Survey
您可以使用:
df['Timestamp'] = pd.to_datetime(df['Timestamp'])
df.sort_values(by=['ID', 'Timestamp']).reset_index(drop=True, inplace=True)
df = df.groupby('ID').apply(lambda x: x.loc[x[x['Survey Outcome'] == 'COMPLETE Survey'].index[0]: ] if
x['Survey Outcome'].isin(['COMPLETE Survey']).any() else x.loc[x['Timestamp'].idxmax():]).reset_index(drop=True)
print(df)
OUTPUT
ID Timestamp Survey Outcome
0 12 2021-11-29 COMPLETE Survey
1 24 2021-11-28 INCOMPLETE Survey
2 95 2021-11-27 COMPLETE Survey
3 95 2021-11-28 RANG-But did not connect
4 95 2021-11-28 RANG-But did not connect
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.