I have a dataframe called df
that looks similar to this (except the number of entries in the Visit Date
column associated with each Client ID
goes up to 74, and there are several hundred unique Client IDs
- I have simplified it here).
Visit Date Client ID
2016-05-25 C1009404
2016-06-30 C1009404
2016-07-14 C1009404
2016-07-20 C1009405
2016-08-03 C1009405
2016-08-08 C1009405
2016-08-10 C1009405
2016-08-15 C1009406
2016-08-17 C1009406
2016-08-24 C1009406
I want to convert it from long-to-wide, such that it looks like this:
Client ID Visit_1 Visit_2 Visit_3 Visit_4
C1009404 2016-05-25 2016-06-30 2016-07-14
C1009405 2016-07-20 2016-08-03 2016-08-08 2016-08-10
C1009406 2016-08-15 2016-08-17 2016-08-24
I have tried the following code:
df_wide = df.groupby(['Client ID'], as_index=False).agg(lambda x: ', '.join(set(x.astype(str))))
df_wide = pd.concat([df_wide[['Client ID','ENROLLED_DT']], df_wide['VISIT_DT'].str.split(',', expand=True)], axis=1)
df_wide = df_wide.rename(columns={0: 'Visit_1', 1: 'Visit_2', 2: 'Visit_3', 3: 'Visit_4'})
It produces the desired result, but the dates are no longer in order. How do I do this but keep the dates in order, ascending from left to right?
You may need create another key for helping the pivot
df.assign(key=df.groupby('ClientID').cumcount()+1).\
pivot('ClientID','key','VisitDate').\
fillna('').\
add_prefix('Visit_')
Out[152]:
key Visit_1 Visit_2 Visit_3 Visit_4
ClientID
C10094042 2016-05-25 2016-06-30 2016-07-14
C10094056 2016-07-20 2016-08-03 2016-08-08 2016-08-10
C10094061 2016-08-15 2016-08-17 2016-08-24
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.