I have the following dataframe:
ID | Date |
---|---|
1 | 5/4/2021 8:17 |
1 | 5/25/2021 6:20 |
1 | 5/2/2021 22:15 |
2 | 7/12/2021 2:20 |
2 | 7/4/2021 21:28 |
2 | |
2 |
For the repeating IDs, i want to sort the date from old to latest and then add a new column which marks increment index for that ID based on the date. And if there is no date for any ID, just add the first index. Following is how I want my new dataframe to look like.
ID | Date | Index |
---|---|---|
1 | 5/2/2021 22:15 | 1 |
1 | 5/4/2021 8:17 | 2 |
1 | 5/25/2021 6:20 | 3 |
2 | 7/4/2021 2:20 | 1 |
2 | 7/12/2021 21:28 | 2 |
2 | 1 | |
2 | 1 |
Use to_datetime
with DataFrame.sort_values
first and then GroupBy.cumcount
with numpy.where
for set 1
if missing values in Date
:
df['Date'] = pd.to_datetime(df['Date'])
df = df.sort_values(['ID','Date'], ignore_index=True)
df['Index'] = np.where(df['Date'].notna(), df.groupby('ID').cumcount().add(1), 1)
print (df)
ID Date Index
0 1 2021-05-02 22:15:00 1
1 1 2021-05-04 08:17:00 2
2 1 2021-05-25 06:20:00 3
3 2 2021-07-04 21:28:00 1
4 2 2021-07-12 02:20:00 2
5 2 NaT 1
6 2 NaT 1
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.