[英]Pandas add column using groupby dataframe by sorting date column
I have the following dataframe:我有以下数据框:
ID ![]() |
Date![]() |
---|---|
1 ![]() |
5/4/2021 8:17 ![]() |
1 ![]() |
5/25/2021 6:20 ![]() |
1 ![]() |
5/2/2021 22:15 ![]() |
2 ![]() |
7/12/2021 2:20 ![]() |
2 ![]() |
7/4/2021 21:28 ![]() |
2 ![]() |
|
2 ![]() |
For the repeating IDs, i want to sort the date from old to latest and then add a new column which marks increment index for that ID based on the date.对于重复的 ID,我想将日期从旧到最新排序,然后添加一个新列,该列根据日期标记该 ID 的增量索引。 And if there is no date for any ID, just add the first index.
如果没有任何 ID 的日期,只需添加第一个索引。 Following is how I want my new dataframe to look like.
以下是我希望我的新数据框的外观。
ID ![]() |
Date![]() |
Index![]() |
---|---|---|
1 ![]() |
5/2/2021 22:15 ![]() |
1 ![]() |
1 ![]() |
5/4/2021 8:17 ![]() |
2 ![]() |
1 ![]() |
5/25/2021 6:20 ![]() |
3 ![]() |
2 ![]() |
7/4/2021 2:20 ![]() |
1 ![]() |
2 ![]() |
7/12/2021 21:28 ![]() |
2 ![]() |
2 ![]() |
1 ![]() |
|
2 ![]() |
1 ![]() |
Use to_datetime
with DataFrame.sort_values
first and then GroupBy.cumcount
with numpy.where
for set 1
if missing values in Date
:首先使用
to_datetime
和DataFrame.sort_values
然后GroupBy.cumcount
和numpy.where
设置1
如果Date
缺少值:
df['Date'] = pd.to_datetime(df['Date'])
df = df.sort_values(['ID','Date'], ignore_index=True)
df['Index'] = np.where(df['Date'].notna(), df.groupby('ID').cumcount().add(1), 1)
print (df)
ID Date Index
0 1 2021-05-02 22:15:00 1
1 1 2021-05-04 08:17:00 2
2 1 2021-05-25 06:20:00 3
3 2 2021-07-04 21:28:00 1
4 2 2021-07-12 02:20:00 2
5 2 NaT 1
6 2 NaT 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.