Pandas 通过对日期列进行排序来使用 groupby 数据框添加列

Question

I have the following dataframe:我有以下数据框：

ID ID	Date日期
1 1	5/4/2021 8:17 5/4/2021 8:17
1 1	5/25/2021 6:20 2021/5/25 6:20
1 1	5/2/2021 22:15 5/2/2021 22:15
2 2	7/12/2021 2:20 2021/7/12 2:20
2 2	7/4/2021 21:28 2021/7/4 21:28
2 2
2 2

For the repeating IDs, i want to sort the date from old to latest and then add a new column which marks increment index for that ID based on the date.对于重复的 ID，我想将日期从旧到最新排序，然后添加一个新列，该列根据日期标记该 ID 的增量索引。 And if there is no date for any ID, just add the first index.如果没有任何 ID 的日期，只需添加第一个索引。 Following is how I want my new dataframe to look like.以下是我希望我的新数据框的外观。

ID ID	Date日期	Index指数
1 1	5/2/2021 22:15 5/2/2021 22:15	1 1
1 1	5/4/2021 8:17 5/4/2021 8:17	2 2
1 1	5/25/2021 6:20 2021/5/25 6:20	3 3
2 2	7/4/2021 2:20 2021/7/4 2:20	1 1
2 2	7/12/2021 21:28 2021/7/12 21:28	2 2
2 2		1 1
2 2		1 1

Answer 1

Use to_datetime with DataFrame.sort_values first and then GroupBy.cumcount with numpy.where for set 1 if missing values in Date :首先使用to_datetime和DataFrame.sort_values然后GroupBy.cumcount和numpy.where设置1如果Date缺少值：

df['Date'] = pd.to_datetime(df['Date'])
df = df.sort_values(['ID','Date'], ignore_index=True)

df['Index'] = np.where(df['Date'].notna(), df.groupby('ID').cumcount().add(1), 1)
print (df)
   ID                Date  Index
0   1 2021-05-02 22:15:00      1
1   1 2021-05-04 08:17:00      2
2   1 2021-05-25 06:20:00      3
3   2 2021-07-04 21:28:00      1
4   2 2021-07-12 02:20:00      2
5   2                 NaT      1
6   2                 NaT      1

Pandas 通过对日期列进行排序来使用 groupby 数据框添加列

问题描述

1 个解决方案

解决方案1
0 2021-11-04 06:03:33

Pandas 通过对日期列进行排序来使用 groupby 数据框添加列

问题描述

1 个解决方案

解决方案1 0 2021-11-04 06:03:33

解决方案1
0 2021-11-04 06:03:33