[英]Python inserting consecutive numbers per group
I have a dataframe below:我在下面有一个 dataframe:
date ticker NATR
0 2001-02-23 ABC 9.189955
1 2001-02-23 ADP 3.300756
2 2001-02-23 AGL1 4.443902
3 2001-02-24 ALD 7.733580
4 2001-02-24 ALL 8.217828
5 2001-02-24 ALQ 2.538381
6 2001-02-24 ALU 10.394890
7 2001-02-25 ALZ 4.970826
8 2001-02-25 AMC 4.173612
9 2001-02-25 AMP 4.012471
10 2001-02-25 ANN 8.280537
11 2001-02-26 ANZ 3.775175
12 2001-02-26 AOR 7.413381
13 2001-02-26 AQP 7.253565
14 2001-02-26 ART 4.439084
15 2001-02-26 ASX 5.089084
16 2001-02-26 AUN 51.088334
17 2001-02-27 AUT1 10.018372
18 2001-02-27 AWC 5.429162
19 2001-02-27 AWE 10.349716
I need to insert a points tally based on the smallest 'NATR' for each date.我需要根据每个日期的最小“NATR”插入一个点数。 The lowest 'NATR' for each date gets 1 point and consecutively increases based on the size of the list for each date.每个日期的最低“NATR”获得 1 分,并根据每个日期的列表大小连续增加。 For example:例如:
date ticker NATR Points
0 2001-02-23 ABC 9.189955 3
1 2001-02-23 ADP 3.300756 1
2 2001-02-23 AGL1 4.443902 2
3 2001-02-24 ALD 7.733580 2
4 2001-02-24 ALL 8.217828 3
5 2001-02-24 ALQ 2.538381 1
I have tried the following code, which returns a value error:我尝试了以下代码,它返回一个值错误:
df.insert(loc=3, column='points',value=np.arange(len(df.groupby('date'))))
When I remove the df.groupby('date')
the points are added for the entire length of the dataframe, not resetting for each date.当我删除df.groupby('date')
时,会为 dataframe 的整个长度添加点,而不是为每个日期重置。
You can use groupby
+ rank
:您可以使用groupby
+ rank
:
df['Points'] = df.groupby('date')['NATR'].rank(method='dense').astype(int)
date ticker NATR Points
0 2001-02-23 ABC 9.189955 3
1 2001-02-23 ADP 3.300756 1
2 2001-02-23 AGL1 4.443902 2
3 2001-02-24 ALD 7.733580 2
4 2001-02-24 ALL 8.217828 3
5 2001-02-24 ALQ 2.538381 1
6 2001-02-24 ALU 10.394890 4
7 2001-02-25 ALZ 4.970826 3
8 2001-02-25 AMC 4.173612 2
9 2001-02-25 AMP 4.012471 1
10 2001-02-25 ANN 8.280537 4
11 2001-02-26 ANZ 3.775175 1
12 2001-02-26 AOR 7.413381 5
13 2001-02-26 AQP 7.253565 4
14 2001-02-26 ART 4.439084 2
15 2001-02-26 ASX 5.089084 3
16 2001-02-26 AUN 51.088334 6
17 2001-02-27 AUT1 10.018372 2
18 2001-02-27 AWC 5.429162 1
19 2001-02-27 AWE 10.349716 3
You can use cumcount
:您可以使用cumcount
:
df = df.sort_values(['date', 'NATR'])
df['Points'] = df.groupby('date').cumcount() + 1
df
Out[1]:
date ticker NATR Points
1 2001-02-23 ADP 3.300756 1
2 2001-02-23 AGL1 4.443902 2
0 2001-02-23 ABC 9.189955 3
5 2001-02-24 ALQ 2.538381 1
3 2001-02-24 ALD 7.73358 2
4 2001-02-24 ALL 8.217827999999999 3
6 2001-02-24 ALU 10.39489 4
9 2001-02-25 AMP 4.012471 1
8 2001-02-25 AMC 4.173612 2
7 2001-02-25 ALZ 4.970826000000001 3
10 2001-02-25 ANN 8.280536999999999 4
11 2001-02-26 ANZ 3.775175 1
14 2001-02-26 ART 4.439083999999999 2
15 2001-02-26 ASX 5.089084 3
13 2001-02-26 AQP 7.253564999999999 4
12 2001-02-26 AOR 7.413380999999999 5
16 2001-02-26 AUN 51.088334 6
18 2001-02-27 AWC 5.429162 1
17 2001-02-27 AUT1 10.018372 2
19 2001-02-27 AWE 10.349716 3
From there if you want it sorted back, then do df = df.sort_index()
.如果你想从那里重新排序,然后执行df = df.sort_index()
。 Rank answer is better though.排名答案虽然更好。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.