Python 每组插入连续数字

Question

I have a dataframe below:我在下面有一个 dataframe：

         date ticker       NATR
0  2001-02-23    ABC   9.189955
1  2001-02-23    ADP   3.300756
2  2001-02-23   AGL1   4.443902
3  2001-02-24    ALD   7.733580
4  2001-02-24    ALL   8.217828
5  2001-02-24    ALQ   2.538381
6  2001-02-24    ALU  10.394890
7  2001-02-25    ALZ   4.970826
8  2001-02-25    AMC   4.173612
9  2001-02-25    AMP   4.012471
10 2001-02-25    ANN   8.280537
11 2001-02-26    ANZ   3.775175
12 2001-02-26    AOR   7.413381
13 2001-02-26    AQP   7.253565
14 2001-02-26    ART   4.439084
15 2001-02-26    ASX   5.089084
16 2001-02-26    AUN  51.088334
17 2001-02-27   AUT1  10.018372
18 2001-02-27    AWC   5.429162
19 2001-02-27    AWE  10.349716

I need to insert a points tally based on the smallest 'NATR' for each date.我需要根据每个日期的最小“NATR”插入一个点数。 The lowest 'NATR' for each date gets 1 point and consecutively increases based on the size of the list for each date.每个日期的最低“NATR”获得 1 分，并根据每个日期的列表大小连续增加。 For example:例如：

         date ticker       NATR Points
0  2001-02-23    ABC   9.189955 3 
1  2001-02-23    ADP   3.300756 1
2  2001-02-23   AGL1   4.443902 2
3  2001-02-24    ALD   7.733580 2
4  2001-02-24    ALL   8.217828 3
5  2001-02-24    ALQ   2.538381 1

I have tried the following code, which returns a value error:我尝试了以下代码，它返回一个值错误：

df.insert(loc=3, column='points',value=np.arange(len(df.groupby('date'))))

When I remove the df.groupby('date') the points are added for the entire length of the dataframe, not resetting for each date.当我删除df.groupby('date')时，会为 dataframe 的整个长度添加点，而不是为每个日期重置。

Answer 1

You can use groupby + rank :您可以使用groupby + rank ：

df['Points'] = df.groupby('date')['NATR'].rank(method='dense').astype(int)

          date ticker       NATR  Points
0   2001-02-23    ABC   9.189955       3
1   2001-02-23    ADP   3.300756       1
2   2001-02-23   AGL1   4.443902       2
3   2001-02-24    ALD   7.733580       2
4   2001-02-24    ALL   8.217828       3
5   2001-02-24    ALQ   2.538381       1
6   2001-02-24    ALU  10.394890       4
7   2001-02-25    ALZ   4.970826       3
8   2001-02-25    AMC   4.173612       2
9   2001-02-25    AMP   4.012471       1
10  2001-02-25    ANN   8.280537       4
11  2001-02-26    ANZ   3.775175       1
12  2001-02-26    AOR   7.413381       5
13  2001-02-26    AQP   7.253565       4
14  2001-02-26    ART   4.439084       2
15  2001-02-26    ASX   5.089084       3
16  2001-02-26    AUN  51.088334       6
17  2001-02-27   AUT1  10.018372       2
18  2001-02-27    AWC   5.429162       1
19  2001-02-27    AWE  10.349716       3

Answer 2

You can use cumcount :您可以使用cumcount ：

df = df.sort_values(['date', 'NATR'])
df['Points'] = df.groupby('date').cumcount() + 1
df
Out[1]: 
          date ticker              NATR  Points
1   2001-02-23    ADP          3.300756       1
2   2001-02-23   AGL1          4.443902       2
0   2001-02-23    ABC          9.189955       3
5   2001-02-24    ALQ          2.538381       1
3   2001-02-24    ALD           7.73358       2
4   2001-02-24    ALL 8.217827999999999       3
6   2001-02-24    ALU          10.39489       4
9   2001-02-25    AMP          4.012471       1
8   2001-02-25    AMC          4.173612       2
7   2001-02-25    ALZ 4.970826000000001       3
10  2001-02-25    ANN 8.280536999999999       4
11  2001-02-26    ANZ          3.775175       1
14  2001-02-26    ART 4.439083999999999       2
15  2001-02-26    ASX          5.089084       3
13  2001-02-26    AQP 7.253564999999999       4
12  2001-02-26    AOR 7.413380999999999       5
16  2001-02-26    AUN         51.088334       6
18  2001-02-27    AWC          5.429162       1
17  2001-02-27   AUT1         10.018372       2
19  2001-02-27    AWE         10.349716       3

From there if you want it sorted back, then do df = df.sort_index() .如果你想从那里重新排序，然后执行df = df.sort_index() 。 Rank answer is better though.排名答案虽然更好。

Python 每组插入连续数字

问题描述

2 个解决方案

解决方案1
4 2020-12-26 09:10:53

解决方案2
2 已采纳 2020-12-26 09:09:49

Python 每组插入连续数字

问题描述

2 个解决方案

解决方案1 4 2020-12-26 09:10:53

解决方案2 2 已采纳 2020-12-26 09:09:49

解决方案1
4 2020-12-26 09:10:53

解决方案2
2 已采纳 2020-12-26 09:09:49