[英]Add a ranking ordered column to a Pandas Dataframe
I know this question may seem trivial, but I can't find the solution anywhere. 我知道这个问题看似微不足道,但我找不到任何解决方案。 I have a really large pandas dataframe df
that looks something like this: 我有一个很大的pandas dataframe df
,看起来像这样:
conference IF2013 AR2013
0 HOTMOBILE 16.333333 31.50
1 FOGA 13.772727 60.00
2 IEA/AIE 10.433735 28.20
3 IEEE Real-Time and Embedded Technology and App... 10.250000 29.00
4 Symposium on Computational Geometry 9.880342 35.00
5 WISA 9.693878 43.60
6 ICMT 8.750000 22.00
7 Haskell 8.703704 39.00
I would like to add an extra column at the end that orders it 1,2,3,4, etc. So it looks like this: 我想在末尾添加一个额外的列,以对其进行排序1、2、3、4等。因此看起来像这样:
conference IF2013 AR2013 Ranking
0 HOTMOBILE 16.333333 31.50 1
1 FOGA 13.772727 60.00 2
2 IEA/AIE 10.433735 28.20 3
3 IEEE Real-Time and Embedded Technology and App... 10.250000 29.00 4
I can't seem to figure out how to add a filled extra column that just put a Series of consecutive numbers. 我似乎无法弄清楚如何添加仅填充一系列连续数字的填充的额外列。
I guess you are looking for the rank
function: 我想您正在寻找rank
函数:
df['rank'] = df['IF2013'].rank()
This way your result will be independant of the index. 这样,您的结果将与索引无关。
You could add column with range
: 您可以添加具有range
列:
df['Ranking'] = range(1, len(df) + 1)
Example: 例:
import pandas as pd
from io import StringIO
data = """
conference IF2013 AR2013
HOTMOBILE 16.333333 31.50
FOGA 13.772727 60.00
IEA/AIE 10.433735 28.20
IEEE Real-Time and Embedded Technology and App... 10.250000 29.00
Symposium on Computational Geometry 9.880342 35.00
WISA 9.693878 43.60
ICMT 8.750000 22.00
Haskell 8.703704 39.00
"""
df = pd.read_csv(StringIO(data), sep=' \s+')
df['Ranking'] = range(1, len(df) + 1)
In [183]: df
Out[183]:
conference IF2013 AR2013 Ranking
0 HOTMOBILE 16.333333 31.5 1
1 FOGA 13.772727 60.0 2
2 IEA/AIE 10.433735 28.2 3
3 IEEE Real-Time and Embedded Technology and App... 10.250000 29.0 4
4 Symposium on Computational Geometry 9.880342 35.0 5
5 WISA 9.693878 43.6 6
6 ICMT 8.750000 22.0 7
7 Haskell 8.703704 39.0 8
EDIT 编辑
Benchmarking: 基准测试:
In [202]: %timeit df['rank'] = range(1, len(df) + 1)
10000 loops, best of 3: 127 us per loop
In [203]: %timeit df['rank'] = df.AR2013.rank(ascending=False)
1000 loops, best of 3: 248 us per loop
You can try: 你可以试试:
df['rank'] = df.index + 1
print df
# conference IF2013 AR2013 rank
#0 HOTMOBILE 16.333333 31.5 1
#1 FOGA 13.772727 60.0 2
#2 IEA/AIE 10.433735 28.2 3
#3 IEEE Real-Time and Embedded Technology and App... 10.250000 29.0 4
#4 Symposium on Computational Geometry 9.880342 35.0 5
#5 WISA 9.693878 43.6 6
#6 ICMT 8.750000 22.0 7
#7 Haskell 8.703704 39.0 8
Or use rank
with parameter ascending=False
: 或者使用带有参数ascending=False
rank
:
df['rank'] = df['conference'].rank(ascending=False)
print df
# conference IF2013 AR2013 rank
#0 HOTMOBILE 16.333333 31.5 1
#1 FOGA 13.772727 60.0 2
#2 IEA/AIE 10.433735 28.2 3
#3 IEEE Real-Time and Embedded Technology and App... 10.250000 29.0 4
#4 Symposium on Computational Geometry 9.880342 35.0 5
#5 WISA 9.693878 43.6 6
#6 ICMT 8.750000 22.0 7
#7 Haskell 8.703704 39.0 8
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.