[英]Add a ranking ordered column to a Pandas Dataframe
我知道這個問題看似微不足道,但我找不到任何解決方案。 我有一個很大的pandas dataframe df
,看起來像這樣:
conference IF2013 AR2013
0 HOTMOBILE 16.333333 31.50
1 FOGA 13.772727 60.00
2 IEA/AIE 10.433735 28.20
3 IEEE Real-Time and Embedded Technology and App... 10.250000 29.00
4 Symposium on Computational Geometry 9.880342 35.00
5 WISA 9.693878 43.60
6 ICMT 8.750000 22.00
7 Haskell 8.703704 39.00
我想在末尾添加一個額外的列,以對其進行排序1、2、3、4等。因此看起來像這樣:
conference IF2013 AR2013 Ranking
0 HOTMOBILE 16.333333 31.50 1
1 FOGA 13.772727 60.00 2
2 IEA/AIE 10.433735 28.20 3
3 IEEE Real-Time and Embedded Technology and App... 10.250000 29.00 4
我似乎無法弄清楚如何添加僅填充一系列連續數字的填充的額外列。
我想您正在尋找rank
函數:
df['rank'] = df['IF2013'].rank()
這樣,您的結果將與索引無關。
您可以添加具有range
列:
df['Ranking'] = range(1, len(df) + 1)
例:
import pandas as pd
from io import StringIO
data = """
conference IF2013 AR2013
HOTMOBILE 16.333333 31.50
FOGA 13.772727 60.00
IEA/AIE 10.433735 28.20
IEEE Real-Time and Embedded Technology and App... 10.250000 29.00
Symposium on Computational Geometry 9.880342 35.00
WISA 9.693878 43.60
ICMT 8.750000 22.00
Haskell 8.703704 39.00
"""
df = pd.read_csv(StringIO(data), sep=' \s+')
df['Ranking'] = range(1, len(df) + 1)
In [183]: df
Out[183]:
conference IF2013 AR2013 Ranking
0 HOTMOBILE 16.333333 31.5 1
1 FOGA 13.772727 60.0 2
2 IEA/AIE 10.433735 28.2 3
3 IEEE Real-Time and Embedded Technology and App... 10.250000 29.0 4
4 Symposium on Computational Geometry 9.880342 35.0 5
5 WISA 9.693878 43.6 6
6 ICMT 8.750000 22.0 7
7 Haskell 8.703704 39.0 8
編輯
基准測試:
In [202]: %timeit df['rank'] = range(1, len(df) + 1)
10000 loops, best of 3: 127 us per loop
In [203]: %timeit df['rank'] = df.AR2013.rank(ascending=False)
1000 loops, best of 3: 248 us per loop
你可以試試:
df['rank'] = df.index + 1
print df
# conference IF2013 AR2013 rank
#0 HOTMOBILE 16.333333 31.5 1
#1 FOGA 13.772727 60.0 2
#2 IEA/AIE 10.433735 28.2 3
#3 IEEE Real-Time and Embedded Technology and App... 10.250000 29.0 4
#4 Symposium on Computational Geometry 9.880342 35.0 5
#5 WISA 9.693878 43.6 6
#6 ICMT 8.750000 22.0 7
#7 Haskell 8.703704 39.0 8
或者使用帶有參數ascending=False
rank
:
df['rank'] = df['conference'].rank(ascending=False)
print df
# conference IF2013 AR2013 rank
#0 HOTMOBILE 16.333333 31.5 1
#1 FOGA 13.772727 60.0 2
#2 IEA/AIE 10.433735 28.2 3
#3 IEEE Real-Time and Embedded Technology and App... 10.250000 29.0 4
#4 Symposium on Computational Geometry 9.880342 35.0 5
#5 WISA 9.693878 43.6 6
#6 ICMT 8.750000 22.0 7
#7 Haskell 8.703704 39.0 8
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.