[英]How to sort strings with numbers in Pandas?
我有一個Python Pandas Dataframe,其中一個名為status
的列包含三種可能的值: ok
, must read x more books
, does not read any books yet
,其中x
是一個大於0
的整數。
我想根據上面的順序對status
值進行排序。
例:
name status
0 Paul ok
1 Jean must read 1 more books
2 Robert must read 2 more books
3 John does not read any book yet
我發現了一些有趣的提示,使用Pandas Categorical和map但我不知道如何處理修改字符串的變量值。
我怎樣才能做到這一點?
采用:
a = df['status'].str.extract('(\d+)', expand=False).astype(float)
d = {'ok': a.max() + 1, 'does not read any book yet':-1}
df1 = df.iloc[(-df['status'].map(d).fillna(a)).argsort()]
print (df1)
name status
0 Paul ok
2 Robert must read 2 more books
1 Jean must read 1 more books
3 John does not read any book yet
說明 :
您可以使用自定義函數sorted
來計算將對數組進行排序的索引(非常類似於numpy.argsort
)。 然后輸入pd.DataFrame.iloc
:
df = pd.DataFrame({'name': ['Paul', 'Jean', 'Robert', 'John'],
'status': ['ok', 'must read 20 more books',
'must read 3 more books', 'does not read any book yet']})
def sort_key(x):
if x[1] == 'ok':
return -1
elif x[1] == 'does not read any book yet':
return np.inf
else:
return int(x[1].split()[2])
idx = [idx for idx, _ in sorted(enumerate(df['status']), key=sort_key)]
df = df.iloc[idx, :]
print(df)
name status
0 Paul ok
2 Robert must read 3 more books
1 Jean must read 20 more books
3 John does not read any book yet
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.