簡體   English   中英

如何在Pandas中用數字對字符串進行排序?

[英]How to sort strings with numbers in Pandas?

我有一個Python Pandas Dataframe,其中一個名為status的列包含三種可能的值: okmust read x more booksdoes not read any books yet ,其中x是一個大於0的整數。

我想根據上面的順序對status值進行排序。

例:

  name    status
0 Paul    ok
1 Jean    must read 1 more books
2 Robert  must read 2 more books
3 John    does not read any book yet

我發現了一些有趣的提示,使用Pandas Categoricalmap但我不知道如何處理修改字符串的變量值。

我怎樣才能做到這一點?

采用:

a = df['status'].str.extract('(\d+)', expand=False).astype(float)

d = {'ok': a.max() + 1, 'does not read any book yet':-1}

df1 = df.iloc[(-df['status'].map(d).fillna(a)).argsort()]
print (df1)
     name                      status
0    Paul                          ok
2  Robert      must read 2 more books
1    Jean      must read 1 more books
3    John  does not read any book yet

說明

  1. 首先通過regex \\d+ extract整數
  2. 然后為map非數值動態創建dictionary
  3. fillna替換numeric Series NaN
  4. 通過argsort獲得職位
  5. iloc選擇排序值

您可以使用自定義函數sorted來計算將對數組進行排序的索引(非常類似於numpy.argsort )。 然后輸入pd.DataFrame.iloc

df = pd.DataFrame({'name': ['Paul', 'Jean', 'Robert', 'John'],
                   'status': ['ok', 'must read 20 more books',
                              'must read 3 more books', 'does not read any book yet']})

def sort_key(x):
    if x[1] == 'ok':
        return -1
    elif x[1] == 'does not read any book yet':
        return np.inf
    else:
        return int(x[1].split()[2])

idx = [idx for idx, _ in sorted(enumerate(df['status']), key=sort_key)]

df = df.iloc[idx, :]

print(df)

     name                      status
0    Paul                          ok
2  Robert      must read 3 more books
1    Jean     must read 20 more books
3    John  does not read any book yet

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM