簡體   English   中英

如何以特定方式按特定列對 Pandas 數據幀的值進行排序(使用 lambda 函數,如在 std lib 中排序)

[英]How to sort values of a pandas dataframe by a particular column in a particular manner (using lambda function like sorted in std lib)

鑒於以下數據:

import pandas as pd
import io

df = pd.read_csv(
    io.StringIO(
        "bit,val\nbit_0,40.9\nbit_1,49.6\nbit_2,50.5\nbit_3,37.7\nbit_4,52.0\nbit_5,55.1\nbit_6,40.6\nbit_7,37.8\nbit_8,39.2\nbit_9,51.1\nbit_10,48.4\nbit_11,49.8\nbit_12,51.7\nbit_13,46.7\nbit_14,40.8\nbit_15,41.1\nbit_16,36.7\nbit_17,50.8\nbit_18,41.6\nbit_19,41.3\n"
    )
)

df = df.sample(len(df), random_state=1).reset_index(drop=True)

看起來像:

       bit   val
0    bit_3  37.7
1   bit_16  36.7
2    bit_6  40.6
3   bit_10  48.4
4    bit_2  50.5
5   bit_14  40.8
6    bit_4  52.0
7   bit_17  50.8
8    bit_7  37.8
9    bit_1  49.6
10  bit_13  46.7
11   bit_0  40.9
12  bit_19  41.3
13  bit_18  41.6
14   bit_9  51.1
15  bit_15  41.1
16   bit_8  39.2
17  bit_12  51.7
18  bit_11  49.8
19   bit_5  55.1

我想根據尾隨數字按bit對數據進行排序。

如果這是一個標准的 python 列表,那么以下內容將起作用:

sorted(df["bit"].to_list(), key=lambda x: int(x.split("_")[-1]))

不過,我不確定如何將其應用於數據幀。

嘗試使用natsort

from natsort import index_natsorted
df = df.iloc[index_natsorted(df.bit)]
df
Out[195]: 
       bit   val
11   bit_0  40.9
9    bit_1  49.6
4    bit_2  50.5
0    bit_3  37.7
6    bit_4  52.0
19   bit_5  55.1
2    bit_6  40.6
8    bit_7  37.8
16   bit_8  39.2
14   bit_9  51.1
3   bit_10  48.4
18  bit_11  49.8
17  bit_12  51.7
10  bit_13  46.7
5   bit_14  40.8
15  bit_15  41.1
1   bit_16  36.7
7   bit_17  50.8
13  bit_18  41.6
12  bit_19  41.3

使用df.sort_values.str.split("_",expand=True)並使用.astype(int)轉換為 int ,如下所示:

df.sort_values('bit',key=lambda x: x.str.split("_",expand=True)[1].astype(int))

輸出:

       bit   val
11   bit_0  40.9
9    bit_1  49.6
4    bit_2  50.5
0    bit_3  37.7
6    bit_4  52.0
19   bit_5  55.1
2    bit_6  40.6
8    bit_7  37.8
16   bit_8  39.2
14   bit_9  51.1
3   bit_10  48.4
18  bit_11  49.8
17  bit_12  51.7
10  bit_13  46.7
5   bit_14  40.8
15  bit_15  41.1
1   bit_16  36.7
7   bit_17  50.8
13  bit_18  41.6
12  bit_19  41.3

如果您需要重置索引,只需添加.reset_index(drop=True)

df.sort_values('bit',key=lambda x: x.str.split("_",expand=True)[1].astype(int)).reset_index(drop=True)

輸出:

       bit   val
0    bit_0  40.9
1    bit_1  49.6
2    bit_2  50.5
3    bit_3  37.7
4    bit_4  52.0
5    bit_5  55.1
6    bit_6  40.6
7    bit_7  37.8
8    bit_8  39.2
9    bit_9  51.1
10  bit_10  48.4
11  bit_11  49.8
12  bit_12  51.7
13  bit_13  46.7
14  bit_14  40.8
15  bit_15  41.1
16  bit_16  36.7
17  bit_17  50.8
18  bit_18  41.6
19  bit_19  41.3

使用pandas >= 1.1.0 ,您可以像 sorted 一樣使用key
在我的解決方案中,我對 bit 列進行排序,但對於排序,我丟棄了bit_

df.sort_values(
    by='bit', 
    key=lambda x: x.str.replace('bit_', '').astype(int),
)

    bit     val
11  bit_0   40.9
9   bit_1   49.6
4   bit_2   50.5
0   bit_3   37.7
6   bit_4   52.0

.sort_values()上的文檔:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.sort_values.html

一種高效的方法是創建一個按您希望的方式排序的系列,然后將該索引傳遞給數據幀:

# create series of bit integers, sort them
bit_vals = df.bit.str.split("_", expand=True).loc[:, 1].astype(int)
sort_series = bit_vals.sort_values()    

# pass back to dataframe
df = df.iloc[sort_series.index]

結果:

       bit   val
11   bit_0  40.9
9    bit_1  49.6
4    bit_2  50.5
0    bit_3  37.7
6    bit_4  52.0
19   bit_5  55.1
2    bit_6  40.6
8    bit_7  37.8
16   bit_8  39.2
14   bit_9  51.1
3   bit_10  48.4
18  bit_11  49.8
17  bit_12  51.7
10  bit_13  46.7
5   bit_14  40.8
15  bit_15  41.1
1   bit_16  36.7
7   bit_17  50.8
13  bit_18  41.6
12  bit_19  41.3

您可以根據需要重置數據幀索引

您可以將str.extractSeries.argsortdf.loc

In [1038]: ix = df.bit.str.extract('(\d+)', expand=False).astype(int).argsort().tolist()

In [1039]: df.loc[ix]
Out[1039]: 
       bit   val
11   bit_0  40.9
9    bit_1  49.6
4    bit_2  50.5
0    bit_3  37.7
6    bit_4  52.0
19   bit_5  55.1
2    bit_6  40.6
8    bit_7  37.8
16   bit_8  39.2
14   bit_9  51.1
3   bit_10  48.4
18  bit_11  49.8
17  bit_12  51.7
10  bit_13  46.7
5   bit_14  40.8
15  bit_15  41.1
1   bit_16  36.7
7   bit_17  50.8
13  bit_18  41.6
12  bit_19  41.3

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM