[英]how to sort dataframe based on particular (string)columns using python pandas?
[英]How to sort values of a pandas dataframe by a particular column in a particular manner (using lambda function like sorted in std lib)
鑒於以下數據:
import pandas as pd
import io
df = pd.read_csv(
io.StringIO(
"bit,val\nbit_0,40.9\nbit_1,49.6\nbit_2,50.5\nbit_3,37.7\nbit_4,52.0\nbit_5,55.1\nbit_6,40.6\nbit_7,37.8\nbit_8,39.2\nbit_9,51.1\nbit_10,48.4\nbit_11,49.8\nbit_12,51.7\nbit_13,46.7\nbit_14,40.8\nbit_15,41.1\nbit_16,36.7\nbit_17,50.8\nbit_18,41.6\nbit_19,41.3\n"
)
)
df = df.sample(len(df), random_state=1).reset_index(drop=True)
看起來像:
bit val
0 bit_3 37.7
1 bit_16 36.7
2 bit_6 40.6
3 bit_10 48.4
4 bit_2 50.5
5 bit_14 40.8
6 bit_4 52.0
7 bit_17 50.8
8 bit_7 37.8
9 bit_1 49.6
10 bit_13 46.7
11 bit_0 40.9
12 bit_19 41.3
13 bit_18 41.6
14 bit_9 51.1
15 bit_15 41.1
16 bit_8 39.2
17 bit_12 51.7
18 bit_11 49.8
19 bit_5 55.1
我想根據尾隨數字按bit
對數據進行排序。
如果這是一個標准的 python 列表,那么以下內容將起作用:
sorted(df["bit"].to_list(), key=lambda x: int(x.split("_")[-1]))
不過,我不確定如何將其應用於數據幀。
嘗試使用natsort
from natsort import index_natsorted
df = df.iloc[index_natsorted(df.bit)]
df
Out[195]:
bit val
11 bit_0 40.9
9 bit_1 49.6
4 bit_2 50.5
0 bit_3 37.7
6 bit_4 52.0
19 bit_5 55.1
2 bit_6 40.6
8 bit_7 37.8
16 bit_8 39.2
14 bit_9 51.1
3 bit_10 48.4
18 bit_11 49.8
17 bit_12 51.7
10 bit_13 46.7
5 bit_14 40.8
15 bit_15 41.1
1 bit_16 36.7
7 bit_17 50.8
13 bit_18 41.6
12 bit_19 41.3
使用df.sort_values
和.str.split("_",expand=True)
並使用.astype(int)
轉換為 int ,如下所示:
df.sort_values('bit',key=lambda x: x.str.split("_",expand=True)[1].astype(int))
輸出:
bit val
11 bit_0 40.9
9 bit_1 49.6
4 bit_2 50.5
0 bit_3 37.7
6 bit_4 52.0
19 bit_5 55.1
2 bit_6 40.6
8 bit_7 37.8
16 bit_8 39.2
14 bit_9 51.1
3 bit_10 48.4
18 bit_11 49.8
17 bit_12 51.7
10 bit_13 46.7
5 bit_14 40.8
15 bit_15 41.1
1 bit_16 36.7
7 bit_17 50.8
13 bit_18 41.6
12 bit_19 41.3
如果您需要重置索引,只需添加.reset_index(drop=True)
:
df.sort_values('bit',key=lambda x: x.str.split("_",expand=True)[1].astype(int)).reset_index(drop=True)
輸出:
bit val
0 bit_0 40.9
1 bit_1 49.6
2 bit_2 50.5
3 bit_3 37.7
4 bit_4 52.0
5 bit_5 55.1
6 bit_6 40.6
7 bit_7 37.8
8 bit_8 39.2
9 bit_9 51.1
10 bit_10 48.4
11 bit_11 49.8
12 bit_12 51.7
13 bit_13 46.7
14 bit_14 40.8
15 bit_15 41.1
16 bit_16 36.7
17 bit_17 50.8
18 bit_18 41.6
19 bit_19 41.3
使用pandas >= 1.1.0 ,您可以像 sorted 一樣使用key
。
在我的解決方案中,我對 bit 列進行排序,但對於排序,我丟棄了bit_
:
df.sort_values(
by='bit',
key=lambda x: x.str.replace('bit_', '').astype(int),
)
bit val
11 bit_0 40.9
9 bit_1 49.6
4 bit_2 50.5
0 bit_3 37.7
6 bit_4 52.0
.sort_values()
上的文檔:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.sort_values.html
一種高效的方法是創建一個按您希望的方式排序的系列,然后將該索引傳遞給數據幀:
# create series of bit integers, sort them
bit_vals = df.bit.str.split("_", expand=True).loc[:, 1].astype(int)
sort_series = bit_vals.sort_values()
# pass back to dataframe
df = df.iloc[sort_series.index]
結果:
bit val
11 bit_0 40.9
9 bit_1 49.6
4 bit_2 50.5
0 bit_3 37.7
6 bit_4 52.0
19 bit_5 55.1
2 bit_6 40.6
8 bit_7 37.8
16 bit_8 39.2
14 bit_9 51.1
3 bit_10 48.4
18 bit_11 49.8
17 bit_12 51.7
10 bit_13 46.7
5 bit_14 40.8
15 bit_15 41.1
1 bit_16 36.7
7 bit_17 50.8
13 bit_18 41.6
12 bit_19 41.3
您可以根據需要重置數據幀索引
您可以將str.extract
與Series.argsort
和df.loc
:
In [1038]: ix = df.bit.str.extract('(\d+)', expand=False).astype(int).argsort().tolist()
In [1039]: df.loc[ix]
Out[1039]:
bit val
11 bit_0 40.9
9 bit_1 49.6
4 bit_2 50.5
0 bit_3 37.7
6 bit_4 52.0
19 bit_5 55.1
2 bit_6 40.6
8 bit_7 37.8
16 bit_8 39.2
14 bit_9 51.1
3 bit_10 48.4
18 bit_11 49.8
17 bit_12 51.7
10 bit_13 46.7
5 bit_14 40.8
15 bit_15 41.1
1 bit_16 36.7
7 bit_17 50.8
13 bit_18 41.6
12 bit_19 41.3
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.