[英]pandas sort dataframe by column that includes numbers and letters
I need to sort a dataframe by one column, which includes a combination of numbers and letters.我需要按一列对 dataframe 进行排序,其中包括数字和字母的组合。
df = [{"user": "seth",
"name": "1"},
{"user" : "chris",
"name": "10A"},
{"user" : "aaron",
"name": "4B"},
{"user" : "dan",
"name": "10B"}]
My code:我的代码:
df1 = df.sort_values(by=['name'])
This gets me:这让我:
df1 = [{"user": "seth",
"name": "1"},
{"user" : "chris",
"name": "10A"},
{"user" : "dan",
"name": "10B"},
{"user" : "aaron",
"name": "4B"}]
I want:我想:
df1 = [{"user": "seth",
"name": "1"},
{"user" : "aaron",
"name": "4B"},
{"user" : "chris",
"name": "10A"},
{"user" : "dan",
"name": "10B"}]
I had a different question that was flagged as a similar question, and their code:我有一个不同的问题被标记为类似的问题,他们的代码:
df.reindex(index=natsorted(df.name))
It returns a sorted dataframe, but all values have been replaced by NaNs.它返回排序后的 dataframe,但所有值都已替换为 NaN。
df.iloc(natsorted(df.name))
It raises an error:它引发了一个错误:
TypeError: unhashable type: 'list'
To slightly correct Quang's comment, this works fine为了稍微纠正 Quang 的评论,这很好用
import natsort
df1.iloc[natsort.index_humansorted(df1.name)]
you could use a regular expression and extract the numbers + letters, sort them and assign as a categorical column.您可以使用正则表达式并提取数字+字母,对它们进行排序并分配为分类列。
s = df["name"].str.extract("(\d+)?(\w|)")
s[0]= s[0].astype(int)
print(s)
0 1
0 1
1 10 A
2 4 B
3 10 B
df['name'] = pd.Categorical(df['name'],s.sort_values([0,1]).astype(str).agg(''.join,axis=1))
print(df.sort_values('name')
user name
0 seth 1
2 aaron 4B
1 chris 10A
3 dan 10B
sort_values
now has key parameter:sort_values
现在具有关键参数:df = pd.DataFrame([{"user": "seth",
"name": "1"},
{"user" : "chris",
"name": "10A"},
{"user" : "aaron",
"name": "4B"},
{"user" : "dan",
"name": "10B"}])
df.sort_values('name', key=lambda x: x.str.extract('(\d+)').squeeze().astype(int))
Output: Output:
user name
0 seth 1
2 aaron 4B
1 chris 10A
3 dan 10B
You can now also do (with pandas >= 1.1.0):您现在还可以执行以下操作(使用 pandas >= 1.1.0):
import natsort
sorted_df = df1.sort_values("name", key=natsort.natsort_keygen())
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.