简体   繁体   English

pandas 按包含数字和字母的列对 dataframe 进行排序

[英]pandas sort dataframe by column that includes numbers and letters

I need to sort a dataframe by one column, which includes a combination of numbers and letters.我需要按一列对 dataframe 进行排序,其中包括数字和字母的组合。

df = [{"user": "seth",
       "name": "1"},
     {"user" : "chris",
       "name": "10A"},
     {"user" : "aaron",
       "name": "4B"},
     {"user" : "dan",
       "name": "10B"}]

My code:我的代码:

df1 = df.sort_values(by=['name'])

This gets me:这让我:

df1 = [{"user": "seth",
       "name": "1"},
     {"user" : "chris",
       "name": "10A"},
     {"user" : "dan",
       "name": "10B"},
     {"user" : "aaron",
       "name": "4B"}]

I want:我想:

df1 =    [{"user": "seth",
           "name": "1"},
         {"user" : "aaron",
           "name": "4B"},
         {"user" : "chris",
           "name": "10A"},
         {"user" : "dan",
           "name": "10B"}]

I had a different question that was flagged as a similar question, and their code:我有一个不同的问题被标记为类似的问题,他们的代码:

   df.reindex(index=natsorted(df.name))

It returns a sorted dataframe, but all values have been replaced by NaNs.它返回排序后的 dataframe,但所有值都已替换为 NaN。

  df.iloc(natsorted(df.name))

It raises an error:它引发了一个错误:

TypeError: unhashable type: 'list'

To slightly correct Quang's comment, this works fine为了稍微纠正 Quang 的评论,这很好用

import natsort

df1.iloc[natsort.index_humansorted(df1.name)]

you could use a regular expression and extract the numbers + letters, sort them and assign as a categorical column.您可以使用正则表达式并提取数字+字母,对它们进行排序并分配为分类列。

s = df["name"].str.extract("(\d+)?(\w|)")
s[0]= s[0].astype(int)

print(s)

   0  1
0   1   
1  10  A
2   4  B
3  10  B



df['name'] = pd.Categorical(df['name'],s.sort_values([0,1]).astype(str).agg(''.join,axis=1))


print(df.sort_values('name')

   user name
0   seth    1
2  aaron   4B
1  chris  10A
3    dan  10B

Update using pandas 1.1.0+ sort_values now has key parameter:使用 pandas 1.1.0+ 更新sort_values现在具有关键参数:

df = pd.DataFrame([{"user": "seth",
       "name": "1"},
     {"user" : "chris",
       "name": "10A"},
     {"user" : "aaron",
       "name": "4B"},
     {"user" : "dan",
       "name": "10B"}])

df.sort_values('name', key=lambda x: x.str.extract('(\d+)').squeeze().astype(int))

Output: Output:

    user name
0   seth    1
2  aaron   4B
1  chris  10A
3    dan  10B

You can now also do (with pandas >= 1.1.0):您现在还可以执行以下操作(使用 pandas >= 1.1.0):

import natsort

sorted_df = df1.sort_values("name", key=natsort.natsort_keygen())

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM