简体   繁体   中英

Sort string columns with numbers in it in Pandas

I want to order my table by a column. The column is a string that has numbers in it, for example ASH11, ASH2, ASH1, etc. The problem is that using the method sort_values is going to do a "character" order, so the columns from the example will be order like this --> ASH1, ASH11, ASH2. And I want the order like this --> AS20H1, AS20H2, AS20H11 (taking into account the last number).

I though about taking the last characters of the string but sometimes would be only the last and in other cases the last two. The other way around (taking the characters from the beggining) doesnt work either because the strings are not always from the same lenght (ie some cases the name is ASH1, ASGH22, ASHGT3, etc)

Use key parameter (new in 1.1.0 )

df.sort_values(by=['xxx'], key=lambda col: col.map(lambda x: int(re.split('(\d+)',x)[-2])))

You could maybe extract the integers from your column and then use it to sort your dataFrame

  df["new_index"] = df.yourColumn.str.extract('(\d+)')
  df.sort_values(by=["new_index"], inplace=True)

In case you get some NA in your "new_index" column you can use the option na_position in the sort_values method in order to choose where to put them (beginning or end)

Using list comprehension and regular expression:

>>> import pandas as pd
>>> import re #Regular expression

>>> a = pd.DataFrame({'label':['AS20H1','AS20H2','AS20H11','ASH1','ASGH22','ASHGT3']})
>>> a
     label
0   AS20H1
1   AS20H2
2  AS20H11
3     ASH1
4   ASGH22
5   ASHGT3

r'(\d+)(?..*\d)' Matches the last number in a string

>>> a['sort_int'] = [ int(re.search(r'(\d+)(?!.*\d)',i).group(0)) for i in a['label']]
>>> a
     label  sort_int
0   AS20H1         1
1   AS20H2         2
2  AS20H11        11
3     ASH1         1
4   ASGH22        22
5   ASHGT3         3

>>> a.sort_values(by='sort_int',ascending=True)
     label  sort_int
0   AS20H1         1
3     ASH1         1
1   AS20H2         2
5   ASHGT3         3
2  AS20H11        11
4   ASGH22        22

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM