简体   繁体   中英

Python pandas dataframe sort_values does not work

I have the following pandas data frame which I want to sort by 'test_type'

  test_type         tps          mtt        mem        cpu       90th
0  sso_1000  205.263559  4139.031090  24.175933  34.817701  4897.4766
1  sso_1500  201.127133  5740.741266  24.599400  34.634209  6864.9820
2  sso_2000  203.204082  6610.437558  24.466267  34.831947  8005.9054
3   sso_500  189.566836  2431.867002  23.559557  35.787484  2869.7670

My code to load the dataframe and sort it is, the first print line prints the data frame above.

        df = pd.read_csv(file) #reads from a csv file
        print df
        df = df.sort_values(by=['test_type'], ascending=True)
        print '\nAfter sort...'
        print df

After doing the sort and printing the dataframe content, the data frame still looks like below.

Program output:

After sort...
  test_type         tps          mtt        mem        cpu       90th
0  sso_1000  205.263559  4139.031090  24.175933  34.817701  4897.4766
1  sso_1500  201.127133  5740.741266  24.599400  34.634209  6864.9820
2  sso_2000  203.204082  6610.437558  24.466267  34.831947  8005.9054
3   sso_500  189.566836  2431.867002  23.559557  35.787484  2869.7670

I expect row 3 (test type: sso_500 row) to be on top after sorting. Can someone help me figure why it's not working as it should?

Presumbaly, what you're trying to do is sort by the numerical value after sso_ . You can do this as follows:

import numpy as np

df.ix[np.argsort(df.test_type.str.split('_').str[-1].astype(int).values)

This

  1. splits the strings at _

  2. converts what's after this character to the numerical value

  3. Finds the indices sorted according to the numerical values

  4. Reorders the DataFrame according to these indices

Example

In [15]: df = pd.DataFrame({'test_type': ['sso_1000', 'sso_500']})

In [16]: df.sort_values(by=['test_type'], ascending=True)
Out[16]: 
  test_type
0  sso_1000
1   sso_500

In [17]: df.ix[np.argsort(df.test_type.str.split('_').str[-1].astype(int).values)]
Out[17]: 
  test_type
1   sso_500
0  sso_1000

Alternatively, you could also extract the numbers from test_type and sort them. Followed by reindexing DF according to those indices.

df.reindex(df['test_type'].str.extract('(\d+)', expand=False)    \
                          .astype(int).sort_values().index).reset_index(drop=True)

图片

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM