简体   繁体   中英

Finding the n maximum values (when strings) of pandas dataframe column

I am trying to find the highest values of a column in my dataframe. However, as the values contain % they are strings, not integers, which is preventing me from using nlargest . I would like to know if I can convert the strings to integers.

Here is an example of my code:

import pandas as pd
import re
test_data = {
            'Animal': ['Otter', 'Turtle', 'Chicken'],
            'Squeak Appeal': [12.8, 1.92, 11.4],
            'Richochet Chance': ['8%', '30%', '16%'],
            }        
test_df = pd.DataFrame(
                        test_data, 
                        columns=[ 'Animal', 'Squeak Appeal','Richochet Chance']
                        )

My attempts to use nlargest:

r_chance = test_df.nlargest(2, ['Richochet Chance'])
# TypeError: Column 'Richochet Chance' has dtype object, cannot use method 'nlargest' with this dtype
r_chance = test_df.nlargest(2, re.sub("[^0-9]", ""(['Richochet Chance'])))
# TypeError: 'str' object is not callable

If there is no sensible way to do this I shan't remain in denial. I just wondered if I could avoid looping through a large df and converting strings to integers for multiple columns.

Let's convert that column into floats and extract the top indexes:

idx = (test_df['Richochet Chance']
          .str.strip('%')          # remove the ending %
          .astype(float)           # convert to float 
          .nlargest(2).index       # nlargest and index
      )
test_df.loc[idx]

Output:

    Animal  Squeak Appeal Richochet Chance
1   Turtle           1.92              30%
2  Chicken          11.40              16%

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM