简体   繁体   English

查找 pandas dataframe 列的 n 个最大值(当字符串时)

[英]Finding the n maximum values (when strings) of pandas dataframe column

I am trying to find the highest values of a column in my dataframe.我试图在我的 dataframe 中找到列的最大值。 However, as the values contain % they are strings, not integers, which is preventing me from using nlargest .但是,由于值包含%它们是字符串,而不是整数,这使我无法使用nlargest I would like to know if I can convert the strings to integers.我想知道是否可以将字符串转换为整数。

Here is an example of my code:这是我的代码示例:

import pandas as pd
import re
test_data = {
            'Animal': ['Otter', 'Turtle', 'Chicken'],
            'Squeak Appeal': [12.8, 1.92, 11.4],
            'Richochet Chance': ['8%', '30%', '16%'],
            }        
test_df = pd.DataFrame(
                        test_data, 
                        columns=[ 'Animal', 'Squeak Appeal','Richochet Chance']
                        )

My attempts to use nlargest:我尝试使用 nlargest:

r_chance = test_df.nlargest(2, ['Richochet Chance'])
# TypeError: Column 'Richochet Chance' has dtype object, cannot use method 'nlargest' with this dtype
r_chance = test_df.nlargest(2, re.sub("[^0-9]", ""(['Richochet Chance'])))
# TypeError: 'str' object is not callable

If there is no sensible way to do this I shan't remain in denial.如果没有明智的方法来做到这一点,我不会继续否认。 I just wondered if I could avoid looping through a large df and converting strings to integers for multiple columns.我只是想知道是否可以避免循环遍历一个大的 df 并将字符串转换为多个列的整数。

Let's convert that column into floats and extract the top indexes:让我们将该列转换为浮点数并提取顶部索引:

idx = (test_df['Richochet Chance']
          .str.strip('%')          # remove the ending %
          .astype(float)           # convert to float 
          .nlargest(2).index       # nlargest and index
      )
test_df.loc[idx]

Output: Output:

    Animal  Squeak Appeal Richochet Chance
1   Turtle           1.92              30%
2  Chicken          11.40              16%

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Pandas:为子集中的每一列寻找最大值 - Pandas : finding maximum values for each column in a subset 获取数据框中列中 n 个最大值的索引 - Get the index of n maximum values in a column in dataframe 查找Pandas中轴上N个最大值的索引 - Finding the indexes of the N maximum values across an axis in Pandas 在Pandas数据框中查找具有相同列值的行 - Finding rows with same column values in pandas dataframe 如何在 pandas dataframe 字符串列中找到最大单词数? - How to find the maximum number of words in a pandas dataframe column of strings? 如何从单个 pandas dataframe 列的值中选择大小为 n 的随机样本,重复值最多出现 2 次? - How can I choose a random sample of size n from values from a single pandas dataframe column, with repeating values occurring a maximum of 2 times? pandas DataFrame:列中同号值相加的最大值和最小值 - pandas DataFrame: Maximum and minimum values of the addition of the values of the same sign in the column 在 pandas dataframe 中查找具有匹配列子字符串的行对 - Finding pairs of rows with matching column sub-strings in pandas dataframe 获取数据框中 n 个最大值的行名和列名 - get row and column names of n maximum values in dataframe 当列中的字符串计数最大时,Pandas 使用 groupby 转换数据帧 - Pandas transform dataframe using groupby when count of a string in a column is maximum
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM