删除最小值、最大值并计算平均值

Question

I have columns of numbers and I would need to remove only one min.我有数字列，我只需要删除一分钟。 and one max.和一个最大。 and then calculate the average of the numbers that remain.然后计算剩余数字的平均值。 The hitch is that the min/max could be anywhere in the column and some rows may be blank (null) or have a zero, or the column might have only 3 values.问题是最小值/最大值可能位于列中的任何位置，并且某些行可能为空白（空）或具有零，或者该列可能只有 3 个值。 All numbers will be between 0 and 100. For example:所有数字都在 0 到 100 之间。例如：

Value    Property
80          H
30.5        D
40          A
30.5        A
72          H
56          D
64.2        H

If there is more than one min or max, only one can be removed.如果有多个 min 或 max，则只能删除一个。

To calculate the minimum and maximum of a column, I did as follows:为了计算一列的最小值和最大值，我做了如下操作：

maximum = df['Value'].max()
minimum = df['Value'].min()

In the condition for calculating this average, I also included the condition where it is not null and where it is not equal to zero.在计算这个平均值的条件中，我还包括了它不是 null 和不等于 0 的条件。 However, I do not know how to remove only one max and one min, and add information on greater than 3 rows/values.但是，我不知道如何只删除一个最大值和一个最小值，并添加关于大于 3 行/值的信息。

I hope you can provide some help/tips on this.我希望你能提供一些帮助/提示。

Answer 1

Let us do idxmin and idxmax让我们做idxmin和idxmax

out = df.drop([df.Value.idxmax(),df.Value.idxmin()])
Out[27]: 
   Value Property
2   40.0        A
3   30.5        A
4   72.0        H
5   56.0        D
6   64.2        H

Answer 2

If the objective is to calculate the average without one min and one max, you can just do如果目标是计算没有最小值和最大值的平均值，您可以这样做

(df['Value'].sum() - df['Value'].min() - df['Value'].max())/(len(df)-2)

which outputs 52.54 for your data.它为您的数据输出52.54 。 Note that this will ignore NaNs etc. This will not modify your df which, if I read the question right, was not the objective anyway请注意，这将忽略 NaN 等。这不会修改您的 df，如果我正确阅读了问题，那无论如何都不是目标

Answer 3

Lately I struggled a little bit with similar problem.最近我遇到了类似的问题。 Finally I came across on numpy.ma library and found this to be elegant solution.最后我发现了 numpy.ma 库，发现这是一个很好的解决方案。

import numpy.ma as ma
df['Value'].values

# output -> array([80. , 30.5, 40. , 30.5, 72. , 56. , 64.2])

col_name= 'Value'
ma.masked_outside(df[col_name].values, df[col_name].min()+0.02, df[col_name].max()-0.05)

# output -> masked_array(data=[--, --, 40.0, --, 72.0, 56.0, 64.2],
#             mask=[ True,  True, False,  True, False, False, False],
#       fill_value=1e+20

# mean for values without outliers
ma.masked_outside(df[col_name].values, df[col_name].min()+0.02, df[col_name].max()-0.05).mean()

删除最小值、最大值并计算平均值

问题描述

3 个解决方案

解决方案1
1 2021-04-06 01:50:48

解决方案2
1 已采纳 2021-04-06 07:31:28

解决方案3
0 2022-02-27 12:29:02

删除最小值、最大值并计算平均值

问题描述

3 个解决方案

解决方案1 1 2021-04-06 01:50:48

解决方案2 1 已采纳 2021-04-06 07:31:28

解决方案3 0 2022-02-27 12:29:02

解决方案1
1 2021-04-06 01:50:48

解决方案2
1 已采纳 2021-04-06 07:31:28

解决方案3
0 2022-02-27 12:29:02