简体   繁体   English

(Python,DataFrame):记录列中所有小于第n个百分位数的数字的平均值

[英](Python, DataFrame): Record the average of all numbers in a column that are smaller than the n'th percentile

I have a DataFrame similar to the below and would like to create a DataFrame or series that looks more like the second table. 我有一个类似于下面的DataFrame,并想创建一个看起来更像第二张表的DataFrame或系列。

For example: I would find the nth percentile of column A, then take the average of all numbers in A that are less than the nth percentile. 例如:我将找到列A的第n个百分位数,然后取A中小于第n个百分位数的所有数字的平均值。

I've used the code below to get the average and range of each column but seem to be missing something to get the conditional average. 我使用下面的代码来获取每列的平均值和范围,但似乎缺少一些东西来获取条件平均值。

min = df.min(axis='index')

max = df.max(axis='index')

mean = df.mean(axis = 'index')

df[df < np.percentile(df, 0.4)].mean()

this doesnt seem to work and I believe gives the average of every row 这似乎不起作用,我相信可以给出每一行的平均值

Table 1 表格1

Date    A   B   C   D   E   F
02/10/2017  10  5   1   2   1   1
01/10/2017  10  4   9   4   3   5
30/09/2017  4   8   5   6   2   4
29/09/2017  8   2   7   9   10  5
28/09/2017  3   8   2   7   10  8
27/09/2017  7   3   8   9   9   7
26/09/2017  4   1   2   9   3   4
25/09/2017  10  1   6   6   3   5
24/09/2017  8   3   5   5   6   7
23/09/2017  7   9   5   7   1   3
22/09/2017  2   9   10  5   8   1

Table 2 表2

Index   Avg<40th Percentile
A   3.25
B   1.333333333
C   1.666666667
D   4
E   1.333333333
F   1.666666667

Use 采用

df.where(df < df.quantile(0.4)).mean()

Date         NaN
A       3.250000
B       1.333333
C       1.666667
D       4.000000
E       1.333333
F       1.666667

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM