pandas：根据不同的算术条件获取每列内的计数

Question

I've data frame as below.我有如下数据框。 I calculate percentile based on inputs provided.我根据提供的输入计算百分位数。 I'd like to get count for each column that matches certain condition.我想计算符合特定条件的每一列。 For example, get count in a1 >value1 , similarly a2 > value2 and other column.例如，在a1 >value1中获取计数，类似a2 > value2和其他列。

import pandas as pd 

    df = pd.DataFrame([[10,11,20],[580,11,20],
        [500,11,20],
        [110,111,420],[11,11,20],[80,91,90],
        [80,91,'NA'],
        [10,11,13],[0,14,1111],
        [20,104,111],[220,314,1000],[200,30,2000],
        [61,31,10],[516,71,20],[10,30,330]],
         columns=['a1','a2','a3'])

calculate and describe column based on input percentile, for columns interested. drop NAs

print( (df[["a1","a2","a3"]].dropna()).describe(percentiles =[0.90,0.91,
    0.92,0.93,0.94,0.95,0.96,0.97,0.98,0.99] ))

I face certain issues:我面临一些问题：

Column a3 is removed. a3列被删除。 How do I save it from being thrown away, but simply throw away that row, or ignore NA?我如何避免它被扔掉，而只是扔掉那一行，或者忽略 NA？
I can get value for each column as:我可以获得每列的值：

print(len(df[(df['a1']>200) ]))
print(len(df[(df['a2']>100) ]))

However, this gets tricky and unreadable when data frame has ~10 columns.但是，当数据框有大约 10 列时，这会变得棘手且不可读。 How do I get counts in a data frame manner for columns for a condition ( a1 > 100 , a2>90 , a3>56 )?如何以数据框方式获取条件列的计数（ a1 > 100 ， a2>90 ， a3>56 ）？

Thank you.谢谢你。

Answer 1

If compare by dictionary with keys by all columns names and values for threshold in DataFrame.gt get boolean DataFrame , then for count True s use sum (because processing like 1 ):如果通过字典与DataFrame.gt中的所有列名称和阈值的键进行比较，则得到 boolean DataFrame ，然后对于 count True s 使用sum （因为像处理一样1 ：

df = df.apply(pd.to_numeric, errors='coerce')

s = df.gt({'a1': 100, 'a2': 90, 'a3': 56}).sum()
print (s)
a1    6
a2    5
a3    7
dtype: int64

Details :详情：

print(df.gt({'a1': 100, 'a2': 90, 'a3': 56}))


       a1     a2     a3
0   False  False  False
1    True  False  False
2    True  False  False
3    True   True   True
4   False  False  False
5   False   True   True
6   False   True  False
7   False  False  False
8   False  False   True
9   False   True   True
10   True   True   True
11   True  False   True
12  False  False  False
13   True  False  False
14  False  False   True

Your solution working well for me if removed dropna :如果删除dropna ，您的解决方案对我来说效果很好：

df = df.apply(pd.to_numeric, errors='coerce')

L = [0.90,0.91, 0.92,0.93,0.94,0.95,0.96,0.97,0.98,0.99]
print( df[["a1","a2","a3"]].describe(percentiles=L))
               a1          a2           a3
count   15.000000   15.000000    14.000000
mean   160.533333   62.800000   370.357143
std    204.229166   79.165469   596.271054
min      0.000000   11.000000    10.000000
50%     80.000000   30.000000    55.000000
90%    509.600000  108.200000  1077.700000
91%    511.840000  109.180000  1092.130000
92%    514.080000  110.160000  1106.560000
93%    517.280000  115.060000  1191.010000
94%    526.240000  143.480000  1306.580000
95%    535.200000  171.900000  1422.150000
96%    544.160000  200.320000  1537.720000
97%    553.120000  228.740000  1653.290000
98%    562.080000  257.160000  1768.860000
99%    571.040000  285.580000  1884.430000
max    580.000000  314.000000  2000.000000

EDIT1: If need comapre quantiles by columns from list use: EDIT1：如果需要使用列表中的列进行比较分位数：

df = df.apply(pd.to_numeric, errors='coerce')

cols = ['a1','a2','a3']
print (df[cols].quantile(0.5))
a1    80.0
a2    30.0
a3    55.0
Name: 0.5, dtype: float64

print (df[cols].gt(df[cols].quantile(0.5)))
       a1     a2     a3
0   False  False  False
1    True  False  False
2    True  False  False
3    True   True   True
4   False  False  False
5   False   True   True
6   False   True  False
7   False  False  False
8   False  False   True
9   False   True   True
10   True   True   True
11   True  False   True
12  False   True  False
13   True   True  False
14  False  False   True

pandas：根据不同的算术条件获取每列内的计数

问题描述

1 个解决方案

解决方案1
1 已采纳 2022-09-16 11:28:25

pandas：根据不同的算术条件获取每列内的计数

问题描述

1 个解决方案

解决方案1 1 已采纳 2022-09-16 11:28:25

解决方案1
1 已采纳 2022-09-16 11:28:25