简体   繁体   中英

pythonic way to count multiple columns conditionaly check

I'm trying to make a ordinary loop under specific conditions.

I want to interact over rows, checking conditions, and then interact over columns counting how many times the condition was meet.

This counting should generate a new column e my dataframe indicating the total count for each row.

I tried to use apply and mapapply with no success.

I successfully generated the following code to reach my goal.

But I bet there is more efficient ways, or even, built-in pandas functions to do it.
Anyone know how?

sample code:

import pandas as pd
df = pd.DataFrame({'1column': [11, 22, 33, 44], 
        '2column': [32, 42, 15, 35],                   
        '3column': [33, 77, 26, 64],                   
        '4column': [99, 11, 110, 22],                   
        '5column': [20, 64, 55, 33],                   
        '6column': [10, 77, 77, 10]})
check_columns = ['3column','5column', '6column' ]

df1 = df.copy()
df1['bignum_count'] = 0
for column in check_columns:
  inner_loop_count = []
  bigseries = df[column]>=50
  for big in bigseries:
    if big:
      inner_loop_count.append(1)
    else:
      inner_loop_count.append(0)

  df1['bignum_count'] += inner_loop_count  

# View the dataframe
df1

results:

1column   2column   3column 4column 5column 6column bignum_count

0   11  32  33  99  20  10  0

1   22  42  77  11  64  77  3

2   33  15  26  110 55  77  2

3   44  35  64  22  33  10  1

Index on the columns of interest and check which are greater or equal ( ge ) than a threshold:

df['bignum_count'] = df[check_columns].ge(50).sum(1)

print(df)

 1column  2column  3column  4column  5column  6column  bignum_count
0       11       32       33       99       20       10             0
1       22       42       77       11       64       77             3
2       33       15       26      110       55       77             2
3       44       35       64       22       33       10             1
check_columns
df1 = df.copy()

Use DataFrame.ge for >= with counts True s values by sum :

df['bignum_count'] = df[check_columns].ge(50).sum(axis=1)
#alternative
#df['bignum_count'] = (df[check_columns]>=50).sum(axis=1)
print(df)
   1column  2column  3column  4column  5column  6column  bignum_count
0       11       32       33       99       20       10             0
1       22       42       77       11       64       77             3
2       33       15       26      110       55       77             2
3       44       35       64       22       33       10             1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM