Select 仅来自 Dataframe 的那些行，其中某些带有后缀的列的值不等于零

Question

I want to select only those rows from a dataframe where certain columns with suffix have values not equal to zero.我想 select 仅来自 dataframe 的那些行，其中某些带有后缀的列的值不等于零。 Also the number of columns is more so I need a generalised solution.此外，列数更多，所以我需要一个通用的解决方案。

eg:例如：

import pandas as pd
data = {
    'ID' : [1,2,3,4,5],
    'M_NEW':[10,12,14,16,18],
    'M_OLD':[10,12,14,16,18],
    'M_DIFF':[0,0,0,0,0],
    'CA_NEW':[10,12,16,16,18],
    'CA_OLD':[10,12,14,16,18],
    'CA_DIFF':[0,0,2,0,0],
    'BC_NEW':[10,12,14,16,18],
    'BC_OLD':[10,12,14,16,17],
    'BC_DIFF':[0,0,0,0,1]
}
df = pd.DataFrame(data)
df

The dataframe would be: dataframe 将是：

   ID  M_NEW  M_OLD  M_DIFF  CA_NEW  CA_OLD  CA_DIFF  BC_NEW  BC_OLD  BC_DIFF
0   1     10     10       0      10      10        0      10      10        0
1   2     12     12       0      12      12        0      12      12        0
2   3     14     14       0      16      14        2      14      14        0
3   4     16     16       0      16      16        0      16      16        0
4   5     18     18       0      18      18        0      18      17        1

The desired output is: (because of 2 in CA_DIFF and 1 in BC_DIFF)所需的 output 是：（因为 CA_DIFF 中为 2，BC_DIFF 中为 1）

   ID  M_NEW  M_OLD  M_DIFF  CA_NEW  CA_OLD  CA_DIFF  BC_NEW  BC_OLD  BC_DIFF
0   3     14     14       0      16      14        2      14      14        0
1   5     18     18       0      18      18        0      18      17        1

This works with using multiple conditions but what if the number of DIFF columns are more?这适用于使用多个条件，但如果 DIFF 列的数量更多怎么办？ Like 20?比如20？ Can someone provide a general solution?有人可以提供一个通用的解决方案吗？ Thanks.谢谢。

Answer 1

You can do this:你可以这样做：


...
# get all columns with X_DIFF
columns = df.columns[df.columns.str.contains('_DIFF')]

# check if any has value greater than 0
df[df[columns].transform(lambda x: x > 0).any(axis=1)]

Answer 2

You could use the function below, combined with pipe to filter rows, based on various conditions:您可以使用下面的 function，结合pipe根据各种条件过滤行：

In [22]: def filter_rows(df, dtype, columns, condition, any_True = True):
    ...:     temp = df.copy()
    ...:     if dtype:
    ...:         temp = df.select_dtypes(dtype)
    ...:     if columns:
    ...:         booleans = temp.loc[:, columns].transform(condition)
    ...:     else:
    ...:         booleans = temp.transform(condition)
    ...:     if any_True:
    ...:         booleans = booleans.any(axis = 1)
    ...:     else:
    ...:         booleans = booleans.all(axis = 1)
    ...: 
    ...:     return df.loc[booleans]

In [24]: df.pipe(filter_rows,
                 dtype=None, 
                 columns=lambda df: df.columns.str.endswith("_DIFF"),
                 condition= lambda df: df.ne(0)
                 )

Out[24]: 
   ID  M_NEW  M_OLD  M_DIFF  CA_NEW  CA_OLD  CA_DIFF  BC_NEW  BC_OLD  BC_DIFF
2   3     14     14       0      16      14        2      14      14        0
4   5     18     18       0      18      18        0      18      17        1

Select 仅来自 Dataframe 的那些行，其中某些带有后缀的列的值不等于零

问题描述

2 个解决方案

解决方案1
0 已采纳 2021-03-06 15:32:00

解决方案2
0 2021-03-06 23:53:47

Select 仅来自 Dataframe 的那些行，其中某些带有后缀的列的值不等于零

问题描述

2 个解决方案

解决方案1 0 已采纳 2021-03-06 15:32:00

解决方案2 0 2021-03-06 23:53:47

解决方案1
0 已采纳 2021-03-06 15:32:00

解决方案2
0 2021-03-06 23:53:47