[英]Select only those rows from a Dataframe where certain columns with suffix have values not equal to zero
I want to select only those rows from a dataframe where certain columns with suffix have values not equal to zero.我想 select 仅来自 dataframe 的那些行,其中某些带有后缀的列的值不等于零。 Also the number of columns is more so I need a generalised solution.
此外,列数更多,所以我需要一个通用的解决方案。
eg:例如:
import pandas as pd
data = {
'ID' : [1,2,3,4,5],
'M_NEW':[10,12,14,16,18],
'M_OLD':[10,12,14,16,18],
'M_DIFF':[0,0,0,0,0],
'CA_NEW':[10,12,16,16,18],
'CA_OLD':[10,12,14,16,18],
'CA_DIFF':[0,0,2,0,0],
'BC_NEW':[10,12,14,16,18],
'BC_OLD':[10,12,14,16,17],
'BC_DIFF':[0,0,0,0,1]
}
df = pd.DataFrame(data)
df
The dataframe would be: dataframe 将是:
ID M_NEW M_OLD M_DIFF CA_NEW CA_OLD CA_DIFF BC_NEW BC_OLD BC_DIFF
0 1 10 10 0 10 10 0 10 10 0
1 2 12 12 0 12 12 0 12 12 0
2 3 14 14 0 16 14 2 14 14 0
3 4 16 16 0 16 16 0 16 16 0
4 5 18 18 0 18 18 0 18 17 1
The desired output is: (because of 2 in CA_DIFF and 1 in BC_DIFF)所需的 output 是:(因为 CA_DIFF 中为 2,BC_DIFF 中为 1)
ID M_NEW M_OLD M_DIFF CA_NEW CA_OLD CA_DIFF BC_NEW BC_OLD BC_DIFF
0 3 14 14 0 16 14 2 14 14 0
1 5 18 18 0 18 18 0 18 17 1
This works with using multiple conditions but what if the number of DIFF columns are more?这适用于使用多个条件,但如果 DIFF 列的数量更多怎么办? Like 20?
比如20? Can someone provide a general solution?
有人可以提供一个通用的解决方案吗? Thanks.
谢谢。
You can do this:你可以这样做:
...
# get all columns with X_DIFF
columns = df.columns[df.columns.str.contains('_DIFF')]
# check if any has value greater than 0
df[df[columns].transform(lambda x: x > 0).any(axis=1)]
You could use the function below, combined with pipe
to filter rows, based on various conditions:您可以使用下面的 function,结合
pipe
根据各种条件过滤行:
In [22]: def filter_rows(df, dtype, columns, condition, any_True = True):
...: temp = df.copy()
...: if dtype:
...: temp = df.select_dtypes(dtype)
...: if columns:
...: booleans = temp.loc[:, columns].transform(condition)
...: else:
...: booleans = temp.transform(condition)
...: if any_True:
...: booleans = booleans.any(axis = 1)
...: else:
...: booleans = booleans.all(axis = 1)
...:
...: return df.loc[booleans]
In [24]: df.pipe(filter_rows,
dtype=None,
columns=lambda df: df.columns.str.endswith("_DIFF"),
condition= lambda df: df.ne(0)
)
Out[24]:
ID M_NEW M_OLD M_DIFF CA_NEW CA_OLD CA_DIFF BC_NEW BC_OLD BC_DIFF
2 3 14 14 0 16 14 2 14 14 0
4 5 18 18 0 18 18 0 18 17 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.