[英]Filter Dataframe Based on Differnce Betwwen Multiple Columns
I am working on the following dataframe, df
:我正在研究以下 dataframe,
df
:
name val_1 val_2 val_3
AAA 20 25 30
BBB 15 20 35
CCC 25 40 45
DDD 20 20 25
I need to keep only that name where any val
column increase by more than 10 from the previous val
column.我只需要保留任何
val
列从前一个val
列增加超过 10 的名称。 If one column increases by less than 10 from the previous column or doesn't even increase, we need to drop that name.如果一列比前一列增加了不到 10 或者甚至没有增加,我们需要删除该名称。
Desired output:所需的 output:
name
BBB #val3 increases by 15
CCC #val2 increases by 15
What would be the smartest way of doing it?最聪明的做法是什么? Any suggestions would be appreciated.
任何建议,将不胜感激。 Thanks!
谢谢!
subset = df[df[['val_1', 'val_2', 'val_3']].diff().ge(10).any(axis=1)]
Output (assuming the AAA
of val_3
is 20 instead of 30): Output(假设
val_3
的AAA
是 20 而不是 30):
>>> subset
name val_1 val_2 val_3
1 BBB 15 20 35
2 CCC 25 40 45
The way I understand it, you want to keep the two rows when they have a difference of at least 10 on all columns.我理解的方式是,当两行在所有列上的差异至少为 10 时,您希望保留这两行。
For this, you need to build a mask with diff
+ ge
+ all
and combine the mask with its shift
:为此,您需要使用
diff
+ ge
+ all
构建一个掩码,并将掩码与其shift
结合起来:
m = df.filter(like='val_').diff().ge(10).all(1)
out = df[m|m.shift(-1)]
output: output:
name val_1 val_2 val_3
1 BBB 15 20 35
2 CCC 25 40 45
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.