简体   繁体   English

根据多列之间的差异过滤 Dataframe

[英]Filter Dataframe Based on Differnce Betwwen Multiple Columns

I am working on the following dataframe, df :我正在研究以下 dataframe, df

name         val_1       val_2      val_3

AAA           20         25          30       
BBB           15         20          35
CCC           25         40          45
DDD           20         20          25

I need to keep only that name where any val column increase by more than 10 from the previous val column.我只需要保留任何val列从前一个val列增加超过 10 的名称。 If one column increases by less than 10 from the previous column or doesn't even increase, we need to drop that name.如果一列比前一列增加了不到 10 或者甚至没有增加,我们需要删除该名称。

Desired output:所需的 output:

name

BBB #val3 increases by 15  
CCC #val2 increases by 15 

What would be the smartest way of doing it?最聪明的做法是什么? Any suggestions would be appreciated.任何建议,将不胜感激。 Thanks!谢谢!

subset = df[df[['val_1', 'val_2', 'val_3']].diff().ge(10).any(axis=1)]

Output (assuming the AAA of val_3 is 20 instead of 30): Output(假设val_3AAA是 20 而不是 30):

>>> subset
  name  val_1  val_2  val_3
1  BBB     15     20     35
2  CCC     25     40     45

The way I understand it, you want to keep the two rows when they have a difference of at least 10 on all columns.我理解的方式是,当两行在所有列上的差异至少为 10 时,您希望保留这两行。

For this, you need to build a mask with diff + ge + all and combine the mask with its shift :为此,您需要使用diff + ge + all构建一个掩码,并将掩码与其shift结合起来:

m = df.filter(like='val_').diff().ge(10).all(1)
out = df[m|m.shift(-1)]

output: output:

  name  val_1  val_2  val_3
1  BBB     15     20     35
2  CCC     25     40     45

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM