简体   繁体   English

Pandas-如何通过对多个列值进行正则表达式比较来过滤数据框

[英]Pandas - how to filter dataframe by regex comparisons on mutliple column values

I have a dataframe like the following, where everything is formatted as a string: 我有一个如下数据框,其中所有内容都格式化为字符串:

df
  property  value  count
0   propAb   True     10
1   propAA  False     10
2   propAB   blah     10
3   propBb      3      8
4   propBA      4      7
5   propCa    100      4

I am trying to find a way to filter the dataframe by applying a series of regex-style rules to both the property and value columns together. 我试图找到一种方法,通过将一系列正则表达式样式的规则应用于属性和值列,来筛选数据框。

For example, some sample rules may be like the following: 例如,一些示例规则可能如下所示:

  • "if property starts with 'propA' and value is not 'True', drop the row". “如果属性以'propA'开头并且值不是'True',则删除该行”。

Another rule may be something more mathematical, like: 另一个规则可能是更数学的东西,例如:

  • "if property starts with 'propB' and value < 4, drop the row". “如果属性以'propB'开头且值<4,则删除该行”。

Is there a way to accomplish something like this without having to iterate over all rows each time for every rule I want to apply? 有没有一种方法可以完成这样的事情而不必每次都对我要应用的每个规则都遍历所有行?

You still have to apply each rule (how else?), but let pandas handle the rows. 您仍然必须应用每个规则(还有其他规则),但是让熊猫来处理行。 Also, instead of removing the rows that you do not like, keep the rows that you do. 另外,不要删除不喜欢的行,而要保留行。 Here's an example of how the first two rules can be applied: 这是如何应用前两个规则的示例:

rule1 = df.property.str.startswith('propA') & (df.value != 'True')
df = df[~rule1] # Keep everything that does NOT match
rule2 = df.property.str.startswith('propB') & (df.value < 4)
df = df[~rule2] # Keep everything that does NOT match

By the way, the second rule will not work because value is not a numeric column. 顺便说一句,第二个规则将不起作用,因为value不是数字列。

For the first one: 对于第一个:

df = df.drop(df[(df.property.startswith('propA')) & (df.value is not True)].index)

and the other one: 另一个:

df = df.drop(df[(df.property.startswith('propB')) & (df.value < 4)].index)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM