简体   繁体   English

将列和值匹配到行的功能-DataFrames的交集

[英]Function to match columns and values to rows - intersection of DataFrames

Lets say I have the following DataFrame: 可以说我有以下DataFrame:

>>>df = pd.DataFrame([[5,2,3,11],[5,3,3,8],[9,4,11,12],[5,14,15,16]],columns=["a","b","c","d"])
>>>df
   a   b   c   d
0  5   2   3  11
1  5   3   3   8
2  9   4  11  12
3  5  14  15  16

If I wanted to match all the rows which had the column values: 'a' equal to 5, 'b' < 10 and 'c' greater than or equal to 3, I could just do the following: 如果我想匹配所有具有列值的行:'a'等于5,'b'<10并且'c'大于或等于3,我可以执行以下操作:

df[(df['a'] == 5) & (df['b'] < 10) & (df['c'] >= 3)]

and that would give me the result I was after: 这会给我带来我想要的结果:

   a  b  c   d
0  5  2  3  11
1  5  3  3   8

Entering that code to match the rows was laborious so I decided to make a function called row_matcher which would take 2 arguments: a Pandas DataFrame and a list of lists of length 3 - column of choice, operator and value. 输入该代码以匹配行很费力,因此我决定制作一个名为row_matcher的函数,该函数需要2个参数:Pandas DataFrame和长度为3的列表列表-选择,运算符和值的列。

def get_operator_fn(op):
    import operator
    return {
        '<' : operator.lt,
        '<=' : operator.le,
        '==' : operator.eq,
        '!=' : operator.ne,
        '>=' : operator.ge,
        '>' : operator.gt,
        '&' : operator.and_
        }[op]

def row_matcher(df,parameters):
    import pandas as pd
    """Parameter should be [column,operation,value]
    Example: ['trial',"==",1]
    """
    operations = [df[get_operator_fn(operation)(df[column],value)] for column,operation,value in parameters]
    return reduce(lambda left,right: pd.merge(left,right,how='inner'), operations)

>>>row_matcher(df,[["a","==",5],["b","<",10],["c",">=",3]])

Unfortunately with this code it throws up an error for the return reduce(...) line: TypeError: Could not compare <type 'str'> type with Series 不幸的是,使用此代码,它为return reduce(...)行引发了错误: TypeError: Could not compare <type 'str'> type with Series

I tried replacing the return reduce(...) line with: df[reduce(operator.and_,operations)] 我试图用df[reduce(operator.and_,operations)]代替return reduce(...)行。

This still results in an error: TypeError: unsupported operand type(s) for &: 'str' and 'str' 这仍然会导致错误: TypeError: unsupported operand type(s) for &: 'str' and 'str'

I would appreciate any assistance. 我将不胜感激。

I think this would be a lot simpler using the query() method. 我认为使用query()方法会简单得多。 Using this, your initial example can be written as: 使用此示例,您的初始示例可以写为:

df.query('a==5 & b<10 & c>=3')

Honestly, if you use the query() method, I don't think you'd gain much from your function, unless you're reading in lots of conditions from an external file. 老实说,如果您使用query()方法,除非您要从外部文件中读取许多条件,否则我认为您不会从函数中获得太多收益。 If still want to write the row_matcher function, just use string joins to combine your list of lists as a single string, obeying the query() syntax. 如果仍要编写row_matcher函数,则只需使用字符串连接,就可以使用query()语法将列表列表组合为单个字符串。 Once you have a single string, pass it along to the query() method. 拥有单个字符串后,将其传递给query()方法。

You may need to install the numexpr module to use the query() method. 您可能需要安装numexpr模块才能使用query()方法。 You can get around this by supplying the keyword argument engine='python' to query() method. 您可以通过向query()方法提供关键字参数engine='python'来解决此问题。 This may be less efficient than using the numexpr module, so it might be worthwhile to install the module if performance becomes an issue. 这可能不如使用numexpr模块有效,因此,如果性能成为问题,则可能值得安装该模块。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM