[英]Filtering a dataframe on dynamic columns and values Python Pandas?
The goal is to filter a DataFrame on a dynamic number of columns with their respective individual values.目标是在具有各自独立值的动态列数上过滤 DataFrame。 To achieve this, I've created a filter mask from a dictionary which I should be able to use each time.
为了实现这一点,我从字典中创建了一个过滤掩码,我每次都应该能够使用它。
However this filter mask becomes a string and therefore provides a 'KeyError'.然而,此过滤器掩码变成了一个字符串,因此提供了一个“KeyError”。 Some example of how my logic works.
我的逻辑如何工作的一些例子。
import pandas as pd
# Create a list of dictionaries with the data for each row
data = [{'col1': 1, 'col2': 'a', 'col3': True, 'col4': 1.0},
{'col1': 2, 'col2': 'b', 'col3': False, 'col4': 2.0},
{'col1': 1, 'col2': 'c', 'col3': True, 'col4': 3.0},
{'col1': 2, 'col2': 'd', 'col3': False, 'col4': 4.0},
{'col1': 1, 'col2': 'e', 'col3': True, 'col4': 5.0}]
df = pd.DataFrame(data)
filter_dict = {'col1': 1, 'col3': True,}
def create_filter_query_for_df(filter_dict):
query = ""
for i, (column, values) in enumerate(filter_dict.items()):
if i > 0:
query += " & "
if isinstance(values,float) or isinstance(values,int):
query += f"(data['{column}'] == {values})"
else:
query += f"(data['{column}'] == '{values}')"
return query
df[create_filter_query_for_df(filter_dict)]
Result is:结果是:
KeyError: "(data['col1'] == 1) & (data['col3'] == True)"
The issue is that the create_filter_query_for_df()
will return a string and it should be boolean variable.问题是
create_filter_query_for_df()
将返回一个字符串,它应该是布尔变量。 If you would make the mask as following:如果您将面具制作如下:
mask1 = "(data['col1'] == 1) & (data['col3'] == True)" # the same error is returned;
# However if you format as below, it provides a success
mask2 = (data['col1'] == 1) & (data['col3'] == True)
The type of mask1 will be str. mask1 的类型将为 str。 The type of mask2 will be boolean.
mask2 的类型将是布尔值。
However, I can't use bool(mask1) because then I can't use it anymore as filter condition.但是,我不能使用 bool(mask1) 因为那样我就不能再将它用作过滤条件。 I'm quite stuck so need some help here.
我很困所以在这里需要一些帮助。
Apologies if I took a completely wrong approach in trying to get to the filter, it seemed quite a suitable solution to me.如果我在尝试使用过滤器时采取了完全错误的方法,我深表歉意,这对我来说似乎是一个非常合适的解决方案。
Thanks in advance!提前致谢!
The result of filtering based on mask2
is as follows:基于
mask2
的过滤结果如下:
mask2 = (df['col1'] == 1) & (df['col3'] == True)
df[mask2]
col1 col2 col3 col4
0 1 a True 1.0
2 1 c True 3.0
4 1 e True 5.0
To reach the same result with a string, we can use df.query
like so:要使用字符串获得相同的结果,我们可以像这样使用
df.query
:
df.query('(col1 == 1) & (col3 == True)')
col1 col2 col3 col4
0 1 a True 1.0
2 1 c True 3.0
4 1 e True 5.0
Note that the required syntax is actually a bit different.请注意,所需的语法实际上有点不同。 So, let's simplify your function to end up with the string that we need:
因此,让我们简化您的函数以得到我们需要的字符串:
def create_filter_query_for_df(filter_dict):
list_pairs = [f"({col} == {val})" for col, val in filter_dict.items()]
query = ' & '.join(list_pairs)
# '(col1 == 1) & (col3 == True)'
return query
df.query(create_filter_query_for_df(filter_dict))
col1 col2 col3 col4
0 1 a True 1.0
2 1 c True 3.0
4 1 e True 5.0
Alternative approach替代方法
Incidentially, if you are only using the & operator, another way to approach this problem could be as follows:顺便说一句,如果您只使用&运算符,解决此问题的另一种方法如下:
pd.Series
and use them as input forpd.concat
with axis
parameter set to 1
.pd.Series
并将它们用作pd.concat
的输入, axis
参数设置为1
。df.all
with axis
parameter again set to 1
to evaluate if all values for each row in the resulting temporary df
equal True
). df.all
与axis
参数再次设置为1
以评估生成的临时df
中每一行的所有值是否都等于True
)。pd.Series
with booleans that we can use to filter the df
.pd.Series
,我们可以用它来过滤df
。my_mask = (pd.concat([df[k].eq(v) for k, v in filter_dict.items()],
axis=1)
.all(axis=1))
df[my_mask]
col1 col2 col3 col4
0 1 a True 1.0
2 1 c True 3.0
4 1 e True 5.0
Of course, this approach may not be ideal (or: function at all) if your actual requirements are a bit more complex.当然,如果您的实际需求稍微复杂一些,这种方法可能并不理想(或者:根本无法发挥作用)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.