[英]filtering a dataframe using another dataframe
data = {'a':['a','b','c','d','e','f','g'],
'b':['Y','N','Y','Y','Y','N','Y'],
'c':['Qualified','Unqualified','Qualified','Unqualified','Qualified','Unqualified','Qualified']}
df = pd.DataFrame(data)
df_para = {'Y/N':['','y','n'],
'Q/U':['unqualified','','unqualified']}
df_para = pd.DataFrame(df_para)
我想使用 df_para 过滤 df,我的代码是:
df_output = pd.DataFrame()
for para in df_para.iterrows():
df_result = df
# filter Q/U
if '' not in df_para['Q/U']:
mask_qu = df_result['c'].str.lower().isin(df_para['Q/U'])
df_result = df_result.loc[(mask_qu)]
# filter Y/N
if '' not in df_para['Y/N']:
mask_yn = df_result['b'].str.lower().isin(df_para['Y/N'])
df_result = df_result.loc[(mask_yn)]
df_output = df_output.append(df_result)
如果我使用我的代码,它将返回 df 内的所有行三次。 但是,df_output 应该是这样的:
a b c
1 b N Unqualified
3 d Y Unqualified
5 f N Unqualified
0 a Y Qualified
2 c Y Qualified
3 d Y Unqualified
4 e Y Qualified
6 g Y Qualified
1 b N Unqualified
5 f N Unqualified
我该如何解决?
原因在于操作员测试指标:
在 Series 上使用 Python in 运算符测试索引中的成员资格,而不是值之间的成员资格。
如果这种行为令人惊讶,请记住在 Python 字典中使用 in 测试键,而不是值,并且 Series 类似于 dict。
#pairs for filtering
cols = [('c','Q/U'), ('b','Y/N')]
#for each unique value in df_para filter rows in list
dfs = [df[df[a].str.lower().eq(x)] for a, b in cols for x in df_para[b].unique()]
#join subDataFrames
df_out = pd.concat(dfs)
print (df_out)
a b c
1 b N Unqualified
3 d Y Unqualified
5 f N Unqualified
0 a Y Qualified
2 c Y Qualified
3 d Y Unqualified
4 e Y Qualified
6 g Y Qualified
1 b N Unqualified
5 f N Unqualified
尝试这个:
import pandas as pd
import numpy as np
data = {'a':['a','b','c','d','e','f','g'],
'b':['Y','N','Y','Y','Y','N','Y'],
'c':['Qualified','Unqualified','Qualified','Unqualified','Qualified','Qualified','Unqualified']}
df = pd.DataFrame(data)
df_result = df[df["c"] == "Unqualified"]
print(df_result)
print(type(df_result))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.