繁体   English   中英

使用另一个数据框过滤一个数据框

[英]filtering a dataframe using another dataframe

data = {'a':['a','b','c','d','e','f','g'],
        'b':['Y','N','Y','Y','Y','N','Y'],
        'c':['Qualified','Unqualified','Qualified','Unqualified','Qualified','Unqualified','Qualified']}
df = pd.DataFrame(data)

df_para = {'Y/N':['','y','n'],
        'Q/U':['unqualified','','unqualified']}
df_para = pd.DataFrame(df_para)

我想使用 df_para 过滤 df,我的代码是:

df_output = pd.DataFrame()

for para in df_para.iterrows():
    df_result = df
     # filter Q/U
    if '' not in df_para['Q/U']:
        mask_qu = df_result['c'].str.lower().isin(df_para['Q/U'])
        df_result = df_result.loc[(mask_qu)]
        
    # filter Y/N
    if '' not in df_para['Y/N']:
        mask_yn = df_result['b'].str.lower().isin(df_para['Y/N'])
        df_result = df_result.loc[(mask_yn)]

    df_output = df_output.append(df_result)

如果我使用我的代码,它将返回 df 内的所有行三次。 但是,df_output 应该是这样的:

   a   b   c
1   b   N   Unqualified
3   d   Y   Unqualified
5   f   N   Unqualified
0   a   Y   Qualified
2   c   Y   Qualified
3   d   Y   Unqualified
4   e   Y   Qualified
6   g   Y   Qualified
1   b   N   Unqualified
5   f   N   Unqualified

我该如何解决?

原因在于操作员测试指标

在 Series 上使用 Python in 运算符测试索引中的成员资格,而不是值之间的成员资格。

如果这种行为令人惊讶,请记住在 Python 字典中使用 in 测试键,而不是值,并且 Series 类似于 dict。

#pairs for filtering
cols = [('c','Q/U'), ('b','Y/N')]

#for each unique value in df_para filter rows in list
dfs = [df[df[a].str.lower().eq(x)] for a, b in cols for x in df_para[b].unique()]

#join subDataFrames
df_out = pd.concat(dfs)
print (df_out)
   a  b            c
1  b  N  Unqualified
3  d  Y  Unqualified
5  f  N  Unqualified
0  a  Y    Qualified
2  c  Y    Qualified
3  d  Y  Unqualified
4  e  Y    Qualified
6  g  Y    Qualified
1  b  N  Unqualified
5  f  N  Unqualified

尝试这个:

import pandas as pd
import numpy as np

data = {'a':['a','b','c','d','e','f','g'],
        'b':['Y','N','Y','Y','Y','N','Y'],
        'c':['Qualified','Unqualified','Qualified','Unqualified','Qualified','Qualified','Unqualified']}
df = pd.DataFrame(data)


df_result = df[df["c"] == "Unqualified"]
print(df_result)
print(type(df_result))

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM