[英]Compare two columns in two dataframes with a condition on another column
I have a multilevel dataframe and I want to compare the value in column secret
with a condition on column group
.我有一个多级数据框,我想将 column secret
的值与 column group
上的条件进行比较。 If group = A, we allow the value in another dataframe to be empty or na.如果 group = A,我们允许另一个数据帧中的值为空或 na。 Otherwise, output INVALID for the mismatching ones.否则,为不匹配的输出 INVALID。
multilevel dataframe:多级数据框:
name secret group
df1 df2 df1 df2 df1 df2
id
1 Tim Tim random na A A
2 Tom Tom tree A A
3 Alex Alex apple apple B B
4 May May file cheese C C
expected output for secret秘密的预期输出
id name secret group
1 Tim na A
2 Tom A
3 Alex apple B
4 May INVALID C
so far I have:到目前为止我有:
result_df['result'] = multilevel_df.groupby(level=0, axis=0).apply(lambda x: secret_check(x))
#take care of the rest by compare column by column
result_df = multilevel_df.groupby(level=0, axis=1).apply(lambda x: validate(x))
def validate(x):
if x[0] == x[1]:
return x[1]
else:
return 'INVALID'
def secret_check(x):
if (x['group'] == 'A' and pd.isnull(['secret']): #this line is off
return x[1]
elif x[0] == x[1]:
return x[1]
else:
return 'INVALID'
If I understand you right, you want to mark "secret" in df2
as invalid if the secrets in df1
and df2
differ and the group is not A. There you go:如果我理解正确,如果df1
和df2
的机密不同并且组不是 A,您想将df2
“机密”标记为无效。
condition = (df[('secret', 'df1')] != df[('secret', 'df2')]) &\
df[('group', 'df1')] != 'A')
df.loc[condition, ('secret', 'df2')] = 'INVALID'
Assuming we have the following dataframe:假设我们有以下数据框:
df = pd.DataFrame({0: {0: 1, 1: 2, 2: 3, 3: 4},
1: {0: 'Tim', 1: 'Tom', 2: 'Alex', 3: 'May'},
2: {0: 'Tim', 1: 'Tom', 2: 'Alex', 3: 'May'},
3: {0: 'random', 1: 'tree', 2: 'apple', 3: 'file'},
4: {0: 'na', 1: '', 2: 'apple', 3: 'cheese'},
5: {0: 'A', 1: 'A', 2: 'B', 3: 'C'},
6: {0: 'A', 1: 'A', 2: 'B', 3: 'C'}})
df
df.columns = pd.MultiIndex.from_tuples([('id',''), ('name', 'df1'), ('name', 'df2'),
('secret', 'df1'), ('secret', 'df2'), ('group', 'df1'), ('group', 'df2')])
df
In[1]:
id name secret group
df1 df2 df1 df2 df1 df2
0 1 Tim Tim random na A A
1 2 Tom Tom tree A A
2 3 Alex Alex apple apple B B
3 4 May May file cheese C C
np.select()
to return results based on conditions.您可以使用np.select()
根据条件返回结果。.droplevel()
to get out of a multiindex dataframe .droplevel()
退出多.droplevel()
数据帧df.loc[:,~df.columns.duplicated()]
to drop duplicate columns.和df.loc[:,~df.columns.duplicated()]
删除重复的列。 Since we are setting the answer to df1
columns, df2
columns are not needed.由于我们将答案设置为df1
列,因此不需要df2
列。df[('secret', 'df1')] = np.select([(df[('group', 'df2')] != 'A') &
(df[('secret', 'df1')] != df[('secret', 'df2')])], #condition 1
[df[('secret', 'df1')] + ' > ' + df[('secret', 'df2')]], #result 1
df[('secret', 'df2')]) #alterantive if conditions not met
df.columns = df.columns.droplevel(1)
df = df.loc[:,~df.columns.duplicated()]
df
Out[1]:
id name secret group
0 1 Tim na A
1 2 Tom A
2 3 Alex apple B
3 4 May file > cheese C
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.