Python - lambda function works by itself, but in a for loop - it doesn't

Question

Very weird case. I created some lists of ID numbers from DF columns which I want to lookup in another DF (final_df) and set a 'YES'/'NO' value in a correspondent column if they are found. It works perfectly when I run a lambda function by itself, but I tried to for loop it - and it doesn't.

df1['id_column'] = ['ABCDEF', 'BDCJG', 'HJAYR']
df2['id_column'] = ['NBJOAN', 'NAJOJ', 'NAIRG']

# The real version has duplicates so I convert it to sets here
df1_id_list = set(df1['id_column']
df2_id_list = set(df2['id_column']

This works just fine:

final_df['df1'] = final_df['id_column'].apply(lambda x: 'YES' if x in df1_id_list else 'NO')

But THIS brings KeyError 'df1_id_list':

df_list = ['df1', 'df2']

for df in df_list:
   final_df[df] = final_df['id_column'].apply(lambda x: 'YES' if x in vars()[df + '_id_list'] else 'NO')

I don't want to lose scalability, so why on Earth does the second one not work?

Answer 1

The reason why vars() wasn't working it's because vars() without any argument acts like locals() and since a dictionary comprehension has its own scope it has no variable named a df1_id_list or df2_id_list . You can use eval() (Edited based on corrected question):

df_list = ['df1', 'df2']
for df in df_list:
   final_df[df] = final_df['id_column'].apply(lambda x: 'YES' if x in eval(df + '_id_list') else 'NO')

You can try also with globals() , which returns a dictionary with all the defined variables:

df_list = ['df1', 'df2']
for df in df_list:
       final_df[df] = final_df['id_column'].apply(lambda x: 'YES' if x in globals()[df + '_id_list'] else 'NO')

Answer 2

Is there a reason you want to print your internal variable name to the dataframe and not just use an index like this:

import pandas as pd

df1 = pd.DataFrame()
df2 = pd.DataFrame()
final_df = pd.DataFrame()
df1['id_column'] = ['ABCDEF', 'BDCJG', 'HJAYR']
df2['id_column'] = ['NBJOAN', 'NAJOJ', 'NAIRG']
final_df['id_column'] = ['ABCDEF', 'NAJOJ', 'ZZZZZ']

df1_id_list = set(df1['id_column'])
df2_id_list = set(df2['id_column'])

df_list = [df1, df2]
index = 1
for df in df_list:
    final_df[f'df_{index}'] = final_df['id_column'].apply(lambda x: 'YES' if x in set(df['id_column']) else 'NO')
    index += 1
print(final_df)

Output:

id_column df_1 df_2
0    ABCDEF  YES   NO
1     NAJOJ   NO  YES
2     ZZZZZ   NO   NO

Python - lambda function works by itself, but in a for loop - it doesn't

Question

2 answers

solution1
0 2020-06-10 12:54:37

solution2
0 2020-06-10 13:03:49

Python - lambda function works by itself, but in a for loop - it doesn't

Question

2 answers

solution1 0 2020-06-10 12:54:37

solution2 0 2020-06-10 13:03:49

solution1
0 2020-06-10 12:54:37

solution2
0 2020-06-10 13:03:49