简体   繁体   中英

Python - lambda function works by itself, but in a for loop - it doesn't

Very weird case. I created some lists of ID numbers from DF columns which I want to lookup in another DF (final_df) and set a 'YES'/'NO' value in a correspondent column if they are found. It works perfectly when I run a lambda function by itself, but I tried to for loop it - and it doesn't.

df1['id_column'] = ['ABCDEF', 'BDCJG', 'HJAYR']
df2['id_column'] = ['NBJOAN', 'NAJOJ', 'NAIRG']

# The real version has duplicates so I convert it to sets here
df1_id_list = set(df1['id_column']
df2_id_list = set(df2['id_column']

This works just fine:

final_df['df1'] = final_df['id_column'].apply(lambda x: 'YES' if x in df1_id_list else 'NO')

But THIS brings KeyError 'df1_id_list':

df_list = ['df1', 'df2']

for df in df_list:
   final_df[df] = final_df['id_column'].apply(lambda x: 'YES' if x in vars()[df + '_id_list'] else 'NO')

I don't want to lose scalability, so why on Earth does the second one not work?

The reason why vars() wasn't working it's because vars() without any argument acts like locals() and since a dictionary comprehension has its own scope it has no variable named a df1_id_list or df2_id_list . You can use eval() (Edited based on corrected question):

df_list = ['df1', 'df2']
for df in df_list:
   final_df[df] = final_df['id_column'].apply(lambda x: 'YES' if x in eval(df + '_id_list') else 'NO')

You can try also with globals() , which returns a dictionary with all the defined variables:

df_list = ['df1', 'df2']
for df in df_list:
       final_df[df] = final_df['id_column'].apply(lambda x: 'YES' if x in globals()[df + '_id_list'] else 'NO')

Is there a reason you want to print your internal variable name to the dataframe and not just use an index like this:

import pandas as pd

df1 = pd.DataFrame()
df2 = pd.DataFrame()
final_df = pd.DataFrame()
df1['id_column'] = ['ABCDEF', 'BDCJG', 'HJAYR']
df2['id_column'] = ['NBJOAN', 'NAJOJ', 'NAIRG']
final_df['id_column'] = ['ABCDEF', 'NAJOJ', 'ZZZZZ']

df1_id_list = set(df1['id_column'])
df2_id_list = set(df2['id_column'])

df_list = [df1, df2]
index = 1
for df in df_list:
    final_df[f'df_{index}'] = final_df['id_column'].apply(lambda x: 'YES' if x in set(df['id_column']) else 'NO')
    index += 1
print(final_df)

Output:

id_column df_1 df_2
0    ABCDEF  YES   NO
1     NAJOJ   NO  YES
2     ZZZZZ   NO   NO

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM