简体   繁体   中英

Need help on Pandas Dataframe creation by passing a dynamic argument list to a function

I have a table T1 as shown below (stored as dataframe df3 with columns col1, col2 and col3)

df2 has columns 'l', 'm', 'n'...

df1 has columns 'a', 'b', 'c'

col1       col2       col3

x            add       {'a':'df1','l':'df2','n':'df2'}

y            sub        {'b':'df1','m':'df2'} 

z            sqrt       {'c': 'df1'}

Value x in col1 is to be calculated using operation add in col2 using parameters key:value pairs in col3 ( a in df1 , l in df2 , ...)

Likewise, value y in col1 is to be calculated using operation sub in col2 using parameters in col3 ( b in df1 , m in df2 ); the number of k:v pairs in Col3 could be more OR less depending upon the operation/function defined in col 2, for sqrt for instance, there is only 1 pair

I want to get the output in form a dataframe df4 as mentioned below

x                           y                      z

df1['a']+df2['l']+df2['n']   df1['b'] - df2['m']    df1['c]

I am trying achieve this by building a function as mentioned below but I am not sure how shall I build and pass a dynamic arguments list to this function where number of arguments to be passed depends upon the number of k:v pairs assigned in col3? In my case for add I have 3 and for sub I have 2 and for sqrt , I have only 1

for ix,row in df3.iterrows():

call_operation = row['col2'] target_value = row['col1'] #df4[target_value] = getattr(module,call_operation)(df2[b],df1[a]) df4[target_value] = getattr(module,call_operation)( <dynamic argument list form col3> )
# dummy data
df1 = pd.DataFrame({'a': [1, 2, 3]})
df2 = pd.DataFrame({'l': [4, 5, 6],
                    'n': [7, 8, 9]})

# get your dfs in a list so we can call them by name
dfs = {'df1': df1, 'df2': df2}

# let's say you are in your for loop on the first row:
ix = 0
target_name = 'x'
call_operation = 'sum'
col3 = {'a': 'df1', 'l': 'df2', 'n': 'df2'}

# actual logic:
vars = []
for k, v in col3.items():
    vars.append(dfs[v][k].iloc[ix])
results['target_name'].iloc[ix] = getattr(__builtin__, call_operation)(vars)

Depending on how many type of operations you have in your real data you could either use getattr() , if statements or a combination of both.

if call_operation == 'sqrt':
    getattr(math, 'sqrt')(vars[0])

etc.

This doesn't feel like a proper use of pandas though, but I'm not sure of the size of your actual dataset.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM