I have a table T1 as shown below (stored as dataframe df3 with columns col1, col2 and col3)
df2 has columns 'l', 'm', 'n'...
df1 has columns 'a', 'b', 'c'
col1 col2 col3 x add {'a':'df1','l':'df2','n':'df2'} y sub {'b':'df1','m':'df2'} z sqrt {'c': 'df1'}
Value x in col1 is to be calculated using operation add in col2 using parameters key:value pairs in col3 ( a in df1 , l in df2 , ...)
Likewise, value y in col1 is to be calculated using operation sub in col2 using parameters in col3 ( b in df1 , m in df2 ); the number of k:v pairs in Col3 could be more OR less depending upon the operation/function defined in col 2, for sqrt for instance, there is only 1 pair
I want to get the output in form a dataframe df4 as mentioned below
x y z df1['a']+df2['l']+df2['n'] df1['b'] - df2['m'] df1['c]
I am trying achieve this by building a function as mentioned below but I am not sure how shall I build and pass a dynamic arguments list to this function where number of arguments to be passed depends upon the number of k:v pairs assigned in col3? In my case for add I have 3 and for sub I have 2 and for sqrt , I have only 1
for ix,row in df3.iterrows():
call_operation = row['col2'] target_value = row['col1'] #df4[target_value] = getattr(module,call_operation)(df2[b],df1[a]) df4[target_value] = getattr(module,call_operation)( <dynamic argument list form col3> )
# dummy data
df1 = pd.DataFrame({'a': [1, 2, 3]})
df2 = pd.DataFrame({'l': [4, 5, 6],
'n': [7, 8, 9]})
# get your dfs in a list so we can call them by name
dfs = {'df1': df1, 'df2': df2}
# let's say you are in your for loop on the first row:
ix = 0
target_name = 'x'
call_operation = 'sum'
col3 = {'a': 'df1', 'l': 'df2', 'n': 'df2'}
# actual logic:
vars = []
for k, v in col3.items():
vars.append(dfs[v][k].iloc[ix])
results['target_name'].iloc[ix] = getattr(__builtin__, call_operation)(vars)
Depending on how many type of operations you have in your real data you could either use getattr()
, if
statements or a combination of both.
if call_operation == 'sqrt':
getattr(math, 'sqrt')(vars[0])
etc.
This doesn't feel like a proper use of pandas
though, but I'm not sure of the size of your actual dataset.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.