简体   繁体   English

通过将动态参数列表传递给函数,需要有关 Pandas Dataframe 创建的帮助

[英]Need help on Pandas Dataframe creation by passing a dynamic argument list to a function

I have a table T1 as shown below (stored as dataframe df3 with columns col1, col2 and col3)我有一个表 T1,如下所示(存储为数据框 df3,列 col1、col2 和 col3)

df2 has columns 'l', 'm', 'n'... df2 有列 'l', 'm', 'n'...

df1 has columns 'a', 'b', 'c' df1 有列 'a', 'b', 'c'

col1       col2       col3

x            add       {'a':'df1','l':'df2','n':'df2'}

y            sub        {'b':'df1','m':'df2'} 

z            sqrt       {'c': 'df1'}

Value x in col1 is to be calculated using operation add in col2 using parameters key:value pairs in col3 ( a in df1 , l in df2 , ...) col1 中的值x将使用 col2 中的操作add计算,使用参数key:value对 col3 ( a in df1 , l in df2 , ...)

Likewise, value y in col1 is to be calculated using operation sub in col2 using parameters in col3 ( b in df1 , m in df2 );同样,col1 中的值y将使用 col2 中的操作sub使用 col3 中的参数( df1 中的bdf2 中的m )来计算; the number of k:v pairs in Col3 could be more OR less depending upon the operation/function defined in col 2, for sqrt for instance, there is only 1 pair Col3 中k:v对的数量可能更多或更少,具体取决于 col 2 中定义的操作/函数,例如sqrt ,只有 1 对

I want to get the output in form a dataframe df4 as mentioned below我想以数据帧 df4 的形式获取输出,如下所述

x                           y                      z

df1['a']+df2['l']+df2['n']   df1['b'] - df2['m']    df1['c]

I am trying achieve this by building a function as mentioned below but I am not sure how shall I build and pass a dynamic arguments list to this function where number of arguments to be passed depends upon the number of k:v pairs assigned in col3?我正在尝试通过构建如下所述的函数来实现这一点,但我不确定如何构建动态参数列表并将其传递给该函数,其中要传递的参数数量取决于在 col3 中分配的k:v对的数量? In my case for add I have 3 and for sub I have 2 and for sqrt , I have only 1在我的情况下,对于add我有 3 个,对于sub我有 2 个,而对于sqrt ,我只有 1 个

for ix,row in df3.iterrows():

call_operation = row['col2'] target_value = row['col1'] #df4[target_value] = getattr(module,call_operation)(df2[b],df1[a]) df4[target_value] = getattr(module,call_operation)( <dynamic argument list form col3> )
# dummy data
df1 = pd.DataFrame({'a': [1, 2, 3]})
df2 = pd.DataFrame({'l': [4, 5, 6],
                    'n': [7, 8, 9]})

# get your dfs in a list so we can call them by name
dfs = {'df1': df1, 'df2': df2}

# let's say you are in your for loop on the first row:
ix = 0
target_name = 'x'
call_operation = 'sum'
col3 = {'a': 'df1', 'l': 'df2', 'n': 'df2'}

# actual logic:
vars = []
for k, v in col3.items():
    vars.append(dfs[v][k].iloc[ix])
results['target_name'].iloc[ix] = getattr(__builtin__, call_operation)(vars)

Depending on how many type of operations you have in your real data you could either use getattr() , if statements or a combination of both.根据您在实际数据中拥有多少类型的操作,您可以使用getattr()if语句或两者的组合。

if call_operation == 'sqrt':
    getattr(math, 'sqrt')(vars[0])

etc.等等。

This doesn't feel like a proper use of pandas though, but I'm not sure of the size of your actual dataset.虽然这感觉不像是正确使用pandas ,但我不确定您的实际数据集的大小。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM