[英]What is the most pythonic way to apply a function on and return multiple columns?
在使用Pandas時,我經常會遇到一個現有函數,它接受多個參數並返回多個值:
def foo(val_a, val_b):
"""
Some example function that takes in and returns multiple values.
Can be a lot more complex.
"""
sm = val_a + val_b
sb = val_a - val_b
mt = val_a * val_b
dv = val_a / val_b
return sm, sb, mt, dv
假設我有一個數據幀:
import pandas as pd
df = pd.DataFrame([[1, 2], [3, 4], [5, 6], [7, 8]])
df
Out[6]:
0 1
0 1 2
1 3 4
2 5 6
3 7 8
我想要的是在df
上將foo
應用於第0列和第1列作為參數,並將結果放入df
新列中, 而不修改foo
,如下所示:
df_out
Out[7]:
0 1 su sb mt dv
0 1 2 3 -1 2 0.5
1 3 4 7 -1 12 0.75
2 5 6 11 -1 30 0.833
3 7 8 15 -1 56 0.875
實現這一目標的最pythonic方法是什么?
>>> pd.concat([df, df.from_records(foo(df[0], df[1])).T], axis=1)
0 1 0 1 2 3
0 1 2 3.0 -1.0 2.0 0.500000
1 3 4 7.0 -1.0 12.0 0.750000
2 5 6 11.0 -1.0 30.0 0.833333
3 7 8 15.0 -1.0 56.0 0.875000
速度:每循環1.13 ms
如果您關心速度,這優於使用apply
並提供您想要的輸出。
>>> pd.concat([df, df.from_records(np.vectorize(foo)(df[0], df[1])).T], axis=1)
速度:每回路728μs
#apply function foo and generate a DF using return values and then merge into existing DF.
merged = pd.merge(df,df.apply(lambda x: pd.Series(foo(x[0],x[1])),axis=1),left_index=True,right_index=True)
#change column names.
merged.columns=[0,1,'sm','sb','mt','dv']
merged
Out[1478]:
0 1 sm sb mt dv
0 1 2 3.0 -1.0 2.0 0.500000
1 3 4 7.0 -1.0 12.0 0.750000
2 5 6 11.0 -1.0 30.0 0.833333
3 7 8 15.0 -1.0 56.0 0.875000
cols = ['sm','sb','mt','dv']
df[cols] = pd.DataFrame(df.apply(lambda x: foo(x[0], x[1]), 1).values.tolist(),columns= cols)
print (df)
0 1 sm sb mt dv
0 1 2 3 -1 2 0.500000
1 3 4 7 -1 12 0.750000
2 5 6 11 -1 30 0.833333
3 7 8 15 -1 56 0.875000
解決方案與concat
cols = ['sm','sb','mt','dv']
df[cols] = pd.concat(foo(df[0], df[1]), axis=1, keys=cols)
print (df)
0 1 sm sb mt dv
0 1 2 3 -1 2 0.500000
1 3 4 7 -1 12 0.750000
2 5 6 11 -1 30 0.833333
3 7 8 15 -1 56 0.875000
也可以創建新的DataFrame
然后concat
原始:
cols = ['sm','sb','mt','dv']
df1 = pd.concat(foo(df[0], df[1]), axis=1, keys=cols)
print (df1)
sm sb mt dv
0 3 -1 2 0.500000
1 7 -1 12 0.750000
2 11 -1 30 0.833333
3 15 -1 56 0.875000
df = pd.concat([df, df1], axis=1)
print (df)
0 1 sm sb mt dv
0 1 2 3 -1 2 0.500000
1 3 4 7 -1 12 0.750000
2 5 6 11 -1 30 0.833333
3 7 8 15 -1 56 0.875000
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.